Skip to main content

String Mathematics, BLAST, and FASTA

  • Chapter
  • First Online:
Bioinformatics and the Cell
  • 2563 Accesses

Abstract

What is an e-value for ungapped and gapped BLAST? What are the Karlin-Altschul parameters that affect e-value calculation? How nucleotide frequencies and match-mismatch matrices affect such parameters? What are the key algorithms for FASTA and BLAST? How do their differences affect sensitivity of sequence search? This chapter addresses these questions and illustrates applications of string matching in genomics, transcriptomics, and proteomics, as well as in drug discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abraham EP, Chain E (1940) An enzyme from bacteria able to destroy penicillin. Rev Infect Dis 10(4):677–678

    Google Scholar 

  • Abraham EP, Chain E, Fletcher CM, Florey HW, Gardner AD, Heatley NG, Jennings MA (1941) Further observations on penicillin. Lancet 238(6155):177–189

    Article  Google Scholar 

  • Alderwick LJ, Seidel M, Sahm H, Besra GS, Eggeling L (2006) Identification of a novel arabinofuranosyltransferase (AftA) involved in cell wall arabinan biosynthesis in Mycobacterium tuberculosis. J Biol Chem 281(23):15653–15661

    Article  PubMed  CAS  Google Scholar 

  • Altschul SF (1996) Local alignment statistics. Meth Enzymol 274:460–480

    Article  Google Scholar 

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410

    Article  PubMed  CAS  Google Scholar 

  • Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Arava Y, Wang Y, Storey JD, Liu CL, Brown PO, Herschlag D (2003) Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae. Proc Natl Acad Sci USA 100(7):3889–3894

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Bastianelli G, Bouillon A, Nguyen C, Crublet E, Petres S, Gorgette O, Le-Nguyen D, Barale JC, Nilges M (2011) Computational reverse-engineering of a spider-venom derived peptide active against Plasmodium falciparum SUB1. PLoS One 6(7):e21812

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Bennetzen JL, Hall BD (1982) Codon selection in yeast. J Biol Chem 257(6):3026–3031

    PubMed  CAS  Google Scholar 

  • Bergsten E, Uutela M, Li X, Pietras K, Ostman A, Heldin CH, Alitalo K, Eriksson U (2001) PDGF-D is a specific, protease-activated ligand for the PDGF beta-receptor. Nat Cell Biol 3(5):512–516

    Article  PubMed  CAS  Google Scholar 

  • Bhatia B, Ponia SS, Solanki AK, Dixit A, Garg LC (2014) Identification of glutamate ABC-transporter component in Clostridium perfringens as a putative drug target. Bioinformation 10(7):401–405

    Article  PubMed  PubMed Central  Google Scholar 

  • Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE et al (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447(7146):799–816

    Article  PubMed  CAS  Google Scholar 

  • Blanchette M, Tompa M (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res 12(5):739–748

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Blanchette M, Bataille AR, Chen X, Poitras C, Laganiere J, Lefebvre C, Deblois G, Giguere V, Ferretti V, Bergeron D et al (2006) Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res 6(5):656–668

    Article  CAS  Google Scholar 

  • Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94

    Article  PubMed  CAS  Google Scholar 

  • Chuang SE, Daniels DL, Blattner FR (1993) Global regulation of gene expression in Escherichia coli. J Bacteriol 175(7):2026–2036

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Cox SS, van der Giezen M, Tarr SJ, Crompton MR, Tovar J (2006) Evidence from bioinformatics, expression and inhibition studies of phosphoinositide-3 kinase signalling in Giardia intestinalis. BMC Microbiol 6:45

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • David E, Tramontin T, Zemmel R (2009) Pharmaceutical R&D: the road to positive returns. Nat Rev Drug Discov 8(8):609–610

    Article  PubMed  CAS  Google Scholar 

  • Deng W, Lee J, Wang H, Miller J, Reik A, Gregory PD, Dean A, Blobel GA (2012) Controlling long-range genomic interactions at a native locus by targeted tethering of a looping factor. Cell 149(6):1233–1244

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Deng W, Rupon JW, Krivega I, Breda L, Motta I, Jahn KS, Reik A, Gregory PD, Rivella S, Dean A et al (2014b) Reactivation of developmentally silenced globin genes by forced chromatin looping. Cell 158(4):849–860

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Doolittle RF, Hunkapiller MW, Hood LE, Devare SG, Robbins KC, Aaronson SA, Antoniades HN (1983) Simian sarcoma virus onc gene, v-sis, is derived from the gene (or genes) encoding a platelet-derived growth factor. Science 221(4607):275–277

    Article  PubMed  CAS  Google Scholar 

  • Drews J, Ryser S (1997) The role of innovation in drug development. Nat Biotechnol 15(13):1318–1319

    Article  PubMed  CAS  Google Scholar 

  • Ehnman M, Missiaglia E, Folestad E, Selfe J, Strell C, Thway K, Brodin B, Pietras K, Shipley J, Ostman A et al (2013) Distinct effects of ligand-induced PDGFRalpha and PDGFRbeta signaling in the human rhabdomyosarcoma tumor cell and stroma cell compartments. Cancer Res 73(7):2139–2149

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Ezzell C (2002) Proteins rule. Sci Am 286(4):40–47

    Article  PubMed  Google Scholar 

  • Fernandez-Pinar R, Lo Sciuto A, Rossi A, Ranucci S, Bragonzi A, Imperi F (2015) In vitro and in vivo screening for novel essential cell-envelope proteins in Pseudomonas aeruginosa. Sci Rep 5:17593

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Figeys D (2002) Adapting arrays and lab-on-a-chip technology for proteomics. Proteomics 2(4):373–382

    Article  PubMed  CAS  Google Scholar 

  • Figeys D (2003a) Novel approaches to map protein interactions. Curr Opin Biotechnol 14(1):119–125

    Article  PubMed  CAS  Google Scholar 

  • Figeys D (2003b) Proteomics in 2002: a year of technical development and wide-ranging applications. Anal Chem 75(12):2891–2905

    Article  PubMed  CAS  Google Scholar 

  • Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269(5223):496–512

    Article  PubMed  CAS  Google Scholar 

  • Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, Bult CJ, Kerlavage AR, Sutton G, Kelley JM et al (1995) The minimal gene complement of Mycoplasma genitalium. Science 270(5235):397–403

    Article  PubMed  CAS  Google Scholar 

  • Frishman D, Mironov A, Mewes HW, Gelfand M (1998) Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res 26(12):2941–2947

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Gal-Mor O, Finlay BB (2006) Pathogenicity islands: a molecular toolbox for bacterial virulence. Cell Microbiol 8(11):1707–1719

    Article  PubMed  CAS  Google Scholar 

  • Gibbs JB (2000) Mechanism-based target identification and drug discovery in cancer research. Science 287(5460):1969–1973

    Article  PubMed  CAS  Google Scholar 

  • Gilbert WV, Zhou K, Butler TK, Doudna JA (2007) Cap-independent translation is required for starvation-induced differentiation in yeast. Science 317(5842):1224–1227

    Article  PubMed  CAS  Google Scholar 

  • Gumbel EJ (1958) Statistics of extremes. Columbia University Press, New York

    Google Scholar 

  • Hacker J, Kaper JB (2000) Pathogenicity islands and the evolution of microbes. Annu Rev Microbiol 54:641–679

    Article  PubMed  CAS  Google Scholar 

  • Hacker J, Blum-Oehler G, Muhldorfer I, Tschape H (1997) Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Mol Microbiol 23(6):1089–1097

    Article  PubMed  CAS  Google Scholar 

  • Hayes WS, Borodovsky M (1998) How to interpret an anonymous bacterial genome: machine learning approach to gene identification. Genome Res 8(11):1154–1171

    Article  PubMed  CAS  Google Scholar 

  • Heath JR, Ribas A, Mischel PS (2016) Single-cell analysis tools for drug discovery and development. Nat Rev Drug Discov 15(3):204–216

    Article  PubMed  CAS  Google Scholar 

  • Hofer A, Steverding D, Chabes A, Brun R, Thelander L (2001) Trypanosoma brucei CTP synthetase: a target for the treatment of African sleeping sickness. Proc Natl Acad Sci U S A 98(11):6412–6416

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Hou C, Zhao H, Tanimoto K, Dean A (2008) CTCF-dependent enhancer-blocking by alternative chromatin loop formation. Proc Natl Acad Sci U S A 105(51):20398–20403

    Article  PubMed  PubMed Central  Google Scholar 

  • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform 11:119

    Article  CAS  Google Scholar 

  • Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS (2009) Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324(5924):218–223

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Ingolia NT, Lareau LF, Weissman JS (2011) Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147(4):789–802

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Ingram VM (1956) A specific chemical difference between the globins of normal human and sickle-cell anaemia haemoglobin. Nature 178(4537):792–794

    Article  PubMed  CAS  Google Scholar 

  • Ingram VM (1957) Gene mutations in human haemoglobin: the chemical difference between normal and sickle cell haemoglobin. Nature 180(4581):326–328

    Article  PubMed  CAS  Google Scholar 

  • Jacob F, Monod J (1961) Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol 3:318–356

    Article  PubMed  CAS  Google Scholar 

  • Kaneko T, Tanaka A, Sato S, Kotani H, Sazuka T, Miyajima N, Sugiura M, Tabata S (1995) Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. I. Sequence features in the 1 Mb region from map positions 64% to 92% of the genome. DNA Res 2(4):153–166. 191-8

    Article  PubMed  CAS  Google Scholar 

  • Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, Hirosawa M, Sugiura M, Sasamoto S et al (1996) Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res 3(3):109–136

    Article  PubMed  CAS  Google Scholar 

  • Kioussis D, Vanin E, deLange T, Flavell RA, Grosveld FG (1983) Beta-globin gene inactivation by DNA translocation in gamma beta-thalassaemia. Nature 306(5944):662–666

    Article  PubMed  CAS  Google Scholar 

  • Kozak M (1981) Possible role of flanking nucleotides in recognition of the AUG initiator codon by eukaryotic ribosomes. Nucleic Acids Res 9(20):5233–5252

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Kozak M (1991) Effects of long 5′ leader sequences on initiation by eukaryotic ribosomes in vitro. Gene Expr 1(2):117–125

    PubMed  CAS  Google Scholar 

  • Kozak M (1999) Initiation of translation in prokaryotes and eukaryotes. Gene 234(2):187–208

    Article  PubMed  CAS  Google Scholar 

  • Krasemann EW, Meier V, Korenke GC, Hunneman DH, Hanefeld F (1996) Identification of mutations in the ALD-gene of 20 families with adrenoleukodystrophy/adrenomyeloneuropathy. Hum Genet 97(2):194–197

    Article  PubMed  CAS  Google Scholar 

  • Kutlar A (2007) Sickle cell disease: a multigenic perspective of a single gene disorder. Hemoglobin 31(2):209–224

    Article  PubMed  CAS  Google Scholar 

  • Laemmli UK (1970) Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nat Biotechnol 227:680–685

    CAS  Google Scholar 

  • Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921

    Article  PubMed  CAS  Google Scholar 

  • Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO et al (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950):289–293

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227(4693):1435–1441

    Article  PubMed  CAS  Google Scholar 

  • Liu X, Jiang H, Gu Z, Roberts JW (2013) High-resolution view of bacteriophage lambda gene expression by ribosome profiling. Proc Natl Acad Sci U S A 110(29):11928–11933

    Article  PubMed  PubMed Central  Google Scholar 

  • MacKay VL, Li X, Flory MR, Turcott E, Law GL, Serikawa KA, Xu XL, Lee H, Goodlett DR, Aebersold R et al (2004) Gene expression analyzed by high-resolution state array analysis and quantitative proteomics: response of yeast to mating pheromone. Mol Cell Proteomics 3(5):478–489

    Article  PubMed  Google Scholar 

  • Madden SL, Galella EA, Zhu J, Bertelsen AH, Beaudry GA (1997) SAGE transcript profiles for p53-dependent growth regulation. Oncogene 15(9):1079–1085

    Article  PubMed  CAS  Google Scholar 

  • Meyer IM, Durbin R (2004) Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res 32(2):776–783

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Moffat JG, Rudolph J, Bailey D (2014) Phenotypic screening in cancer drug discovery – past, present and future. Nat Rev Drug Discov 13(8):588–602

    Article  PubMed  CAS  Google Scholar 

  • Morita M, Shimozawa N, Kashiwayama Y, Suzuki Y, Imanaka T (2011) ABC subfamily D proteins and very long chain fatty acid metabolism as novel targets in adrenoleukodystrophy. Curr Drug Targets 12(5):694–706

    Article  PubMed  CAS  Google Scholar 

  • Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5(7):621–628

    Article  PubMed  CAS  Google Scholar 

  • Needleman SB, Wunsch CD (1970) A general method applicable to the search of similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453

    Article  PubMed  CAS  Google Scholar 

  • Noedl H, Se Y, Schaecher K, Smith BL, Socheat D, Fukuda MM (2008) Evidence of artemisinin-resistant malaria in western Cambodia. N Engl J Med 359(24):2619–2620

    Article  PubMed  CAS  Google Scholar 

  • Noedl H, Socheat D, Satimai W (2009) Artemisinin-resistant malaria in Asia. N Engl J Med 361(5):540–541

    Article  PubMed  CAS  Google Scholar 

  • Noedl H, Se Y, Sriwichai S, Schaecher K, Teja-Isavadharm P, Smith B, Rutvisuttinunt W, Bethell D, Surasri S, Fukuda MM et al (2010) Artemisinin resistance in Cambodia: a clinical trial designed to address an emerging problem in Southeast Asia. Clin Infect Dis 51(11):e82–e89

    Article  PubMed  Google Scholar 

  • Palstra RJ, Tolhuis B, Splinter E, Nijmeijer R, Grosveld F, de Laat W (2003) The beta-globin nuclear compartment in development and erythroid differentiation. Nat Genet 35(2):190–194

    Article  PubMed  CAS  Google Scholar 

  • Pauling L, Itano HA, Singer SJ, Wells IC (1949) Sickle cell anemia a molecular disease. Science 110(2865):543–548

    Article  PubMed  CAS  Google Scholar 

  • Pearson WR (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol 183:63–98

    Article  PubMed  CAS  Google Scholar 

  • Pearson WR (1994) Using the FASTA program to search protein and DNA sequence databases. Methods Mol Biol 24:307–331

    PubMed  CAS  Google Scholar 

  • Pearson WR (1998) Empirical statistical estimates for sequence similarity searches. J Mol Biol 276(1):71–84

    Article  PubMed  CAS  Google Scholar 

  • Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85:2444–2448

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Pietras K, Sjoblom T, Rubin K, Heldin CH, Ostman A (2003) PDGF receptors as cancer drug targets. Cancer Cell 3(5):439–443

    Article  PubMed  CAS  Google Scholar 

  • Poulos MG, Batra R, Charizanis K, Swanson MS (2011) Developments in RNA splicing and disease. Cold Spring Harb Perspect Biol 3(1):a000778

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Press WH, Teukolsky SA, Tetterling WT, Flannery BP (1992) Numerical recipes in C: the art of scientifi computing. Cambridge University Press, Cambridge

    Google Scholar 

  • Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A et al (2007) Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods 4(8):651–657

    Article  PubMed  CAS  Google Scholar 

  • Saadatpour A, Lai S, Guo G, Yuan GC (2015) Single-cell analysis in cancer genomics. Trends Genet 31(10):576–586

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE (2002) Using the transcriptome to annotate the genome. Nat Biotechnol 20(5):508–512

    Article  PubMed  CAS  Google Scholar 

  • Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26(2):544–548

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Schena M (1996) Genome analysis with gene expression microarrays. BioEssays 18(5):427–431

    Article  PubMed  CAS  Google Scholar 

  • Schena M (2003) Microarray analysis. Wiley-Liss, New York

    Google Scholar 

  • Segurel L, Bon C (2017) On the evolution of lactase persistence in humans. Annu Rev Genomics Hum Genet 18:297–319

    Article  PubMed  CAS  Google Scholar 

  • Shine J, Dalgarno L (1974a) The 3′-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc Natl Acad Sci U S A 71(4):1342–1346

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Shine J, Dalgarno L (1974b) Identical 3′-terminal octanucleotide sequence in 18S ribosomal ribonucleic acid from different eukaryotes. A proposed role for this sequence in the recognition of terminator codons. Biochem J 141(3):609–615

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Shine J, Dalgarno L (1975) Determinant of cistron specificity in bacterial ribosomes. Nature 254(5495):34–38

    Article  PubMed  CAS  Google Scholar 

  • Shoemaker RH (2006) The NCI60 human tumour cell line anticancer drug screen. Nat Rev Cancer 6(10):813–823

    Article  PubMed  CAS  Google Scholar 

  • Sloane AJ, Duff JL, Wilson NL, Gandhi PS, Hill CJ, Hopwood FG, Smith PE, Thomas ML, Cole RA, Packer NH et al (2002) High throughput peptide mass fingerprinting and protein macroarray analysis using chemical printing strategies. Mol Cell Proteomics 1(7):490–499

    Article  PubMed  CAS  Google Scholar 

  • Smircich P, Eastman G, Bispo S, Duhagon MA, Guerra-Slompo EP, Garat B, Goldenberg S, Munroe DJ, Dallagiovanna B, Holetz F et al (2015) Ribosome profiling reveals translation control as a key mechanism generating differential gene expression in Trypanosoma cruzi. BMC Genomics 16:443

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197

    Article  PubMed  CAS  Google Scholar 

  • Smyth RP, Davenport MP, Mak J (2012) The origin of genetic diversity in HIV-1. Virus Res 169(2):415–429

    Article  PubMed  CAS  Google Scholar 

  • Smyth RP, Schlub TE, Grimm AJ, Waugh C, Ellenberg P, Chopra A, Mallal S, Cromer D, Mak J, Davenport MP (2014) Identifying recombination hot spots in the HIV-1 genome. J Virol 88(5):2891–2902

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Steinberg MH, Rodgers GP (2001) Pathophysiology of sickle cell disease: role of cellular and genetic modifiers. Semin Hematol 38(4):299–306

    Article  PubMed  CAS  Google Scholar 

  • Steitz JA, Jakes K (1975) How ribosomes select initiator regions in mRNA: base pair formation between the 3′ terminus of 16S rRNA and the mRNA during initiation of protein synthesis in Escherichia coli. Proc Natl Acad Sci U S A 72(12):4734–4738

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Stormo GD, Schneider TD, Gold L, Ehrenfeucht A (1982a) Use of the ‘Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res 10(9):2997–3011

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Taniguchi T, Weissmann C (1978) Inhibition of Qbeta RNA 70S ribosome initiation complex formation by an oligonucleotide complementary to the 3′ terminal region of E. coli 16S ribosomal RNA. Nature 275(5682):770–772

    Article  PubMed  CAS  Google Scholar 

  • Tao H, Bausch C, Richmond C, Blattner FR, Conway T (1999) Functional genomics: expression analysis of Escherichia coli growing on minimal and rich media. J Bacteriol 181(20):6425–6440

    PubMed  PubMed Central  CAS  Google Scholar 

  • Taramelli R, Kioussis D, Vanin E, Bartram K, Groffen J, Hurst J, Grosveld FG (1986) Gamma delta beta-thalassaemias 1 and 2 are the result of a 100 kbp deletion in the human beta-globin cluster. Nucleic Acids Res 14(17):7017–7029

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Tech M, Merkl R (2003) YACOP: enhanced gene prediction obtained by a combination of existing methods. In Silico Biol 3(4):441–451

    PubMed  CAS  Google Scholar 

  • Tolhuis B, Palstra RJ, Splinter E, Grosveld F, de Laat W (2002) Looping and interaction between hypersensitive sites in the active beta-globin locus. Mol Cell 10(6):1453–1465

    Article  PubMed  CAS  Google Scholar 

  • Trudel MV, Vincent AT, Attere SA, Labbe M, Derome N, Culley AI, Charette SJ (2016) Diversity of antibiotic-resistance genes in Canadian isolates of Aeromonas salmonicida subsp. salmonicida: dominance of pSN254b and discovery of pAsa8. Sci Rep 6:35617

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Vasilescu J, Figeys D (2006) Mapping protein-protein interactions by mass spectrometry. Curr Opin Biotechnol 17(4):394–399

    Article  PubMed  CAS  Google Scholar 

  • Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995) Serial analysis of gene expression. Science 270(5235):484–487

    Article  PubMed  CAS  Google Scholar 

  • Velculescu VE, Zhang L, Zhou W, Vogelstein J, Basrai MA, Bassett DE Jr, Hieter P, Vogelstein B, Kinzler KW (1997) Characterization of the yeast transcriptome. Cell 88(2):243–251

    Article  PubMed  CAS  Google Scholar 

  • Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA et al (2001) The sequence of the human genome. Science 291(5507):1304–1351

    Article  PubMed  CAS  Google Scholar 

  • Vlasschaert C, Xia X, Coulombe J, Gray DA (2015) Evolution of the highly networked deubiquitinating enzymes USP4, USP15, and USP11. BMC Evol Biol 15:230

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Washburn MP, Wolters D, Yates JR 3rd (2001) Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol 19(3):242–247

    Article  PubMed  CAS  Google Scholar 

  • Waterfield MD, Scrace GT, Whittle N, Stroobant P, Johnsson A, Wasteson A, Westermark B, Heldin CH, Huang JS, Deuel TF (1983) Platelet-derived growth factor is structurally related to the putative transforming protein p28sis of simian sarcoma virus. Nature 304(5921):35–39

    Article  PubMed  CAS  Google Scholar 

  • Waterman MS, Vingron M (1994) Rapid and accurate estimates of statistical significance for sequence data base searches. Proc Natl Acad Sci U S A 91(11):4625–4628

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Weigert MG, Garen A (1965) Base composition of nonsense codons in E. coli. evidence from amino-acid substitutions at a tryptophan site in alkaline phosphatase. Nature 206(988):992–994

    Article  PubMed  CAS  Google Scholar 

  • Wilson DS, Nock S (2002) Functional protein microarrays. Curr Opin Chem Biol 6(1):81–85

    Article  PubMed  CAS  Google Scholar 

  • Wu J, Tzanakakis ES (2013) Deconstructing stem cell population heterogeneity: single-cell analysis and modeling approaches. Biotechnol Adv 31(7):1047–1062

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Xia X (2013) DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution. Mol Biol Evol 30:1720–1728

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Xia X (2017d) Self-organizing map for characterizing heterogeneous nucleotide and amino acid sequence motifs. Computation 5(4):43

    Article  Google Scholar 

  • Yates JR (2004a) Mass spectral analysis in proteomics. Annu Rev Biophys Biomol Struct 33:297–316

    Article  PubMed  CAS  Google Scholar 

  • Yates JR (2004b) Mass spectrometry as an emerging tool for systems biology. BioTechniques 36(6):917–919

    Article  PubMed  CAS  Google Scholar 

  • Yoon JH, De S, Srikantan S, Abdelmohsen K, Grammatikakis I, Kim J, Kim KM, Noh JH, White EJ, Martindale JL et al (2014) PAR-CLIP analysis uncovers AUF1 impact on target RNA fate and genome integrity. Nat Commun 5:5248

    Article  PubMed  CAS  Google Scholar 

  • Zhang L, Zhou W, Velculescu VE, Kern SE, Hruban RH, Hamilton SR, Vogelstein B, Kinzler KW (1997) Gene expression profiles in normal and cancer cells. Science 276(5316):1268–1272

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Appendix: Being Colorful Is Not Enough—How Are Stop Codons Decoded?

Appendix: Being Colorful Is Not Enough—How Are Stop Codons Decoded?

In the old and not-so-good days of 1965, three stop codons were known to exist. At that time, almost all molecular biologists study bacteriophages (T4 and λ) and their hosts (typically E. coli ). Sometimes a phage would have a mutation from a sense codon to a stop codon, leading to a truncated protein. Such a phage can only grow in certain E. coli strains containing a suppressor mutation (which is typically a tRNA with a mutated anticodon that can base-pair with a stop codon). Such strains were then called suppressor strains. The first set of nonsense mutations were discovered and isolated by Richard Epstein and Charles Steinberg. The resulting stop codon from these nonsense mutations was named after their friend Harris Bernstein whose last name means “amber” in German. The associated suppressor was termed amber suppressor. Any phage with a nonsense mutation that can grow only in this particular set of suppressor strains was said to harbor an amber mutation. The observation that an amber stop codon could be suppressed (i.e., recognized as a sense codon) in suppressor strains but not in other strains suggests some ambiguity in the meaning of the stop codon, i.e., the stop codon is interpreted differently between the suppressor strains and non-suppressor strains.

Phages with another set of nonsense mutations can grow only in a different set of E. coli strains, and the stop codon resulting from these nonsense mutations was named ochre. The associated suppressor is termed ochre suppressor. The third stop codon was named opal. In short, there are three disjoint sets of suppressor strains corresponding to three stop codons, but biologists did not know which stop codon is associated with amber, ochre, or opal.

Martin Weigert and Alan Garen (1965) studied one particular amino acid site in the alkaline phosphatase gene in E. coli . This site is occupied by tryptophan coded by UGG. An amber mutation (i.e., a nonsense mutation that can be suppressed by an amber suppressor) occurred at this site. In several revertants (in which the amber mutation had reversed to a sense codon), the original amino acid site was found to be occupied by seven different amino acids, Glu, Lys, Leu, Gln, Ser, Trp, and Tyr (Fig. 1.5). Can we now match the amber codon to one of the stop codons?

Fig. 1.5
figure 5

Data for matching the amber mutation to one of the three possible stop codons

Because mutation is rare, we may assume that the revertants differ from the amber mutant by a single nucleotide. These candidate codons are shown in Fig. 1.6 for each of the three stop codons. UGA is the least likely candidate because it cannot generate a codon for Glu, Lys, Gln, and Tyr by a single nucleotide change. UAA can also be excluded because (1) it cannot generate a Trp UGG codon by a single nucleotide change, and (2) it would need to have a double mutation from the original UGG codon. In contrast, UAG can mutate to one of the codons for all seven amino acids, and it can result from the original UGG through a single nucleotide replacement. So the amber codon is UAG. (I have made things easier because Weigert and Garen were not so sure about the sense codons which were only partially decoded in 1965).

Fig. 1.6
figure 6

Bioinformatic analysis to match the amber codon to UAG. The numbers indicate the number of matched nucleotides because a sense codon and a stop codon. Amino acids are shown next to its codons

This example illustrates several decision-making principles that are useful to us. The first is the KISS principle (“Keep it simple, stupid”). That is, to use simplifying assumptions to make decision-making easier. For example, if we assume that double mutation (i.e., mutations at two sites of the codon) is so rare as to be negligible, then we can immediately reject UAA and UGA because both require double mutations to generate codons for the seven amino acids. In terms of model selection, we state that only the UAG model fits the data given the model condition (the assumption of no double mutation).

The second is the parsimony principle. We have three alternative hypotheses corresponding to the three stop codons. The UAG hypothesis needs a minimum of seven single point mutations to generate the seven revertants each with a different amino acid. It also needs a point mutation to connect the original UGG codon. The UGA hypothesis needs at least two point mutations to change to a Glu, Lys, Gln, or Try codon, and one mutation to each of the other three amino acids. This comes to a minim number of 11 point mutations, plus a point mutation to connect the original UGG codon. The UAA hypothesis needs to have two point mutations to a UGG revertant as well as two point mutations from the original UGG codon, leading to a minimum of ten point mutations. The UAG hypothesis is therefore the most parsimonious and is chosen over the other two alternatives.

The third is the likelihood principle, but we will not go there without first learning substitution models which is covered in a later chapter.

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media LLC

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Xia, X. (2018). String Mathematics, BLAST, and FASTA. In: Bioinformatics and the Cell. Springer, Cham. https://doi.org/10.1007/978-3-319-90684-3_1

Download citation

Publish with us

Policies and ethics