String Mathematics, BLAST, and FASTA

Xia, Xuhua

doi:10.1007/978-3-319-90684-3_1

Xuhua Xia²

2563 Accesses

Abstract

What is an e-value for ungapped and gapped BLAST? What are the Karlin-Altschul parameters that affect e-value calculation? How nucleotide frequencies and match-mismatch matrices affect such parameters? What are the key algorithms for FASTA and BLAST? How do their differences affect sensitivity of sequence search? This chapter addresses these questions and illustrates applications of string matching in genomics, transcriptomics, and proteomics, as well as in drug discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abraham EP, Chain E (1940) An enzyme from bacteria able to destroy penicillin. Rev Infect Dis 10(4):677–678
Google Scholar
Abraham EP, Chain E, Fletcher CM, Florey HW, Gardner AD, Heatley NG, Jennings MA (1941) Further observations on penicillin. Lancet 238(6155):177–189
Article Google Scholar
Alderwick LJ, Seidel M, Sahm H, Besra GS, Eggeling L (2006) Identification of a novel arabinofuranosyltransferase (AftA) involved in cell wall arabinan biosynthesis in Mycobacterium tuberculosis. J Biol Chem 281(23):15653–15661
Article PubMed CAS Google Scholar
Altschul SF (1996) Local alignment statistics. Meth Enzymol 274:460–480
Article Google Scholar
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Article PubMed CAS Google Scholar
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Article PubMed PubMed Central CAS Google Scholar
Arava Y, Wang Y, Storey JD, Liu CL, Brown PO, Herschlag D (2003) Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae. Proc Natl Acad Sci USA 100(7):3889–3894
Article PubMed CAS PubMed Central Google Scholar
Bastianelli G, Bouillon A, Nguyen C, Crublet E, Petres S, Gorgette O, Le-Nguyen D, Barale JC, Nilges M (2011) Computational reverse-engineering of a spider-venom derived peptide active against Plasmodium falciparum SUB1. PLoS One 6(7):e21812
Article PubMed PubMed Central CAS Google Scholar
Bennetzen JL, Hall BD (1982) Codon selection in yeast. J Biol Chem 257(6):3026–3031
PubMed CAS Google Scholar
Bergsten E, Uutela M, Li X, Pietras K, Ostman A, Heldin CH, Alitalo K, Eriksson U (2001) PDGF-D is a specific, protease-activated ligand for the PDGF beta-receptor. Nat Cell Biol 3(5):512–516
Article PubMed CAS Google Scholar
Bhatia B, Ponia SS, Solanki AK, Dixit A, Garg LC (2014) Identification of glutamate ABC-transporter component in Clostridium perfringens as a putative drug target. Bioinformation 10(7):401–405
Article PubMed PubMed Central Google Scholar
Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE et al (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447(7146):799–816
Article PubMed CAS Google Scholar
Blanchette M, Tompa M (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res 12(5):739–748
Article PubMed PubMed Central CAS Google Scholar
Blanchette M, Bataille AR, Chen X, Poitras C, Laganiere J, Lefebvre C, Deblois G, Giguere V, Ferretti V, Bergeron D et al (2006) Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res 6(5):656–668
Article CAS Google Scholar
Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94
Article PubMed CAS Google Scholar
Chuang SE, Daniels DL, Blattner FR (1993) Global regulation of gene expression in Escherichia coli. J Bacteriol 175(7):2026–2036
Article PubMed PubMed Central CAS Google Scholar
Cox SS, van der Giezen M, Tarr SJ, Crompton MR, Tovar J (2006) Evidence from bioinformatics, expression and inhibition studies of phosphoinositide-3 kinase signalling in Giardia intestinalis. BMC Microbiol 6:45
Article PubMed PubMed Central CAS Google Scholar
David E, Tramontin T, Zemmel R (2009) Pharmaceutical R&D: the road to positive returns. Nat Rev Drug Discov 8(8):609–610
Article PubMed CAS Google Scholar
Deng W, Lee J, Wang H, Miller J, Reik A, Gregory PD, Dean A, Blobel GA (2012) Controlling long-range genomic interactions at a native locus by targeted tethering of a looping factor. Cell 149(6):1233–1244
Article PubMed PubMed Central CAS Google Scholar
Deng W, Rupon JW, Krivega I, Breda L, Motta I, Jahn KS, Reik A, Gregory PD, Rivella S, Dean A et al (2014b) Reactivation of developmentally silenced globin genes by forced chromatin looping. Cell 158(4):849–860
Article PubMed PubMed Central CAS Google Scholar
Doolittle RF, Hunkapiller MW, Hood LE, Devare SG, Robbins KC, Aaronson SA, Antoniades HN (1983) Simian sarcoma virus onc gene, v-sis, is derived from the gene (or genes) encoding a platelet-derived growth factor. Science 221(4607):275–277
Article PubMed CAS Google Scholar
Drews J, Ryser S (1997) The role of innovation in drug development. Nat Biotechnol 15(13):1318–1319
Article PubMed CAS Google Scholar
Ehnman M, Missiaglia E, Folestad E, Selfe J, Strell C, Thway K, Brodin B, Pietras K, Shipley J, Ostman A et al (2013) Distinct effects of ligand-induced PDGFRalpha and PDGFRbeta signaling in the human rhabdomyosarcoma tumor cell and stroma cell compartments. Cancer Res 73(7):2139–2149
Article PubMed PubMed Central CAS Google Scholar
Ezzell C (2002) Proteins rule. Sci Am 286(4):40–47
Article PubMed Google Scholar
Fernandez-Pinar R, Lo Sciuto A, Rossi A, Ranucci S, Bragonzi A, Imperi F (2015) In vitro and in vivo screening for novel essential cell-envelope proteins in Pseudomonas aeruginosa. Sci Rep 5:17593
Article PubMed PubMed Central CAS Google Scholar
Figeys D (2002) Adapting arrays and lab-on-a-chip technology for proteomics. Proteomics 2(4):373–382
Article PubMed CAS Google Scholar
Figeys D (2003a) Novel approaches to map protein interactions. Curr Opin Biotechnol 14(1):119–125
Article PubMed CAS Google Scholar
Figeys D (2003b) Proteomics in 2002: a year of technical development and wide-ranging applications. Anal Chem 75(12):2891–2905
Article PubMed CAS Google Scholar
Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269(5223):496–512
Article PubMed CAS Google Scholar
Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, Bult CJ, Kerlavage AR, Sutton G, Kelley JM et al (1995) The minimal gene complement of Mycoplasma genitalium. Science 270(5235):397–403
Article PubMed CAS Google Scholar
Frishman D, Mironov A, Mewes HW, Gelfand M (1998) Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res 26(12):2941–2947
Article PubMed PubMed Central CAS Google Scholar
Gal-Mor O, Finlay BB (2006) Pathogenicity islands: a molecular toolbox for bacterial virulence. Cell Microbiol 8(11):1707–1719
Article PubMed CAS Google Scholar
Gibbs JB (2000) Mechanism-based target identification and drug discovery in cancer research. Science 287(5460):1969–1973
Article PubMed CAS Google Scholar
Gilbert WV, Zhou K, Butler TK, Doudna JA (2007) Cap-independent translation is required for starvation-induced differentiation in yeast. Science 317(5842):1224–1227
Article PubMed CAS Google Scholar
Gumbel EJ (1958) Statistics of extremes. Columbia University Press, New York
Google Scholar
Hacker J, Kaper JB (2000) Pathogenicity islands and the evolution of microbes. Annu Rev Microbiol 54:641–679
Article PubMed CAS Google Scholar
Hacker J, Blum-Oehler G, Muhldorfer I, Tschape H (1997) Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Mol Microbiol 23(6):1089–1097
Article PubMed CAS Google Scholar
Hayes WS, Borodovsky M (1998) How to interpret an anonymous bacterial genome: machine learning approach to gene identification. Genome Res 8(11):1154–1171
Article PubMed CAS Google Scholar
Heath JR, Ribas A, Mischel PS (2016) Single-cell analysis tools for drug discovery and development. Nat Rev Drug Discov 15(3):204–216
Article PubMed CAS Google Scholar
Hofer A, Steverding D, Chabes A, Brun R, Thelander L (2001) Trypanosoma brucei CTP synthetase: a target for the treatment of African sleeping sickness. Proc Natl Acad Sci U S A 98(11):6412–6416
Article PubMed PubMed Central CAS Google Scholar
Hou C, Zhao H, Tanimoto K, Dean A (2008) CTCF-dependent enhancer-blocking by alternative chromatin loop formation. Proc Natl Acad Sci U S A 105(51):20398–20403
Article PubMed PubMed Central Google Scholar
Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform 11:119
Article CAS Google Scholar
Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS (2009) Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324(5924):218–223
Article PubMed PubMed Central CAS Google Scholar
Ingolia NT, Lareau LF, Weissman JS (2011) Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147(4):789–802
Article PubMed PubMed Central CAS Google Scholar
Ingram VM (1956) A specific chemical difference between the globins of normal human and sickle-cell anaemia haemoglobin. Nature 178(4537):792–794
Article PubMed CAS Google Scholar
Ingram VM (1957) Gene mutations in human haemoglobin: the chemical difference between normal and sickle cell haemoglobin. Nature 180(4581):326–328
Article PubMed CAS Google Scholar
Jacob F, Monod J (1961) Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol 3:318–356
Article PubMed CAS Google Scholar
Kaneko T, Tanaka A, Sato S, Kotani H, Sazuka T, Miyajima N, Sugiura M, Tabata S (1995) Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. I. Sequence features in the 1 Mb region from map positions 64% to 92% of the genome. DNA Res 2(4):153–166. 191-8
Article PubMed CAS Google Scholar
Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, Hirosawa M, Sugiura M, Sasamoto S et al (1996) Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res 3(3):109–136
Article PubMed CAS Google Scholar
Kioussis D, Vanin E, deLange T, Flavell RA, Grosveld FG (1983) Beta-globin gene inactivation by DNA translocation in gamma beta-thalassaemia. Nature 306(5944):662–666
Article PubMed CAS Google Scholar
Kozak M (1981) Possible role of flanking nucleotides in recognition of the AUG initiator codon by eukaryotic ribosomes. Nucleic Acids Res 9(20):5233–5252
Article PubMed PubMed Central CAS Google Scholar
Kozak M (1991) Effects of long 5′ leader sequences on initiation by eukaryotic ribosomes in vitro. Gene Expr 1(2):117–125
PubMed CAS Google Scholar
Kozak M (1999) Initiation of translation in prokaryotes and eukaryotes. Gene 234(2):187–208
Article PubMed CAS Google Scholar
Krasemann EW, Meier V, Korenke GC, Hunneman DH, Hanefeld F (1996) Identification of mutations in the ALD-gene of 20 families with adrenoleukodystrophy/adrenomyeloneuropathy. Hum Genet 97(2):194–197
Article PubMed CAS Google Scholar
Kutlar A (2007) Sickle cell disease: a multigenic perspective of a single gene disorder. Hemoglobin 31(2):209–224
Article PubMed CAS Google Scholar
Laemmli UK (1970) Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nat Biotechnol 227:680–685
CAS Google Scholar
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921
Article PubMed CAS Google Scholar
Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO et al (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950):289–293
Article PubMed PubMed Central CAS Google Scholar
Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227(4693):1435–1441
Article PubMed CAS Google Scholar
Liu X, Jiang H, Gu Z, Roberts JW (2013) High-resolution view of bacteriophage lambda gene expression by ribosome profiling. Proc Natl Acad Sci U S A 110(29):11928–11933
Article PubMed PubMed Central Google Scholar
MacKay VL, Li X, Flory MR, Turcott E, Law GL, Serikawa KA, Xu XL, Lee H, Goodlett DR, Aebersold R et al (2004) Gene expression analyzed by high-resolution state array analysis and quantitative proteomics: response of yeast to mating pheromone. Mol Cell Proteomics 3(5):478–489
Article PubMed Google Scholar
Madden SL, Galella EA, Zhu J, Bertelsen AH, Beaudry GA (1997) SAGE transcript profiles for p53-dependent growth regulation. Oncogene 15(9):1079–1085
Article PubMed CAS Google Scholar
Meyer IM, Durbin R (2004) Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res 32(2):776–783
Article PubMed PubMed Central CAS Google Scholar
Moffat JG, Rudolph J, Bailey D (2014) Phenotypic screening in cancer drug discovery – past, present and future. Nat Rev Drug Discov 13(8):588–602
Article PubMed CAS Google Scholar
Morita M, Shimozawa N, Kashiwayama Y, Suzuki Y, Imanaka T (2011) ABC subfamily D proteins and very long chain fatty acid metabolism as novel targets in adrenoleukodystrophy. Curr Drug Targets 12(5):694–706
Article PubMed CAS Google Scholar
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5(7):621–628
Article PubMed CAS Google Scholar
Needleman SB, Wunsch CD (1970) A general method applicable to the search of similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Article PubMed CAS Google Scholar
Noedl H, Se Y, Schaecher K, Smith BL, Socheat D, Fukuda MM (2008) Evidence of artemisinin-resistant malaria in western Cambodia. N Engl J Med 359(24):2619–2620
Article PubMed CAS Google Scholar
Noedl H, Socheat D, Satimai W (2009) Artemisinin-resistant malaria in Asia. N Engl J Med 361(5):540–541
Article PubMed CAS Google Scholar
Noedl H, Se Y, Sriwichai S, Schaecher K, Teja-Isavadharm P, Smith B, Rutvisuttinunt W, Bethell D, Surasri S, Fukuda MM et al (2010) Artemisinin resistance in Cambodia: a clinical trial designed to address an emerging problem in Southeast Asia. Clin Infect Dis 51(11):e82–e89
Article PubMed Google Scholar
Palstra RJ, Tolhuis B, Splinter E, Nijmeijer R, Grosveld F, de Laat W (2003) The beta-globin nuclear compartment in development and erythroid differentiation. Nat Genet 35(2):190–194
Article PubMed CAS Google Scholar
Pauling L, Itano HA, Singer SJ, Wells IC (1949) Sickle cell anemia a molecular disease. Science 110(2865):543–548
Article PubMed CAS Google Scholar
Pearson WR (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol 183:63–98
Article PubMed CAS Google Scholar
Pearson WR (1994) Using the FASTA program to search protein and DNA sequence databases. Methods Mol Biol 24:307–331
PubMed CAS Google Scholar
Pearson WR (1998) Empirical statistical estimates for sequence similarity searches. J Mol Biol 276(1):71–84
Article PubMed CAS Google Scholar
Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85:2444–2448
Article PubMed PubMed Central CAS Google Scholar
Pietras K, Sjoblom T, Rubin K, Heldin CH, Ostman A (2003) PDGF receptors as cancer drug targets. Cancer Cell 3(5):439–443
Article PubMed CAS Google Scholar
Poulos MG, Batra R, Charizanis K, Swanson MS (2011) Developments in RNA splicing and disease. Cold Spring Harb Perspect Biol 3(1):a000778
Article PubMed PubMed Central CAS Google Scholar
Press WH, Teukolsky SA, Tetterling WT, Flannery BP (1992) Numerical recipes in C: the art of scientifi computing. Cambridge University Press, Cambridge
Google Scholar
Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A et al (2007) Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods 4(8):651–657
Article PubMed CAS Google Scholar
Saadatpour A, Lai S, Guo G, Yuan GC (2015) Single-cell analysis in cancer genomics. Trends Genet 31(10):576–586
Article PubMed PubMed Central CAS Google Scholar
Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE (2002) Using the transcriptome to annotate the genome. Nat Biotechnol 20(5):508–512
Article PubMed CAS Google Scholar
Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26(2):544–548
Article PubMed PubMed Central CAS Google Scholar
Schena M (1996) Genome analysis with gene expression microarrays. BioEssays 18(5):427–431
Article PubMed CAS Google Scholar
Schena M (2003) Microarray analysis. Wiley-Liss, New York
Google Scholar
Segurel L, Bon C (2017) On the evolution of lactase persistence in humans. Annu Rev Genomics Hum Genet 18:297–319
Article PubMed CAS Google Scholar
Shine J, Dalgarno L (1974a) The 3′-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc Natl Acad Sci U S A 71(4):1342–1346
Article PubMed PubMed Central CAS Google Scholar
Shine J, Dalgarno L (1974b) Identical 3′-terminal octanucleotide sequence in 18S ribosomal ribonucleic acid from different eukaryotes. A proposed role for this sequence in the recognition of terminator codons. Biochem J 141(3):609–615
Article PubMed PubMed Central CAS Google Scholar
Shine J, Dalgarno L (1975) Determinant of cistron specificity in bacterial ribosomes. Nature 254(5495):34–38
Article PubMed CAS Google Scholar
Shoemaker RH (2006) The NCI60 human tumour cell line anticancer drug screen. Nat Rev Cancer 6(10):813–823
Article PubMed CAS Google Scholar
Sloane AJ, Duff JL, Wilson NL, Gandhi PS, Hill CJ, Hopwood FG, Smith PE, Thomas ML, Cole RA, Packer NH et al (2002) High throughput peptide mass fingerprinting and protein macroarray analysis using chemical printing strategies. Mol Cell Proteomics 1(7):490–499
Article PubMed CAS Google Scholar
Smircich P, Eastman G, Bispo S, Duhagon MA, Guerra-Slompo EP, Garat B, Goldenberg S, Munroe DJ, Dallagiovanna B, Holetz F et al (2015) Ribosome profiling reveals translation control as a key mechanism generating differential gene expression in Trypanosoma cruzi. BMC Genomics 16:443
Article PubMed PubMed Central CAS Google Scholar
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197
Article PubMed CAS Google Scholar
Smyth RP, Davenport MP, Mak J (2012) The origin of genetic diversity in HIV-1. Virus Res 169(2):415–429
Article PubMed CAS Google Scholar
Smyth RP, Schlub TE, Grimm AJ, Waugh C, Ellenberg P, Chopra A, Mallal S, Cromer D, Mak J, Davenport MP (2014) Identifying recombination hot spots in the HIV-1 genome. J Virol 88(5):2891–2902
Article PubMed PubMed Central CAS Google Scholar
Steinberg MH, Rodgers GP (2001) Pathophysiology of sickle cell disease: role of cellular and genetic modifiers. Semin Hematol 38(4):299–306
Article PubMed CAS Google Scholar
Steitz JA, Jakes K (1975) How ribosomes select initiator regions in mRNA: base pair formation between the 3′ terminus of 16S rRNA and the mRNA during initiation of protein synthesis in Escherichia coli. Proc Natl Acad Sci U S A 72(12):4734–4738
Article PubMed PubMed Central CAS Google Scholar
Stormo GD, Schneider TD, Gold L, Ehrenfeucht A (1982a) Use of the ‘Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res 10(9):2997–3011
Article PubMed PubMed Central CAS Google Scholar
Taniguchi T, Weissmann C (1978) Inhibition of Qbeta RNA 70S ribosome initiation complex formation by an oligonucleotide complementary to the 3′ terminal region of E. coli 16S ribosomal RNA. Nature 275(5682):770–772
Article PubMed CAS Google Scholar
Tao H, Bausch C, Richmond C, Blattner FR, Conway T (1999) Functional genomics: expression analysis of Escherichia coli growing on minimal and rich media. J Bacteriol 181(20):6425–6440
PubMed PubMed Central CAS Google Scholar
Taramelli R, Kioussis D, Vanin E, Bartram K, Groffen J, Hurst J, Grosveld FG (1986) Gamma delta beta-thalassaemias 1 and 2 are the result of a 100 kbp deletion in the human beta-globin cluster. Nucleic Acids Res 14(17):7017–7029
Article PubMed PubMed Central CAS Google Scholar
Tech M, Merkl R (2003) YACOP: enhanced gene prediction obtained by a combination of existing methods. In Silico Biol 3(4):441–451
PubMed CAS Google Scholar
Tolhuis B, Palstra RJ, Splinter E, Grosveld F, de Laat W (2002) Looping and interaction between hypersensitive sites in the active beta-globin locus. Mol Cell 10(6):1453–1465
Article PubMed CAS Google Scholar
Trudel MV, Vincent AT, Attere SA, Labbe M, Derome N, Culley AI, Charette SJ (2016) Diversity of antibiotic-resistance genes in Canadian isolates of Aeromonas salmonicida subsp. salmonicida: dominance of pSN254b and discovery of pAsa8. Sci Rep 6:35617
Article PubMed PubMed Central CAS Google Scholar
Vasilescu J, Figeys D (2006) Mapping protein-protein interactions by mass spectrometry. Curr Opin Biotechnol 17(4):394–399
Article PubMed CAS Google Scholar
Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995) Serial analysis of gene expression. Science 270(5235):484–487
Article PubMed CAS Google Scholar
Velculescu VE, Zhang L, Zhou W, Vogelstein J, Basrai MA, Bassett DE Jr, Hieter P, Vogelstein B, Kinzler KW (1997) Characterization of the yeast transcriptome. Cell 88(2):243–251
Article PubMed CAS Google Scholar
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA et al (2001) The sequence of the human genome. Science 291(5507):1304–1351
Article PubMed CAS Google Scholar
Vlasschaert C, Xia X, Coulombe J, Gray DA (2015) Evolution of the highly networked deubiquitinating enzymes USP4, USP15, and USP11. BMC Evol Biol 15:230
Article PubMed PubMed Central CAS Google Scholar
Washburn MP, Wolters D, Yates JR 3rd (2001) Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol 19(3):242–247
Article PubMed CAS Google Scholar
Waterfield MD, Scrace GT, Whittle N, Stroobant P, Johnsson A, Wasteson A, Westermark B, Heldin CH, Huang JS, Deuel TF (1983) Platelet-derived growth factor is structurally related to the putative transforming protein p28^sis of simian sarcoma virus. Nature 304(5921):35–39
Article PubMed CAS Google Scholar
Waterman MS, Vingron M (1994) Rapid and accurate estimates of statistical significance for sequence data base searches. Proc Natl Acad Sci U S A 91(11):4625–4628
Article PubMed PubMed Central CAS Google Scholar
Weigert MG, Garen A (1965) Base composition of nonsense codons in E. coli. evidence from amino-acid substitutions at a tryptophan site in alkaline phosphatase. Nature 206(988):992–994
Article PubMed CAS Google Scholar
Wilson DS, Nock S (2002) Functional protein microarrays. Curr Opin Chem Biol 6(1):81–85
Article PubMed CAS Google Scholar
Wu J, Tzanakakis ES (2013) Deconstructing stem cell population heterogeneity: single-cell analysis and modeling approaches. Biotechnol Adv 31(7):1047–1062
Article PubMed PubMed Central CAS Google Scholar
Xia X (2013) DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution. Mol Biol Evol 30:1720–1728
Article PubMed PubMed Central CAS Google Scholar
Xia X (2017d) Self-organizing map for characterizing heterogeneous nucleotide and amino acid sequence motifs. Computation 5(4):43
Article Google Scholar
Yates JR (2004a) Mass spectral analysis in proteomics. Annu Rev Biophys Biomol Struct 33:297–316
Article PubMed CAS Google Scholar
Yates JR (2004b) Mass spectrometry as an emerging tool for systems biology. BioTechniques 36(6):917–919
Article PubMed CAS Google Scholar
Yoon JH, De S, Srikantan S, Abdelmohsen K, Grammatikakis I, Kim J, Kim KM, Noh JH, White EJ, Martindale JL et al (2014) PAR-CLIP analysis uncovers AUF1 impact on target RNA fate and genome integrity. Nat Commun 5:5248
Article PubMed CAS Google Scholar
Zhang L, Zhou W, Velculescu VE, Kern SE, Hruban RH, Hamilton SR, Vogelstein B, Kinzler KW (1997) Gene expression profiles in normal and cancer cells. Science 276(5316):1268–1272
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

University of Ottawa CAREG and Biology Department, Ottawa, ON, Canada
Xuhua Xia

Authors

Xuhua Xia
View author publications
You can also search for this author in PubMed Google Scholar

Appendix: Being Colorful Is Not Enough—How Are Stop Codons Decoded?

In the old and not-so-good days of 1965, three stop codons were known to exist. At that time, almost all molecular biologists study bacteriophages (T4 and λ) and their hosts (typically E. coli ). Sometimes a phage would have a mutation from a sense codon to a stop codon, leading to a truncated protein. Such a phage can only grow in certain E. coli strains containing a suppressor mutation (which is typically a tRNA with a mutated anticodon that can base-pair with a stop codon). Such strains were then called suppressor strains. The first set of nonsense mutations were discovered and isolated by Richard Epstein and Charles Steinberg. The resulting stop codon from these nonsense mutations was named after their friend Harris Bernstein whose last name means “amber” in German. The associated suppressor was termed amber suppressor. Any phage with a nonsense mutation that can grow only in this particular set of suppressor strains was said to harbor an amber mutation. The observation that an amber stop codon could be suppressed (i.e., recognized as a sense codon) in suppressor strains but not in other strains suggests some ambiguity in the meaning of the stop codon, i.e., the stop codon is interpreted differently between the suppressor strains and non-suppressor strains.

Phages with another set of nonsense mutations can grow only in a different set of E. coli strains, and the stop codon resulting from these nonsense mutations was named ochre. The associated suppressor is termed ochre suppressor. The third stop codon was named opal. In short, there are three disjoint sets of suppressor strains corresponding to three stop codons, but biologists did not know which stop codon is associated with amber, ochre, or opal.

Martin Weigert and Alan Garen (1965) studied one particular amino acid site in the alkaline phosphatase gene in E. coli . This site is occupied by tryptophan coded by UGG. An amber mutation (i.e., a nonsense mutation that can be suppressed by an amber suppressor) occurred at this site. In several revertants (in which the amber mutation had reversed to a sense codon), the original amino acid site was found to be occupied by seven different amino acids, Glu, Lys, Leu, Gln, Ser, Trp, and Tyr (Fig. 1.5). Can we now match the amber codon to one of the stop codons?

Because mutation is rare, we may assume that the revertants differ from the amber mutant by a single nucleotide. These candidate codons are shown in Fig. 1.6 for each of the three stop codons. UGA is the least likely candidate because it cannot generate a codon for Glu, Lys, Gln, and Tyr by a single nucleotide change. UAA can also be excluded because (1) it cannot generate a Trp UGG codon by a single nucleotide change, and (2) it would need to have a double mutation from the original UGG codon. In contrast, UAG can mutate to one of the codons for all seven amino acids, and it can result from the original UGG through a single nucleotide replacement. So the amber codon is UAG. (I have made things easier because Weigert and Garen were not so sure about the sense codons which were only partially decoded in 1965).

This example illustrates several decision-making principles that are useful to us. The first is the KISS principle (“Keep it simple, stupid”). That is, to use simplifying assumptions to make decision-making easier. For example, if we assume that double mutation (i.e., mutations at two sites of the codon) is so rare as to be negligible, then we can immediately reject UAA and UGA because both require double mutations to generate codons for the seven amino acids. In terms of model selection, we state that only the UAG model fits the data given the model condition (the assumption of no double mutation).

The second is the parsimony principle. We have three alternative hypotheses corresponding to the three stop codons. The UAG hypothesis needs a minimum of seven single point mutations to generate the seven revertants each with a different amino acid. It also needs a point mutation to connect the original UGG codon. The UGA hypothesis needs at least two point mutations to change to a Glu, Lys, Gln, or Try codon, and one mutation to each of the other three amino acids. This comes to a minim number of 11 point mutations, plus a point mutation to connect the original UGG codon. The UAA hypothesis needs to have two point mutations to a UGG revertant as well as two point mutations from the original UGG codon, leading to a minimum of ten point mutations. The UAG hypothesis is therefore the most parsimonious and is chosen over the other two alternatives.

The third is the likelihood principle, but we will not go there without first learning substitution models which is covered in a later chapter.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Xia, X. (2018). String Mathematics, BLAST, and FASTA. In: Bioinformatics and the Cell. Springer, Cham. https://doi.org/10.1007/978-3-319-90684-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-90684-3_1
Published: 06 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90682-9
Online ISBN: 978-3-319-90684-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

String Mathematics, BLAST, and FASTA

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Appendix: Being Colorful Is Not Enough—How Are Stop Codons Decoded?

Appendix: Being Colorful Is Not Enough—How Are Stop Codons Decoded?

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation