Part of the Computational Biology book series (COBO, volume 20)


This book is meant to serve as an introduction to the new and very exciting field of comparative gene finding. We introduce the field in its current state, and go through the process of constructing a comparative gene finder by breaking it down into its separate building blocks. But before we can dive into the algorithmic details of such a process, we begin by giving a brief introduction to the underlying biological theory. In this chapter we introduce the basic concepts of genetics needed for this book, and define the gene finding problem we have set out to solve. We round off by giving a brief account of the historical developments of approaching the gene finding problem up to where it stands today. In the last section we split the process of building a gene finder into its smaller parts, and the rest of the book is structured in the same manner.


Splice Site Gene Finding Input Sequence European Bioinformatics Institute Saccharomyces Genome Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Alexandersson, M., Cawley, S., Pachter, L.: SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res. 13, 496–502 (2003)Google Scholar
  2. 2.
    Allen, J.E., Salzberg, S.L.: JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 21, 3596–3603 (2005)Google Scholar
  3. 3.
    Audic, S., Claverie, J.-M.: Self-identification of protein-coding regions in microbial genomes. Proc. Natl. Acad. Sci. USA 95, 10026–10031 (1998)Google Scholar
  4. 4.
    Axelson-Fisk, M., Sunnerhagen, P.: Comparative genomics and gene finding in fungi. In: Sunnerhagen, P., Piskur, J. (eds.) Topics in Current Genetics: Comparative Genomics Using Fungi as Models, pp. 1–28. Springer, Berlin (2005)Google Scholar
  5. 5.
    Badger, J.H., Olsen, G.J.: CRITICA: coding region identification tool invoking comparative analysis. Mol. Biol. Evol. 16, 512–524 (1999)Google Scholar
  6. 6.
    Bafna, V., Huson, D.H.: The conserved exon method for gene finding. Int. Conf. Intell. Syst. Mol. Biol. 8, 3–12 (2000)Google Scholar
  7. 7.
    Batzoglou, S., Pachter, L., Mesirov, J., Berger, B., Lander, E.S.: Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res. 10, 950–958 (2000)Google Scholar
  8. 8.
    Beadle, G., Tatum, E.: Genetic control of biochemical reactions in Neurospora. Proc. Natl. Acad. Sci. USA 27, 499–506 (1941)Google Scholar
  9. 9.
    Besemer, J., Lomsadze, A., Borodovsky, M.: GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 29, 2607–2618 (2001)Google Scholar
  10. 10.
    Biémont, C., Vieira, C.: Junk DNA as an evolutionary force. Nature 443, 521–524 (2006)Google Scholar
  11. 11.
    Birney, E., Clamp, M., Durbin, R.: GeneWise and GenomeWise. Genome Res. 14, 988–995 (2004)Google Scholar
  12. 12.
    Birney, E., Durbin, R.: Dynamite: a flexible code generating system for dynamic programming methods used in sequence comparison. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 56–64 (1997)Google Scholar
  13. 13.
    Blandin, G., Durrens, P., Tekaia, F., Aigle, M., Bolotin-Fukuhara, M., Bon, E., Casarégola, S., de Montigny, J., Gaillardin, C., Lépingle, A., Llorente, B., Malpertuy, A., Neuvéglise, C., Ozier-Kalogeropoulus, O., Perrin, A., Potier, S., Souciet, J.-L., Talla, E., Toffano-Nioche, C., Wésolowski-Louvel, M., Marck, C., Dujon, B.: Genomic exploration of the hemiascomycetous yeasts: 4. The genome of Saccharomyces cerevisiae revisited. FEBS Lett. 487, 31–36 (2000)Google Scholar
  14. 14.
    Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003)Google Scholar
  15. 15.
    Borodovsky, M., McIninch, J.: GENMARK: parallel gene recognition for both DNA strands. Comput. Chem. 17, 123–133 (1993)MATHGoogle Scholar
  16. 16.
    Brejova, B., Brown, D.G., Li, M., Vinar, T.: ExonHunter: a comprehensive approach to gene finding. Bioinformatics 21, i57–i65 (2005)Google Scholar
  17. 17.
    Brunak, S., Engelbrecht, J., Knudsen, S.: Prediction of human mRNA donor and acceptor sites from the DNA sequence. J. Mol. Biol. 220, 49–65 (1991)Google Scholar
  18. 18.
    Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997)Google Scholar
  19. 19.
    Carter, D., Durbin, R.: Vertebrate gene finding from multiple-species alignments using a two-level strategy. Genome Biol. 7, S6.1–S6.12 (2006)Google Scholar
  20. 20.
    Cawley, S.E., Wirth, A.I., Speed, T.P.: Phat—-a gene finding program for Plasmodium falciparum. Mol. Biochem. Parasitol. 118, 167–174 (2001)Google Scholar
  21. 21.
    Cebrat, S., Dudek, M.R., Machiewicz, P., Kowalczuk, M., Fita, M.: Asymmetry of coding versus noncoding strand in coding sequences of different genomes. Microb. Comp. Genomics 2, 259–268 (1997)Google Scholar
  22. 22.
    Chatterji, S., Pachter, L.: Reference based annotation with GeneMapper. Genome Biol. 7, R29 (2006)Google Scholar
  23. 23.
    Chen, T., Zhang, M.Q.: Pombe: a gene-finding and exon-intron structure prediction system for fission yeast. Yeast 14, 701–710 (1998)Google Scholar
  24. 24.
    Cherry, J.M., Adler, C., Ball, C., Chervitz, S.A., Dwight, S.S., Hester, E.T., Jia, Y., Juvik, G., Roe, T., Schroeder, M., Weng, S., Botstein, D.: SGD: saccharomyces genome database. Nucleic Acids Res. 26, 73–79 (1998)Google Scholar
  25. 25.
    Claverie, J.M.: Gene number: what if there are only 30,000 human genes? Science 291, 1255–1257 (2001)Google Scholar
  26. 26.
    Comings, D.E.: The structure and function of chromatin. Adv. Hum. Genet. 3, 237–431 (1972)Google Scholar
  27. 27.
    Crick, F.: Cetnral dogma of molecular biology. Nature 227, 561–563 (1970)Google Scholar
  28. 28.
    Curwen, V., Eyras, E., Andrews, T.D., Clarke, L., Mongin, E., Searle, S.M.J., Clamp, M.: The ensembl automatic gene annotation system. Genome Res. 14, 942–950 (2004)Google Scholar
  29. 29.
    DeCaprio, D., Vinson, J.P., Pearson, M.D., Montgomery, P., Doherty, M., Galagan, J.E.: Conrad: gene prediction using conditional random fields. Genome Res. 17, 1389–1398 (2007)Google Scholar
  30. 30.
    Delcher, A.L., Harmon, D., Kasif, S., White, O., Salzberg, S.L.: Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 27, 4636–4641 (1999)Google Scholar
  31. 31.
    Dong, S., Searls, D.B.: Gene structure prediction by linguistic models. Genomics 23, 540–551 (1994)Google Scholar
  32. 32.
    The FANTOM consortium and RIKEN genome exploration research group and genome science group (genome network project core group). Science 309, 1559–1563 (2005)Google Scholar
  33. 33.
    Fickett, J.W.: Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 10, 5303–5318 (1982)Google Scholar
  34. 34.
    Fields, C.A., Söderlund, C.A.: GM: a practical tool for automating DNA sequence analysis. Comput. Appl. Biosci. 6, 263–270 (1990)Google Scholar
  35. 35.
    Flicek, P., Aken, B.L., Beal, K., Ballester, B., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cunningham, F., Cutts, T., Down, T., Dyer, S.C., Eyre, T., Fitzgerald, S., Fernandez-Banet, J., Grf, S., Haider, S., Hammond, M., Holland, R., Howe, K.L., Howe, K., Johnson, N., Jenkinson, A., Khri, A., Keefe, D., Kokocinski, F., Kulesha, E., Lawson, D., Longden, I., Megy, K., Meidl, P., Overduin, B., Parker, A., Pritchard, B., Prlic, A., Rice, S., Rios, D., Schuster, M., Sealy, I., Slater, G., Smedley, D., Spudich, G., Trevanion, S., Vilella, A.J., Vogel, J., White, S., Wood, M., Birney, E., Cox, T., Curwen, V., Durbin, R., Fernandez-Suarez, X.M., Herrero, J., Hubbard, T.J., Kasprzyk, A., Proctor, G., Smith, J., Ureta-Vidal, A., Searle, S.: Ensembl 2008. Nucleic Acids Res. 36, D707–D714 (2008)Google Scholar
  36. 36.
    Frishman, D., Mironov, A., Mewes, H.-W., Gelfand, M.: Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res. 26, 2941–2947 (1998)Google Scholar
  37. 37.
    Gelfand, M.S.: Computer prediction of the exon-intron structure of mammalian pre-mRNAs. Nucleic Acids Res. 18, 5865–5869 (1990)Google Scholar
  38. 38.
    Gelfand, M.S., Mironov, A.A., Pevzner, P.A.: Gene recognition via spliced sequence alignment. Proc. Natl. Acad. Sci. USA 93, 9061–9066 (1996)Google Scholar
  39. 39.
    Gelfand, M.S., Roytberg, M.A.: Prediction of the exon-intron structure by a dynamic programming approach. BioSystems 30, 173–182 (1993)Google Scholar
  40. 40.
    Gerstein, M.B., Bruce, C., Rozowsky, J.S., Zheng, D., Du, J., Korbel, J.O., Emanuelsson, O., Zhang, Z.D., Wiessman, S., Snyder, M.: What is a gene, post-ENCODE? History and updated definition. Genome Res. 17, 669–681 (2007)Google Scholar
  41. 41.
    Gish, W., States, D.J.: Identification of protein coding regions by database similarity search. Nat. Genet. 3, 266–272 (1993)Google Scholar
  42. 42.
    Goffeau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M., Louis, E.J., Mewes, H.W., Murakami, Y., Philippsen, P., Tettelin, H., Oliver, S.G.: Life with 6000 genes. Science 274, 563–567 (1996)Google Scholar
  43. 43.
    Gregory, T.R.: Coincidence, coevolution, or causation? DNA content, cell size, and the C-value enigma. Biol. Rev. 76, 65–101 (2001)Google Scholar
  44. 44.
    Gregory, T.R.: The C-value enigma in plants and animals: a review of parallels and an appeal for partnership. Ann. Bot. 95, 133–146 (2005)Google Scholar
  45. 45.
    Gremme, G., Brendel, V., Sparks, M.E., Kurtz, S.: Engineering a software tool for gene structure prediction in higher organisms. Inf. Softw. Tech. 47, 965–978 (2005)Google Scholar
  46. 46.
    Gross, S.S., Brent, M.R.: Using multiple alignments to improve gene prediction. J. Comput. Biol. 13, 379–393 (2006)MathSciNetGoogle Scholar
  47. 47.
    Guigó, R., Knudsen, S., Drake, N., Smith, T.: Prediction of gene structure. J. Mol. Biol. 226, 141–157 (1992)Google Scholar
  48. 48.
    Guo, F.-B., Ou, H.-Y., Zhang, C.-T.: ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acids Res. 31, 1780–1789 (2003)Google Scholar
  49. 49.
    Harrison, P.M., Kumar, A., Lang, N., Snyder, M., Gerstein, M.: A question of size: the eukaryotic proteome and the problems in defining it. Nucleic Acids Res. 30, 1083–1090 (2002)Google Scholar
  50. 50.
    Henderson, J., Salzberg, S., Fasman, K.H.: Finding genes in DNA with a hidden Markov model. J. Comput. Biol. 4, 127–141 (1997)Google Scholar
  51. 51.
    Howe, K.L., Chothia, T., Durbin, R.: GAZE: a generic framework for the integration of gene-prediction data by dynamic programming. Genome Res. 12, 1418–1427 (2002)Google Scholar
  52. 52.
    Hsieh, S.J., Lin, C.Y., Liu, N.H., Chow, W.Y., Tang, C.Y.: GeneAlign: a coding exon prediction tool based on phylogenetical comparisons. Nucleic Acids Res. 34, W280–W284 (2006)Google Scholar
  53. 53.
    Human genome sequencing consortium: initial sequencing and analysis of the human genome. Nature 409, 745–964 (2002)Google Scholar
  54. 54.
    Hutchinson, G.B., Hayden, M.R.: The prediction of exons through an analysis of spliceable open reading frames. Nucleic Acids Res. 20, 3453–3462 (1992)Google Scholar
  55. 55.
    Issac, B., Raghava, G.P.S.: EGPred: prediction of eukaryotic genes uisng ab initio methods after combining with sequence similarity approaches. Genome Res. 14, 1756–1766 (2004)Google Scholar
  56. 56.
    Kanno, H., Huang, I.-Y., Kan, Y.W., Yoshida, A.: Two structural genes on different chromosomes are required for encoding the major subunit of human red cell glucose-6-phosphate dehydrogenase. Cell 58, 595–606 (1989)Google Scholar
  57. 57.
    Kellis, M., Patterson, N., Endrizzi, M., Birren, B., Lander, E.S.: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241–254 (2003)Google Scholar
  58. 58.
    Kim, H., Klein, R., Majewski, J., Ott, J.: Estimating rates of alternative splicing in mammals and invertebrates. Nat. Genet. 36, 915–917 (2004)Google Scholar
  59. 59.
    Korf, I.: Gene finding in novel genomes. BMC Bioinform. 5, 59 (2004)Google Scholar
  60. 60.
    Korf, I., Flicek, P., Duan, D., Brent, M.R.: Integrating genomic homology into gene structure prediction. Bioinformatics 17, S140–S148 (2001)Google Scholar
  61. 61.
    Kowalczuk, M., Mackiewicz, P., Gierlik, A., Dudek, M.R., Cebrat, S.: Total number of coding open reading frames in the yeast genome. Yeast 15, 1031–1034 (1999)Google Scholar
  62. 62.
    Krogh, A.: Two methods for improving performance of an HMM and their application for gene finding. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 179–186 (1997)Google Scholar
  63. 63.
    Krogh, A.: Using database matches with HMMGene for automated gene detection in Drosophila. Genome Res. 10, 523–528 (2000)Google Scholar
  64. 64.
    Krogh, A., Brown, M., Mian, I.S., Sjölander, K., Haussler, D.: Hidden Markov models in computational biology: applications to protein modeling. J. Mol. Biol. 235, 1501–1531 (2002)Google Scholar
  65. 65.
    Krogh, A., Mian, I.S., Haussler, D.: A hidden Markov model that finds genes in E.coli DNA. Nucleic Acids Res. 22, 4768–4778 (1994)Google Scholar
  66. 66.
    Kulp, D., Haussler, D., Reese, M.G., Eeckman, F.H.: A generalized hidden Markov model for the recognition of human genes in DNA. Proc. Int. Conf. Intell. Syst. Mol. Biol. 4, 134–142 (1996)Google Scholar
  67. 67.
    Kulp, D., Haussler, D., Reese, M.G., Eeckman, F.H.: Integrating database homology in a probabilistic gene structure model. Pac. Symp. Biocomput. 2, 232–244 (1997)Google Scholar
  68. 68.
    Kumar, A., Harrison, P.M., Cheung, K.-H., Lan, N., Echols, N., Bertone, P., Miller, P., Gerstein, M.B., Snyder, M.: An integrated approach for finding overlooked genes in yeast. Nat. Biotech. 20, 58–63 (2002)Google Scholar
  69. 69.
    Larsen, T.S., Krogh, A.: Easy-Gene—a prokaryotic gene finder that ranks ORFs by statistical significance. BMC Bioinform. 4, 21–35 (2003)Google Scholar
  70. 70.
    Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y.O., Borodovsky, M.: Gene identification in novel eukaryotic genomes by self-traning algorithm. Nucleic Acids Res. 33, 6494–6506 (2005)Google Scholar
  71. 71.
    Mackiewicz, P., Kowalczuk, M., Mackiewicz, D., Nowicka, A., Dudkiewicz, M., Laszkiewicz, A., Dudek, M.R., Cebrat, S.: How many protein-coding genes are there in the Saccharomyces cerevisiae genome? Yeast 19, 619–629 (2002)Google Scholar
  72. 72.
    Majoros, W.H., Pertea, M., Antonescu, C., Salzberg, S.L.: GlimmerM, Exonomy and Unveil: three ab initio eukaryotic gene finders. Nucleic Acids Res. 31, 3601–3604 (2003)Google Scholar
  73. 73.
    Majoros, W.H., Pertea, M., Delcher, A.L., Salzberg, S.L.: Efficient decoding algorithms for generalized hidden Markov model gene finders. BMC Bioinform. 6, 16–28 (2005)Google Scholar
  74. 74.
    Majoros, W.H., Pertea, M., Salzberg, S.L.: TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene finders. Bioinformatics 20, 2878–2879 (2004)Google Scholar
  75. 75.
    Majoros, W.H., Pertea, M., Salzberg, S.L.: Efficient implementation of a generalized pair hidden Markov model for comparative gene finding. Bioinformatics 21, 1782–1788 (2005)Google Scholar
  76. 76.
    Mewes, H.W., Heumann, K., Kaps, A., Mayer, K., Pfeiffer, F., Stocker, S., Frishman, D.: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 27, 44–48 (1999)Google Scholar
  77. 77.
    Meyer, I.M., Durbin, R.: Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics 18, 1309–1318 (2002)Google Scholar
  78. 78.
    Meyer, I.M., Durbin, R.: Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res. 32, 776–783 (2004)Google Scholar
  79. 79.
    Milanesi, L., D’Angelo, D., Rogozin, I.B.: GeneBuilder: interactive in silico prediction of gene structure. Bioinformatics 15, 612–621 (1999)Google Scholar
  80. 80.
    Mironov, A.A., Noivchkov, P.S., Gelfand, M.S.: Pro-Frame: similarity-based gene recognition in eukaryotic DNA sequences with errors. Bioinformatics 17, 13–15 (2001)Google Scholar
  81. 81.
    Mouse Genome Sequencing Consortium: Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002)Google Scholar
  82. 82.
    Munch, K., Krogh, A.: Automatic generation of gene finders for euakryotic species. BMC Bioinform. 7, 263–274 (2006)Google Scholar
  83. 83.
    Novichkov, P.S., Gelfand, M.S., Mironov, A.A.: Gene recognition in eukaryotic DNA by comparison of genomic sequences. Bioinformatics 17, 1011–1018 (2001)Google Scholar
  84. 84.
    Ovcharenko, I., Boffelli, D., Loots, G.G.: eShadow: a tool for comparing closely related sequences. Genome Res. 14, 1191–1198 (2004)Google Scholar
  85. 85.
    Parra, G., Agarwal, P., Abril, J.F., Wiehe, T., Fickett, J.W., Guigó, R.: Comparative Gene Prediction in Human and Mouse. Genome Res. 13, 108–117 (2003)Google Scholar
  86. 86.
    Pedersen, J.S., Hein, J.: Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics 19, 219–227 (2003)Google Scholar
  87. 87.
    RIKEN genome exploration research group and genome science group (genome network project core group) and the FANTOM consortium. Science 309, 1564–1566 (2005)Google Scholar
  88. 88.
    Salamov, A.A., Solovyev, V.V.: Ab initio gene finding in Drosophila genomic DNA. Genome Res. 10, 516–522 (2000)Google Scholar
  89. 89.
    Salzberg, S.L., Delcher, A.L., Fasman, K.H., Henderson, J.: A decision tree system for finding genes in DNA. J. Comput. Biol. 5, 667–680 (1998)Google Scholar
  90. 90.
    Salzberg, S.L., Delcher, A.L., Kasif, S., White, O.: Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 26, 544–548 (1998)Google Scholar
  91. 91.
    Schiex, T., Moisan, A., Rouzé, P.: EuGene: an eucaryotic gene finder that combines several sources of evidenc. In: Gascuel, O., Sagot, M.-F. (eds.) Computational Biology, pp. 111–125. Springer, Berlin (2001)Google Scholar
  92. 92.
    Schweikert, G., Zien, A., Zeller, G., Behr, J., Dieteric, C., Ong, C.S., Philips, P., De Bona, F., Hartmann, L., Bohlen, A., Krüger, N., Sonnenburg, S., Rätsch, G.: mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res. June 29 Epub (2009)Google Scholar
  93. 93.
    Siepel, A., Haussler, D.: Computational identification of evolutionary conserved exons. RECOMB 8, 177–186 (2004)Google Scholar
  94. 94.
    Smit, A.F.A., Hubley, R., Green, P.: RepeatMasker.
  95. 95.
    Snyder, E.E., Stormo, G.D.: Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. Nucleic Acids Res. 21, 607–613 (1993)Google Scholar
  96. 96.
    Snyder, E.E., Stormo, G.D.: Identification of protein coding regions in genomic DNA. J. Mol. Biol. 248, 1–18 (1995)Google Scholar
  97. 97.
    Solovyev, V.V., Salamov, A.A., Lawrence, C.B.: Predicting internal exons by oligonucleotide composition and discrimant analysis of spliceable open reading frames. Nucleic Acids Res. 22, 5156–5163 (1994)Google Scholar
  98. 98.
    Southan, C.: Has the yo-yo stopped? an assessment of human protein-coding gene number. Proteomics 4, 1712–1726 (2004)Google Scholar
  99. 99.
    Staden, R.: Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 12, 505–519 (1984)Google Scholar
  100. 100.
    Staden, R., McLachlan, A.D.: Codon preference and its use in identifying protein coding regions in long DNA sequences. Nucleic Acids Res. 10, 141–156 (1982)Google Scholar
  101. 101.
    Stanke, M., Waack, S.: Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19, ii215–ii225 (2003)Google Scholar
  102. 102.
    Swift, H.: The constancy of desoxyribose nucleic acid in plant nuclei. Proc. Natl. Acad. Sci. USA 36, 643–654 (1950)Google Scholar
  103. 103.
    Taher, L., Rinner, O., Garg, S., Sczyrba, A., Brudno, M., Batzoglou, S., Morgenstern, B.: AGenDA: homology-based gene prediction. Bioinformatics 19, 1575–1577 (2003)Google Scholar
  104. 104.
    Vendrely, R., Vendrely, C.: La teneur du noyau cellulaire en acide désoxyribonucléique à travers les organes, les individus et les espéces animales : Techniques et premiers résultats. Experientia 4, 434–436 (1948)Google Scholar
  105. 105.
    Wade, N.: Gene sweepstakes end, but winner may well be wrong. New York Times, 3 June 2003Google Scholar
  106. 106.
    Wain, H.M., Bruford, E.A., Lovering, E.C., Lush, M.J., Wright, M.W., Povey, S.: Guidelines for human gene nomenclature. Genomics 79, 464–470 (2002)Google Scholar
  107. 107.
    Wiehe, T., Gebauer-Jung, S., Mitchell-Olds, T., Guigó, R.: SGP-1: prediction and validation of homologous genes based on sequence alignments. Genome Res. 11, 1574–1583 (2001)Google Scholar
  108. 108.
    Wood, V., Rutherford, K.M., Ivens, A., Rajandream, M.-A., Barrell, B.: A re-annotation of the Saccharomyces cerevisiae genome. Comp. Funct. Genomics 2, 143–154 (2001)Google Scholar
  109. 109.
    Wu, J., Haussler, D.: Coding exon detection using comparative sequences. J. Comput. Biol. 13, 1148–1164 (2006)MathSciNetGoogle Scholar
  110. 110.
    Xu, Y., Mural, R.J., Einstein, J.R., Shah, M.B., Uberbacher, E.C.: GRAIL: a multi-agent neural network system for gene identification. Proc. IEEE 84, 1544–1552 (1996)Google Scholar
  111. 111.
    Xu, Y., Uberbacher, E.C.: In: Salzberg, S.L., Searls, D.B., Kasif, S. (eds.) Computational Methods in Molecular Biology, pp. 109–128. Elsevier Science B.V., Amsterdam (1998)Google Scholar
  112. 112.
    Yada, T., Takagi, T., Totoki, Y., Sakaki, Y., Takaeda, Y.: DIGIT: a novel gene finding program by combining gene-finders. Pac. Symp. Biocomput. 8, 375–387 (2003)Google Scholar
  113. 113.
    Zhang, C.-T., Wang, J.: Recognition of protein coding genes in the yeast genome at better than 95 % accuracy based on the Z curve. Nucleic Acids Res. 28, 2804–2814 (2000)Google Scholar
  114. 114.
    Zhang, M.Q.: Identification of protein coding regions in the human genome by quadratic discriminant analysis. Proc. Natl. Acad. Sci. USA 94, 565–568 (1997)Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  1. 1.Chalmers University of TechnologyGothenburgSweden

Personalised recommendations