Since the release of the first draft of the entire human DNA sequence in 2001, researchers have been inspired to continue with sequencing many other organisms. This has lead to the creation of comprehensive sequencing libraries which are available for intensive study. The most recent genome to be sequenced has been the gray, short-tailed opossum (Monodelphis domestica). This metatherian (“marsupial”) species, which is the first of its type to be sequenced, may offer researchers an insight not only into the evolution of mammalian genomes in respect to the architecture and functional organization, but may also tender an understanding in the human genome [12].
Much attention within computational biology research has focused on identifying gene products and locations from experimentally obtained DNA sequences. The use of promoter sequence prediction and positions of the transcription start sites can inevitably facilitate the process of gene finding in DNA sequences. This can be more beneficial if the organisms of interest are higher eukaryotes, where the coding regions of the genes are situated in an expanse of non-coding DNA.
With the genomes of numerous organisms now completely sequenced, there is a potential to gain invaluable biological information from these sequences. Computational prediction of promoters from the nucleotide sequences is one of the most attractive topics in sequence analysis today. Current promoter prediction algorithms employ several gene features for prediction. These attributes include homology with known promoters, the presence of particular motifs within the sequence, DNA structural characteristics and the relative signatures of different regions in the sequence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bajic, V.B., Tan, S.L., Suzuki, Y. and Sugano, S. (2004) Promoter prediction analysis on the whole human genome, Nature Biotechnology, 22: 1467–1473.
Burden, S., Lin, Y.-X. and Zhang, R. (2005) Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences, Bioinformatics, 21: 601–607.
Chiaromonte, F., Miller, W. and Bouhassira, E.E. (2003) Gene length and proximity to neighbors affect genome-wide expression levels, Genome Research, 13: 2602–2608.
Dai, Y., Zhang, R. and Lin, Y.-X. (2006) The probability distribution of distance TSS-TLS is organism characteristic and can be used for promoter prediction. In: Ali, M. and Daposigny, R. (eds) Advances in Applied Artificial Intelligence – Lecture Notes in Artificial Intelligence (LNAI 4031). Springer, Heidelberg Berlin New York, pp. 927–934.
Fickett, J.W. and Hatzigeorgiou, A.G. (1997) Eukaryotic promoter recognition, Genome Research, 7: 861–878.
Garcia-Hernandez, M., Berardini, T., Chen, G., Crist, D., Doyle, A., Huala, E., Knee, E., Lambrecht, M., Miller, N., Mueller, L.A., Mundodi, S., Reiser, L., Rhee, S.Y., School, R., Tacklind, J., Weems, D.C., Wu, Y., Xu, I., Yoo, D. Yoon J. and Zhang, P. (2002) TAIR: a resource of integrated Arabidopsis data, Functional & Intergrative Genomics, 2: 239–253.
Gorm Pedersen, A., Baldi, P., Chauvin, Y. and Brunak, S. (1999) The biology of eukaryotic promoter prediction – a review, Computers & Chemistry, 23: 191–207.
Hughes, T.A. (2006) Regulation of gene expression by alternative untranslated regions, Trends in Genetics, 22: 119–122.
Knudsen, S. (1999) Promoter2.0: for the recognition of PolII promoter sequences, Bioinformatics, 15: 356–361.
Lemos, B., Bettencourt, B.R., Meiklejohn, C.D. and Hartl, D.L. (2005) Evolution of proteins and gene expression levels are coupled in Drosophila and are independently assocated with mRNA abundance, protein length, and number of protein–protein interactions, Molecular Biology and Evolution, 22: 1345–1354.
Makita, Y., Nakao, M., Ogasawara, N. and Nakai, K. (2004) DBTBS: database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics, Nucleic Acid Research, 32: D75–D77.
Mikkelsen, T.S., Wakefield, M.J., Aken, B., Amemiya, C.T., Chang, J.L., Duke, D., Garber, M., Gentles, A.J., Goodstadt, L., Heger, A., Jurka, J., Kamal, M., Mauceli, E., Searle, S.M.J., Sharpe, T., Baker, M.L., Batzer, M.A., Benos, P.V., Belov, K., Clamp, M., Cook, A., Cuff, J., Das, R., Davidow, L., Deakin, J.E., Fazzari, M.J., Glass, J.L., Grabherr, M., Greally, J.M., Gu, W., Hore, T.A., Huttley, G.A., Kleber, M., Jirtle, R.L., Koina, E., Lee, J.T., Mahony, S., Marra, M.A., Miller, R.D., Nicholls, R.D., Oda, M., Papenfuss, A.T., Parra, Z.E., Pollock, D.D., Ray, D.A., Schein, J.E., Speed, T.P., Thompson, K., VandeBerg, J.L., Wade, C.M., Walker, J.A., Waters, P.D., Webber, C., Weidman, J.R., Xie, X., Zody, M.C., Broad Institute Genome Sequencing Platform, Broad Institute Whole Genome Assembly Team, Marshall Graves, J.A., Ponting, C.P., Breen, M., Samollow, P.B., Lander, E.S. and Lindblad-Toh, K. (2007) Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences, Nature, 447: 167–178.
Ohler, U. and Niemann, H. (2001) Identification and analysis of eukaryotic promoters: recent computational approaches, Trends in Genetics, 17: 56–60.
Pandey, S.P. and Krishnamachari, A. (2006) Computational analysis of plant RNA Pol-II promoters, BioSystems, 83: 38–50.
Qui, P. (2003a) Recent advances in computational promoter analysis in understanding the transcriptional regulatory network, Biochemical and Biophysical Research Communications, 309: 495–501.
Qui, P. (2003b) Computational approaches for deciphering the transcriptional regulatory network by promoter analysis, Biosilico, 4: 125–133.
Reese, M.G. (2001) Application of a time-delay neural network to promoter annotation in the Drosophila Melanogaster genome, Computers and Chemistry, 26: 51–56.
Salgado, H., Cama-Castro, S., Peralta-Gil, M., Daz-Peredo, E., Snchez-Solano, F., Santo-Zavaleta, A., Martnez-Flores, I., Jimnez-Jacinto, V., Bonavides-Martnez, C., Segura-Salazar, J., Martnez-Antonio, A., and Collado-Vides, J. (2006) RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions, Nucleic Acids Research, 34: D394–D397.
Suzuki, Y., Yamashita, R., Sugano, S. and Nakai, K. (2004) DBTSS, DataBase of transcriptional start sites: progress report 2004, Nucleic Acids Research, 32: D78–D81.
Tan, T., Frenkel, D., Gupta, V. and Deem, M.W. (2005) Length, protein–protein interactions, and complexity, Physica A, 350: 52–62.
Wang, D., Hsieh, M. and Li, W. (2005) A general tendency for conservation of protein length across eukaryotic kingdoms, Molecular Biology and Evolution, 22: 142–147.
Zhang, J. (2000) Protein-length distributions for the three domains of life, Trends in Genetics, 16: 107–109.
Zhu, J. and Zhang, M.Q. (1998) SCPD: a promoter database of the yeast Saccharomyces cerevisiae, Bioinformatics, 15: 607–611.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Caldwell, R., Dai, Y., Srivastava, S., Lin, YX., Zhang, R. (2008). Improving Neural Network Promoter Prediction by Exploiting the Lengths of Coding and Non-Coding Sequences. In: Liu, Y., Sun, A., Loh, H.T., Lu, W.F., Lim, EP. (eds) Advances of Computational Intelligence in Industrial Systems. Studies in Computational Intelligence, vol 116. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78297-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-78297-1_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78296-4
Online ISBN: 978-3-540-78297-1
eBook Packages: EngineeringEngineering (R0)