Skip to main content

Improving Neural Network Promoter Prediction by Exploiting the Lengths of Coding and Non-Coding Sequences

  • Chapter
Book cover Advances of Computational Intelligence in Industrial Systems

Part of the book series: Studies in Computational Intelligence ((SCI,volume 116))

Since the release of the first draft of the entire human DNA sequence in 2001, researchers have been inspired to continue with sequencing many other organisms. This has lead to the creation of comprehensive sequencing libraries which are available for intensive study. The most recent genome to be sequenced has been the gray, short-tailed opossum (Monodelphis domestica). This metatherian (“marsupial”) species, which is the first of its type to be sequenced, may offer researchers an insight not only into the evolution of mammalian genomes in respect to the architecture and functional organization, but may also tender an understanding in the human genome [12].

Much attention within computational biology research has focused on identifying gene products and locations from experimentally obtained DNA sequences. The use of promoter sequence prediction and positions of the transcription start sites can inevitably facilitate the process of gene finding in DNA sequences. This can be more beneficial if the organisms of interest are higher eukaryotes, where the coding regions of the genes are situated in an expanse of non-coding DNA.

With the genomes of numerous organisms now completely sequenced, there is a potential to gain invaluable biological information from these sequences. Computational prediction of promoters from the nucleotide sequences is one of the most attractive topics in sequence analysis today. Current promoter prediction algorithms employ several gene features for prediction. These attributes include homology with known promoters, the presence of particular motifs within the sequence, DNA structural characteristics and the relative signatures of different regions in the sequence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bajic, V.B., Tan, S.L., Suzuki, Y. and Sugano, S. (2004) Promoter prediction analysis on the whole human genome, Nature Biotechnology, 22: 1467–1473.

    Article  Google Scholar 

  2. Burden, S., Lin, Y.-X. and Zhang, R. (2005) Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences, Bioinformatics, 21: 601–607.

    Google Scholar 

  3. Chiaromonte, F., Miller, W. and Bouhassira, E.E. (2003) Gene length and proximity to neighbors affect genome-wide expression levels, Genome Research, 13: 2602–2608.

    Article  Google Scholar 

  4. Dai, Y., Zhang, R. and Lin, Y.-X. (2006) The probability distribution of distance TSS-TLS is organism characteristic and can be used for promoter prediction. In: Ali, M. and Daposigny, R. (eds) Advances in Applied Artificial Intelligence – Lecture Notes in Artificial Intelligence (LNAI 4031). Springer, Heidelberg Berlin New York, pp. 927–934.

    Google Scholar 

  5. Fickett, J.W. and Hatzigeorgiou, A.G. (1997) Eukaryotic promoter recognition, Genome Research, 7: 861–878.

    Google Scholar 

  6. Garcia-Hernandez, M., Berardini, T., Chen, G., Crist, D., Doyle, A., Huala, E., Knee, E., Lambrecht, M., Miller, N., Mueller, L.A., Mundodi, S., Reiser, L., Rhee, S.Y., School, R., Tacklind, J., Weems, D.C., Wu, Y., Xu, I., Yoo, D. Yoon J. and Zhang, P. (2002) TAIR: a resource of integrated Arabidopsis data, Functional & Intergrative Genomics, 2: 239–253.

    Google Scholar 

  7. Gorm Pedersen, A., Baldi, P., Chauvin, Y. and Brunak, S. (1999) The biology of eukaryotic promoter prediction – a review, Computers & Chemistry, 23: 191–207.

    Article  Google Scholar 

  8. Hughes, T.A. (2006) Regulation of gene expression by alternative untranslated regions, Trends in Genetics, 22: 119–122.

    Article  Google Scholar 

  9. Knudsen, S. (1999) Promoter2.0: for the recognition of PolII promoter sequences, Bioinformatics, 15: 356–361.

    Google Scholar 

  10. Lemos, B., Bettencourt, B.R., Meiklejohn, C.D. and Hartl, D.L. (2005) Evolution of proteins and gene expression levels are coupled in Drosophila and are independently assocated with mRNA abundance, protein length, and number of protein–protein interactions, Molecular Biology and Evolution, 22: 1345–1354.

    Article  Google Scholar 

  11. Makita, Y., Nakao, M., Ogasawara, N. and Nakai, K. (2004) DBTBS: database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics, Nucleic Acid Research, 32: D75–D77.

    Article  Google Scholar 

  12. Mikkelsen, T.S., Wakefield, M.J., Aken, B., Amemiya, C.T., Chang, J.L., Duke, D., Garber, M., Gentles, A.J., Goodstadt, L., Heger, A., Jurka, J., Kamal, M., Mauceli, E., Searle, S.M.J., Sharpe, T., Baker, M.L., Batzer, M.A., Benos, P.V., Belov, K., Clamp, M., Cook, A., Cuff, J., Das, R., Davidow, L., Deakin, J.E., Fazzari, M.J., Glass, J.L., Grabherr, M., Greally, J.M., Gu, W., Hore, T.A., Huttley, G.A., Kleber, M., Jirtle, R.L., Koina, E., Lee, J.T., Mahony, S., Marra, M.A., Miller, R.D., Nicholls, R.D., Oda, M., Papenfuss, A.T., Parra, Z.E., Pollock, D.D., Ray, D.A., Schein, J.E., Speed, T.P., Thompson, K., VandeBerg, J.L., Wade, C.M., Walker, J.A., Waters, P.D., Webber, C., Weidman, J.R., Xie, X., Zody, M.C., Broad Institute Genome Sequencing Platform, Broad Institute Whole Genome Assembly Team, Marshall Graves, J.A., Ponting, C.P., Breen, M., Samollow, P.B., Lander, E.S. and Lindblad-Toh, K. (2007) Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences, Nature, 447: 167–178.

    Google Scholar 

  13. Ohler, U. and Niemann, H. (2001) Identification and analysis of eukaryotic promoters: recent computational approaches, Trends in Genetics, 17: 56–60.

    Article  Google Scholar 

  14. Pandey, S.P. and Krishnamachari, A. (2006) Computational analysis of plant RNA Pol-II promoters, BioSystems, 83: 38–50.

    Article  Google Scholar 

  15. Qui, P. (2003a) Recent advances in computational promoter analysis in understanding the transcriptional regulatory network, Biochemical and Biophysical Research Communications, 309: 495–501.

    Article  Google Scholar 

  16. Qui, P. (2003b) Computational approaches for deciphering the transcriptional regulatory network by promoter analysis, Biosilico, 4: 125–133.

    Google Scholar 

  17. Reese, M.G. (2001) Application of a time-delay neural network to promoter annotation in the Drosophila Melanogaster genome, Computers and Chemistry, 26: 51–56.

    Article  Google Scholar 

  18. Salgado, H., Cama-Castro, S., Peralta-Gil, M., Daz-Peredo, E., Snchez-Solano, F., Santo-Zavaleta, A., Martnez-Flores, I., Jimnez-Jacinto, V., Bonavides-Martnez, C., Segura-Salazar, J., Martnez-Antonio, A., and Collado-Vides, J. (2006) RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions, Nucleic Acids Research, 34: D394–D397.

    Google Scholar 

  19. Suzuki, Y., Yamashita, R., Sugano, S. and Nakai, K. (2004) DBTSS, DataBase of transcriptional start sites: progress report 2004, Nucleic Acids Research, 32: D78–D81.

    Article  Google Scholar 

  20. Tan, T., Frenkel, D., Gupta, V. and Deem, M.W. (2005) Length, protein–protein interactions, and complexity, Physica A, 350: 52–62.

    Article  Google Scholar 

  21. Wang, D., Hsieh, M. and Li, W. (2005) A general tendency for conservation of protein length across eukaryotic kingdoms, Molecular Biology and Evolution, 22: 142–147.

    Article  Google Scholar 

  22. Zhang, J. (2000) Protein-length distributions for the three domains of life, Trends in Genetics, 16: 107–109.

    Article  Google Scholar 

  23. Zhu, J. and Zhang, M.Q. (1998) SCPD: a promoter database of the yeast Saccharomyces cerevisiae, Bioinformatics, 15: 607–611.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Caldwell, R., Dai, Y., Srivastava, S., Lin, YX., Zhang, R. (2008). Improving Neural Network Promoter Prediction by Exploiting the Lengths of Coding and Non-Coding Sequences. In: Liu, Y., Sun, A., Loh, H.T., Lu, W.F., Lim, EP. (eds) Advances of Computational Intelligence in Industrial Systems. Studies in Computational Intelligence, vol 116. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78297-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78297-1_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78296-4

  • Online ISBN: 978-3-540-78297-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics