Improving Neural Network Promoter Prediction by Exploiting the Lengths of Coding and Non-Coding Sequences

Caldwell, Rachel; Dai, Yun; Srivastava, Sheenal; Lin, Yan-Xia; Zhang, Ren

doi:10.1007/978-3-540-78297-1_10

Rachel Caldwell⁵,
Yun Dai⁶,
Sheenal Srivastava⁶,
Yan-Xia Lin⁶ &
…
Ren Zhang⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 116))

931 Accesses
1 Citations

Since the release of the first draft of the entire human DNA sequence in 2001, researchers have been inspired to continue with sequencing many other organisms. This has lead to the creation of comprehensive sequencing libraries which are available for intensive study. The most recent genome to be sequenced has been the gray, short-tailed opossum (Monodelphis domestica). This metatherian (“marsupial”) species, which is the first of its type to be sequenced, may offer researchers an insight not only into the evolution of mammalian genomes in respect to the architecture and functional organization, but may also tender an understanding in the human genome [12].

Much attention within computational biology research has focused on identifying gene products and locations from experimentally obtained DNA sequences. The use of promoter sequence prediction and positions of the transcription start sites can inevitably facilitate the process of gene finding in DNA sequences. This can be more beneficial if the organisms of interest are higher eukaryotes, where the coding regions of the genes are situated in an expanse of non-coding DNA.

With the genomes of numerous organisms now completely sequenced, there is a potential to gain invaluable biological information from these sequences. Computational prediction of promoters from the nucleotide sequences is one of the most attractive topics in sequence analysis today. Current promoter prediction algorithms employ several gene features for prediction. These attributes include homology with known promoters, the presence of particular motifs within the sequence, DNA structural characteristics and the relative signatures of different regions in the sequence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Supervised promoter recognition: a benchmark framework

Article Open access 02 April 2022

A successful hybrid deep learning model aiming at promoter identification

Article Open access 31 May 2022

Model-driven generation of artificial yeast promoters

Article Open access 30 April 2020

References

Bajic, V.B., Tan, S.L., Suzuki, Y. and Sugano, S. (2004) Promoter prediction analysis on the whole human genome, Nature Biotechnology, 22: 1467–1473.
Article Google Scholar
Burden, S., Lin, Y.-X. and Zhang, R. (2005) Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences, Bioinformatics, 21: 601–607.
Google Scholar
Chiaromonte, F., Miller, W. and Bouhassira, E.E. (2003) Gene length and proximity to neighbors affect genome-wide expression levels, Genome Research, 13: 2602–2608.
Article Google Scholar
Dai, Y., Zhang, R. and Lin, Y.-X. (2006) The probability distribution of distance TSS-TLS is organism characteristic and can be used for promoter prediction. In: Ali, M. and Daposigny, R. (eds) Advances in Applied Artificial Intelligence – Lecture Notes in Artificial Intelligence (LNAI 4031). Springer, Heidelberg Berlin New York, pp. 927–934.
Google Scholar
Fickett, J.W. and Hatzigeorgiou, A.G. (1997) Eukaryotic promoter recognition, Genome Research, 7: 861–878.
Google Scholar
Garcia-Hernandez, M., Berardini, T., Chen, G., Crist, D., Doyle, A., Huala, E., Knee, E., Lambrecht, M., Miller, N., Mueller, L.A., Mundodi, S., Reiser, L., Rhee, S.Y., School, R., Tacklind, J., Weems, D.C., Wu, Y., Xu, I., Yoo, D. Yoon J. and Zhang, P. (2002) TAIR: a resource of integrated Arabidopsis data, Functional & Intergrative Genomics, 2: 239–253.
Google Scholar
Gorm Pedersen, A., Baldi, P., Chauvin, Y. and Brunak, S. (1999) The biology of eukaryotic promoter prediction – a review, Computers & Chemistry, 23: 191–207.
Article Google Scholar
Hughes, T.A. (2006) Regulation of gene expression by alternative untranslated regions, Trends in Genetics, 22: 119–122.
Article Google Scholar
Knudsen, S. (1999) Promoter2.0: for the recognition of PolII promoter sequences, Bioinformatics, 15: 356–361.
Google Scholar
Lemos, B., Bettencourt, B.R., Meiklejohn, C.D. and Hartl, D.L. (2005) Evolution of proteins and gene expression levels are coupled in Drosophila and are independently assocated with mRNA abundance, protein length, and number of protein–protein interactions, Molecular Biology and Evolution, 22: 1345–1354.
Article Google Scholar
Makita, Y., Nakao, M., Ogasawara, N. and Nakai, K. (2004) DBTBS: database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics, Nucleic Acid Research, 32: D75–D77.
Article Google Scholar
Mikkelsen, T.S., Wakefield, M.J., Aken, B., Amemiya, C.T., Chang, J.L., Duke, D., Garber, M., Gentles, A.J., Goodstadt, L., Heger, A., Jurka, J., Kamal, M., Mauceli, E., Searle, S.M.J., Sharpe, T., Baker, M.L., Batzer, M.A., Benos, P.V., Belov, K., Clamp, M., Cook, A., Cuff, J., Das, R., Davidow, L., Deakin, J.E., Fazzari, M.J., Glass, J.L., Grabherr, M., Greally, J.M., Gu, W., Hore, T.A., Huttley, G.A., Kleber, M., Jirtle, R.L., Koina, E., Lee, J.T., Mahony, S., Marra, M.A., Miller, R.D., Nicholls, R.D., Oda, M., Papenfuss, A.T., Parra, Z.E., Pollock, D.D., Ray, D.A., Schein, J.E., Speed, T.P., Thompson, K., VandeBerg, J.L., Wade, C.M., Walker, J.A., Waters, P.D., Webber, C., Weidman, J.R., Xie, X., Zody, M.C., Broad Institute Genome Sequencing Platform, Broad Institute Whole Genome Assembly Team, Marshall Graves, J.A., Ponting, C.P., Breen, M., Samollow, P.B., Lander, E.S. and Lindblad-Toh, K. (2007) Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences, Nature, 447: 167–178.
Google Scholar
Ohler, U. and Niemann, H. (2001) Identification and analysis of eukaryotic promoters: recent computational approaches, Trends in Genetics, 17: 56–60.
Article Google Scholar
Pandey, S.P. and Krishnamachari, A. (2006) Computational analysis of plant RNA Pol-II promoters, BioSystems, 83: 38–50.
Article Google Scholar
Qui, P. (2003a) Recent advances in computational promoter analysis in understanding the transcriptional regulatory network, Biochemical and Biophysical Research Communications, 309: 495–501.
Article Google Scholar
Qui, P. (2003b) Computational approaches for deciphering the transcriptional regulatory network by promoter analysis, Biosilico, 4: 125–133.
Google Scholar
Reese, M.G. (2001) Application of a time-delay neural network to promoter annotation in the Drosophila Melanogaster genome, Computers and Chemistry, 26: 51–56.
Article Google Scholar
Salgado, H., Cama-Castro, S., Peralta-Gil, M., Daz-Peredo, E., Snchez-Solano, F., Santo-Zavaleta, A., Martnez-Flores, I., Jimnez-Jacinto, V., Bonavides-Martnez, C., Segura-Salazar, J., Martnez-Antonio, A., and Collado-Vides, J. (2006) RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions, Nucleic Acids Research, 34: D394–D397.
Google Scholar
Suzuki, Y., Yamashita, R., Sugano, S. and Nakai, K. (2004) DBTSS, DataBase of transcriptional start sites: progress report 2004, Nucleic Acids Research, 32: D78–D81.
Article Google Scholar
Tan, T., Frenkel, D., Gupta, V. and Deem, M.W. (2005) Length, protein–protein interactions, and complexity, Physica A, 350: 52–62.
Article Google Scholar
Wang, D., Hsieh, M. and Li, W. (2005) A general tendency for conservation of protein length across eukaryotic kingdoms, Molecular Biology and Evolution, 22: 142–147.
Article Google Scholar
Zhang, J. (2000) Protein-length distributions for the three domains of life, Trends in Genetics, 16: 107–109.
Article Google Scholar
Zhu, J. and Zhang, M.Q. (1998) SCPD: a promoter database of the yeast Saccharomyces cerevisiae, Bioinformatics, 15: 607–611.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Biological Science, University of Wollongong, Australia
Rachel Caldwell & Ren Zhang
School of Mathematics and Applied Statistics, University of Wollongong, Australia
Yun Dai, Sheenal Srivastava & Yan-Xia Lin

Authors

Rachel Caldwell
View author publications
You can also search for this author in PubMed Google Scholar
Yun Dai
View author publications
You can also search for this author in PubMed Google Scholar
Sheenal Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Xia Lin
View author publications
You can also search for this author in PubMed Google Scholar
Ren Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China
Ying Liu
School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, Singapore, 639798
Aixin Sun & Ee-Peng Lim &
Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117576
Han Tong Loh & Wen Feng Lu &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Caldwell, R., Dai, Y., Srivastava, S., Lin, YX., Zhang, R. (2008). Improving Neural Network Promoter Prediction by Exploiting the Lengths of Coding and Non-Coding Sequences. In: Liu, Y., Sun, A., Loh, H.T., Lu, W.F., Lim, EP. (eds) Advances of Computational Intelligence in Industrial Systems. Studies in Computational Intelligence, vol 116. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78297-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-78297-1_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78296-4
Online ISBN: 978-3-540-78297-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Improving Neural Network Promoter Prediction by Exploiting the Lengths of Coding and Non-Coding Sequences

Access this chapter

Preview

Similar content being viewed by others

Supervised promoter recognition: a benchmark framework

A successful hybrid deep learning model aiming at promoter identification

Model-driven generation of artificial yeast promoters

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Improving Neural Network Promoter Prediction by Exploiting the Lengths of Coding and Non-Coding Sequences

Access this chapter

Preview

Similar content being viewed by others

Supervised promoter recognition: a benchmark framework

A successful hybrid deep learning model aiming at promoter identification

Model-driven generation of artificial yeast promoters

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation