Neural Network System for Promoter Recognition

  • Vladimir B. Bajić
  • Ivan V. Bajić
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 45)


The computational prediction of regulatory components in genomic DNA is an attractive and complex research field. The main interest is in finding protein coding genes in long stretches of non-mapped DNA. A particularly important segment of gene finding is the location of promoters - a specific group of regulatory components that are just at the beginning of the gene and which initiate the DNA transcription process. The computational methods for promoter recognition are not sufficiently developed yet. Current methods are prone to produce a large number offalse predictions. We present a new method based on clustering the PCA transformed DNA data with further signal processing of the clustered data. The basic technical system consists of eleven neural networks (one SOM ANN and ten GRNNs). On an independent test set the system shows an increased accuracy of recognition with a reduced level offalse positive reporting. A special method of data separation into the training set and test set is used. The results achieved with the extended system appear to be currently the best in the class of those that use neural networks for promoter recognition.


Transcription Start Site Basic System Extended System False Recognition Logic Block 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alexandrov, N. N. and A. A. Mironov, Application of a new method of pattern recognition in DNA sequence analysis: a study of E.coli promoters. Nucl. Acids Res. 18, 1847–1852, 1990.CrossRefGoogle Scholar
  2. 2.
    Audic, S. and J.-M. Claverie, Detection of eukaryotic promoters using Markov transition matrices, Computer & Chemistry, 21(4), 223–227, 1997.CrossRefGoogle Scholar
  3. 3.
    Bajić, V. B. and I. V. Bajić, A challenging problem of bioinformatics: Artificial neural networks for promoter recognition, Fourth KZN Research Conference on Computer Science, Information Systems and Systems Engineering, Durban, South Africa, June 9, 1999.Google Scholar
  4. 4.
    Bajić, V. B. and I. V. Bajić, ANN in DNA regulatory region recognitions: The case of promoters, Tutorial, CD, International Joint Conference on Neural Networks, Washington, DC, USA, July 10–16, 1999.Google Scholar
  5. 5.
    Benoist, C., K. O’Hare, R. Breathnach and P. Chambon, The ovalbumin gene - sequence of putative control regions. Nucl. Acids Res. 8, 127–142, 1980.CrossRefGoogle Scholar
  6. 6.
    Bucher, P., Weight matrix descriptions of four eukaryotic RNA polymerase II promoter derived from 502 unrelated promoter sequences, J. Mol. Biol, 212, 563–578, 1990.CrossRefGoogle Scholar
  7. 7.
    Campbell, N. A., Biology, 4th edition, The Benjamin/Cummings Publishing Company Ltd., Menlo Park, California, US, 1996.Google Scholar
  8. 8.
    Chen, Q., G. Z. Hertz and G. D. Stormo, PromFD 1.0: a computer program that predicts eukaryotic pol II promoters using strings and IMD matrices. Computer Applic. Biosci, 13, 29–35, 1997.Google Scholar
  9. 9.
    Claverie, J. and I. Sauvaget, Assessing the biological significance of primary structure consensus patterns using sequence databanks. I. Heat-shock and glucocorticoid control elements in eukaryotic promoters. Computer Applic. Biosci, 1, 95–104, 1985.Google Scholar
  10. 10.
    Corden, J., B. Wasylyk, A. Buchwalder, P. Sassone-Corsi, C. Kedinger and P Chambon. Promoter sequence of eukaryotic protein-coding genes. Science 209, 1406–1414, 1980.CrossRefGoogle Scholar
  11. 11.
    Demeler, B. and G. W. Zhou, Neural network optimization for E.coli promoter prediction. Nucl. Acids Res. 19, 1593–1599, 1991.CrossRefGoogle Scholar
  12. 12.
    Dynan, W. S. and R. Tjian, Control of eukaryotic messenger RNA synthesis by sequencespecific DNA-binding proteins. Nature, 316, 774–778, 1985.CrossRefGoogle Scholar
  13. 13.
    Efstratiadis, A., J. W. Posakony, T. Maniatis, R. M. Lawn, C. O’Connell, R. A. Spritz, J. K. De Riel, B. G. Forget, S. M. Weissman, J. L. Slightom, A. E. Blechl, O. Smithies, F. E. Baralle, C. C. Shoulders and N. J. Proudfoot, The structure and evolution of the human beta-globin gene family, Cell, 21: 653–668, 1980.CrossRefGoogle Scholar
  14. 14. Scholar
  15. 15.
    Fickett, J. W. and A. G. Hatzigeorgiou, Eukaryotic promoter recognition, Genome Research, 7(9), 861–878, 1997.Google Scholar
  16. 16.
    Frech, K. and T. Werner, Specific modelling of regulatory units in DNA sequences. Proceedings of the 1997 Pacific Symposium on Biocomputing, World Scientific Publishing Co. Pty.. Ltd., Singapore, 151–162, 1997.Google Scholar
  17. 17.
    Frech, K., K. Quandt and T. Werner, Muscle actin genes: A first step towards computational classification of tissue specific promoters, In Silico Biol, 1, 29–38, 1998.Google Scholar
  18. 18.
    Ghosh, D., Status of the transcription factors database. Nucl. Acids Res, 21, 2091–2093, 1993.Google Scholar
  19. 19.
    Grob, U. and K. Stuber, Recognition of ill-defined signals in nucleic acid sequences. Computer Appl. Biosci, 4, 79–88, 1988.Google Scholar
  20. 20.
    Hatzigeorgiou, A., N. Mache and M. Reczko, Functional site prediction of the DNA sequence by artificial neural networks, Proc. IEEE Int. Joint Symposia on Intelligence and Systems, 12–17, 1996.Google Scholar
  21. 21.
    Hirst, J. D. and M. J. Sternberg, Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. Biochemistry 31, 7211–7218, 1992.CrossRefGoogle Scholar
  22. 22.
    Horton, P. B. and M. Kanehisa, An assessment of neural network and statistical approaches for prediction of E.coli promoter sites. Nucl. Acids Res. 20, 4331–4338, 1992.CrossRefGoogle Scholar
  23. 23.
    Hutchinson, G. B., The prediction of vertebrate promoter regions using differential hexamer frequency analysis. Computer Applic. Biosci, 12, 391–398, 1996.Google Scholar
  24. 24.
    Jones, N. C., P. W. J. Rigby and E. B. Ziff, Trans-acting protein factors and the regulation of eukaryotic transcription: lessons from studies on DNA tumor viruses. Genes Dey. 2, 267–281, 1988.CrossRefGoogle Scholar
  25. 25.
    Knudsen, S., Promoter2.0: for the recognition of Pol II promoter sequences, Bioinformatics, Vol. 15, No. 5, pp. 356–361, 1999.CrossRefGoogle Scholar
  26. 26.
    Kondrakhin, Y. V., A. E. Kel, N. A. Kolchanov, A. G. Romashchenko and L. Milanesi, Eukaryotic promoter recognition by binding sites for transcription factors. Computer Applic. Biosci. 11, 477–488, 1995.Google Scholar
  27. 27.
    Latchman, D. S., Eukaryotic transcription factors, Academic Press, New York, 1991.Google Scholar
  28. 28.
    Lukashin, A. V., V. V. Anshelevich, B. R. Amirikyan, A. I. Gragerov and M. D. Frank-Kamenetskii, Neural network models for promoter recognition. J. Biomol. Struct. Dyn. 6, 1123–1133, 1989.CrossRefGoogle Scholar
  29. 29.
    Mache, N. and P. Levi, Detection of eukaryotic POL II promoters with multi-state timedelay neural network, Proc. of the German conference on Bioinformatics GCB’96 IMISE Report No. 1, Inst. fuer Medizinische Informatik, Statistik und Epidemilogie, Leipzig, ISB 3–000000872-1, 1996Google Scholar
  30. 30.
    Mache, N., M. Reczko and A. Hatzigeorgiou, Multistate time-delay neural networks for the recognition of POL II promoter sequences, ht tp : //www i n f o rma t i k . Scholar
  31. 31.
    Matis, S., Y. Xu, M. Shah, X. Guan, J. R. Einstein, R. Mural and E. Uberbacher, Detection of RNA polymerase II promoters and polyadenylation sites in human DNA sequence. Computers Chem. 20, 135–140, 1996.CrossRefGoogle Scholar
  32. 32.
    McKnight, S. and R. Tjian, Transcriptional selectivity of viral genes in mammalian cells. Cell, 46, 795–805, 1986.CrossRefGoogle Scholar
  33. 33.
    Milanesi, L., M. Muselli and P. Arrigo, Hamming-Clustering method for signal prediction in 5’ and 3’ regions of eukaryotic genes, Comput. Applic. Biosci, 12: 399–404, 1996.Google Scholar
  34. 34.
    Mitchell, P. J. and R. Tjian, Transcriptional regulation in mammalian cells by sequencespecific DNA binding proteins. Science, 245, 371–245, 1989.CrossRefGoogle Scholar
  35. 35.
    Mulligan, M. E. and W. R. McClure, Analysis of the occurrence of promoter-sites in DNA. Nucl. Acids Res, 14, 109–126, 1986.CrossRefGoogle Scholar
  36. 36.
    Nakata, K., M. Kanehisa and J. V. Maizel, Discriminant analysis of promoter regions in Escherichia coli sequences. Computer Applic. Biosci, 4, 367–371, 1988.Google Scholar
  37. 37. Scholar
  38. 38.
    Novina, C.D. and A. L. Roy, Core promoters and transcriptional control. Trends Genet, 9, 351–355, 1996.Google Scholar
  39. 39.
    Nussinov, R., J. Owens and J. V. Maizel, Sequence signals in eukaryotic upstream regions. Biochim. Biophys. Acta, 866, 109–119, 1986.CrossRefGoogle Scholar
  40. 40.
    Ohler, U., S. Harbeck, H. Niemann, E. Noth and M. G. Reese, Interpolated Markov chains for eukaryotic promoter recognition, Bioinformatics, Vol. 15, No. 5, pp. 362–369, 1999.Google Scholar
  41. 41.
    O’Neil, M. C., Consensus Methods for Finding and Ranking DNA Binding Sites. J. Mol. Biol 213, 37–52, 1989.Google Scholar
  42. 42.
    Pedersen, A. G., P. Baldi, Y. Chauvin and S. Brunak, The biology of eukaryotic promoter prediction - a review, Computers & Chemistry, Vol. 23, pp. 191–207, 1999.CrossRefGoogle Scholar
  43. 43.
    Penotii, F., Human DNA TATA boxes and transcription initiation sites. J. Mol. Biol. 213, 37–52, 1990.CrossRefGoogle Scholar
  44. 44.
    Prestridge, D. S. Predicting Pol II promoter sequences using transcription factor binding sites, J. Mol. Biol, 249:923–32, 1995.CrossRefGoogle Scholar
  45. 45.
    Prestridge, D. S., Computer software for eukaryotic promoter analysis, (published over Internet) 1999, http : / /biosci . umn . edu/class /bioc / 8140 / Promoter . htmlGoogle Scholar
  46. 46.
    Quandt, K., K. Grote and T. Werner, Genomelnspector: a new approach to detect correlation patterns of elements on genomic sequences. Computer Applic. Biosci 12, 405–413, 1996.Google Scholar
  47. 47.
    Quandt, K., K. Grote and T. Werner, Genomelnspector: basic software tools for analysis of spatial correlations between genomic structures within megabase sequences. Genomics 33, 301–304, 1996.CrossRefGoogle Scholar
  48. 48.
    Reese, M. Erkennung von Promotoren in pro- und eukaryontischen DNA-Sequenzen durch Künstliche Neuronale Netze, Diploma work, University of Heidelberg, Germany, 1994.Google Scholar
  49. 49.
    Reese, M. G. and F. H. Eeckman, Time-delay neural networks for eukaryotic promoter prediction, submitted, 1999.Google Scholar
  50. 50.
    Reese, M., NNPP program internet address. Scholar
  51. 51.
    Rosenblueth, D. A., D. Thieffry, A. M. Huerta, H. Salgado and J. Collado-Vides, Syntactic recognition of regulatory regions in Escherichia coli, Computer Applic. Biosci, 12(5): 415–422, 1996.Google Scholar
  52. 52.
    Smale, S. T., Generality of a functional initiator consensus sequence. Gene, 182, 13–22, 1997.Google Scholar
  53. 53.
    Solovyev, V. and A. Salamov, The Gene-Finder computer tools for analysis of human and model organisms genome sequences, in Proc. of the Fifth Int. Conf. on Intelligent Systems for Molecular Biology (T. Gaaserland, P. Karp, K. Karplus, C. Ouzounis, K. Sander and A. Valencia, Eds.), ISMB97, 294–302, AAAI Press, Menlo Park, CA, 1997.Google Scholar
  54. 54.
    Staden, R., Computer methods to locate signals in nucleic acid sequences. Nucl. Acids Res 12, 505–519, 1984.CrossRefGoogle Scholar
  55. 55.
    Staden, R., Methods to define and locate patterns of motifs in sequences. Computer Applic. Biosci, 4, 53–60, 1988.Google Scholar
  56. 56.
    Stargell, L. A. and K. Struhl, Mechanisms of transcriptional activation in vivo: two steps forward. Trends Genet. 8, 311–315, 1996.CrossRefGoogle Scholar
  57. 57.
    Wasylyk, B., Transcription elements and factors of RNA polymerase B promoters of higher eukaryotes. Crit. Rev. Biochem 23, 77–120, 1988.CrossRefGoogle Scholar
  58. 58.
    Wingender, E., Transcription regulating proteins and their recognition sequences. CRC Crit. Rev. in Eukaryotic Gene Expression 1, 11–48, 1990.Google Scholar
  59. 59.
    Veljković, V. and I. Slavić, Simple General-Model Pseudopotential, Phys. Rev. Lett, Vol. 29, No. 5, pp. 105–107, 1972.CrossRefGoogle Scholar
  60. 60.
    Veljković, V., I. Ćosić, B. Dimitrijević and D. Lalović, “Is It Possible to Analyze DNA and Protein Sequences by the Methods of Digital Signal Processing?“, IEEE Trans. Biomed. Eng, Vol. 32, No. 5, pp. 337–341, 1985.CrossRefGoogle Scholar
  61. 61.
    Zhang, M. Q., Identification of Human Gene Core Promoters in Silico, Genome Research, 8: 319–326, 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Vladimir B. Bajić
    • 1
  • Ivan V. Bajić
    • 2
  1. 1.Centre for Engineering Research, Technikon NatalDurbanSouth Africa
  2. 2.Rensselaer Polytechnic InstituteTroyUSA

Personalised recommendations