Advertisement

Exploring Homology Using the Concept of Three-State Entropy Vector

  • Armando J. Pinho
  • Sara P. Garcia
  • Paulo J. S. G. Ferreira
  • Vera Afreixo
  • Carlos A. C. Bastos
  • António J. R. Neves
  • João M. O. S. Rodrigues
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6282)

Abstract

The three-base periodicity usually found in exons has been used for several purposes, as for example the prediction of potential genes. In this paper, we use a data model, previously proposed for encoding protein-coding regions of DNA sequences, to build signatures capable of supporting the construction of meaningful dendograms. The model relies on the three-base periodicity and provides an estimate of the entropy associated with each of the three bases of the codons. We observe that the three entropy values vary among themselves and also from species to species. Moreover, we provide evidence that this makes it possible to associate a three-state entropy vector with each species, and we show that similar species are characterized by similar three-state entropy vectors.

Keywords

DNA signature DNA coding regions DNA entropy Markov models 

References

  1. 1.
    Bell, T.C., Cleary, J.G., Witten, I.H.: Text compression. Prentice-Hall, Englewood Cliffs (1990)Google Scholar
  2. 2.
    Eskesen, S.T., Eskesen, F.N., Kinghorn, B., Ruvinsky, A.: Periodicity of DNA in exons. BMC Molecular Biology 5 (2004)Google Scholar
  3. 3.
    Ferreira, P.J.S.G., Neves, A.J.R., Afreixo, V., Pinho, A.J.: Exploring three-base periodicity for DNA compression and modeling. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP-2006., Toulouse, France, vol. 5, pp. 877–880 (May 2006)Google Scholar
  4. 4.
    Fickett, J.W.: Recognition of protein coding regions in DNA sequences. Nucleic Acids Research 10(17), 5303–5318 (1982)CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Frenkel, F.E., Korotkov, E.V.: Using triplet periodicity of nucleotide sequences for finding potential reading frame shifts in genes. DNA Research 16, 105–114 (2009)CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Issac, B., Singh, H., Kaur, H., Raghava, G.P.S.: Locating probable genes using Fourier transform approach. Bioinformatics 18(1), 196–197 (2002)CrossRefPubMedGoogle Scholar
  7. 7.
    Kotlar, D., Lavner, Y.: Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. Genome Research 13, 1930–1937 (2003)PubMedPubMedCentralGoogle Scholar
  8. 8.
    Laplace, P.S.: Mémoire sur la probabilité des causes par les événements. Mémoires de l’Académie royale des Sciences de Paris (Savants étrangers) 6, 621–656 (1774); Reprinted in Oeuvres complètes de Laplace, Gauthier-Villars et fils, Paris, vol. 8, pp. 27–65 (1891) Google Scholar
  9. 9.
    Pinho, A.J., Neves, A.J.R., Afreixo, V., Bastos, C.A.C., Ferreira, P.J.S.G.: A three-state model for DNA protein-coding regions. IEEE Trans. on Biomedical Engineering 53(11), 2148–2155 (2006)CrossRefGoogle Scholar
  10. 10.
    Salomon, D.: Data compression - The complete reference, 2nd edn. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  11. 11.
    Sayood, K.: Introduction to data compression, 2nd edn. Morgan Kaufmann, San Francisco (2000)Google Scholar
  12. 12.
    Sokal, R.R., Michener, C.D.: A statistical method for evaluating systematic relationships. University of Kansas Scientific Bulletin 28, 1409–1438 (1958)Google Scholar
  13. 13.
    Tiwari, S., Ramachandran, S., Bhattacharya, A., Bhattacharya, S., Ramaswamy, R.: Prediction of probable genes by Fourier analysis of genomic sequences. Bioinformatics 13, 263–270 (1997)CrossRefGoogle Scholar
  14. 14.
    Trifonov, E.N.: 3-, 10.5, 200- and 400-base periodicities in genome sequences. Physica A 249, 511–516 (1998)CrossRefGoogle Scholar
  15. 15.
    Yin, C., Yau, S.S.T.: Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. Journal of Theoretical Biology 247(1), 687–694 (2007)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Armando J. Pinho
    • 1
  • Sara P. Garcia
    • 1
  • Paulo J. S. G. Ferreira
    • 1
  • Vera Afreixo
    • 2
  • Carlos A. C. Bastos
    • 1
  • António J. R. Neves
    • 1
  • João M. O. S. Rodrigues
    • 1
  1. 1.Signal Processing Lab, DETI / IEETAUniversity of AveiroAveiroPortugal
  2. 2.Department of MathematicsUniversity of AveiroAveiroPortugal

Personalised recommendations