A method for classifying unaligned biological sequences

  • B. Tallur
  • J. Nicolas
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


It is needless to emphasize the importance of classification of protein sequences in molecular biology. Various methods of classification are currently being used by biologists (Landès et aí.1992) but most of them require the sequences to be prealigned — and thus to be of equal length — using one of the several multiple alignment algorithms available, so as to make the site-by-site comparison of sequences possible. Two LLA-based approaches for classifying prealigned sequences were already proposed (Lerman et al. (1994a)) whose results compared favourably with most currently used methods. The first approach made use of the “preordonnance” coding and the second one, the idea of “significant windows”. The new directions of research leading to a clustering method free from this somewhat strong constraint were also suggested by the authors. The present paper gives an account of the recent developments of our research, consisting of a new method that gets round the sequence comparison problem faced with while dealing with unaligned sequences, thanks to the “significant windows” approach.


Window Size Sequence Pair Beam Search Letter Pair Significant Window 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Abe.K., Gita.N.. (1982): Distances between strings of symbols: Review and remarks. ICPR6, Munich.Google Scholar
  2. Barker W.C., Hunt L., George D.(1988): Protein Seq. Data Anal., 1, 363.Google Scholar
  3. Dayhoff M.O., Barker W.C., Hunt L.T. (1983): Mehods Enzymol., 91, 524–545.Google Scholar
  4. Dickerson R., Geis I. (1983): Hemoglobin, Benjamin/Cummings, Menlo Park, CA.Google Scholar
  5. Eriani G., Delarue E., Poch O., Gangloff J., Moras D. (1990): Partitions of tRNA synthetases into two classes based on mutually exclusive sets of sequence motifs. Nature, 347, 203–206.Google Scholar
  6. George D., Barker W., Hunt L. (1990): Mutation Data Matrix and Its Uses. Mehods Enzymol., 183, 313–330.Google Scholar
  7. Gribscov M., Mclachlan A., Eisenberg D. (1987): Profile analysis: Detection of distantly related proteins. Proc. Natl. Acad. Sci. 84, 4355–4358.Google Scholar
  8. Landès C., Henault A., Risler J.L. (1992): A comparison of several similarity indices used in the classification of protein sequences: a multivariate analysis. Nucleic Acids Research, 20, 3631–3637.Google Scholar
  9. Landès C., Perona J.J, Brunie S., Rould M.A., Zelwer C., Steitz T.A., Risler J-L. (1995): A structure-based multiple sequence alignment of all class I aminoacyl-tRNA synthetases. Biochimie, 77, 194–203.Google Scholar
  10. Lebbe J., Vignes R. (1992): Sélection d’un sous-ensembe de descripteurs maximalement discriminant. Troisième journées Symbolique-Numérique, université de Paris Dauphine.Google Scholar
  11. Lebbe J., Vignes R. (1993): Local predictability in biological sequences, algorithms and application. Biochimie, 75, 371–378.Google Scholar
  12. Lerman I C., Peter Ph., Leredde H. (1993): Principes et calculs de la méthode implantée dans le programme CHAVL (Classification Hiérarchique par Analyse de la Vraisemblance des Liens). La revue de Modulad, Numéro 12, Dec 93.Google Scholar
  13. Lerman I.C., Nicolas J., Tallur B., Peter Ph. (1994a): Classification of aligned biological sequences. New Approaches in Classification and Data Analysis, Springer Verlag, Berlin.Google Scholar
  14. Lerman I. C., Peter Ph., Risler J. L. (1994b): Matrices AVL pour la classification et alignement de séquences protéiques. Publication IRISA No 866, IRISA, Rennes, France.Google Scholar
  15. Risler J.L., Delorme M.O., Delacroix H. and Hénault A. (1988): Amino acid substitutions in structurally related proteins: a pattern recognition approach. Determination of a new and efficient scoring matrix. Journal of Molecular Biology, 204. 1019–1029.Google Scholar

Copyright information

© Springer Japan 1998

Authors and Affiliations

  • B. Tallur
    • 1
  • J. Nicolas
    • 1
  1. 1.Campus Universitaire de BeaulieuIRISARennes cedexFrance

Personalised recommendations