Skip to main content

Classification and Data Analysis in Genome Projects: Some Aspects of Mapping, Alignment and Tree Reconstruction

  • Conference paper
Book cover Information Systems and Data Analysis
  • 401 Accesses

Summary

Some statistical aspects of classification and data analysis in genome projects are reviewed and discussed. The construction of genetic maps involves sedation and hierarchical classification problems. The alignment of molecular data and tree reconstruction with genetic distance data are methodological tasks which can be seen as two steps of a data analysis of DNA fragments sequenced in a genome project. The combination of these two steps and the availability of an enormous amount of data on several levels (primary, secondary order, etc.) provide many challenging problems and applications for classification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BANDELT, H.-J., and DRESS, A.W.M. (1993): A Relational Approach to Split Decomposition. In: O. Opitz, B. Lausen, and R. Klar (eds.): Information and Classification: Concepts, Methods and Applications. Springer, Berlin Heidelberg, 123–131.

    Google Scholar 

  • BLUM, N. (1991): On Locally Optimal Alignments in Genetic Sequences. Report No. 8567-CS, Institut für Informatik, University Bonn.

    Google Scholar 

  • BOCK, H.H. (1984): Distanzmaße zum Vergleich von Bäumen, Hierarchien und Sequenzen. In: Bock, H.H. (ed.): Anwendungen der Klassifikation: Datenanalyse und Numerische Klassifikation. Indeks-Verlag, Frankfurt a. M., 52–67.

    Google Scholar 

  • BOCK, H.H. (1989): Datenanalyse zur Strukturierung und Ordnung von Information. In: Wille, R. (ed.): Klassifikation und Ordnung (Classification and Order). Indeks-Verlag, Frankfurt a. M., 1–22.

    Google Scholar 

  • CHOTHIA, C. (1992): One Thousand Families for the Molecular Biologist. Nature, 357, 543–544.

    Article  Google Scholar 

  • DAY, W.H.E. (1991): Estimating Phylogenies With Invariant Functions of Data. In: H.-H. Bock and P. Ihm (eds.): Classification, Data Analysis and Knowledge Organization. Springer, Berlin Heidelberg, 248–253.

    Chapter  Google Scholar 

  • DAY, W.H.E., and McMorris, F.R. (1993a): Discovering Consensus Molecular Sequences. In: O. Opitz, B. Lausen, and R. Klar (eds.): Information and Classification: Concepts, Methods and Applications. Springer, Berlin Heidelberg, 394–402.

    Google Scholar 

  • DAY, W.H.E., and MCMORRIS, F.R. (1993b): Alignment, Comparison and Consensus of Molecular Sequences: A Bibliography. IFCS IV, Paris 1–4, Sept. 1993 (preprint).

    Google Scholar 

  • DEGENS, P.O., LAUSEN, B. and VACH, W. (1990): Reconstruction of Phylogenies by Distance Data: Mathematical Framework and Statistical Analysis. In: A. Dress and A. von Haeseler (eds.): Trees and Hierarchical Structures, Lecture Notes in Biomathematics, 84. Springer, Berlin, 9–42.

    Google Scholar 

  • EVANS, S.N., and SPEED, T.P. (1991): Invariants of Some Probability Models Used in Phylogenetic Inference, (preprint)

    Google Scholar 

  • FEINGOLD, E., BROWN, P.O., and SIEGMUND, D. (1993): Gaussian Models for Genetic Linkage Analysis Using Complete High Resolution Maps of Identity-by-descent. American Journal of Human Genetics, 53, 1, 234–51.

    Google Scholar 

  • FELSENSTEIN, J. (1988): Phylogenies From Molecular Sequences: Inference and Reliability. Annual Review of Genetics, 22, 521–565.

    Article  Google Scholar 

  • GONNET, G.H., COHEN, M.A., and BENNER, S.A. (1992): Exhaustive Matching of the Entire Protein Sequence Database. Science, 256, 1443–1445.

    Article  Google Scholar 

  • FUCHS, R., RICE, P., and CAMERON, G.N. (1992): Molecular Biological Databases -Present and Future. Trends in Biotechnology, 10, 61–66.

    Article  Google Scholar 

  • GOTOH, O.: Optimal Sequence Alignments Allowing for Long Gaps. Bulletin of Mathematical Biology, 52, 359–373.

    Google Scholar 

  • GUENOCHE, A. (1993): Alignment and Hierarchical Clustering Method for Strings. In: O. Opitz, B. Lausen, and R. Klar (eds.): Information and Classification: Concepts, Methods and Applications. Springer, Berlin Heidelberg, 403–412.

    Google Scholar 

  • LATHROP, G.M. and LALOUEL, J.M. (1991): Statistical Methods for Linkage Analysis. In: C.R. Rao and R. Chakraborty (eds.): Handbook of Statistics, Vol. 8, Statistical Methods in Biological and Medical Sciences. North-Holland, Amsterdam, 81–123.

    Google Scholar 

  • LAUSEN, B. (1991): Statistical Analysis of Genetic Distance Data. In: H.-H. Bock and P. Ihm (eds.): Classification, Data Analysis and Knowledge Organization. Springer, Berlin Heidelberg, 254–261.

    Chapter  Google Scholar 

  • LAUSEN, B., and DEGENS, P.O. (1986): Variance Estimation and the Reconstruction of Phylogenies. In: P.O. Degens, H.-J. Hermes, and O. Opitz (eds.): Die Klassifikation und ihr Umfeld (Classification and its Environment). Indeks-Verlag, Frankfurt a.M., 306–314.

    Google Scholar 

  • LAUSEN, B., and DEGENS, P.O. (1988): Evaluation of the Reconstruction of Phylogenies With DNA-DNA Hybridization Data. In: H.H. Bock (ed.): Classification and Related Methods of Data Analysis. North-Holland, Amsterdam, 367–374.

    Google Scholar 

  • LAUSEN, B., and SCHUMACHER, M. (1992): Maximally Selected Rank Statistics. Biometrics, 48, 1, 73–85.

    Article  Google Scholar 

  • LEE, R.H. (1992): Protein Model Building Using Structural Homology. Nature, 356, 543–544.

    Article  Google Scholar 

  • NAVIDI, W.C., CHURCHILL, G.A., and VON HAESELER, A. (1991): Methods for Inferring Phylogenies From Nucleic Acid Sequences by Using Maximum Likelihood and Linear Invariants. Molecular Biology and Evolution, 8, 128–143.

    Google Scholar 

  • NAVIDI, W.C., CHURCHILL, G.A., and VON HAESELER, A. (1993): Phylogenetic Inference: Linear Invariants and Maximum Likelihood. Biometrics, 49, 2, 543–555.

    Article  Google Scholar 

  • NEEDLEMAN, S.B., and WUNSCH, C.D. (1970): A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of two Proteins. Journal of Molecular Biology, 48, 443–453.

    Article  Google Scholar 

  • OTT, J. (1985): Analysis of Human Genetic Linkage. John-Hopkins University Press, Baltimore.

    Google Scholar 

  • PASCARELLA, S., and ARGOS, P. (1992): A Data Bank Merging Related Protein Structures and Sequences. Protein Engineering, 5, 121–137.

    Article  Google Scholar 

  • SAITOU, N. (1991): Statistical Methods for Phylogenetic Tree Reconstruction. In: C.R. Rao and R. Chakraborty (eds.): Handbook of Statistics, Vol. 8, Statistical Methods in Biological and Medical Sciences. North-Holland, Amsterdam, 317–346.

    Google Scholar 

  • SANKOFF, D., and CEDERGREN, R.J. (1983): Simultaneous Comparison of Three or More Sequences Related by a Tree. In: D. Sankoff and J.B. Kruskal (eds.): Time Warps, String Edits, and Macromolecules. Addison-Wesley, Reading/Massachusetts, 253–263.

    Google Scholar 

  • SCHÖNIGER, M., and VON HAESELER, A. (1993). More Reliable Phylogenies by Properly Weighted Nucleotide Substitutions. In: O. Opitz, B. Lausen, and R. Klar (eds.): Information and Classification: Concepts, Methods and Applications. Springer, Berlin Heidelberg, 413–420.

    Google Scholar 

  • SWOFFORD, D.L., and OLSEN, G.J. (1990): Phylogeny Reconstruction. In: D.M. Hillis and C Moritz (eds.): Molecular Systematics. Sinauer Ass., Massachusetts, 411–501.

    Google Scholar 

  • VACH, W. (1992): The Jukes-Cantor Transformation and Additivity of Estimated Genetic Distances. In: M. Schader (ed): Analysing and Modeling Data and Knowledge. Springer, Berlin Heidelberg, 141–150.

    Chapter  Google Scholar 

  • VACH, W., and DEGENS, P.O. (1991a): Least-Squares Approximation of Additive Trees to Dissimilarities — Characterizations and Algorithms. Computational Statistics Quarterly, 3, 203–218.

    Google Scholar 

  • VACH, W., and DEGENS, P.O. (1991b): A new Approach to Isotonic Agglomerative Hierarchical Clustering. Journal of Classification, 8, 217–237.

    Article  Google Scholar 

  • VINGRON, M., and ARGOS, P. (1991): Motif Recognition and Alignment by Consensus. Journal of Molecular Biology, 218, 33–43.

    Article  Google Scholar 

  • WATERMAN, M.S. (1984): General Methods of Sequence Comparisons. Bulletin of Mathematical Biology, 46, 4, 473–500.

    Google Scholar 

  • WEIR, B.S. (1990): Genetic Data Analysis. Sinauer Ass., Massachusetts.

    Google Scholar 

  • WEIR, B.S., and BASTEN, C. (1990): Sampling Strategies for Distances Between DNA Sequences. Biometrics, 46, 551–572.

    Article  Google Scholar 

  • WOLF, K., and DEGENS, P.O. (1991): Variance Estimation in the Additive Tree Model. In: H.-H. Bock and P. Ihm (eds.): Classification, Data Analysis and Knowledge Organization. Springer, Berlin Heidelberg, 262–269.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Lausen, B. (1994). Classification and Data Analysis in Genome Projects: Some Aspects of Mapping, Alignment and Tree Reconstruction. In: Bock, HH., Lenski, W., Richter, M.M. (eds) Information Systems and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-46808-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-46808-7_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58057-7

  • Online ISBN: 978-3-642-46808-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics