Classification and Data Analysis in Genome Projects: Some Aspects of Mapping, Alignment and Tree Reconstruction

Lausen, Berthold

doi:10.1007/978-3-642-46808-7_33

Berthold Lausen⁸

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

401 Accesses

Summary

Some statistical aspects of classification and data analysis in genome projects are reviewed and discussed. The construction of genetic maps involves sedation and hierarchical classification problems. The alignment of molecular data and tree reconstruction with genetic distance data are methodological tasks which can be seen as two steps of a data analysis of DNA fragments sequenced in a genome project. The combination of these two steps and the availability of an enormous amount of data on several levels (primary, secondary order, etc.) provide many challenging problems and applications for classification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BANDELT, H.-J., and DRESS, A.W.M. (1993): A Relational Approach to Split Decomposition. In: O. Opitz, B. Lausen, and R. Klar (eds.): Information and Classification: Concepts, Methods and Applications. Springer, Berlin Heidelberg, 123–131.
Google Scholar
BLUM, N. (1991): On Locally Optimal Alignments in Genetic Sequences. Report No. 8567-CS, Institut für Informatik, University Bonn.
Google Scholar
BOCK, H.H. (1984): Distanzmaße zum Vergleich von Bäumen, Hierarchien und Sequenzen. In: Bock, H.H. (ed.): Anwendungen der Klassifikation: Datenanalyse und Numerische Klassifikation. Indeks-Verlag, Frankfurt a. M., 52–67.
Google Scholar
BOCK, H.H. (1989): Datenanalyse zur Strukturierung und Ordnung von Information. In: Wille, R. (ed.): Klassifikation und Ordnung (Classification and Order). Indeks-Verlag, Frankfurt a. M., 1–22.
Google Scholar
CHOTHIA, C. (1992): One Thousand Families for the Molecular Biologist. Nature, 357, 543–544.
Article Google Scholar
DAY, W.H.E. (1991): Estimating Phylogenies With Invariant Functions of Data. In: H.-H. Bock and P. Ihm (eds.): Classification, Data Analysis and Knowledge Organization. Springer, Berlin Heidelberg, 248–253.
Chapter Google Scholar
DAY, W.H.E., and McMorris, F.R. (1993a): Discovering Consensus Molecular Sequences. In: O. Opitz, B. Lausen, and R. Klar (eds.): Information and Classification: Concepts, Methods and Applications. Springer, Berlin Heidelberg, 394–402.
Google Scholar
DAY, W.H.E., and MCMORRIS, F.R. (1993b): Alignment, Comparison and Consensus of Molecular Sequences: A Bibliography. IFCS IV, Paris 1–4, Sept. 1993 (preprint).
Google Scholar
DEGENS, P.O., LAUSEN, B. and VACH, W. (1990): Reconstruction of Phylogenies by Distance Data: Mathematical Framework and Statistical Analysis. In: A. Dress and A. von Haeseler (eds.): Trees and Hierarchical Structures, Lecture Notes in Biomathematics, 84. Springer, Berlin, 9–42.
Google Scholar
EVANS, S.N., and SPEED, T.P. (1991): Invariants of Some Probability Models Used in Phylogenetic Inference, (preprint)
Google Scholar
FEINGOLD, E., BROWN, P.O., and SIEGMUND, D. (1993): Gaussian Models for Genetic Linkage Analysis Using Complete High Resolution Maps of Identity-by-descent. American Journal of Human Genetics, 53, 1, 234–51.
Google Scholar
FELSENSTEIN, J. (1988): Phylogenies From Molecular Sequences: Inference and Reliability. Annual Review of Genetics, 22, 521–565.
Article Google Scholar
GONNET, G.H., COHEN, M.A., and BENNER, S.A. (1992): Exhaustive Matching of the Entire Protein Sequence Database. Science, 256, 1443–1445.
Article Google Scholar
FUCHS, R., RICE, P., and CAMERON, G.N. (1992): Molecular Biological Databases -Present and Future. Trends in Biotechnology, 10, 61–66.
Article Google Scholar
GOTOH, O.: Optimal Sequence Alignments Allowing for Long Gaps. Bulletin of Mathematical Biology, 52, 359–373.
Google Scholar
GUENOCHE, A. (1993): Alignment and Hierarchical Clustering Method for Strings. In: O. Opitz, B. Lausen, and R. Klar (eds.): Information and Classification: Concepts, Methods and Applications. Springer, Berlin Heidelberg, 403–412.
Google Scholar
LATHROP, G.M. and LALOUEL, J.M. (1991): Statistical Methods for Linkage Analysis. In: C.R. Rao and R. Chakraborty (eds.): Handbook of Statistics, Vol. 8, Statistical Methods in Biological and Medical Sciences. North-Holland, Amsterdam, 81–123.
Google Scholar
LAUSEN, B. (1991): Statistical Analysis of Genetic Distance Data. In: H.-H. Bock and P. Ihm (eds.): Classification, Data Analysis and Knowledge Organization. Springer, Berlin Heidelberg, 254–261.
Chapter Google Scholar
LAUSEN, B., and DEGENS, P.O. (1986): Variance Estimation and the Reconstruction of Phylogenies. In: P.O. Degens, H.-J. Hermes, and O. Opitz (eds.): Die Klassifikation und ihr Umfeld (Classification and its Environment). Indeks-Verlag, Frankfurt a.M., 306–314.
Google Scholar
LAUSEN, B., and DEGENS, P.O. (1988): Evaluation of the Reconstruction of Phylogenies With DNA-DNA Hybridization Data. In: H.H. Bock (ed.): Classification and Related Methods of Data Analysis. North-Holland, Amsterdam, 367–374.
Google Scholar
LAUSEN, B., and SCHUMACHER, M. (1992): Maximally Selected Rank Statistics. Biometrics, 48, 1, 73–85.
Article Google Scholar
LEE, R.H. (1992): Protein Model Building Using Structural Homology. Nature, 356, 543–544.
Article Google Scholar
NAVIDI, W.C., CHURCHILL, G.A., and VON HAESELER, A. (1991): Methods for Inferring Phylogenies From Nucleic Acid Sequences by Using Maximum Likelihood and Linear Invariants. Molecular Biology and Evolution, 8, 128–143.
Google Scholar
NAVIDI, W.C., CHURCHILL, G.A., and VON HAESELER, A. (1993): Phylogenetic Inference: Linear Invariants and Maximum Likelihood. Biometrics, 49, 2, 543–555.
Article Google Scholar
NEEDLEMAN, S.B., and WUNSCH, C.D. (1970): A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of two Proteins. Journal of Molecular Biology, 48, 443–453.
Article Google Scholar
OTT, J. (1985): Analysis of Human Genetic Linkage. John-Hopkins University Press, Baltimore.
Google Scholar
PASCARELLA, S., and ARGOS, P. (1992): A Data Bank Merging Related Protein Structures and Sequences. Protein Engineering, 5, 121–137.
Article Google Scholar
SAITOU, N. (1991): Statistical Methods for Phylogenetic Tree Reconstruction. In: C.R. Rao and R. Chakraborty (eds.): Handbook of Statistics, Vol. 8, Statistical Methods in Biological and Medical Sciences. North-Holland, Amsterdam, 317–346.
Google Scholar
SANKOFF, D., and CEDERGREN, R.J. (1983): Simultaneous Comparison of Three or More Sequences Related by a Tree. In: D. Sankoff and J.B. Kruskal (eds.): Time Warps, String Edits, and Macromolecules. Addison-Wesley, Reading/Massachusetts, 253–263.
Google Scholar
SCHÖNIGER, M., and VON HAESELER, A. (1993). More Reliable Phylogenies by Properly Weighted Nucleotide Substitutions. In: O. Opitz, B. Lausen, and R. Klar (eds.): Information and Classification: Concepts, Methods and Applications. Springer, Berlin Heidelberg, 413–420.
Google Scholar
SWOFFORD, D.L., and OLSEN, G.J. (1990): Phylogeny Reconstruction. In: D.M. Hillis and C Moritz (eds.): Molecular Systematics. Sinauer Ass., Massachusetts, 411–501.
Google Scholar
VACH, W. (1992): The Jukes-Cantor Transformation and Additivity of Estimated Genetic Distances. In: M. Schader (ed): Analysing and Modeling Data and Knowledge. Springer, Berlin Heidelberg, 141–150.
Chapter Google Scholar
VACH, W., and DEGENS, P.O. (1991a): Least-Squares Approximation of Additive Trees to Dissimilarities — Characterizations and Algorithms. Computational Statistics Quarterly, 3, 203–218.
Google Scholar
VACH, W., and DEGENS, P.O. (1991b): A new Approach to Isotonic Agglomerative Hierarchical Clustering. Journal of Classification, 8, 217–237.
Article Google Scholar
VINGRON, M., and ARGOS, P. (1991): Motif Recognition and Alignment by Consensus. Journal of Molecular Biology, 218, 33–43.
Article Google Scholar
WATERMAN, M.S. (1984): General Methods of Sequence Comparisons. Bulletin of Mathematical Biology, 46, 4, 473–500.
Google Scholar
WEIR, B.S. (1990): Genetic Data Analysis. Sinauer Ass., Massachusetts.
Google Scholar
WEIR, B.S., and BASTEN, C. (1990): Sampling Strategies for Distances Between DNA Sequences. Biometrics, 46, 551–572.
Article Google Scholar
WOLF, K., and DEGENS, P.O. (1991): Variance Estimation in the Additive Tree Model. In: H.-H. Bock and P. Ihm (eds.): Classification, Data Analysis and Knowledge Organization. Springer, Berlin Heidelberg, 262–269.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Forschungsinstitut für Kinderernährung Dortmund (FKE), Heinstück 11, D-44225, Dortmund, Germany
Berthold Lausen

Authors

Berthold Lausen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Statistik, Rheinisch-Westfälische Technische Hochschule Aachen (RWTH), Wüllnerstr. 3, D-52056, Aachen, Germany
Hans-Hermann Bock
Forschungsstelle “Mathematische Logik” der Heidelberger Akademie der Wissenschaften Fachbereich Informatik, Universität Kaiserslautern, Erwin-Schrödinger-Str. 57, D-67653, Kaiserslautern, Germany
Wolfgang Lenski
Fachbereich Informatik, Universität Kaiserslautern, Erwin-Schrödinger-Str. 57, D-67653, Kaiserslautern, Germany
Michael M. Richter
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH (DFKI), Erwin-Schrödinger-Str. 57, D-67653, Kaiserslautern, Germany
Michael M. Richter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lausen, B. (1994). Classification and Data Analysis in Genome Projects: Some Aspects of Mapping, Alignment and Tree Reconstruction. In: Bock, HH., Lenski, W., Richter, M.M. (eds) Information Systems and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-46808-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-46808-7_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58057-7
Online ISBN: 978-3-642-46808-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics