Summary
Some statistical aspects of classification and data analysis in genome projects are reviewed and discussed. The construction of genetic maps involves sedation and hierarchical classification problems. The alignment of molecular data and tree reconstruction with genetic distance data are methodological tasks which can be seen as two steps of a data analysis of DNA fragments sequenced in a genome project. The combination of these two steps and the availability of an enormous amount of data on several levels (primary, secondary order, etc.) provide many challenging problems and applications for classification methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
BANDELT, H.-J., and DRESS, A.W.M. (1993): A Relational Approach to Split Decomposition. In: O. Opitz, B. Lausen, and R. Klar (eds.): Information and Classification: Concepts, Methods and Applications. Springer, Berlin Heidelberg, 123–131.
BLUM, N. (1991): On Locally Optimal Alignments in Genetic Sequences. Report No. 8567-CS, Institut für Informatik, University Bonn.
BOCK, H.H. (1984): Distanzmaße zum Vergleich von Bäumen, Hierarchien und Sequenzen. In: Bock, H.H. (ed.): Anwendungen der Klassifikation: Datenanalyse und Numerische Klassifikation. Indeks-Verlag, Frankfurt a. M., 52–67.
BOCK, H.H. (1989): Datenanalyse zur Strukturierung und Ordnung von Information. In: Wille, R. (ed.): Klassifikation und Ordnung (Classification and Order). Indeks-Verlag, Frankfurt a. M., 1–22.
CHOTHIA, C. (1992): One Thousand Families for the Molecular Biologist. Nature, 357, 543–544.
DAY, W.H.E. (1991): Estimating Phylogenies With Invariant Functions of Data. In: H.-H. Bock and P. Ihm (eds.): Classification, Data Analysis and Knowledge Organization. Springer, Berlin Heidelberg, 248–253.
DAY, W.H.E., and McMorris, F.R. (1993a): Discovering Consensus Molecular Sequences. In: O. Opitz, B. Lausen, and R. Klar (eds.): Information and Classification: Concepts, Methods and Applications. Springer, Berlin Heidelberg, 394–402.
DAY, W.H.E., and MCMORRIS, F.R. (1993b): Alignment, Comparison and Consensus of Molecular Sequences: A Bibliography. IFCS IV, Paris 1–4, Sept. 1993 (preprint).
DEGENS, P.O., LAUSEN, B. and VACH, W. (1990): Reconstruction of Phylogenies by Distance Data: Mathematical Framework and Statistical Analysis. In: A. Dress and A. von Haeseler (eds.): Trees and Hierarchical Structures, Lecture Notes in Biomathematics, 84. Springer, Berlin, 9–42.
EVANS, S.N., and SPEED, T.P. (1991): Invariants of Some Probability Models Used in Phylogenetic Inference, (preprint)
FEINGOLD, E., BROWN, P.O., and SIEGMUND, D. (1993): Gaussian Models for Genetic Linkage Analysis Using Complete High Resolution Maps of Identity-by-descent. American Journal of Human Genetics, 53, 1, 234–51.
FELSENSTEIN, J. (1988): Phylogenies From Molecular Sequences: Inference and Reliability. Annual Review of Genetics, 22, 521–565.
GONNET, G.H., COHEN, M.A., and BENNER, S.A. (1992): Exhaustive Matching of the Entire Protein Sequence Database. Science, 256, 1443–1445.
FUCHS, R., RICE, P., and CAMERON, G.N. (1992): Molecular Biological Databases -Present and Future. Trends in Biotechnology, 10, 61–66.
GOTOH, O.: Optimal Sequence Alignments Allowing for Long Gaps. Bulletin of Mathematical Biology, 52, 359–373.
GUENOCHE, A. (1993): Alignment and Hierarchical Clustering Method for Strings. In: O. Opitz, B. Lausen, and R. Klar (eds.): Information and Classification: Concepts, Methods and Applications. Springer, Berlin Heidelberg, 403–412.
LATHROP, G.M. and LALOUEL, J.M. (1991): Statistical Methods for Linkage Analysis. In: C.R. Rao and R. Chakraborty (eds.): Handbook of Statistics, Vol. 8, Statistical Methods in Biological and Medical Sciences. North-Holland, Amsterdam, 81–123.
LAUSEN, B. (1991): Statistical Analysis of Genetic Distance Data. In: H.-H. Bock and P. Ihm (eds.): Classification, Data Analysis and Knowledge Organization. Springer, Berlin Heidelberg, 254–261.
LAUSEN, B., and DEGENS, P.O. (1986): Variance Estimation and the Reconstruction of Phylogenies. In: P.O. Degens, H.-J. Hermes, and O. Opitz (eds.): Die Klassifikation und ihr Umfeld (Classification and its Environment). Indeks-Verlag, Frankfurt a.M., 306–314.
LAUSEN, B., and DEGENS, P.O. (1988): Evaluation of the Reconstruction of Phylogenies With DNA-DNA Hybridization Data. In: H.H. Bock (ed.): Classification and Related Methods of Data Analysis. North-Holland, Amsterdam, 367–374.
LAUSEN, B., and SCHUMACHER, M. (1992): Maximally Selected Rank Statistics. Biometrics, 48, 1, 73–85.
LEE, R.H. (1992): Protein Model Building Using Structural Homology. Nature, 356, 543–544.
NAVIDI, W.C., CHURCHILL, G.A., and VON HAESELER, A. (1991): Methods for Inferring Phylogenies From Nucleic Acid Sequences by Using Maximum Likelihood and Linear Invariants. Molecular Biology and Evolution, 8, 128–143.
NAVIDI, W.C., CHURCHILL, G.A., and VON HAESELER, A. (1993): Phylogenetic Inference: Linear Invariants and Maximum Likelihood. Biometrics, 49, 2, 543–555.
NEEDLEMAN, S.B., and WUNSCH, C.D. (1970): A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of two Proteins. Journal of Molecular Biology, 48, 443–453.
OTT, J. (1985): Analysis of Human Genetic Linkage. John-Hopkins University Press, Baltimore.
PASCARELLA, S., and ARGOS, P. (1992): A Data Bank Merging Related Protein Structures and Sequences. Protein Engineering, 5, 121–137.
SAITOU, N. (1991): Statistical Methods for Phylogenetic Tree Reconstruction. In: C.R. Rao and R. Chakraborty (eds.): Handbook of Statistics, Vol. 8, Statistical Methods in Biological and Medical Sciences. North-Holland, Amsterdam, 317–346.
SANKOFF, D., and CEDERGREN, R.J. (1983): Simultaneous Comparison of Three or More Sequences Related by a Tree. In: D. Sankoff and J.B. Kruskal (eds.): Time Warps, String Edits, and Macromolecules. Addison-Wesley, Reading/Massachusetts, 253–263.
SCHÖNIGER, M., and VON HAESELER, A. (1993). More Reliable Phylogenies by Properly Weighted Nucleotide Substitutions. In: O. Opitz, B. Lausen, and R. Klar (eds.): Information and Classification: Concepts, Methods and Applications. Springer, Berlin Heidelberg, 413–420.
SWOFFORD, D.L., and OLSEN, G.J. (1990): Phylogeny Reconstruction. In: D.M. Hillis and C Moritz (eds.): Molecular Systematics. Sinauer Ass., Massachusetts, 411–501.
VACH, W. (1992): The Jukes-Cantor Transformation and Additivity of Estimated Genetic Distances. In: M. Schader (ed): Analysing and Modeling Data and Knowledge. Springer, Berlin Heidelberg, 141–150.
VACH, W., and DEGENS, P.O. (1991a): Least-Squares Approximation of Additive Trees to Dissimilarities — Characterizations and Algorithms. Computational Statistics Quarterly, 3, 203–218.
VACH, W., and DEGENS, P.O. (1991b): A new Approach to Isotonic Agglomerative Hierarchical Clustering. Journal of Classification, 8, 217–237.
VINGRON, M., and ARGOS, P. (1991): Motif Recognition and Alignment by Consensus. Journal of Molecular Biology, 218, 33–43.
WATERMAN, M.S. (1984): General Methods of Sequence Comparisons. Bulletin of Mathematical Biology, 46, 4, 473–500.
WEIR, B.S. (1990): Genetic Data Analysis. Sinauer Ass., Massachusetts.
WEIR, B.S., and BASTEN, C. (1990): Sampling Strategies for Distances Between DNA Sequences. Biometrics, 46, 551–572.
WOLF, K., and DEGENS, P.O. (1991): Variance Estimation in the Additive Tree Model. In: H.-H. Bock and P. Ihm (eds.): Classification, Data Analysis and Knowledge Organization. Springer, Berlin Heidelberg, 262–269.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Lausen, B. (1994). Classification and Data Analysis in Genome Projects: Some Aspects of Mapping, Alignment and Tree Reconstruction. In: Bock, HH., Lenski, W., Richter, M.M. (eds) Information Systems and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-46808-7_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-46808-7_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58057-7
Online ISBN: 978-3-642-46808-7
eBook Packages: Springer Book Archive