Abstract
This paper is a review of promising applications of pyramidal classification to biological data. We show that overlapping and ordering properties can give new insights that can not be achieved using more classical methods. We examplify our point using three applications: (i) a genome scale sequence analysis, (ii) a new progressive multiple sequence alignment method, (iii) a cluster analysis of transcriptomic data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AUDE, J.-C., DIAZ-LAZCOZ, Y., CODANI, J.-J. and RISLER, J.-L. (1999): Application of the pyramidal clustering method to biological objects. Computer and Chemistry 23(3–4), 303–315.
BARRETT, T., SUZEK, T.O., TROUP, D.B., WILHITE, S.E., NGAU, W.-C., LEDOUX, P., RUDNEV, D., LASH, A.E., FUJIBUCHI, W. and EDGAR R. (2005): NCBI GEO: mining millions of expression profiles — database and tools. Nucleic Acids Research, Database issue 33, D562–D566.
BATEMAN, A., COIN, L., DURBIN, R., FINN, R.D., HOLLICH, V., GRIFFTHSJONES, S., KHANNA, A., MARSHALL, M., MOXON, S., SONNHAMMER, E.L.L., STUDHOLME, D.J., YEATS, C. and EDDY, S.R. (2004): The Pfam protein families database. Nucleic Acids Research 32, 138–141.
BATZOGLOU, S. (2005): The many faces of sequence alignment. Briefings in Bioinformatics 6(1), 6–22.
BERTRAND, P. and DIDAY, E. (1990): Une généralisation des arbres hiérarchiques: les représentations pyramidales. Rev. Statistique Appliquée 38(3), 53–78.
BERTRAND, P. and JANOWITZ, M.F. (2002): Pyramids and Weak Hierarchies in The Ordinal Model for Clustering. Discrete Appl. Math., 122, 55–81.
BULYK, M.L. (2003): Computational prediction of transcription-factor binding site locations. Genome Biol., 5(1), 201.
CARPENTIER, A.-S., RIVA, A., TISSEUR, P., DIDIER, G. and HENAUT A. (2004): The operons, a criterion to compare the reliability of transcriptome analysis tools: ICA is more reliable than ANOVA, PLS and PCA. Comput Biol Chem. 28(1), 3–10.
CODANI, J.-J., COMET, J.-P., AUDE, J.-C., GLEMET, E., WOZNIAK, A., RISLER, J.-L., HENAUT, A. and SLONIMSKI, P.P. (1999): Automatic analysis of large scale pairwise alignments of protein sequences. In: A.G. Craig and J.D. Hoheisel (Eds.): Methods in Microbiology: Automation, Genomic and Functional Analysis. Academic Press, (28) 229–244.
DIDAY, E. (1984): Une représentation visuelle des classes empiétantes: les pyramides. INRIA, Rapport de Recherche No. 291.
DO, C.B. and MAHABHASYAM, M.SP. and BRODNO, M. and BATZOGLOU, S. (2005): ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research 15, 330–340.
EDGAR, R.C. (2004): MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5), 1792–1797.
EISEN, M.B., SPELLMAN, P.T., BROWN, P.O. and BOTSTEIN, D. (1998): Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 95(25), 14863–14868.
FENG, D.F. and DOOLITTLE, R.F. (1987): Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution 25, 351–360.
JONES, D.T. (1999): Protein Secondary Structure Prediction Based on position-specific Scoring Matrices. J. Mol. Biol. 292, 195–202.
KATOH, K., KUMA, K., TOH, H. and MIYATA, T. (2005): MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research 33(2), 511–518.
KOONIN, E., MUSHEGIAN, A., GALPERIN M. and WALKER D. (1997): Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol Microbiol. 25, 619–637.
LEE, C., GRASSO, C. and SHARLOW, M.F. (2002): Multiple sequence alignment using partial order graphs. Bioinformatics 18(3), 452–464.
LOUIS, A. (2001): La maitrise de l’information scientifique, clé de l’après séquencage Thèse de l’Université Versailles Saint-Quentin.
LOUIS, A., OLLIVIER, E., AUDE, J.-C. and RISLER, J.-L. (2001): Massive sequence comparisons as a help in annotating genomic sequences. Genome Research 11, 1296–1303.
MORGENSTERN, B., DRESS, A. and WERNER, T. (1996): DIALIGN: Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Nat. Acad. Sci. 32, 571–592.
OLTVAI, Z.N. and BARABASI, A.L. (2002): Systems biology. Life’s complexity pyramid. Science 298(5594):763–4.
PARK, J. and TEICHMANN, S. (1998): Divclus: an automatic method in the gean-fammer package that finds homologous domains in single-and multi-domain proteins. Bioinformatics 14, 144–150.
PHILLIPS, A., JANIES, D. and WHEELER, W. (2000): Multiple sequence alignment in phylogenetic analysis. Molecular Phylogenetics and Evolution 16(3), 317–330.
SABATTI, C., ROHLIN, L., OH, M.K. and LIAO, J.C. (2002): Co-expression pattern from DNA microarray experiments as a tool for operon prediction. Nucleic Acids Res. 30(13), 2886–93.
SAITOU, N. and NEI, M. (1987): The Neighbor-Joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4, 406–425.
SCHENA, M., SHALON, D., DAVIS, R.W. and BROWN, P.O. (1995): Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 270(5235), 368–371.
SMITH, R.F. and SMITH, T.F. (1992): Pattern-Induced Multi-sequence Alignment (PIMA) algorithm employing secondary structure-dependent gap-penalties for comparative protein modelling. Protein Engineering 5, 35–41.
SPEED, T. (2003): Statistical Analysis of Gene Expression Microarray Data. Chapman & Hall / CRC, Boca Raton FL.
THOMAS, P.D., CAMPBELL, M.J., KEJARIWAL, A., MI, H., KARLAK, B., DAVERMAN, R., DIEMER, K., MURUGANUJAN, A. and NARECHANIA, A. (2003): PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 13, 2129–2141. Supplementary Materials.
THOMPSON, J.D., HIGGINS, D.G. and GIBSON, T.J. (1994): Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22(22), 4673–4680.
VAN MALLE, I., LASTERS, I. and WYNS, L. (2004): Align-m-a new algorithm for multiple alignment of highly divergent sequences. Bioinformatics 20(9), 1428–1435.
VESCOVO, L., AUDE, J.-C., POLAILLON, G. and RISLER, J-L. (2004): Progressive multiple alignment based on pyramidal classification and applied to multi-domain proteins, proceedings of the 12th International Conference on Intelligent Systems for Molecular Biology 2004, Glasgow, Scotland.
VESCOVO, L., AUDE, J.-C. and POLAILLON, G. (2005): Guide structure calculation: a critical step for the accuracy of progressive multiple sequence alignment algorithms. proceedings of the 4th European Conference of Computational Biology 2005, Madrid, Espagne.
YOSHIHARA, S., GENG, X., OKAMOTO, S., YURA, K., MURATA, T., GO, M., OHMORI, M. and IKEUCHI M. (2001): Mutational analysis of genes involved in pilus structure, motility and transformation competency in the unicellular motile cyanobacterium Synechocystis sp. PCC 6803. Plant Cell Physiol. 42(1), 63–73.
YOSHIMURA, H., YANAGISAWA, S., KANEHISA, M. and OHMORI, M. (2002): Screening for the target gene of cyanobacterial cAMP receptor protein SYCRP1. Molecular microbiology 43(4), 843–853.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Polaillon, G., Vescovo, L., Michaut, M., Aude, JC. (2007). Mining Biological Data Using Pyramids. In: Brito, P., Cucumel, G., Bertrand, P., de Carvalho, F. (eds) Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73560-1_37
Download citation
DOI: https://doi.org/10.1007/978-3-540-73560-1_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73558-8
Online ISBN: 978-3-540-73560-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)