Mining Biological Data Using Pyramids

Polaillon, Géraldine; Vescovo, Laure; Michaut, Magali; Aude, Jean-Christophe

doi:10.1007/978-3-540-73560-1_37

Géraldine Polaillon²³,
Laure Vescovo²³,
Magali Michaut²⁴ &
…
Jean-Christophe Aude²⁴

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2658 Accesses
1 Citations

Abstract

This paper is a review of promising applications of pyramidal classification to biological data. We show that overlapping and ordering properties can give new insights that can not be achieved using more classical methods. We examplify our point using three applications: (i) a genome scale sequence analysis, (ii) a new progressive multiple sequence alignment method, (iii) a cluster analysis of transcriptomic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AUDE, J.-C., DIAZ-LAZCOZ, Y., CODANI, J.-J. and RISLER, J.-L. (1999): Application of the pyramidal clustering method to biological objects. Computer and Chemistry 23(3–4), 303–315.
Article Google Scholar
BARRETT, T., SUZEK, T.O., TROUP, D.B., WILHITE, S.E., NGAU, W.-C., LEDOUX, P., RUDNEV, D., LASH, A.E., FUJIBUCHI, W. and EDGAR R. (2005): NCBI GEO: mining millions of expression profiles — database and tools. Nucleic Acids Research, Database issue 33, D562–D566.
Google Scholar
BATEMAN, A., COIN, L., DURBIN, R., FINN, R.D., HOLLICH, V., GRIFFTHSJONES, S., KHANNA, A., MARSHALL, M., MOXON, S., SONNHAMMER, E.L.L., STUDHOLME, D.J., YEATS, C. and EDDY, S.R. (2004): The Pfam protein families database. Nucleic Acids Research 32, 138–141.
Article Google Scholar
BATZOGLOU, S. (2005): The many faces of sequence alignment. Briefings in Bioinformatics 6(1), 6–22.
Article Google Scholar
BERTRAND, P. and DIDAY, E. (1990): Une généralisation des arbres hiérarchiques: les représentations pyramidales. Rev. Statistique Appliquée 38(3), 53–78.
Google Scholar
BERTRAND, P. and JANOWITZ, M.F. (2002): Pyramids and Weak Hierarchies in The Ordinal Model for Clustering. Discrete Appl. Math., 122, 55–81.
Article MATH Google Scholar
BULYK, M.L. (2003): Computational prediction of transcription-factor binding site locations. Genome Biol., 5(1), 201.
Article Google Scholar
CARPENTIER, A.-S., RIVA, A., TISSEUR, P., DIDIER, G. and HENAUT A. (2004): The operons, a criterion to compare the reliability of transcriptome analysis tools: ICA is more reliable than ANOVA, PLS and PCA. Comput Biol Chem. 28(1), 3–10.
Article MATH Google Scholar
CODANI, J.-J., COMET, J.-P., AUDE, J.-C., GLEMET, E., WOZNIAK, A., RISLER, J.-L., HENAUT, A. and SLONIMSKI, P.P. (1999): Automatic analysis of large scale pairwise alignments of protein sequences. In: A.G. Craig and J.D. Hoheisel (Eds.): Methods in Microbiology: Automation, Genomic and Functional Analysis. Academic Press, (28) 229–244.
Google Scholar
DIDAY, E. (1984): Une représentation visuelle des classes empiétantes: les pyramides. INRIA, Rapport de Recherche No. 291.
Google Scholar
DO, C.B. and MAHABHASYAM, M.SP. and BRODNO, M. and BATZOGLOU, S. (2005): ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research 15, 330–340.
Article Google Scholar
EDGAR, R.C. (2004): MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5), 1792–1797.
Article Google Scholar
EISEN, M.B., SPELLMAN, P.T., BROWN, P.O. and BOTSTEIN, D. (1998): Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 95(25), 14863–14868.
Article Google Scholar
FENG, D.F. and DOOLITTLE, R.F. (1987): Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution 25, 351–360.
Article Google Scholar
JONES, D.T. (1999): Protein Secondary Structure Prediction Based on position-specific Scoring Matrices. J. Mol. Biol. 292, 195–202.
Article Google Scholar
KATOH, K., KUMA, K., TOH, H. and MIYATA, T. (2005): MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research 33(2), 511–518.
Article Google Scholar
KOONIN, E., MUSHEGIAN, A., GALPERIN M. and WALKER D. (1997): Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol Microbiol. 25, 619–637.
Article Google Scholar
LEE, C., GRASSO, C. and SHARLOW, M.F. (2002): Multiple sequence alignment using partial order graphs. Bioinformatics 18(3), 452–464.
Article Google Scholar
LOUIS, A. (2001): La maitrise de l’information scientifique, clé de l’après séquencage Thèse de l’Université Versailles Saint-Quentin.
Google Scholar
LOUIS, A., OLLIVIER, E., AUDE, J.-C. and RISLER, J.-L. (2001): Massive sequence comparisons as a help in annotating genomic sequences. Genome Research 11, 1296–1303.
Article Google Scholar
MORGENSTERN, B., DRESS, A. and WERNER, T. (1996): DIALIGN: Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Nat. Acad. Sci. 32, 571–592.
Google Scholar
OLTVAI, Z.N. and BARABASI, A.L. (2002): Systems biology. Life’s complexity pyramid. Science 298(5594):763–4.
Article Google Scholar
PARK, J. and TEICHMANN, S. (1998): Divclus: an automatic method in the gean-fammer package that finds homologous domains in single-and multi-domain proteins. Bioinformatics 14, 144–150.
Article Google Scholar
PHILLIPS, A., JANIES, D. and WHEELER, W. (2000): Multiple sequence alignment in phylogenetic analysis. Molecular Phylogenetics and Evolution 16(3), 317–330.
Article Google Scholar
SABATTI, C., ROHLIN, L., OH, M.K. and LIAO, J.C. (2002): Co-expression pattern from DNA microarray experiments as a tool for operon prediction. Nucleic Acids Res. 30(13), 2886–93.
Article Google Scholar
SAITOU, N. and NEI, M. (1987): The Neighbor-Joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4, 406–425.
Google Scholar
SCHENA, M., SHALON, D., DAVIS, R.W. and BROWN, P.O. (1995): Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 270(5235), 368–371.
Article Google Scholar
SMITH, R.F. and SMITH, T.F. (1992): Pattern-Induced Multi-sequence Alignment (PIMA) algorithm employing secondary structure-dependent gap-penalties for comparative protein modelling. Protein Engineering 5, 35–41.
Article Google Scholar
SPEED, T. (2003): Statistical Analysis of Gene Expression Microarray Data. Chapman & Hall / CRC, Boca Raton FL.
MATH Google Scholar
THOMAS, P.D., CAMPBELL, M.J., KEJARIWAL, A., MI, H., KARLAK, B., DAVERMAN, R., DIEMER, K., MURUGANUJAN, A. and NARECHANIA, A. (2003): PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 13, 2129–2141. Supplementary Materials.
Article Google Scholar
THOMPSON, J.D., HIGGINS, D.G. and GIBSON, T.J. (1994): Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22(22), 4673–4680.
Article Google Scholar
VAN MALLE, I., LASTERS, I. and WYNS, L. (2004): Align-m-a new algorithm for multiple alignment of highly divergent sequences. Bioinformatics 20(9), 1428–1435.
Article Google Scholar
VESCOVO, L., AUDE, J.-C., POLAILLON, G. and RISLER, J-L. (2004): Progressive multiple alignment based on pyramidal classification and applied to multi-domain proteins, proceedings of the 12th International Conference on Intelligent Systems for Molecular Biology 2004, Glasgow, Scotland.
Google Scholar
VESCOVO, L., AUDE, J.-C. and POLAILLON, G. (2005): Guide structure calculation: a critical step for the accuracy of progressive multiple sequence alignment algorithms. proceedings of the 4th European Conference of Computational Biology 2005, Madrid, Espagne.
Google Scholar
YOSHIHARA, S., GENG, X., OKAMOTO, S., YURA, K., MURATA, T., GO, M., OHMORI, M. and IKEUCHI M. (2001): Mutational analysis of genes involved in pilus structure, motility and transformation competency in the unicellular motile cyanobacterium Synechocystis sp. PCC 6803. Plant Cell Physiol. 42(1), 63–73.
Article Google Scholar
YOSHIMURA, H., YANAGISAWA, S., KANEHISA, M. and OHMORI, M. (2002): Screening for the target gene of cyanobacterial cAMP receptor protein SYCRP1. Molecular microbiology 43(4), 843–853.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Département Informatique, Supélec Plateau de Moulon, 3 rue Joliot-Curie, 91192, Gif-sur-Yvette cedex, France
Géraldine Polaillon & Laure Vescovo
Service de Biologie Intégrative et de Génétique Moléculaire CEA, CEA Saclay, 91191, Gif-sur-Yvette cedex, France
Magali Michaut & Jean-Christophe Aude

Authors

Géraldine Polaillon
View author publications
You can also search for this author in PubMed Google Scholar
Laure Vescovo
View author publications
You can also search for this author in PubMed Google Scholar
Magali Michaut
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Christophe Aude
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics, University of Porto, Rua Dr. Roberto Frias, 4200-464, Porto, Portugal
Paula Brito
ESG UQAM, 315 East, Sainte-Catherine Street, Montreal, Quebec, H2X 3X2, Canada
Guy Cucumel
Department Lussi, ENST Bretagne, 2 rue de la Châtaigneraie, CS 17607, 35576, Cesson-Sévigné Cedex, France
Patrice Bertrand
Centre of Computer Science (CIn), Federal University of Pernambuco (UFPE), Av. Prof. Luiz Freire s/n Cidade Universitária, CEP 50740-540, Recife-PE, Brazil
Francisco de Carvalho

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Polaillon, G., Vescovo, L., Michaut, M., Aude, JC. (2007). Mining Biological Data Using Pyramids. In: Brito, P., Cucumel, G., Bertrand, P., de Carvalho, F. (eds) Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73560-1_37

Download citation

DOI: https://doi.org/10.1007/978-3-540-73560-1_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73558-8
Online ISBN: 978-3-540-73560-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics