Abstract
In gene prediction, studying phenotypes is highly valuable for reducing the number of locus candidates in association studies and to aid disease gene candidate prioritization. This is due to the intrinsic nature of phenotypes to visibly reflect genetic activity, making them potentially one of the most useful data types for functional studies. However, systematic use of these data has begun only recently. ‘Comparative phenomics’ is the analysis of genotype–phenotype associations across species and experimental methods. This is an emerging research field of utmost importance for gene discovery and gene function annotation. In this chapter, we review the use of phenotype data in the biomedical field. We will give an overview of phenotype resources, focusing on PhenomicDB – a cross-species genotype–phenotype database – which is the largest available collection of phenotype descriptions across species and experimental methods. We report on its latest extension by which genotype–phenotype relationships can be viewed as graphical representations of similar phenotypes clustered together (‘phenoclusters’), supplemented with information from protein–protein interactions and Gene Ontology terms. We show that such ‘phenoclusters’ represent a novel approach to group genes functionally and to predict novel gene functions with high precision. We explain how these data and methods can be used to supplement the results of gene discovery approaches. The aim of this chapter is to assist researchers interested in understanding how phenotype data can be used effectively in the gene discovery field.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tuschl, T., and Borkhardt, A. (2002) Small interfering RNAs: a revolutionary tool for the analysis of gene function and gene therapy. Mol Interv 2, 158–167.
Gunsalus, K. C., Yueh, W. C., MacMenamin, P., and Piano, F. (2004) RNAiDB and PhenoBlast: web tools for genome-wide phenotypic mapping projects. Nucleic Acids Res 32, D406–D410.
Sonnichsen, B., Koski, L. B., Walsh, A., et al. (2005) Full-genome RNAi profiling of early embryogenesis in Caenorhabditis elegans. Nature 434, 462–469.
Kittler, R., Surendranath, V., Heninger, A. K., et al. (2007) Genome-wide resources of endoribonuclease-prepared short interfering RNAs for specific loss-of-function studies. Nat Methods 4, 337–344.
Groth, P., and Weiss, B. (2006) Phenotype data: a neglected resource in biomedical research? Curr Bioinform 1, 347–358.
Kent, J. W., Jr. (2009) Analysis of multiple phenotypes. Genet Epidemiol 33(Suppl 1 ), S33–39.
Prosdocimi, F., Chisham, B., Pontelli, E., Thompson, J. D., and Stoltzfus, A. (2009) Initial implementation of a comparative data analysis ontology. Evol Bioinform Online 5, 47–66.
Yu, B. (2009) Role of in silico tools in gene discovery. Mol Biotechnol 41, 296–306.
Gefen, A., Cohen, R., and Birk, O. S. (2009) Syndrome to gene (S2G): in-silico identification of candidate genes for human diseases. Hum Mutat 31, 229–236.
Robinson, P. N., Kohler, S., Bauer, S., et al. (2008) The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet 83, 610–615.
Oti, M., Snel, B., Huynen, M. A., and Brunner, H. G. (2006) Predicting disease genes using protein–protein interactions. J Med Genet 43, 691–698.
Lage, K., Karlberg, E. O., Storling, Z. M., et al. (2007) A human phenome–interactome network of protein complexes implicated in genetic disorders. Nat Biotechnol 25, 309–316.
van Driel, M. A., Bruggeman, J., Vriend, G., et al. (2006) A text-mining analysis of the human phenome. Eur J Hum Genet 14, 535–542.
McKusick, V. A. (2007) Mendelian Inheritance in Man and its online version, OMIM. Am J Hum Genet 80, 588–604.
Rogers, A., Antoshechkin, I., Bieri, T., et al. (2008) WormBase 2007. Nucleic Acids Res 36, D612–D617.
Smith, C. L., Goldsmith, C. A., and Eppig, J. T. (2005) The Mammalian phenotype ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol 6, R7.
Bult, C. J., Eppig, J. T., Kadin, J. A., et al. (2008) The Mouse Genome Database (MGD): mouse biology and model systems. Nucleic Acids Res 36, D724–D728.
Oti, M., Huynen, M. A., and Brunner, H. G. (2009) The biological coherence of human phenome databases. Am J Hum Genet 85, 801–808.
Groth, P., Pavlova, N., Kalev, I., et al. (2007) PhenomicDB: a new cross-species genotype/phenotype resource. Nucleic Acids Res 35, D696–D699.
Kahraman, A., Avramov, A., Nashev, L. G., et al. (2005) PhenomicDB: a multi-species genotype/phenotype database for comparative phenomics. Bioinformatics 21, 418–420.
Groth, P., Weiss, B., Pohlenz, H. D., and Leser, U. (2008) Mining phenotypes for gene function prediction. BMC Bioinformatics 9, 136.
Drysdale, R. (2008) FlyBase: a database for the Drosophila research community. Methods Mol Biol 420, 45–59.
Guldener, U., Munsterkotter, M., Kastenmuller, G., et al. (2005) CYGD: the Comprehensive Yeast Genome Database. Nucleic Acids Res 33, D364–D368.
Sprague, J., Bayraktaroglu, L., Bradford, Y., et al. (2008) The Zebrafish Information Network: the zebrafish model organism database provides expanded support for genotypes and phenotypes. Nucleic Acids Res 36, D768–D772.
Schoof, H., Ernst, R., Nazarov, V., et al. (2004) MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource for plant genomics. Nucleic Acids Res 32, D373–D376.
Flockhart, I., Booker, M., Kiger, A., et al. (2006) FlyRNAi: the Drosophila RNAi screening center database. Nucleic Acids Res 34, D489–494.
Sayers, E. W., Barrett, T., Benson, D. A., et al. (2010) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 38, D5–D16.
Porter, M. F. (1980) An algorithm for suffix stripping. Program 14, 130−137.
Zhao, Y., and Karypis, G. (2003) Clustering in life sciences. Methods Mol Biol 224, 183–218.
Cirelli, C., Bushey, D., Hill, S., et al. (2005) Reduced sleep in Drosophila Shaker mutants. Nature 434, 1087–1092.
Zhao, Y., and Karypis, G. (2005) Data clustering in life sciences. Mol Biotechnol 31, 55–80.
Groth, P., Kalev, I., Kirov, I., Traikov, B., Leser, U., and Weiss, B. (2010) Phenoclustering: Online mining of cross-species phenotypes. Bioinformatics 26(15), 1924–1925.
Washington, N. L., Haendel, M. A., Mungall, C. J., et al. (2009) Linking human diseases to animal models using ontology-based phenotype annotation. PLoS Biol 7, e1000247.
Mungall, C. J., Gkoutos, G. V., Smith, C. L., et al. (2010) Integrating phenotype ontologies across multiple species. Genome Biol 11, R2.
Groth, P., Weiss, B., and Leser, U. (2010) Ontologies improve cross-species phenotype analysis. In Special Interest Group on Bio-ontologies: Semantic Applications in Life Sciences (Shah, N., Ed.). National Center for Biomedical Ontology, Boston, MA. p. 192.
Tagarelli, A., and Karypis, G. (2008) A segment-based approach to clustering multi-topic documents. In Text Mining Workshop, SIAM Datamining Conference. Atlanta, GA.
Steinbach, M., Karypis, G., and Kumar, V. (2000) A Comparison of Document Clustering Techniques. In KDD Workshop on Text Mining. Boston, MA.
Piano, F., Schetter, A. J., Morton, D. G., et al. (2002) Gene clustering based on RNAi phenotypes of ovary-enriched genes in C. elegans. Curr Biol 12, 1959–1964.
Zhao, Y., and Karypis, G. (2002) Criterion functions for document clustering, University of Minnesota, Department of Computer Science/Army HPC Research Center, Minneapolis.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Groth, P., Leser, U., Weiss, B. (2011). Phenotype Mining for Functional Genomics and Gene Discovery. In: Yu, B., Hinchcliffe, M. (eds) In Silico Tools for Gene Discovery. Methods in Molecular Biology, vol 760. Humana Press. https://doi.org/10.1007/978-1-61779-176-5_10
Download citation
DOI: https://doi.org/10.1007/978-1-61779-176-5_10
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-61779-175-8
Online ISBN: 978-1-61779-176-5
eBook Packages: Springer Protocols