Abstract
Semantic similarity searches in ontologies are an important component of many bioinformatic algorithms, e.g., protein function prediction with the Gene Ontology. In this paper we consider the exact computation of score distributions for similarity searches in ontologies, and introduce a simple null hypothesis which can be used to compute a P-value for the statistical significance of similarity scores. We concentrate on measures based on Resnik’s definition of ontological similarity. A new algorithm is proposed that collapses subgraphs of the ontology graph and thereby allows fast score distribution computation. The new algorithm is several orders of magnitude faster than the naive approach, as we demonstrate by computing score distributions for similarity searches in the Human Phenotype Ontology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
The Gene Ontology Consortium: Gene Ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000)
Smith, C.L., Goldsmith, C.A.W., Eppig, J.T.: The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol. 6(1), R7 (2005)
Robinson, P.N., Köhler, S., Bauer, S., Seelow, D., Horn, D., Mundlos, S.: The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am. J. Hum. Genet. 83(5), 610–615 (2008)
Amberger, J., Bocchini, C.A., Scott, A.F., Hamosh, A.: McKusick’s Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 37(Database issue), D793–D796 (2009)
Couto, F., Silva, M.J., Coutinho, P.M.: Measuring semantic similarity between Gene Ontology terms. Data and Knowledge Engineering 61(1) (April 2007)
Lord, P.W., Stevens, R.D., Brass, A., Goble, C.A.: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 19(10), 1275–1283 (2003)
Joshi, T., Xu, D.: Quantitative assessment of relationship between sequence similarity and function similarity. BMC Genomics 9(8), 222 (2007)
Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. Artificial Intelligence Research 11, 95–130 (1999)
Pesquita, C., Faria, D., Bastos, H., Ferreira, A., Falco, A., Couto, F.: Metrics for GO based protein semantic similarity: a systematic evaluation. BMC Bioinformatics 9(suppl. 5), S4 (2008)
Blizard, W.D.: Multiset theory. Notre Dame Journal of Formal Logic 30(1), 36–66 (1989)
Thomas, H., Cormen, C.E.L., Rivest, R.L.: Introduction to Algorithms, 2nd edn. McGraw-Hill Science / Engineering / Math. (December 2003)
Bejerano, G., Friedman, N., Tishby, N.: Efficient exact p-value computation for small sample, sparse, and surprising categorical data. Journal of Computational Biology 11(5), 867–886 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schulz, M.H., Köhler, S., Bauer, S., Vingron, M., Robinson, P.N. (2009). Exact Score Distribution Computation for Similarity Searches in Ontologies. In: Salzberg, S.L., Warnow, T. (eds) Algorithms in Bioinformatics. WABI 2009. Lecture Notes in Computer Science(), vol 5724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04241-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-04241-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04240-9
Online ISBN: 978-3-642-04241-6
eBook Packages: Computer ScienceComputer Science (R0)