Abstract
Ontologies are being progressively used to capture the semantics of information from various sources. They have wide area of usage ranging from artificial intelligence, natural language processing to web content and biology. This paper proposes the problem of finding similar objects that have been defined as a set of terms from an ontology. We consider tree-based ontologies where a node represents a term and an edge weight defines the distance or dissimilarity between corresponding terms. For object distance, Earth Mover’s Distance (EMD) is used as it outperforms other distance measures like average and minimum pairwise distance. EMD, however is highly computationally intensive as it involves solution to linear programming (LP) problem. We propose an efficient lower bound on computing EMD by aggregating the terms in the ontology at the first level of the tree. This reduces the number of terms, thereby decreasing the number of flow variables and making it computationally faster. Range queries that use the lower bound runs faster by up to a factor of 20, as approximately 97% percentage of database objects are pruned, thereby saving expensive EMD calculations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Gnu linear programming kit, http://www.gnu.org/software/glpk/
Medical subject headings, http://www.nlm.nih.gov/pubs/factsheets/mesh.html/
Ohsumed test collection, http://ir.ohsu.edu/ohsumed/ohsumed.html
Brameier, M., Wiuf, C.: Co-clustering and visualization of gene expression data and gene ontology terms for Saccharomyces cerevisiae using self-organizing maps. Journal of biomedical informatics 40(2), 160–173 (2007)
Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to algorithms. MIT Press, Cambridge (1990)
Gruber, T.R.: A translation approach to portable ontology specifications. Knowledge Acquisition 5(2), 199–220 (1993)
Ljosa, V., Bhattacharya, A., Singh, A.: Indexing spatially sensitive distance measures using multi-resolution lower bounds. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, p. 865. Springer, Heidelberg (2006)
Mariño-Ramírez, L., Bodenreider, O., Kantz, N., Jordan, I.: Co-evolutionary rates of functionally related yeast genes. Evolutionary bioinformatics online 2, 295 (2006)
Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to wordnet: An on-line lexical database. International Journal of Lexicography 3(4), 235–244 (1990)
Pease, A., Fikes, R., Hendler, J.: Ontologies and the Semantic Web. In: AAAI Workshop, Edmonton AB (2002)
Rubner, Y., Tomasi, C., Guibas, L.: The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision 40(2), 99–121 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Saraswat, M. (2010). Efficiently Finding Similar Objects on Ontologies Using Earth Mover’s Distance. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds) Database and Expert Systems Applications. DEXA 2010. Lecture Notes in Computer Science, vol 6262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15251-1_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-15251-1_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15250-4
Online ISBN: 978-3-642-15251-1
eBook Packages: Computer ScienceComputer Science (R0)