Abstract
There is significant knowledge captured through annotations on the life sciences Web. In past research, we developed a methodology of support and confidence metrics from association rule mining, to mine the association bridge (of termlinks) between pairs of controlled vocabulary (CV) terms across two ontologies. Our (naive) approach did not exploit the following: implicit knowledge captured via the hierarchical is-a structure of ontologies, and patterns of annotation in datasets that may impact the distribution of parent/child or sibling CV terms. In this research, we consider this knowledge. We aggregate termlinks over the siblings of a parent CV term and use them as additional evidence to boost support and confidence scores in the associations of the parent CV term. A weight factor (α) reflects the contribution from the child CV terms; its value can be varied to reflect a variance of confidence values among the sibling CV terms of some parent CV term. We illustrate the benefits of exploiting this knowledge through experimental evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Maglott, D.R., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research 35, D26–D31 (2007) (Database issue)
Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A., McKusick, V.A.: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research 33, D514–D517 (2005) (Database issue)
Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Feolo, M., Geer, L.Y., Helmberg, W., Kapustin, Y., Khovayko, O., Landsman, D., Lipman, D.J., Madden, T.L., Maglott, D.R., Miller, V., Ostell, J., Pruitt, K.D., Schuler, G.D., Shumway, M., Sequeira, E., Sherry, S.T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov, R.L., Tatusova, T.A., Wagner, L., Yaschenko, E.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 36, D13–D21 (2008) (Database issue)
Wang, A.Y., Sable, J.H., Spackman, K.A.: The SNOMED Clinical Terms development process: refinement and analysis of content. In: AMIA 2002 Annual Symposium, San Antonio, Texas, USA, November 9-13, 2002, pp. 845–849 (2002)
Gene Ontology Consortium: The Gene Ontology (GO) project in 2006. Nucleic Acids Research 34, 322–326 (2006) (Database issue)
Savage, A.: Changes in MeSH data structure. Technical Report (313), NLM Technical Bulletin (March-April 2000)
Lee, W.J., Raschid, L., Srinivasan, P., Shah, N., Rubin, D., Noy, N.: Using annotations from controlled vocabularies to find meaningful associations. In: Cohen-Boulakia, S., Tannen, V. (eds.) DILS 2007. LNCS (LNBI), vol. 4544, pp. 247–263. Springer, Heidelberg (2007)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. SIGMOD Record 22(2), 207–216 (1993)
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: Proceeding of the 20th International Conference on Very Large Data Bases (VLDB 1994), San Francisco, CA, USA, September 1994, pp. 487–499 (1994)
Day, C.P.: Personal communiction (2007)
Tseng, M.C., Lin, W.Y., Jeng, R.: Incremental maintenance of ontology-exploiting association rules. In: International Conference on Machine Learning and Cybernetics, Hong Kong, China, August 19-22, 2007, pp. 2280–2285 (2007)
Han, J., Fu, Y.: Discovery of multiple-level association rules from large databases. In: Proceeding of the 21th International Conference on Very Large Data Bases (VLDB 1995), Zürich, Switzerland, September 11-15, 1995, pp. 420–431 (1995)
Jiang, T., Tan, A.H., Wang, K.: Mining generalized associations of semantic relations from textual Web content. IEEE Transactions on Knowledge and Data Engineering 19(2), 164–179 (2007)
Cheung, D.W.L., Ng, V.T.Y., Tam, B.W.: Maintenance of discovered knowledge: a case in multi-level association rules. In: Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), Portland, Oregon, USA, pp. 307–310 (1996)
Srikant, R., Agrawal, R.: Mining generalized association rules. Future Generation Computer Systems 13(2-3), 161–180 (1997)
Wang, X., Ni, Z., Cao, H.: Research on association rules mining based-on ontology in e-commerce. In: International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM 2007), Shanghai, China, September 2007, pp. 3544–3547 (2007)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1–7), 107–117 (1998)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal: Very Large Data Bases 10(4), 334–350 (2001)
Hopcroft, J.E., Karp, R.M.: An n5/2 algorithm for maximum matchings in bipartite graphs. SIAM J. Comput. 2(4), 225–231 (1973)
Yu, C., Zavaljevski, N., Desai, V., Johnson, S., Stevens, F.J., Reifman, J.: The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation. BMC Bioinformatics 9(52) (January 2008)
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research 32, D267–D270 (2004) (Database issue)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, WJ., Raschid, L., Sayyadi, H., Srinivasan, P. (2008). Exploiting Ontology Structure and Patterns of Annotation to Mine Significant Associations between Pairs of Controlled Vocabulary Terms. In: Bairoch, A., Cohen-Boulakia, S., Froidevaux, C. (eds) Data Integration in the Life Sciences. DILS 2008. Lecture Notes in Computer Science(), vol 5109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69828-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-69828-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69827-2
Online ISBN: 978-3-540-69828-9
eBook Packages: Computer ScienceComputer Science (R0)