Using Annotations from Controlled Vocabularies to Find Meaningful Associations

  • Woei-Jyh Lee
  • Louiqa Raschid
  • Padmini Srinivasan
  • Nigam Shah
  • Daniel Rubin
  • Natasha Noy
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4544)


This paper presents the LSLink (or Life Science Link) methodology that provides users with a set of tools to explore the rich Web of interconnected and annotated objects in multiple repositories, and to identify meaningful associations. Consider a physical link between objects in two repositories, where each of the objects is annotated with controlled vocabulary (CV) terms from two ontologies. Using a set of LSLink instances generated from a background dataset of knowledge we identify associations between pairs of CV terms that are potentially significant and may lead to new knowledge. We develop an approach based on the logarithm of the odds (LOD) to determine a confidence and support in the associations between pairs of CV terms. Using a case study of Entrez Gene objects annotated with GO terms linked to PubMed objects annotated with MeSH terms, we describe a user validation and analysis task to explore potentially significant associations.


links between data objects annotations associations  controlled vocabularies LOD confidence and support scores life science link (LSLink) 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, R., et al.: Mining association rules between sets of items in large databases. SIGMOD Record 22(2), 207–216 (1993)CrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: Proceeding of the 20th International Conference on Very Large Data Bases, pp. 487–499, San Francisco, CA, USA (September 1994)Google Scholar
  3. 3.
    Barnard, G.A.: Statistical inference. Journal of the Royal Statistical Society. Series B (Methodological) 11(2), 115–149 (1949)MathSciNetGoogle Scholar
  4. 4.
    Blaschke, C., et al.: Evaluation of BioCreAtIvE assessment of task 2. BMC Bioinformatics 6(Suppl 1), S16 (2005)CrossRefGoogle Scholar
  5. 5.
    Camon, E., et al.: The Gene Ontology Annotation (GOA) Database: sharing knowledge in UniProt with Gene Ontology. Nucleic Acids Research 32(Database issue), D262–D266 (2004)CrossRefGoogle Scholar
  6. 6.
    Couto, F.M., et al.: Finding genomic ontology terms in text using evidence content. BMC Bioinformatics 6(Suppl 1), S21 (2005)CrossRefGoogle Scholar
  7. 7.
    Couto, F.M., et al.: GOAnnotator: linking protein GO annotations to evidence text. Journal of Biomedical Discovery and Collaboration 1(19) (December 20, 2006)Google Scholar
  8. 8.
    Current Semantic Types in the Unified Medical Language System (UMLS), ml
  9. 9.
    Fiszman, M., et al.: Integrating a hypernymic proposition interpreter into a semantic processor for biomedical text. In: AMIA 2003 Annual Symposium, pp. 239–243, Washington, DC, USA (November 8-12, 2003)Google Scholar
  10. 10.
    Fujibuchi, W., et al.: DBGET/LinkDB: an integrated database retrieval system. In: Third Pacific Symposium on Biocomputing (PSB 1998), pp. 683–694, Maui, Hawaii, USA, (January 4-9, 1998)Google Scholar
  11. 11.
    Gene Ontology (GO),
  12. 12.
    Gene Ontology Annotation (GOA),
  13. 13.
    Hamosh, A., et al.: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research 33(Database issue), D514–D517 (2005)CrossRefGoogle Scholar
  14. 14.
    Hoffmann, R., Valencia, A.: A gene network for navigating the literature. Nature Genetics 36(7), 664 (2004)CrossRefGoogle Scholar
  15. 15.
    Hristovski, D., et al.: Improving literature based discovery support by genetic knowledge integration. Studies in health technology and informatics 95, 68–73 (2003)Google Scholar
  16. 16.
    Kersey, P.J., et al.: Integr8: enhanced inter-operability of european molecular biology databases. Methods of Information in Medicine 42(2), 154–160 (2003)Google Scholar
  17. 17.
    Koike, A., Takagi, T.: Knowledge discovery based on an implicit and explicit conceptual network. Journal of the American Society for Information Science and Technology 58(1), 51–65 (2007)CrossRefGoogle Scholar
  18. 18.
    Korbel, J.O., et al.: Systematic association of genes to phenotypes by genome and literature mining. PLoS Biology 3(5) (April 5, 2005)Google Scholar
  19. 19.
    Lee, W.-J., Raschid, L., Vidal, M.-E.: A Generic, Flexible and Scalable Methodology to Enhance the Semantics of Links in Life Science Data Resources. Technical Report CS-TR-4809 (UMIACS-TR-2006-29), Univeristy of Maryland, (June 2006)Google Scholar
  20. 20.
    Maglott, D., et al.: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research 35(Database issue), D26–D31 (2007)CrossRefGoogle Scholar
  21. 21.
    Martin, A.C.: PDBSprotEC: a Web-accessible database linking PDB chains to EC numbers via SwissProt. Bioinformatics 20(6), 986–988 (2004)CrossRefGoogle Scholar
  22. 22.
    Medical Subject Headings (MeSH),
  23. 23.
    Neumann, E.K., Quan, D.: Biodash: A semantic web dashboard for drug development. In: Eleventh Pacific Symposium on Biocomputing (PSB 2006), pp. 140–151, Maui, Hawaii, USA, (January 3-7, 2006)Google Scholar
  24. 24.
    Perez-Iratxeta, C., Bork, P., Andrade, M.A.: Association of genes to genetically inherited diseases using data mining. Nature Genetics 31(3), 316–319 (2002)Google Scholar
  25. 25.
    Ray, S., Craven, M.: Learning statistical models for annotating proteins with function information using biomedical text. BMC Bioinformatics 6(Suppl 1), S18 (2005)CrossRefGoogle Scholar
  26. 26.
    Rice, S.B., Nenadic, G., Stapley, B.J.: Mining protein function from text using term-based support vector machines. BMC Bioinformatics 6(Suppl 1), S22 (2005)CrossRefGoogle Scholar
  27. 27.
    Siadaty, M.S., Knausg, W.A.: Locating previously unknown patterns in data-mining results: a dual data- and knowledge- mining method. BMC Medical Informatics and Decision Making, 6(13) (March 7, 2006)Google Scholar
  28. 28.
    Srinivasan, P., Libbus, B.: Mining MEDLINE for implicit links between dietary substances and diseases. Bioinformatics 20(Supplement 1), i290–i296 (2004)CrossRefGoogle Scholar
  29. 29.
    Stanyon, C.A., et al.: A Drosophila protein-interaction map centered on cell-cycle regulators. Genome Biology 5(12), R96 (2004)CrossRefGoogle Scholar
  30. 30.
    Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT),
  31. 31.
    Thomas, J., et al.: Automatic extraction of protein interactions from scientific abstracts. In: Fifth Pacific Symposium on Biocomputing (PSB 2000), pp. 538–549. Oahu, Hawaii, USA (2000)Google Scholar
  32. 32.
    Thorn, C.F., et al.: PharmGKB: the pharmacogenetics and pharmacogenomics knowledge base. Methods in Molecular Biology 311, 179–191 (2005)Google Scholar
  33. 33.
    Tiffin, N., et al.: Integration of text- and data-mining using ontologies successfully selects disease gene candidates. Nucleic Acids Research 33(5), 1544–1552 (2005)CrossRefGoogle Scholar
  34. 34.
    Wheeler, D.L., et al.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 35(Database issue), D5–D12 (2007)CrossRefGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Woei-Jyh Lee
    • 1
  • Louiqa Raschid
    • 1
  • Padmini Srinivasan
    • 2
  • Nigam Shah
    • 3
  • Daniel Rubin
    • 3
  • Natasha Noy
    • 3
  1. 1.University of Maryland, College Park, MD 20742USA
  2. 2.The University of Iowa, Iowa City, IA 52242USA
  3. 3.Stanford University, Stanford, CA 94305USA

Personalised recommendations