Using Annotations from Controlled Vocabularies to Find Meaningful Associations

Lee, Woei-Jyh; Raschid, Louiqa; Srinivasan, Padmini; Shah, Nigam; Rubin, Daniel; Noy, Natasha

doi:10.1007/978-3-540-73255-6_20

Woei-Jyh Lee¹,
Louiqa Raschid¹,
Padmini Srinivasan²,
Nigam Shah³,
Daniel Rubin³ &
…
Natasha Noy³

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4544))

Included in the following conference series:

International Conference on Data Integration in the Life Sciences

712 Accesses
6 Citations

Abstract

This paper presents the LSLink (or Life Science Link) methodology that provides users with a set of tools to explore the rich Web of interconnected and annotated objects in multiple repositories, and to identify meaningful associations. Consider a physical link between objects in two repositories, where each of the objects is annotated with controlled vocabulary (CV) terms from two ontologies. Using a set of LSLink instances generated from a background dataset of knowledge we identify associations between pairs of CV terms that are potentially significant and may lead to new knowledge. We develop an approach based on the logarithm of the odds (LOD) to determine a confidence and support in the associations between pairs of CV terms. Using a case study of Entrez Gene objects annotated with GO terms linked to PubMed objects annotated with MeSH terms, we describe a user validation and analysis task to explore potentially significant associations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., et al.: Mining association rules between sets of items in large databases. SIGMOD Record 22(2), 207–216 (1993)
Article Google Scholar
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: Proceeding of the 20th International Conference on Very Large Data Bases, pp. 487–499, San Francisco, CA, USA (September 1994)
Google Scholar
Barnard, G.A.: Statistical inference. Journal of the Royal Statistical Society. Series B (Methodological) 11(2), 115–149 (1949)
MathSciNet Google Scholar
Blaschke, C., et al.: Evaluation of BioCreAtIvE assessment of task 2. BMC Bioinformatics 6(Suppl 1), S16 (2005)
Article Google Scholar
Camon, E., et al.: The Gene Ontology Annotation (GOA) Database: sharing knowledge in UniProt with Gene Ontology. Nucleic Acids Research 32(Database issue), D262–D266 (2004)
Article Google Scholar
Couto, F.M., et al.: Finding genomic ontology terms in text using evidence content. BMC Bioinformatics 6(Suppl 1), S21 (2005)
Article Google Scholar
Couto, F.M., et al.: GOAnnotator: linking protein GO annotations to evidence text. Journal of Biomedical Discovery and Collaboration 1(19) (December 20, 2006)
Google Scholar
Current Semantic Types in the Unified Medical Language System (UMLS), http://www.nlm.nih.gov/research/umls/META3_current_semantic_types.ht ml
Fiszman, M., et al.: Integrating a hypernymic proposition interpreter into a semantic processor for biomedical text. In: AMIA 2003 Annual Symposium, pp. 239–243, Washington, DC, USA (November 8-12, 2003)
Google Scholar
Fujibuchi, W., et al.: DBGET/LinkDB: an integrated database retrieval system. In: Third Pacific Symposium on Biocomputing (PSB 1998), pp. 683–694, Maui, Hawaii, USA, (January 4-9, 1998)
Google Scholar
Gene Ontology (GO), http://www.geneontology.org/
Gene Ontology Annotation (GOA), http://www.ebi.ac.uk/GOA/
Hamosh, A., et al.: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research 33(Database issue), D514–D517 (2005)
Article Google Scholar
Hoffmann, R., Valencia, A.: A gene network for navigating the literature. Nature Genetics 36(7), 664 (2004)
Article Google Scholar
Hristovski, D., et al.: Improving literature based discovery support by genetic knowledge integration. Studies in health technology and informatics 95, 68–73 (2003)
Google Scholar
Kersey, P.J., et al.: Integr8: enhanced inter-operability of european molecular biology databases. Methods of Information in Medicine 42(2), 154–160 (2003)
Google Scholar
Koike, A., Takagi, T.: Knowledge discovery based on an implicit and explicit conceptual network. Journal of the American Society for Information Science and Technology 58(1), 51–65 (2007)
Article Google Scholar
Korbel, J.O., et al.: Systematic association of genes to phenotypes by genome and literature mining. PLoS Biology 3(5) (April 5, 2005)
Google Scholar
Lee, W.-J., Raschid, L., Vidal, M.-E.: A Generic, Flexible and Scalable Methodology to Enhance the Semantics of Links in Life Science Data Resources. Technical Report CS-TR-4809 (UMIACS-TR-2006-29), Univeristy of Maryland, (June 2006)
Google Scholar
Maglott, D., et al.: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research 35(Database issue), D26–D31 (2007)
Article Google Scholar
Martin, A.C.: PDBSprotEC: a Web-accessible database linking PDB chains to EC numbers via SwissProt. Bioinformatics 20(6), 986–988 (2004)
Article Google Scholar
Medical Subject Headings (MeSH), http://www.nlm.nih.gov/mesh/meshhome.html
Neumann, E.K., Quan, D.: Biodash: A semantic web dashboard for drug development. In: Eleventh Pacific Symposium on Biocomputing (PSB 2006), pp. 140–151, Maui, Hawaii, USA, (January 3-7, 2006)
Google Scholar
Perez-Iratxeta, C., Bork, P., Andrade, M.A.: Association of genes to genetically inherited diseases using data mining. Nature Genetics 31(3), 316–319 (2002)
Google Scholar
Ray, S., Craven, M.: Learning statistical models for annotating proteins with function information using biomedical text. BMC Bioinformatics 6(Suppl 1), S18 (2005)
Article Google Scholar
Rice, S.B., Nenadic, G., Stapley, B.J.: Mining protein function from text using term-based support vector machines. BMC Bioinformatics 6(Suppl 1), S22 (2005)
Article Google Scholar
Siadaty, M.S., Knausg, W.A.: Locating previously unknown patterns in data-mining results: a dual data- and knowledge- mining method. BMC Medical Informatics and Decision Making, 6(13) (March 7, 2006)
Google Scholar
Srinivasan, P., Libbus, B.: Mining MEDLINE for implicit links between dietary substances and diseases. Bioinformatics 20(Supplement 1), i290–i296 (2004)
Article Google Scholar
Stanyon, C.A., et al.: A Drosophila protein-interaction map centered on cell-cycle regulators. Genome Biology 5(12), R96 (2004)
Article Google Scholar
Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), http://www.snomed.org/snomedct/
Thomas, J., et al.: Automatic extraction of protein interactions from scientific abstracts. In: Fifth Pacific Symposium on Biocomputing (PSB 2000), pp. 538–549. Oahu, Hawaii, USA (2000)
Google Scholar
Thorn, C.F., et al.: PharmGKB: the pharmacogenetics and pharmacogenomics knowledge base. Methods in Molecular Biology 311, 179–191 (2005)
Google Scholar
Tiffin, N., et al.: Integration of text- and data-mining using ontologies successfully selects disease gene candidates. Nucleic Acids Research 33(5), 1544–1552 (2005)
Article Google Scholar
Wheeler, D.L., et al.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 35(Database issue), D5–D12 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Maryland, College Park, MD 20742, USA
Woei-Jyh Lee & Louiqa Raschid
The University of Iowa, Iowa City, IA 52242, USA
Padmini Srinivasan
Stanford University, Stanford, CA 94305, USA
Nigam Shah, Daniel Rubin & Natasha Noy

Authors

Woei-Jyh Lee
View author publications
You can also search for this author in PubMed Google Scholar
Louiqa Raschid
View author publications
You can also search for this author in PubMed Google Scholar
Padmini Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar
Nigam Shah
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Rubin
View author publications
You can also search for this author in PubMed Google Scholar
Natasha Noy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Sarah Cohen-Boulakia Val Tannen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, WJ., Raschid, L., Srinivasan, P., Shah, N., Rubin, D., Noy, N. (2007). Using Annotations from Controlled Vocabularies to Find Meaningful Associations. In: Cohen-Boulakia, S., Tannen, V. (eds) Data Integration in the Life Sciences. DILS 2007. Lecture Notes in Computer Science(), vol 4544. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73255-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-73255-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73254-9
Online ISBN: 978-3-540-73255-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics