Skip to main content

Semantic Data Integration for Life Science Entities

  • Reference work entry
  • First Online:
  • 13 Accesses

Synonyms

Data fusion; Duplicate detection; LSID; Object identification

Definition

An entity is the representation of a (not necessarily physical) real-world object, such as a gene, a protein, or a disease, within a database. To integrate information about the same entities from different databases, these representations must be analyzed to uncover the corresponding underlying objects. This process is called entity identification. A variation of entity identification is duplicate detection, which analyses two or more entities to determine whether they represent the same real-world object or not. Finally, data fusion is the process of generating a single, homogeneous representation from multiple, possibly inconsistent entities that represent the same real-world object.

When entities have globally unique keys, such as ISBN numbers in the case of books, entity identification and duplicate detection are simple. However, in life science databases, one usually has only descriptive...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Bhat TN, Bourne P, Feng Z, Gilliland G, Jain S, Ravichandran V, Schneider B, Schneider K, Thanki N, Weissig H, et al. The PDB data uniformity project. Nucleic Acids Res. 2001;29(1):214–8.

    Article  Google Scholar 

  2. Brenner SE. Errors in genome annotation. Trends Genet. 1999;15(4):132–3.

    Article  Google Scholar 

  3. Gibson G, Muse SV. A primer of genome science. Sunderland: Sinauer Associates; 2001.

    Google Scholar 

  4. Karp P.D. Models of identifiers. In: Proceedings of the 2nd Meeting on Interconnection of Molecular Biology Databases; 1995.

    Google Scholar 

  5. Kingsbury D. Consensus, common entry, and community curation. Nat Biotechnol. 1996;14(6):679.

    Article  MathSciNet  Google Scholar 

  6. Krauthammer M, Nenadic G. Term identification in the biomedical literature. J Biomed Inform. 2004;37(6):512–26.

    Article  Google Scholar 

  7. Leser U, Hakenberg J. What makes a gene name? Named entity recognition in the biomedical literature. Brief Bioinform. 2005;6(4):357–69.

    Article  Google Scholar 

  8. Müller H, Naumann F, Freytag J.-C. Data quality in genome databases. In: Proceedings of the 8th Conference on Information Quality; 2003.

    Google Scholar 

  9. Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981;147(1):195–7.

    Article  Google Scholar 

  10. Tamames J, Valencia A. The success (or not) of HUGO nomenclature. Genome Biol. 2006;7(5):402.

    Article  Google Scholar 

  11. Trissl S, Rother K, Müller H, Koch I, Steinke T, Preissner R, Frömmel C, Leser U. Columba: an integrated database of proteins, structures, and annotations. BMC Bioinformatics. 2005;6(1):81.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ulf Leser .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Leser, U. (2018). Semantic Data Integration for Life Science Entities. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_627

Download citation

Publish with us

Policies and ethics