Knowledge Representation and Ontologies

  • Kin Wah FungEmail author
  • Olivier Bodenreider
Part of the Health Informatics book series (HI)


The representation of medical data and knowledge is fundamental in the field of medical informatics. Ontologies and related artifacts are important tools in knowledge representation, yet they are often given little attention and taken for granted. In this chapter, we give an overview of the development of medical ontologies, including available ontology repositories and tools. We highlight some ontologies that are particularly relevant to clinical research and describe with examples the benefits of using ontologies to facilitate research workflow management, data integration, and electronic phenotyping.


Knowledge representation Biomedical ontologies Research metadata ontology Data content ontology Ontology-driven knowledge bases Data integration Electronic phenotyping 



This research was supported in part by the Intramural Research Program of the National Institutes of Health (NIH), National Library of Medicine (NLM).


  1. 1.
    Bodenreider O. Biomedical ontologies in action: role in knowledge management, data integration and decision support. Yearb Med Inform 2008;17(01):67–79.Google Scholar
  2. 2.
    Smith B. Ontology (Science). Nature Precedings, 2008. Available from Nature Precedings.
  3. 3.
    Bodenreider O, Stevens R. Bio-ontologies: current trends and future directions. Brief Bioinform. 2006;7(3):256–74.PubMedPubMedCentralGoogle Scholar
  4. 4.
    Cimino JJ, Zhu X. The practical impact of ontologies on biomedical informatics. Yearb Med Inform 2006;15(01):124–135.Google Scholar
  5. 5.
    Smith B, et al. Relations in biomedical ontologies. Genome Biol. 2005;6(5):R46.PubMedPubMedCentralGoogle Scholar
  6. 6.
    Simmons P, Melia J. Continuants and occurrents. Proc Aristot Soc Suppl Vol. 2000;74:59–75. +77–92.Google Scholar
  7. 7.
    IFOMIS. BFO. Available from:
  8. 8.
    Laboratory for Applied Ontology. DOLCE. Available from:
  9. 9.
    McCray AT. An upper-level ontology for the biomedical domain. Comp Funct Genomics. 2003;4(1):80–4.PubMedPubMedCentralGoogle Scholar
  10. 10.
    Baader F, et al. The description logic handbook: theory, implementation, and applications. 2nd ed. xix, 601 p ed. 2007, Cambridge University Press: Cambridge, New York. ill. 26 cm.Google Scholar
  11. 11.
    Berners-Lee T, Hendler J, Lassila O. The semantic web: a new form of web content that is meaningful to computers will unleash a revolution of new possibilities. Sci Am. 2001;284(5):34–43.Google Scholar
  12. 12.
    World Wide Web Consortium. OWL 2 web ontology language document overview. 2009a. Available from:
  13. 13.
    World Wide Web Consortium. RDF vocabulary description language 1.0: RDF schema. 2004. Available from:
  14. 14.
    World Wide Web Consortium. SKOS simple knowledge organization system reference. 2009b. Available from:
  15. 15.
    Day-Richter J. The OBO flat file format specification. 2006. Available from:
  16. 16.
    Mungall C, et al.. OBO flat file format 1.4 syntax and semantics. Available from:
  17. 17.
    Golbreich C, et al. OBO and OWL: leveraging semantic web technologies for the life sciences, in Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference. Busan: Springer-Verlag; 2007. p. 169–82.Google Scholar
  18. 18.
    Noy N, et al. The ontology life cycle: integrated tools for editing, publishing, peer review, and evolution of ontologies. AMIA Ann Symp Proc. 2010;2010:552–6.Google Scholar
  19. 19.
    Stanford Center for Biomedical Informatics Research. Protégé. Available from:
  20. 20.
    Day-Richter J, et al. OBO-edit-an ontology editor for biologists. Bioinformatics. 2007;23(16):2198–200.PubMedGoogle Scholar
  21. 21.
    Lawrence Berkeley National Lab. OBO-edit. Available from:
  22. 22.
    Smith B, et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007;25(11):1251–5.PubMedPubMedCentralGoogle Scholar
  23. 23.
    International S. Partnerships – working with other standards organizations. Available from:
  24. 24.
    Richesson RL, Krischer J. Data standards in clinical research: gaps, overlaps, challenges and future directions. J Am Med Inform Assoc. 2007;14(6):687–96.PubMedPubMedCentralGoogle Scholar
  25. 25.
    FAIRsharing website.
  26. 26.
    McQuilton P, Gonzalez-Beltran A, Rocca-Serra P, Thurston M, Lister A, Maguire E, Sansone SA. BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences. Database (Oxford). 2016.Google Scholar
  27. 27.
    Sim I, et al. Ontology-based federated data access to human studies information. AMIA Ann Symp Proc. 2012;2012:856–65.Google Scholar
  28. 28.
    Tu SW, et al. OCRe: ontology of clinical research. In 11th International Protege Conference. 2009.Google Scholar
  29. 29.
    Bandrowski A, et al. The ontology for biomedical investigations. PLoS One. 2016;11(4):e0154556.PubMedPubMedCentralGoogle Scholar
  30. 30.
    Ontology for Biomedical Investigations: Community Standard for Scientific Data Integration. Available from:
  31. 31.
    Whetzel PL, et al. Development of FuGO: an ontology for functional genomics investigations. OMICS. 2006;10(2):199–204.PubMedPubMedCentralGoogle Scholar
  32. 32.
    Brinkman RR, et al. Modeling biomedical experimental processes with OBI. J Biomed Semant. 2010;1(Suppl 1):S7.Google Scholar
  33. 33.
    Becnel LB, et al. BRIDG: a domain information model for translational and clinical protocol-driven research. J Am Med Inform Assoc. 2017;24(5):882–90.PubMedPubMedCentralGoogle Scholar
  34. 34.
    Biomedical Research Integrated Domain Group Website. Available from:
  35. 35.
    Fridsma DB, et al. The BRIDG project: a technical report. J Am Med Inform Assoc. 2008;15(2):130–7.PubMedPubMedCentralGoogle Scholar
  36. 36.
    Tu SW, et al. Bridging epoch: mapping two clinical trial ontologies. In 10th International Protege Conference. 2007.Google Scholar
  37. 37.
    de Coronado S, et al. NCI thesaurus: using science-based terminology to integrate cancer research results. Med Info. 2004;11(Pt 1):33–7.Google Scholar
  38. 38.
    Fragoso G, et al. Overview and utilization of the NCI thesaurus. Comp Funct Genomics. 2004;5(8):648–54.PubMedPubMedCentralGoogle Scholar
  39. 39.
    Sioutos N, et al. NCI Thesaurus: a semantic model integrating cancer-related clinical and molecular information. J Biomed Inform. 2007;40(1):30–43.PubMedGoogle Scholar
  40. 40.
    International S. SNOMED CT (Systematized Nomenclature of Medicine-Clinical Terms), SNOMED International. Available from:
  41. 41.
    Lee D, et al. A survey of SNOMED CT implementations. J Biomed Inform. 2013;46(1):87–96.PubMedGoogle Scholar
  42. 42.
    Blumenthal D, Tavenner M. The “meaningful use” regulation for electronic health records. N Engl J Med. 2010;363(6):501–4.PubMedGoogle Scholar
  43. 43.
    Office of the National Coordinator for Health Information Technology (ONC) – Department of Health and Human Services. Standards & certification criteria Interim final rule: revisions to initial set of standards, implementation specifications, and certification criteria for electronic health record technology. Fed Regist. 2010;75(197):62686–90.Google Scholar
  44. 44.
    Huff SM, et al. Development of the Logical Observation Identifiers Names and Codes (LOINC) vocabulary. J Am Med Inform Assoc. 1998;5(3):276–92.PubMedPubMedCentralGoogle Scholar
  45. 45.
    Logical Observation Identifier Names and Codes (LOINC). Available from:
  46. 46.
    Nelson SJ, et al. Normalized names for clinical drugs: RxNorm at 6 years. J Am Med Inform Assoc. 2011;18(4):441–8.PubMedPubMedCentralGoogle Scholar
  47. 47.
    Bouhaddou O, et al. Exchange of computable patient data between the Department of Veterans Affairs (VA) and the Department of Defense (DoD): terminology standards strategy. J Am Med Inform Assoc. 2008;15:174–183.Google Scholar
  48. 48.
    History of the development of the ICD, World Health Organization. Available from:
  49. 49.
    Steindel SJ. International classification of diseases, 10th edition, clinical modification and procedure coding system: descriptive overview of the next generation HIPAA code sets. J Am Med Inform Assoc. 2010;17(3):274–82.PubMedPubMedCentralGoogle Scholar
  50. 50.
    Fung KW, et al. Preparing for the ICD-10-CM transition: automated methods for translating ICD codes in clinical phenotype definitions. EGEMS (Wash DC). 2016;4(1):1211.Google Scholar
  51. 51.
    Averill RF, et al. Development of the ICD-10 procedure coding system (ICD-10-PCS). Top Health Inf Manag. 2001;21(3):54–88.Google Scholar
  52. 52.
    Cimino JJ, Ayres EJ. The clinical research data repository of the US National Institutes of Health. Stud Health Technol Inform. 2010;160(Pt 2):1299–303.PubMedPubMedCentralGoogle Scholar
  53. 53.
    Lowe HJ, et al. STRIDE – an integrated standards-based translational research informatics platform. AMIA Ann Symp Proc. 2009;2009:391–5.Google Scholar
  54. 54.
    Ruttenberg A, et al. Methodology – advancing translational research with the Semantic Web. BMC Bioinforma. 2007;8:S2.Google Scholar
  55. 55.
    McCusker JP, et al. Semantic web data warehousing for caGrid. BMC Bioinforma. 2009;10(Suppl 10):S2.Google Scholar
  56. 56.
    Sahoo SS, et al. An ontology-driven semantic mashup of gene and biological pathway information: application to the domain of nicotine dependence. J Biomed Inform. 2008;41(5):752–65.PubMedPubMedCentralGoogle Scholar
  57. 57.
    Semantic Web for Health Care and Life Sciences Interest Group. Translational medicine ontology and knowledge base. Available from:
  58. 58.
    Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32(Database issue):D267–70.PubMedPubMedCentralGoogle Scholar
  59. 59.
    Humphreys BL, Lindberg DA, Hole WT. Assessing and enhancing the value of the UMLS Knowledge Sources. Proc Annu Symp Comput Appl Med Care. 1991:78–82.Google Scholar
  60. 60.
    Humphreys BL, et al. The unified medical language system: an informatics research collaboration. J Am Med Inform Assoc. 1998;5(1):1–11.PubMedPubMedCentralGoogle Scholar
  61. 61.
    Lindberg DA, Humphreys BL, McCray AT. The unified medical language system. Methods Inf Med. 1993;32(4):281–91.PubMedGoogle Scholar
  62. 62.
    UMLS. Unified Medical Language System (UMLS). Available from:
  63. 63.
    McCray AT, Srinivasan S, Browne AC. Lexical methods for managing variation in biomedical terminologies. Proc Ann Symp Comput Appl Med Care. 1994:235–9.Google Scholar
  64. 64.
    Fung KW, Bodenreider O. Utilizing the UMLS for semantic mapping between terminologies. AMIA Annu Symp Proc. 2005:266–70.Google Scholar
  65. 65.
    Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp. 2001:17–21.Google Scholar
  66. 66.
    Aronson AR, Lang FM. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010;17(3):229–36.PubMedPubMedCentralGoogle Scholar
  67. 67.
    Fung KW, Hole WT, Srinivasan S. Who is using the UMLS and how – insights from the UMLS user annual reports. AMIA Annu Symp Proc. 2006:274–8.Google Scholar
  68. 68.
    Noy NF, et al. BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res. 2009;37(Web Server issue):W170–3.PubMedPubMedCentralGoogle Scholar
  69. 69.
    Ghazvinian A, Noy NF, Musen MA. Creating mappings for ontologies in biomedicine: simple methods work. AMIA Ann Symp Proc. 2009;2009:198–202.Google Scholar
  70. 70.
    Shankar RD, et al. An ontology-based architecture for integration of clinical trials management applications. AMIA Ann Symp Proc. 2007:661–5.Google Scholar
  71. 71.
    Shankar R, et al. TrialWiz: an ontology-driven tool for authoring clinical trial protocols. AMIA Ann Symp Proc. 2008:1226.Google Scholar
  72. 72.
    Brochhausen M, et al. The ACGT master ontology and its applications – towards an ontology-driven cancer research and management system. J Biomed Inform. 2011;44(1):8–25.PubMedGoogle Scholar
  73. 73.
    Martin L, Anguita A, Graf N, Tsiknakis M, Brochhausen M, Rüping S, Bucur A, Sfakianakis S, Sengstag T, Buffa F, Stenzhorn H. ACGT: advancing clinico-genomic trials on cancer - four years of experience. Stud Health Technol Inform. 2011;169:734–8.Google Scholar
  74. 74.
    Stenzhorn H, et al. The ObTiMA system – ontology-based managing of clinical trials. Stud Health Technol Inform. 2010;160(Pt 2):1090–4.PubMedGoogle Scholar
  75. 75.
    Weiler G, et al. Ontology based data management systems for post-genomic clinical trials within a European Grid Infrastructure for Cancer Research. Conf Proc IEEE Eng Med Biol Soc. 2007;2007:6435–8.PubMedGoogle Scholar
  76. 76.
    Eukaryotic Pathogen Database. Available from:
  77. 77.
  78. 78.
    Genome-Wide Association Studies. Available from:
  79. 79.
    Bodenreider O. Ontologies and data integration in biomedicine: success stories and challenging issues. In: Bairoch A, Cohen-Boulakia S, Froidevaux C, editors. Proceedings of the Fifth International Workshop on Data Integration in the Life Sciences (DILS 2008). Berlin: Springer; 2008b. p. 1–4.Google Scholar
  80. 80.
    Vivli: Center for Global Clinical Research Data. Available from:
  81. 81.
    Rubin DL, Shah NH, Noy NF. Biomedical ontologies: a functional perspective. Brief Bioinform. 2008;9(1):75–90.PubMedGoogle Scholar
  82. 82.
    Sansone SA, et al. Toward interoperable bioscience data. Nat Genet. 2012;44(2):121–6.PubMedPubMedCentralGoogle Scholar
  83. 83.
    SALUS Project: Security and interoperability in next generation PPDR communication infrastructures. Available from:
  84. 84.
    Cook C, et al. Real-time updates of meta-analyses of HIV treatments supported by a biomedical ontology. Account Res. 2007;14(1):1–18.PubMedGoogle Scholar
  85. 85.
    Shah NH, et al. Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinforma. 2009;10(Suppl 2):S1.Google Scholar
  86. 86.
    Bizer C, Heath T, Berners-Lee T. Linked data – the story so far. Int J Semant Web Inf Syst. 2009;5(3):1–22.Google Scholar
  87. 87.
    HCLS. Semantic Web Health Care and Life Sciences (HCLS) Interest Group.Google Scholar
  88. 88.
    Semantic Web for Health Care and Life Sciences Interest Group. Linking open drug data. Available from:
  89. 89.
    Gottesman O, et al. The electronic medical records and genomics (eMERGE) network: past, present, and future. Genet Med. 2013;15(10):761–71.PubMedPubMedCentralGoogle Scholar
  90. 90.
    Fleurence RL, et al. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc. 2014;21(4):578–82.PubMedPubMedCentralGoogle Scholar
  91. 91.
    Chute CG, et al. The SHARPn project on secondary use of electronic medical record data: progress, plans, and possibilities. AMIA Ann Symp Proc. 2011;2011:248–56.Google Scholar
  92. 92.
    Hripcsak G, et al. Observational health data sciences and informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform. 2015;216:574–8.PubMedPubMedCentralGoogle Scholar
  93. 93.
    Richesson RL, et al. Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. J Am Med Inform Assoc. 2013;20(e2):e226–31.PubMedPubMedCentralGoogle Scholar
  94. 94.
    Carroll RJ, et al. Portability of an algorithm to identify rheumatoid arthritis in electronic health records. J Am Med Inform Assoc. 2012;19(e1):e162–9.PubMedPubMedCentralGoogle Scholar
  95. 95.
    Cutrona SL, et al. Validation of acute myocardial infarction in the Food and Drug Administration’s mini-sentinel program. Pharmacoepidemiol Drug Saf. 2013;22(1):40–54.PubMedGoogle Scholar
  96. 96.
    Kho AN, et al. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J Am Med Inform Assoc. 2012;19(2):212–8.PubMedGoogle Scholar
  97. 97.
    Newton KM, et al. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J Am Med Inform Assoc. 2013;20(e1):e147–54.PubMedPubMedCentralGoogle Scholar
  98. 98.
    Ritchie MD, et al. Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. Am J Hum Genet. 2010;86(4):560–72.PubMedPubMedCentralGoogle Scholar
  99. 99.
    Banda JM, et al. Electronic phenotyping with APHRODITE and the observational health sciences and informatics (OHDSI) data network. AMIA Jt Summits Transl Sci Proc. 2017;2017:48–57.PubMedPubMedCentralGoogle Scholar
  100. 100.
    Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc. 2013;20(1):117–21.PubMedGoogle Scholar
  101. 101.
    Martin-Sanchez FJ, et al. Secondary use and analysis of big data collected for patient care. Yearb Med Inform. 2017;26(1):28–37.PubMedPubMedCentralGoogle Scholar
  102. 102.
    Yu S, et al. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources. J Am Med Inform Assoc. 2015;22(5):993–1000.PubMedPubMedCentralGoogle Scholar
  103. 103.
    Kirby JC, et al. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc. 2016;23(6):1046–52.PubMedPubMedCentralGoogle Scholar
  104. 104.
    Campbell JR, Payne TH. A comparison of four schemes for codification of problem lists. Proc Ann Symp Comput Appl Med Care. 1994:201–5.Google Scholar
  105. 105.
    Campbell JR, et al. Phase II evaluation of clinical coding schemes: completeness, taxonomy, mapping, definitions, and clarity. CPRI work group on codes and structures. J Am Med Inform Assoc. 1997;4(3):238–51.PubMedPubMedCentralGoogle Scholar
  106. 106.
    Chute CG, et al. The content coverage of clinical classifications. For the computer-based patient record institute’s work group on codes & structures. J Am Med Inform Assoc. 1996;3(3):224–33.PubMedPubMedCentralGoogle Scholar
  107. 107.
    Mo H, et al. Desiderata for computable representations of electronic health records-driven phenotype algorithms. J Am Med Inform Assoc. 2015;22(6):1220–30.PubMedPubMedCentralGoogle Scholar
  108. 108.
    Murphy SN, et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc. 2010;17(2):124–30.PubMedPubMedCentralGoogle Scholar
  109. 109.
    Electronic Clinical Quality Improvement Resource Center, The Office of the National Coordinator for Health Information Technology. Available from:
  110. 110.
    Value Set Authority Center, National Library of Medicine Available from:

Copyright information

© Springer International Publishing 2019

Authors and Affiliations

  1. 1.Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of HealthBethesdaUSA

Personalised recommendations