Identifying Disease-Centric Subdomains in Very Large Medical Ontologies: A Case-Study on Breast Cancer Concepts in SNOMED CT. Or: Finding 2500 Out of 300.000
Modern medical vocabularies can contain up to hundreds of thousands of concepts. In any particular use-case only a small fraction of these will be needed. In this paper we first define two notions of a disease-centric subdomain of a large ontology. We then explore two methods for identifying disease-centric subdomains of such large medical vocabularies. The first method is based on lexically querying the ontology with an iteratively extended set of seed queries. The second method is based on manual mapping between concepts from a medical guideline document and ontology concepts. Both methods include concept-expansion over subsumption and equality relations. We use both methods to determine a breast-cancer-centric subdomain of the SNOMED CT ontology. Our experiments show that the two methods produce a considerable overlap, but they also yield a large degree of complementarity, with interesting differences between the sets of concepts that they return. Analysis of the results reveals strengths and weaknesses of the different methods.
Keywordsidentifying ontology subdomain disease related concepts ontology subsetting mapping medical terminologies seed queries medical guidelines
Unable to display preview. Download preview PDF.
- 1.Aleksovski, Z., Vdovjak, R.: Overlap of selected ontologies in the context of the breast cancer domain. In: Proceedings of SIIM 2009 (2009)Google Scholar
- 2.Aronson, A.R.: Metamap: Mapping text to the umls metathesaurus. In: Proceedings AMIA Symposium (2001)Google Scholar
- 3.CBO. Guideline for the Treatment of Breast Carcinoma. van Zuiden. PMID: 12474555 (2002)Google Scholar
- 4.Clark, K., Parsia, B.: Modularity and owl (2008)Google Scholar
- 5.Grau, B.C., Horrocks, I., Kazakov, y., Satler, U.: Modular reuse of ontologies: Theory and practise. Journal of Artificial Intelligence Research (2008)Google Scholar
- 6.Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Just the right amount: extracting modules from ontologies. In: Proceedings of WWW, pp. 717–726 (2007)Google Scholar
- 7.Konev, B., Lutz, C., Walther, D., Wolter, F.: Cex and mex: Logical diff and semantic module extraction in a fragment of owl. In: Proceedings of the OWL: Experiences and Directions Workshop, OWLED 2008 (2008)Google Scholar
- 8.Marcos, M., Galan, J.C., Martinez, B., Polo, C., Seyfang, A., Miksch, S., Serban, R., ten Teije, A., van Harmelen, F., Rosenbrand, K., Wittenberg, J., van Croonenborg, J., Lucas, P., Hommersom, A.: Protocure ii deliverable d2.2bcd: Models of selected guideline in intermediate, asbru and kiv representations. Technical report (2005), www.protocure.org
- 9.McCray, A.T., Srinivasan, S., Browne, A.C.: Lexical methods for managing variation in biomedical terminologies. In: Proceedings of Symposium on Computer Applications in Medical Care, pp. 235–239 (1994)Google Scholar
- 11.Porter, M.F.: An algorithm for suffix stripping, pp. 313–316. Morgan Kaufmann Publishers Inc., San Francisco (1997)Google Scholar
- 12.Serban, R., ten Teije, A.: Exploiting thesauri knowledge in medical guideline formalization. Methods of Information in Medicine (to appear, 2009)Google Scholar
- 14.Suntisrivaraporn, B.: Module extraction and incremental classification: A pragmatic approach for el+ ontologies (2008)Google Scholar