Unsupervised Medical Subject Heading Assignment Using Output Label Co-occurrence Statistics and Semantic Predications

Kavuluru, Ramakanth; He, Zhenghao

doi:10.1007/978-3-642-38824-8_15

Ramakanth Kavuluru^20,21 &
Zhenghao He²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7934))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

2419 Accesses
2 Citations
1 Altmetric

Abstract

Librarians at the National Library of Medicine tag each biomedical abstract to be indexed by their Pubmed information system with terms from the Medical Subject Headings (MeSH) terminology. The MeSH terminology has over 26,000 terms and indexers look at each article’s full text to assign a set of most suitable terms for indexing it. Several recent automated attempts focused on using the article title and abstract text to identify MeSH terms for the corresponding article. Most of these approaches used supervised machine learning techniques that use already indexed articles and the corresponding MeSH terms. In this paper, we present a novel unsupervised approach using named entity recognition, relationship extraction, and output label co-occurrence frequencies of MeSH term pairs from the existing set of 22 million articles already indexed with MeSH terms by librarians at NLM. The main goal of our study is to gauge the potential of output label co-occurrence statistics and relationships extracted from free text in unsupervised indexing approaches. Especially, in biomedical domains, output label co-occurrences are generally easier to obtain than training data involving document and label set pairs owing to the sensitive nature of textual documents containing protected health information. Our methods achieve a micro F-score that is comparable to those obtained using supervised machine learning techniques with training data consisting of document label set pairs. Baseline comparisons reveal strong prospects for further research in exploiting label co-occurrences and relationships extracted from free text in recommending terms for indexing biomedical articles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Funk, M., Reid, C.: Indexing consistency in medline. Bulletin of the Medical Library Association 71(2), 176 (1983)
Google Scholar
Huang, M., Névéol, A., Lu, Z.: Recommending mesh terms for annotating biomedical articles. J. of the American Medical Informatics Association 18(5), 660–667 (2011)
Article Google Scholar
Aronson, A., Bodenreider, O., Chang, H., Humphrey, S., Mork, J., Nelson, S., Rindflesch, T., Wilbur, W.: The nlm indexing initiative. In: Proceedings of the AMIA Symposium, American Medical Informatics Association, p. 17 (2000)
Google Scholar
Aronson, A., Mork, J., Gay, C., Humphrey, S., Rogers, W.: The NLM indexing initiative: Mti medical text indexer. In: Proceedings of MEDINFO (2004)
Google Scholar
Yetisgen-Yildiz, M., Pratt, W.: The effect of feature representation on medline document classification. In: AMIA Annual Symposium Proceedings, American Medical Informatics Association, vol. 2005, pp. 849–853 (2005)
Google Scholar
Sohn, S., Kim, W., Comeau, D.C., Wilbur, W.J.: Optimal training sets for bayesian prediction of MeSH assignment. Journal of the American Medical Informatics Association 15(4), 546–553 (2008)
Article Google Scholar
Jimeno-Yepes, A., Mork, J.G., Demner-Fushman, D., Aronson, A.R.: A one-size-fits-all indexing method does not exist: Automatic selection based on meta-learning. JCSE 6(2), 151–160 (2012)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Article Google Scholar
Aronson, A.R., Lang, F.M.: An overview of metamap: historical perspective and recent advances. J. American Medical Informatics Assoc. 17(3), 229–236 (2010)
Google Scholar
Bodenreider, O., Nelson, S., Hole, W., Chang, H.: Beyond synonymy: exploiting the umls semantics in mapping vocabularies. In: Proceedings of AMIA Symposium, pp. 815–819 (1998)
Google Scholar
Rindflesh, T.C., Fiszman, M.: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J. of Biomedical Informatics 36(6), 462–477 (2003)
Article Google Scholar
Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings of the 10th International Conference on World Wide Web, WWW 2001, pp. 613–622 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Division of Biomedical Informatics, Department of Biostatistics, University of Kentucky, Lexington, KY, USA
Ramakanth Kavuluru
Department of Computer Science, University of Kentucky, Lexington, KY, USA
Ramakanth Kavuluru & Zhenghao He

Authors

Ramakanth Kavuluru
View author publications
You can also search for this author in PubMed Google Scholar
Zhenghao He
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Conservatoire National des Arts et Métiers, 2 rue Conté, 75003, Paris, France
Elisabeth Métais
School of Computing, Science and Engineering, University of Salford, The Crescent, M5 4WT, Salford, Lancashire, UK
Farid Meziane & Sunil Vadera &
School of Computing Science and Engineering, University of Salford, The Crescent, M5 4WT, Salford, Lancashire, UK
Mohamad Saraee
Department of Decision and Information Sciences School of Business Administration, Oakland University, 306 Elliott Hall, 48309, Rochester, MI, USA
Vijayan Sugumaran

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kavuluru, R., He, Z. (2013). Unsupervised Medical Subject Heading Assignment Using Output Label Co-occurrence Statistics and Semantic Predications. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds) Natural Language Processing and Information Systems. NLDB 2013. Lecture Notes in Computer Science, vol 7934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38824-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-38824-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38823-1
Online ISBN: 978-3-642-38824-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics