Automatic Keyphrase Extraction from Medical Documents

  • Kamal Sarkar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5909)


Keyphrases provide semantic metadata that summarizes the documents and enable the reader to quickly determine whether the given article is in the reader’s fields of interest. This paper presents an automatic keyphrase extraction method based on the naive Bayesian learning that exploits a number of domain-specific features to boost up the keyphrase extraction performance in medical domain. The proposed method has been compared to a popular keyphrase extraction algorithm, called Kea.


Domain specific keyphrase extraction Medical documents Text mining Naïve Bayes 


  1. 1.
    Wu, Y.B., Li, Q.: Document keyphrases as subject metadata: incorporating document key concepts in search results. Journal of Information Retrieval 11(3), 229–249 (2008)CrossRefGoogle Scholar
  2. 2.
    Li, Q., Wu, Y.B., Bot, R., Chen, X.: Incorporating document keyphrases in search results. In: Proceedings of the tenth American conference on information systems, New York (2004)Google Scholar
  3. 3.
    Jones, S., Staveley, M.: Phrasier: A system for interactive document retrieval using keyphrases. In: Proceedings of SIGIR 1999, Berkeley, CA (1999)Google Scholar
  4. 4.
    Gutwin, C., Paynter, G., Witten, I., Nevill-Manning, C., Frank, E.: Improving browsing in digital libraries with keyphrase indexes. Journal of Decision Support Systems 27(1-2), 81–104 (2003)CrossRefGoogle Scholar
  5. 5.
    Kosovac, B., Vanier, D.J., Froese, T.M.: Use of keyphrase extraction software for creation of an AEC/FM thesaurus. Journal of Information Technology in Construction 5, 25–36 (2000)Google Scholar
  6. 6.
    Jonse, S., Mahoui, M.: Hierarchical document clustering using automatically extracted keyphrase. In: Proceedings of the third international Asian conference on digital libraries, Seoul, Korea, pp. 113–120 (2000)Google Scholar
  7. 7.
    Turney, P.D.: Learning algorithm for keyphrase extraction. Journal of Information Retrieval 2(4), 303–336 (2000)CrossRefGoogle Scholar
  8. 8.
    Frank, E., Paynter, G., Witten, I.H., Gutwin, C., Nevill-Manning, C.: Domain-specific keyphrase extraction. In: Proceeding of the sixteenth international joint conference on artificial intelligence, San Mateo, CA (1999)Google Scholar
  9. 9.
    Tsuruoka, Y., Tateishi, Y., Kim, J., Ohta, T., McNaught, J., Ananiadou, S., Tsujii, J.: Developing a Robust Part-of-Speech Tagger for Biomedical Text. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 382–392. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, San Francisco (1993)Google Scholar
  11. 11.
    Jones, S., Paynter, G.W.: Automatic extraction of document keyphrases for use in digital libraries: evaluation and applications. Journal of American Society of Information Science and Technology 53(8), 653–677 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Kamal Sarkar
    • 1
  1. 1.Computer Science & Engineering DepartmentJadavpur UniversityKolkataIndia

Personalised recommendations