A Comparative Study of Key Phrase Extraction for Cross-Domain Document Collections

  • Supaporn Tantanasiriwong
  • Choochart Haruechaiyasak
  • Sumanta Guha
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8839)


An extraction tool, nowadays, has become useful for text mining researchers to find keywords and keyphrases from the documents. Performing keywords and keyphrases extraction for cross-domain information are more challenging since both domains of interest are different in word usage. In this paper, two popular keyphrases extraction tools, Maui and Carrot, are investigated, for extracting terms from cross-domain document databases. The characteristic of keywords or phrases matching among different domain collections is presented and used for determining the keyphrase extraction tool for patent documents and scientific publications. In our experiment, matching between a patent and its cited publication are the key point. For evaluation, the performance of cross-domain matching is measured by comparing the similarity measure among those extraction tool results. The experimental results show that Maui tool proves to be the appropriate keyphrases extraction tool with its best performance measured by Cosine similarity of 3.31% when compared with Carrot tool for cross-domain document collections matching.


Keyphrase extraction tools Cross-domain document collection Patent Publication Similarity measures Maui Carrot 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Nguyen, T.D., Kan, M.-Y.: Keyphrase Extraction in Scientific Publications. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Kaur, B. and Sidhu, B.: Methods for key phrase extraction from documents. In: Technological Research in Engineering (IJTRE) (2014) Google Scholar
  3. 3.
    Medelyan, O., Witten, I.H.: Thesaurus based automatic keyphrase indexing. In: JCDL 2006 (2006)Google Scholar
  4. 4.
    Medelyan, O., Witten, I.: Domain-independent automatic keyphrase indexing with small training sets. Journal of the American Society for Information Science and Technology (JASIST) 59, 1026–1040 (2008)CrossRefGoogle Scholar
  5. 5.
    Medelyan, O.: Human-competitive automatic topic indexing. In: PhD thesis, University of Waikato, New Zealand (2009)Google Scholar
  6. 6.
    Medelyan, O., Frank, E., Witten, I.: Human-competitive tagging using automatic keyphrase extraction. In: Empirical Methods in Natural Language Processing, pp. 1318–1327 (2009)Google Scholar
  7. 7.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Journal of Information Processing and Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  8. 8.
    Stefanowski, J., Weiss, D.: Carrot2 and language properties in Web search results clustering. In: 1st International Atlantic Web Intelligence Conference. Lecture Notes in Computer Science, pp. 240–249 (2003)Google Scholar
  9. 9.
    Verma, M., Varma, V.: Applying key phrase extraction to aid invalidity search. In: 13th International Conference on Artificial Intelligence and Law, pp. 249–255 (2011)Google Scholar
  10. 10.
    Witten, I., Paynter, G., Frank, E., Gutwin, C., Nevill-Manning, C.: Kea: Practical automatic keyphrase extraction. In: 4th ACM conference on Digital Libraries, pp. 254–255 (1999)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Supaporn Tantanasiriwong
    • 1
  • Choochart Haruechaiyasak
    • 2
  • Sumanta Guha
    • 1
  1. 1.Computer Science and Information Management, School of Engineering and TechnologyAsian Institute of TechnologyThailand
  2. 2.Speech and Audio Technology Laboratory,National Electronics and Computer Technology CenterNational Science and Technology Development AgencyThailand

Personalised recommendations