Weighting of Noun Phrases Based on Local Frequency of Nouns

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 700)

Abstract

The tf-idf is a well-known weighting measure for words in texts. It measures both the frequency and the locality of words. It is often used for information retrieval and text mining. However, a lot of infrequent words have the same tf-idf value. In this study, the words are noun phrases. This paper proposes a novel weighting measure for noun phrases in texts by using the local frequency of nouns that construct a noun phrase. The proposed measure is calculated by combining the tf-idf of a noun phrase and the average of the difference between its frequency and the frequency of nouns within the phrase. The proposed measure was evaluated in experiments on the datasets of 19,997 newsgroup texts written in English and 206 Wikipedia pages written in Japanese. The experiments showed that the number of noun phrases with the same proposed measure is less than the number of noun phrases with the same tf-idf.

Keywords

Term weighting Noun phrase Information retrieval Text mining 

Notes

Acknowledgements

This work was supported by JSPS KAKENHI Grant Numbers 15K00426.

References

  1. 1.
    Salton, G., McGill, J.M.: Introduction to Modern Information Retrieval. McGraw-Hill Inc, New York (1983)MATHGoogle Scholar
  2. 2.
    Zipf, G.K.: The Psychobiology of Language. Routledge, London (1936)Google Scholar
  3. 3.
    Home Page for 20 Newsgroups Data Set. http://qwone.com/~jason/20Newsgroups/. Accessed 28 June 2017
  4. 4.
    Manning, D., Raghavan, P., Shūtza, H.: An Introduction to Information Retrieval. Cambridge University Press (2008)Google Scholar
  5. 5.
    Rousseau, F., Vazirgiannis, M.: Composition of TF normalizations: new insights on scoring functions for ad hoc IR. In: 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 917–920. ACM, New York (2013)Google Scholar
  6. 6.
    Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: 3rd Text REtrieval Conference, pp. 109–126 (1994)Google Scholar
  7. 7.
    Trotman, A., Puurula, A., Burgess, B.: Improvements to BM25 and language models examined. In: 2014 Australasian Document Computing Symposium, pp. 58–65. ACM, New York (2014)Google Scholar
  8. 8.
    Lipani, A., Lupu, M., Hanbury, A., Aizawa, A.: Verboseness fission for BM25 document length normalization. In: 2015 International Conference on The Theory of Information Retrieval, pp. 385–388. ACM, New York (2015)Google Scholar
  9. 9.
    Kita, K., Kato, Y., Omoto, T., Yano, Y.: A comparative study of automatic extraction of collocations from corpora: mutual information vs. cost criteria. J. Nat. Lang. Process. 1(1), 21–33 (1994)CrossRefGoogle Scholar
  10. 10.
    Frantzi, K.T., Ananiadou, S.: Extracting nested collocations. In: 16th Conference on Computational Linguistics, vol. 1, pp. 41–46. Association for Computational Linguistics, Stroudsburg (1996)Google Scholar
  11. 11.
    Li, S., Li, J., Song, T., Li, W., Chang, B.: A novel topic model for automatic term extraction. In: 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 885–888. ACM, New York (2013)Google Scholar
  12. 12.
    Astrakhantsev, N.A., Fedorenko, D.G., Turdakov, DYu.: Methods for automatic term recognition in domain-specific text collections: a survey. J. Program. Comput. Softw. 41(6), 336–349 (2015)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Kathait, S.S., Tiwari, S., Varshney, A., Sharma, A.: Unsupervised key-phrase extraction using noun phrases. Int. J. Comput. Appl. 162(1), 1–5 (2017)Google Scholar
  14. 14.
    Yamada, Y., Nakatoh, T., Baba, K., Ikeda, D.: Mining pure patterns in texts. In: 2012 IIAI International Conference on Advanced Applied Informatics, pp. 285–290 (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Yasuhiro Yamada
    • 1
  • Yuusuke Himeno
    • 2
  • Tetsuya Nakatoh
    • 3
  1. 1.Interdisciplinary Graduate School of Science and EngineeringShimane UniversityMatsue-shiJapan
  2. 2.Interdisciplinary Faculty of Science and EngineeringShimane UniversityMatsue-shiJapan
  3. 3.Research Institute for Information TechnologyKyushu UniversityNishi-kuJapan

Personalised recommendations