Skip to main content

An Approach for Text Mining Based on Noun Phrases

  • Conference paper
  • First Online:
  • 1511 Accesses

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 39))

Abstract

The use of noun phrases as descriptors for text mining vectors has been proposed to overcome the poor semantic of the traditional bag-of-words (BOW). However, the solutions found in the literature are unsatisfactory, mainly due to the use of static definitions for noun phrases and the fact that noun phrases per se do not enable an adequate relevance representation since they are expressions that barely repeat. We present an approach to deal with these problems by (i) introducing a process that enables the definition of noun phrases interactively and (ii) considering similar noun phrases as a unique term. A case study compares both approaches, the one proposed in this paper and the other based on BOW. The main contribution of this paper is the improvement of the preprocessing phase of text mining, leading to better results in the overall process.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Weiss, S.M., Indurkhya, N., Zhang, T., Damerau, F.L.: Text Mining: Predictive Methods for Analyzing Unstructured Information. Springer, New York (2005)

    Google Scholar 

  2. Zanasi, D. (ed.): Text Mining and Its Applications to Intelligence, CRM and knowledge management. Advances in Management Information, WIT Press, Southampton (2007)

    Google Scholar 

  3. Berry, M.W., Castellanos, M. (eds.): Survey of Text Mining: Clustering, Classification, and Retrieval, 2nd Edn. Springer, New York (2007)

    Google Scholar 

  4. Konchady, M.: Text mining application programming. Cengage Learning (2006)

    Google Scholar 

  5. Lopes, M.C.S.: Textual data mining using clustering techniques for Portuguese language. Ph.D. Thesis, COPPE/UFRJ, Rio de Janeiro, Brazil (2004) (in Portuguese)

    Google Scholar 

  6. Furtado, M.I.V.: Business Intelligence for the private higher education: a text mining approach. Ph.D. Thesis, COPPE/UFRJ, Rio de Janeiro, Brazil (2004) (in Portuguese)

    Google Scholar 

  7. Gelbukh, A.: Computational linguistics and intelligent text processing. In: Second International Conference—CICLing, Mexico City, Mexico, Springer (2001)

    Google Scholar 

  8. Kuramoto, H.: Proposing a system of information retrieval by computer assisted with application to Portuguese. PhD thesis, University Lumière, Lyon, France (1999) (in French)

    Google Scholar 

  9. Silva, C.F., Vieira, R.: Grammatical and syntactic groups in automatic categorization with support vector machines. In: XXV SBC, V ENIA (2005) (in Portuguese)

    Google Scholar 

  10. Levenshtein, V.: Binary codes capable of correcting spurious insertions and deletions of ones. Probl. Inf. Transm. 1, 8–17 (1965)

    Google Scholar 

  11. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–48 (2001)

    Article  Google Scholar 

  12. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston (1999)

    Google Scholar 

  13. Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2007)

    Google Scholar 

  14. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Google Scholar 

  15. Bräscher, M.: Automatic processing of ambiguity in information retrieval. Ph.D. Thesis, UnB, Brasília, Brazil (1999) (in Portuguese)

    Google Scholar 

  16. Silva, C.F., Vieira, R.: Categorization of Portuguese texts with decision trees, SVM and linguistic information. In: XXVII SBC, TIL—V Workshop em Tecnologia da Informação e da Linguagem Humana. Rio de Janeiro, Brazil (2007) (in Portuguese)

    Google Scholar 

  17. Ratnaparkhi, A.: A maximum entropy part-of-speech tagger. In: Conference on Empirical Methods in Natural Language Processing, pp. 133–142. University of Pennsylvania (1996)

    Google Scholar 

  18. Aires, R.V.X.: Implementation, adaptation, combination and evaluation of taggers for Portuguese of Brazil. MsC Thesis, USP, São Carlos, Brazil, 2000 (in Portuguese)

    Google Scholar 

  19. Mac-Morpho: Corpora in Portuguese for tagger MXPost (in Portuguese)

    Google Scholar 

  20. Matsunaga, L.A.: An automatic text categorization methodology for distribution of bills to standing committees of the legislative chamber of the Brazilian Federal District. Ph.D. Thesis, COPPE/UFRJ, Rio de Janeiro, RJ, Junho, 2007 (in Portuguese)

    Google Scholar 

  21. Feldman, R., Dagan, I.: Knowledge discovery in textual databases (KDT). In: KDD-95, Montreal, pp. 112–117. AAAI Press, Canada, 20–21 Aug 1995

    Google Scholar 

  22. Hearst, M.A.: Untangling text data mining. In: 37th Annual Meeting of the Association for Computational Linguistics, pp. 3–10 (1999)

    Google Scholar 

  23. Silva, C.F., Vieira, R., Osório, F.S.: Use of linguistic information in text categorization using neural networks. In: VIII SBRN, pp. 1–6 (2004) (in Portuguese)

    Google Scholar 

  24. Metzler, D.: Generalized inverse document frequency. In: 17th ACM Conference on Information and Knowledge Management—CIKM. Napa Valley, California, USA (2008)

    Google Scholar 

  25. Sirmakessis, S. (ed.): Text mining and its applications: Results of the NEMIS launch conference, vol. 138. Studies in Fuzziness and Soft Computing, Springer (2004)

    Google Scholar 

  26. Meij, J. (ed.): Dealing with the data flood: mining data, text and multimedia. STT/Beweton, The Hague (2002)

    Google Scholar 

  27. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)

    Article  Google Scholar 

  28. Papineni, K.: Why inverse document frequency? In: Second Meeting of the North American Chapter of the NAACL. Association for Computational Linguistics (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hercules Antonio do Prado .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Pinheiro, M.S., do Prado, H.A., Ferneda, E., Ladeira, M. (2015). An Approach for Text Mining Based on Noun Phrases. In: Neves-Silva, R., Jain, L., Howlett, R. (eds) Intelligent Decision Technologies. IDT 2017. Smart Innovation, Systems and Technologies, vol 39. Springer, Cham. https://doi.org/10.1007/978-3-319-19857-6_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19857-6_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19856-9

  • Online ISBN: 978-3-319-19857-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics