Skip to main content

Computing with Words for Text Categorization

  • Chapter
Aspects of Automatic Text Analysis

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 209))

  • 851 Accesses

Abstract

We discuss the use of some elements of Zadeh’s computing with words and perceptions paradigms (cf. Zadeh and Kacprzyk [37, 38]) for the formulation and solution of automatic text document categorization. This problem is constantly gaining importance and popularity in view of a fast proliferation of textual information available on the Internet. The main issues addressed are the document representation and classification. The use of fuzzy logic for both problems has already been quite deeply studied though for the latter, i.e. classification, mainly in a more general context. Our approach is based mainly on the use of usuality qualification in the computing with words and perception paradigm that is technically handled by Zadeh’s classic calculus of linguistically quantified propositions [36]. Moreover, we employ results related to fuzzy (linguistic) queries in information retrieval, in particular various interpretations of weights of query terms. The methods developed are illustrated by example of a well known text corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Baeza-Yates and B. Ribeiro-Neto, editors. Modern Information Retrieval. Addison-Wesley, Reading, Massachusetts, 1999.

    Google Scholar 

  2. R. K. Belew and C. J. van Rijsbergen. Finding Out About: A Cognitive Perspective on Search Engine Technology and the WWW. Cambridge University Press, New York, NY, USA, 2000.

    MATH  Google Scholar 

  3. A. Bookstein. Fuzzy Requests: An Approach to Weighted Boolean Searches. Journal of the American Society for Information Sciences, 31:240–247, 1980.

    Google Scholar 

  4. G. Bordogna, P. Bosc, and G. Pasi. Extended Boolean Information Retrieval in Terms of Fuzzy Inclusion. In O. Pons, M. A. Vila, and J. Kacprzyk, editors, Knowledge Management in Fuzzy Databases, pages 234–246. Physica, Heidelberg, New York, 2000.

    Google Scholar 

  5. G. Bordogna, P. Carrara, and G. Pasi. Fuzzy Approaches to Extend Boolean Information Retrieval. In P. Bosc and J. Kacprzyk, editors, Fuzziness in Database Management Systems, pages 231–274. Physica, Heidelberg, 1995.

    Google Scholar 

  6. G. Bordogna and G. Pasi. Application of Fuzzy Sets Theory to Extend Boolean Information Retrieval. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval, pages 21–47. Physica, Heidelberg, New York, 2000.

    Google Scholar 

  7. C. Carlsson and R. Fuller. A New Look at Linguistic Importance Weighted Aggregation. In Proceedings of the Fourteenth European Meeting on Cybernetics and Systems Research, pages 169–174, Vienna, 1998. Austrian Society for Cybernetic Studies.

    Google Scholar 

  8. M. Delgado, J. L. Verdegay, and M. A. Vila. On Aggregation Operations of Linguistic Labels. International Journal of Intelligent System, 8:351–370, 1993.

    MATH  Google Scholar 

  9. D. Dubois, H. Fargier, and H. Prade. Beyond Min Aggregation in Multicriteria Decision: (Ordered) Weighted Min, Discri-min, Leximin. In R. R. Yager and J. Kacprzyk, editors, The Ordered Weighted Averaging Operators. Theory and Applications, pages 181–192. Kluwer Academic Publishers, Boston, Dordrecht, London, 1997.

    Google Scholar 

  10. D. Dubois and H. Prade. Using Fuzzy Sets in Flexible Querying: Why and How? In T. Andreasen, H. Christiansen, and H. L. Larsen, editors, Flexible Querying Answering Systems, pages 45–60. Kluwer Academic Publishers, Boston, Dordrecht, 1997.

    Google Scholar 

  11. E. Herrera-Viedma. An Information Retrieval System with Ordinal Linguistic Weighted Queries Based on Two Weighting Elements. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 9:77–88, 2001.

    Article  MATH  MathSciNet  Google Scholar 

  12. E. Herrera-Viedma. Modeling the Retrieval Process of an Information Retrieval System Using an Ordinal Fuzzy Linguistic Approach. Journal of the American Society for Information Science and Technology (JASIST), 52(6):460–475, 2001.

    Article  Google Scholar 

  13. T. Joachims. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. In Proceedings of ICML-97, 14th International Conference on Machine Learning, pages 143–151, Nashville, US, 1997. Morgan Kaufmann.

    Google Scholar 

  14. J. Kacprzyk and S. Zadrożny. Computing withWords in Intelligent Database Querying: Standalone and Internet-Based Applications. Information Sciences, 134:71–109, 2001.

    Article  MATH  Google Scholar 

  15. J. Kacprzyk, S. Zadrożny, and A. Ziółkowski. FQUERY III+: a “humanconsistent” database querying system based on fuzzy logic with linguistic quantifiers. Information Systems, 14:443–453, 1989.

    Article  Google Scholar 

  16. J. Kacprzyk and A. Ziółkowski. Database Queries with Fuzzy Linguistic Quantifiers. IEEE Transactions on Systems, Man and Cybernetics, 16:474–479, 1986.

    Article  Google Scholar 

  17. R. R. Korfhage. Information Storage and Retrieval. John Wiley and Sons, New York, 1997.

    Google Scholar 

  18. D. H. Kraft, G. Bordogna, and G. Pasi. An Extended Fuzzy Linguistic Approach to Generalize Boolean Information Retrieval. Journal of Information Sciences, 2(3):119–134, 1994.

    MATH  Google Scholar 

  19. D. H. Kraft, G. Bordogna, and G. Pasi. Fuzzy Set Techniques in Information Retrieval. In J. C. Bezdek, D. Dubois, and H. Prade, editors, Fuzzy Sets in Approximate Reasoning and Information Systems (The Handbook of Fuzzy Sets Vol. 3), pages 469–510. Kluwer Academic Publishers, Norwell, 1999.

    Google Scholar 

  20. D. D. Lewis. Reuters-21578, Dist. 1.0. online. http://www.research. att.com/~lewis.

  21. M. F. Porter. An Algorithm for Sufix Stripping. Program, 14(3):130–137, 1980.

    Google Scholar 

  22. T. Radecki. Fuzzy Set Theoretical Approach to Document Retrieval. Information Processing and Management, 15:247–260, 1979.

    Article  MATH  Google Scholar 

  23. J. Rocchio. Relevance Feedback in Information Retrieval. In G. Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313–323. Prentice-Hall Inc., 1971.

    Google Scholar 

  24. G. Salton, E. A. Fox, and H. Wu. Extended Boolean Information Retrieval. Communications of ACM, 26(11):1022–1036, 1983.

    Article  MATH  MathSciNet  Google Scholar 

  25. G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw Hill, New York, 1983.

    MATH  Google Scholar 

  26. F. Sebastiani. A Tutorial on Automated Text Categorisation. In Proceedings of ASAI-99, 1st Argentinian Symposium on Artificial Intelligence, pages 7–35, Buenos Aires, 1999.

    Google Scholar 

  27. Stop Words list. http://www.indiana.edu/cgi-bin-local/ doIsearch.pl?Stopwords.

  28. C. J. van Rijsbergen. Information Retrieval. Butterworths, London, Boston, 1979.

    Google Scholar 

  29. R. R. Yager. A Note on Weighted Queries in Information Retrieval Systems. Journal of the American Society for Information Science, 38:23–24, 1987.

    Article  Google Scholar 

  30. R. R. Yager. On Ordered Weighted Averaging Aggregation Operators in Multi-Criteria Decision Making. IEEE Transactions on Systems, Man and Cybernetics, 18:183–190, 1988.

    Article  MATH  MathSciNet  Google Scholar 

  31. R. R. Yager and J. Kacprzyk, editors. The Ordered Weighted Averaging Operators: Theory and Applications. Kluwer Academic Publishers, Boston, 1997.

    Google Scholar 

  32. Y. Yang. An Evaluation of Statistical Approaches to Text Categorization. Journal of Information Retrieval, 1(1/2):67–88, 1999.

    Google Scholar 

  33. Y. Yang. A Study on Thresholding Strategies for Text Categorization. In W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, editors, Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'01), pages 137–145, New Orleans, US, 2001. ACM.

    Chapter  Google Scholar 

  34. Y. Yang and X. Liu. A Re-examination of Text Categorization Methods. In M. A. Hearst, F. Gey, and R. Tong, editors, Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99), pages 42–49, Berkeley, US, 1999. ACM.

    Google Scholar 

  35. L. A. Zadeh. The Concept of Linguistic Variable and its Applications to Approximate Reasoning. Parts I, II, III. Information Sciences, 8, 9:199–251 (8), 301–357 (8), 43–80 (9), 1975.

    Article  MathSciNet  Google Scholar 

  36. L. A. Zadeh. A Computational Approach to Fuzzy Quantifiers in Natural Languages. Computers and Mathematics, 9:149–184, 1983.

    MATH  MathSciNet  Google Scholar 

  37. L. A. Zadeh and J. Kacprzyk, editors. Computing with Words in Information/ Intelligent Systems. Part 1: Foundations. Physica, Heidelberg, New York, 1999.

    Google Scholar 

  38. L. A. Zadeh and J. Kacprzyk, editors. Computing with Words in Information/ Intelligent Systems. Part 2: Applications. Physica, Heidelberg, New York, 1999.

    Google Scholar 

  39. S. Zadrożny, K. Ławcewicz, and J. Kacprzyk. Intelligent Linguistic Characterization and Retrieval of Textual Documents: An Internet-Based Application. In B. Bouchon-Meunier, L. Foulloy, and R. R. Yager, editors, Intelligent Systems for Information Processing — From Representation to Applications, pages 153–164. Elsevier, Amsterdam, 2003.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer

About this chapter

Cite this chapter

Kacprzyk, J., Zadrożny, S. (2007). Computing with Words for Text Categorization. In: Aspects of Automatic Text Analysis. Studies in Fuzziness and Soft Computing, vol 209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-37522-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-37522-7_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37520-3

  • Online ISBN: 978-3-540-37522-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics