Text mining at the term level

Feldman, Ronen; Fresko, Moshe; Kinar, Yakkov; Lindell, Yehuda; Liphstat, Orly; Rajman, Martin; Schler, Yonatan; Zamir, Oren

doi:10.1007/BFb0094806

Ronen Feldman¹,
Moshe Fresko¹,
Yakkov Kinar¹,
Yehuda Lindell¹,
Orly Liphstat¹,
Martin Rajman²,
Yonatan Schler¹ &
…
Oren Zamir³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1510))

Included in the following conference series:

European Symposium on Principles of Data Mining and Knowledge Discovery

1516 Accesses
67 Citations

Abstract

Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. Previous work in text mining focused at the word or the tag level. This paper presents an approach to performing text mining at the term level. The mining process starts by preprocessing the document collection and extracting terms from the documents. Each document is then represented by a set of terms and annotations characterizing the document. Terms and additional higher-level entities are then organized in a hierarchical taxonomy. In this paper we will describe the Term Extraction module of the Document Explorer system, and provide experimental evaluation performed on a set of 52,000 documents published by Reuters in the years 1995–1996.

Download to read the full chapter text

Chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Anand, T.; Kahn, G.: Opportunity Explorer: Navigating Large Databases Using Knowledge Discovery Templates. In: Proceedings of the 1993 workshop on Knowledge Discovery in Databases, (1993).
Google Scholar
Bookstein, A.; Klein, S.T.; Raita, T.: Clumping Properties of Content-Bearing Words. In: Proceedings of International Conference on Research and Development in Information Retrieval (SIGIR), (1995).
Google Scholar
Brachman, R.J.; Selfridge, P.G.; Terveen, L.G.; Altman, B.; Borgida, A.; Halper, F.; Kirk, T.; Lazar, A.; McGuinness, D.L.; Resnick, L.A.: Integrated Support for Data Archaeology. International Journal of Intelligent and Cooperative Information Systems, (1993)2(2):159–185.
Article Google Scholar
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics, (1995) 21(4):543–565.
Google Scholar
Church, K.W.; Hanks, P.: Word Association Norms, Mutual Information, and Lexicography. Computational Linguistics, (1990) 16(1):22–29.
Google Scholar
Cohen, W.; Singer, Y.: Context Sensitive Learning Methods for Text categorization. In: Proceedings of International Conference on Research and Development in Information Retrieval (SIGIR), (1996).
Google Scholar
Dagan, I.; Church K.W.: Termight: Identifying and Translating Technical Terminology. In: Proceedings of the European Chapter of the Association for Computational Linguistics, EACL, (1994) 34–40.
Google Scholar
Daille, B.; Gaussier, E.; Lange, J.M.: Towards Automatic Extraction of Monolingual and Bilingual Terminology. In: Proceedings of the International Conference on Computational Linguistics (COLING), (1994) 515–521.
Google Scholar
Daille, B.: Study and Implementation of Combined Techniques for Automatic Extraction of Terminology. In: Resnik, P.; Klavans, J. (eds.): The Balancing Act: Combining Symbolic and Statistical Approaches to Language, MIT Press, Cambridge, MA, USA, (1996) 49–66.
Google Scholar
Dunning, T.: Accurute Methods for the Statistics of Surprise and Coincidence. Computational Linguistics, (1993) 19(1).
Google Scholar
Google Scholar
Feldman, R.; Hirsh, H.: Exploiting Background Information in Knowledge Discovery from Text. Journal of Intelligent Information Systems, (1996).
Google Scholar
Feldman, R.; Aumann, Y.; Amir, A.; Klösgen, W.; Zilberstien, A.: Maximal Association Rules: a New Tool for Mining for Keyword co-occurrences in Document Collections. In: Proceedings of the 3rd International Conference on Knowledge Discovery (KDD), (1997).
Google Scholar
Feldman, R.; Dagan, I.: KDT—Knowledge Discovery in Texts. In: Proceedings of the First International Conference on Knowledge Discovery (KDD), (1995).
Google Scholar
Frantzi, T.K.; Incorporating Context Information for the Extraction of Terms. In: Proceedings of ACL-EACL, (1997).
Google Scholar
Frawley, W.J.; Piatetsky-Shapiro, G.; Matheus, C.J.: Knowledge Discovery in Databases: an Overview. In: Piatetsky-Shapiro, G.; Frawley, W. J. (eds.): Knowledge Discovery in Databases, MIT Press, (1991), 1–27.
Google Scholar
Gale, W.A.; Church, K.W.: Concordances for parallel texts. In: Proceedings of the 7^th Annual Conference of the UW Centre for the New OED and Text Research, Using Corpora, (1991) 40–62.
Google Scholar
Hull, D.: Stemming algorithms—a case study for detailed evaluation. Journal of the American Society for Information Science, (1996) 47(1):70–84.
Article Google Scholar
Justeson, J.S.; Katz, S.M.: Technical Terminology: Some linguistic properties and an algorithm for identification in text. Natural Language Engineering, (1995) 1(1):9–27.
Article Google Scholar
Klösgen, W.: Problems for Knowledge Discovery in Databases and their treatment in the Statistics Interpreter EXPLORA. International Journal for Intelligent Systems, (1992) 7(7):649–673.
MATH Google Scholar
Klösgen, W.: Efficient Discovery of Interesting Statements. The Journal of Intelligent Information Systems, (1995) 4(1).
Google Scholar
Google Scholar
Lent, B.; Agrawal, R.; Srikant, R.: Discovering Trends in Text Databases. In: Proceedings of the 3^rd International Conference on Knowledge Discovery (KDD), (1997).
Google Scholar
Rajman, M.; Besançon, R.: Text Mining: Natural Language Techniques and Text Mining Applications. In: Proceedings of the seventh IFIP 2.6 Working Conference on Database Semantics (DS-7), Chapam & Hall IFIP Proceedings serie, (1997) Oct 7–10.
Google Scholar
Salton, G.; Buckley, C.: Term-weighting Approaches in Automatic Text Retrieval. Information Processing and Management, (1998) 24(5):513–523.
Article Google Scholar
Srikant, R.; Agrawal, R.: Mining generalized association rules. In: Proceedings of the 21^st Very Large Databases (VLDB), (1995).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Bar-Ilan University, Ramat-Gan, Israel
Ronen Feldman, Moshe Fresko, Yakkov Kinar, Yehuda Lindell, Orly Liphstat & Yonatan Schler
Artificial Intelligence Laboratory (LIA), Swiss Federal Institute of Technology, Lausanne, Switzerland
Martin Rajman
Department of Computer Science, University of Washington, Seattle, WA
Oren Zamir

Authors

Ronen Feldman
View author publications
You can also search for this author in PubMed Google Scholar
Moshe Fresko
View author publications
You can also search for this author in PubMed Google Scholar
Yakkov Kinar
View author publications
You can also search for this author in PubMed Google Scholar
Yehuda Lindell
View author publications
You can also search for this author in PubMed Google Scholar
Orly Liphstat
View author publications
You can also search for this author in PubMed Google Scholar
Martin Rajman
View author publications
You can also search for this author in PubMed Google Scholar
Yonatan Schler
View author publications
You can also search for this author in PubMed Google Scholar
Oren Zamir
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jan M. Żytkow Mohamed Quafafou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feldman, R. et al. (1998). Text mining at the term level. In: Żytkow, J.M., Quafafou, M. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1998. Lecture Notes in Computer Science, vol 1510. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0094806

Download citation

DOI: https://doi.org/10.1007/BFb0094806
Published: 19 October 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65068-3
Online ISBN: 978-3-540-49687-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics