Skip to main content

Mining Knowledge from Text Collections Using Automatically Generated Metadata

  • Conference paper
  • First Online:
Practical Aspects of Knowledge Management (PAKM 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2569))

Included in the following conference series:

Abstract

Data mining is typically applied to large databases of highly structured information in order to discover new knowledge. In businesses and institutions, the amount of information existing in repositories of text documents usually rivals or surpasses the amount found in relational databases. Though the amount of potentially valuable knowledge contained in document collections can be great, they are often dificult to analyze. Therefore, it is important to develop methods to efficiently discover knowledge embedded in these document repositories. In this paper we describe an approach for mining knowledge from text collections by applying data mining techniques to metadata records generated via automated text categorization. By controlling the set of metadata fields as well as the set of assigned categories we can customize the knowledge discovery task to address specific questions. As an example, we apply the approach to a large collection of product reviews and evaluate the performance of the knowledge discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. A. Hearst. Untangling Text Data Mining. In Proceedings of ACL’99: the 37th Annual Meeting of the Association for Computational Linguistics, 1999.

    Google Scholar 

  2. H. Ahonen and O. Heinonen. Applying Data Mining Techniques in Text Analysis. Report C-1997-23, University of Helsinki, Department of Computer Science, March 1997.

    Google Scholar 

  3. R. Ghani, R. Jones, D. Mladenic, K. Nigam, and S. Slattery. Data Mining on Symbolic Knowledge Extracted from the Web. In Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000) Workshop on Text Mining, 29–36, 2000.

    Google Scholar 

  4. U. Nahm and R. Mooney. Text Mining with Information Extraction. In Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases, 2002.

    Google Scholar 

  5. Y. Yang and X. Liu. A re-examination of text categorization methods. In Proceedings of the 22nd Annual ACM SIGIR Conference on Research and Development in Information Retrieval, 42–49, 1999.

    Google Scholar 

  6. J. English, M. Hearst, R. Sinha, K. Swearingen, K.-P. Yee. Flexible Search and Navigation using Faceted Metadata. Submitted for publication, 2002.

    Google Scholar 

  7. A. McCallum and K. Nigam. A Comparison of Event Models for Naive Bayes Text Classification. In AAAI-98 Workshop on Learning for Text Categorization, 1998.

    Google Scholar 

  8. D. Lewis. Evaluating Text Categorization. In Proceedings of the Speech and Natural Language Workshop, 312–318, 1991.

    Google Scholar 

  9. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. Fayyad et al., editors, Advances in Knowledge Discovery and Data Mining, 307–328. AAAI Press, 1996.

    Google Scholar 

  10. C. Borgelt. Apriori. http://fuzzy.cs.uni-magdeburg.de/~borgelt/apriori/apriori.html

  11. R. Feldman, M. Fresko, H. Hirsh, Y. Aumann, O. Liphstat, Y. Schler, M. Rajman. Knowledge Management: A Text Mining Approach. In Proceedings of the 2nd International Conference on Practical Aspects of Knowledge Management (PAKM98), 29–30, 1998.

    Google Scholar 

  12. S. Loh, L. Wives, J. P. M. de Oliveira. Concept-based Knowledge Discovery in Texts Extracted from the Web. SIGKDD Explorations, 2(1): 29–39, 2000.

    Article  Google Scholar 

  13. S. Basu, R. J. Mooney, K. V. Pasupuleti, and J. Ghosh. Evaluting the Novelty of Text-Mined Rules Using Lexical Knowledge. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001), 233–238, 2001.

    Google Scholar 

  14. J. Han and Y. Fu. Discovery of Multiple-Level Association Rules from Large Databases. In Proceedings of the 21st VLDB Conference, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pierre, J.M. (2002). Mining Knowledge from Text Collections Using Automatically Generated Metadata. In: Karagiannis, D., Reimer, U. (eds) Practical Aspects of Knowledge Management. PAKM 2002. Lecture Notes in Computer Science(), vol 2569. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36277-0_47

Download citation

  • DOI: https://doi.org/10.1007/3-540-36277-0_47

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00314-4

  • Online ISBN: 978-3-540-36277-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics