Mining Knowledge from Text Collections Using Automatically Generated Metadata

Pierre, John M.

doi:10.1007/3-540-36277-0_47

John M. Pierre³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2569))

Included in the following conference series:

International Conference on Practical Aspects of Knowledge Management

731 Accesses
8 Citations

Abstract

Data mining is typically applied to large databases of highly structured information in order to discover new knowledge. In businesses and institutions, the amount of information existing in repositories of text documents usually rivals or surpasses the amount found in relational databases. Though the amount of potentially valuable knowledge contained in document collections can be great, they are often dificult to analyze. Therefore, it is important to develop methods to efficiently discover knowledge embedded in these document repositories. In this paper we describe an approach for mining knowledge from text collections by applying data mining techniques to metadata records generated via automated text categorization. By controlling the set of metadata fields as well as the set of assigned categories we can customize the knowledge discovery task to address specific questions. As an example, we apply the approach to a large collection of product reviews and evaluate the performance of the knowledge discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. A. Hearst. Untangling Text Data Mining. In Proceedings of ACL’99: the 37th Annual Meeting of the Association for Computational Linguistics, 1999.
Google Scholar
H. Ahonen and O. Heinonen. Applying Data Mining Techniques in Text Analysis. Report C-1997-23, University of Helsinki, Department of Computer Science, March 1997.
Google Scholar
R. Ghani, R. Jones, D. Mladenic, K. Nigam, and S. Slattery. Data Mining on Symbolic Knowledge Extracted from the Web. In Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000) Workshop on Text Mining, 29–36, 2000.
Google Scholar
U. Nahm and R. Mooney. Text Mining with Information Extraction. In Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases, 2002.
Google Scholar
Y. Yang and X. Liu. A re-examination of text categorization methods. In Proceedings of the 22nd Annual ACM SIGIR Conference on Research and Development in Information Retrieval, 42–49, 1999.
Google Scholar
J. English, M. Hearst, R. Sinha, K. Swearingen, K.-P. Yee. Flexible Search and Navigation using Faceted Metadata. Submitted for publication, 2002.
Google Scholar
A. McCallum and K. Nigam. A Comparison of Event Models for Naive Bayes Text Classification. In AAAI-98 Workshop on Learning for Text Categorization, 1998.
Google Scholar
D. Lewis. Evaluating Text Categorization. In Proceedings of the Speech and Natural Language Workshop, 312–318, 1991.
Google Scholar
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. Fayyad et al., editors, Advances in Knowledge Discovery and Data Mining, 307–328. AAAI Press, 1996.
Google Scholar
C. Borgelt. Apriori. http://fuzzy.cs.uni-magdeburg.de/~borgelt/apriori/apriori.html
R. Feldman, M. Fresko, H. Hirsh, Y. Aumann, O. Liphstat, Y. Schler, M. Rajman. Knowledge Management: A Text Mining Approach. In Proceedings of the 2nd International Conference on Practical Aspects of Knowledge Management (PAKM98), 29–30, 1998.
Google Scholar
S. Loh, L. Wives, J. P. M. de Oliveira. Concept-based Knowledge Discovery in Texts Extracted from the Web. SIGKDD Explorations, 2(1): 29–39, 2000.
Article Google Scholar
S. Basu, R. J. Mooney, K. V. Pasupuleti, and J. Ghosh. Evaluting the Novelty of Text-Mined Rules Using Lexical Knowledge. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001), 233–238, 2001.
Google Scholar
J. Han and Y. Fu. Discovery of Multiple-Level Association Rules from Large Databases. In Proceedings of the 21st VLDB Conference, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Interwoven, Inc., 101 2nd Street, 4th Floor, 94105, San Francisco, CA
John M. Pierre

Authors

John M. Pierre
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Knowledge Engineering, University of Vienna, Brünner Str. 72, 1210, Vienna, Austria
Dimitris Karagiannis
Business Operation Systems, Esslenstr. 3, 8280, Kreuzlingen, Switzerland
Ulrich Reimer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pierre, J.M. (2002). Mining Knowledge from Text Collections Using Automatically Generated Metadata. In: Karagiannis, D., Reimer, U. (eds) Practical Aspects of Knowledge Management. PAKM 2002. Lecture Notes in Computer Science(), vol 2569. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36277-0_47

Download citation

DOI: https://doi.org/10.1007/3-540-36277-0_47
Published: 16 December 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00314-4
Online ISBN: 978-3-540-36277-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics