Abstract
In this paper we investigate the applicability of a Rough Set model and method to discover maximal associations from a collection of text documents, and compare its applicability with that of the maximal association method. Both methods are based on computing co-occurrences of various sets of keywords, but it has been shown that by using the Rough Set method, rules discovered are similar to maximal association rules, and it is much simpler than the maximal association method. In addition, we also present an alternative strategy to taxonomies required in the above methods, instead of building taxonomies based on labelled document collections themselves. This is to effectively utilise ontologies which will increasingly be deployed on the Internet.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal R, Imielinski T and Swami A. Mining Association Rules between Sets of Items in Large Databases. Proceedings of the ACM SIGMOD conference, pp207–216, Washington D.C., 1993.
Feldman, R., Aumann, Y., Amir, A., Zilberstein, A., Klösgen, W.: Maximal Association Rules: A New Tool for Mining for Keyword Co-Occurrences in Document Collections. pp 167–170, 1997.
Srikant, R. and Agrawal, R. Mining Generalized Association Rules. Proc. of the 21st Int’l Conference on Very Large Databases, Zurich, Switzerland, Sep. 1995.
John D. Holt, Soon Myoung Chung: Multipass Algorithms for Mining Association Rules in Text Databases. Knowledge and Information Systems 3 (2): pp l68 – 183, 2001.
Reuters-21578,http://www.research.att.com/~lewis/reuters21578.html, (April 2002).
Feldman, R., Fresko, M., Kinar, Y., Lindell, Y., Liphstat, O., Rajman, M., Schler, Y., Zamir, O.: Text Mining at the Term Level. pp 65–73, 1998.
Gruber, T. A translation Approach to Portable Ontology Specifications. Knowledge Acquisition. Vol 5, 1993.
WordNet,www.cogsci.princeton.edu/~wn, (April 2002)
UN Classifications Registry,esa.un.org/unsd/cr/registry, (April 2002)
RAMON,europa.eu.int/comm/eurostat/ramon, (April 2002).
McGuinness, D.L. Ontologies Come of Age. To appear in D Fensel, J Hendler, H Lieberman, and W Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2002.
Pawlak Z. Rough Set: Theoretical Aspects of Reasoning About Data. Kluwer Academic, 1991.
Bi, Y., Anderson, T. and McClean, S. Rule Generation Based on Rough Set Theory for Text Classification. Twentieth SGES International Conference on KBS and Applied Al. pp 101–112, 2000.
Ahonen-Myka, H. Finding All Frequent Maximal Sequences in Text. Proceedings of the 16th International Conference on Machine Learning ICML-99 Workshop on Machine Learning in Text Data Analysis, eds. D. Mladenic and M. Grobelnik, pp. 11–17, J. Stefan Institute, Ljubljana, 1999.
Hotho, A., Mädche, A., Staab, S.: Ontology-based Text Clustering, Workshop “Text Learning: Beyond Supervision”, IJCAI 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag London Limited
About this paper
Cite this paper
Bi, Y., Anderson, T., McClean, S. (2003). A Rough Set Model with Ontological Information for Discovering Maximal Association Rules in Document Collections. In: Bramer, M., Preece, A., Coenen, F. (eds) Research and Development in Intelligent Systems XIX. Springer, London. https://doi.org/10.1007/978-1-4471-0651-7_2
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0651-7_2
Publisher Name: Springer, London
Print ISBN: 978-1-85233-674-5
Online ISBN: 978-1-4471-0651-7
eBook Packages: Springer Book Archive