Meta-learning Models for Automatic Textual Document Categorization

Lai⋆, Kwok-Yin; Lam, Wai

doi:10.1007/3-540-45357-1_11

Kwok-Yin Lai⋆⁴ &
Wai Lam⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2035))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1307 Accesses

Abstract

We investigate two meta-model approaches for the task of automatic textual document categorization. The first approach is the linear combination approach. Based on the idea of distilling the characteristics of how we estimate the merits of each component algorithm, we propose three different strategies for the linear combination approach. The linear combination approach makes use of limited knowledge in the training document set. To address this limitation, we propose the second meta-model approach, called Meta-learning Using Document Feature characteristics (MUDOF), which employs a meta-learning phase using document feature characteristics. Document feature characteristics, derived from the training document set, capture some inherent properties of a particular category. Extensive experiments have been conducted on a real-world document collection and satisfactory performance is obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

P.K. Chan and S.J. Stolfo. Comparative evaluation of voting and meta-learning on partitioned data. In Proceedings of the International Conference on Machine Learning (ICML’95), pages 90–98, 1995.
Google Scholar
S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In Proceedings of the Seventh International Conference on Information and Knowledge Management, pages 148–155, 1998.
Google Scholar
D.A. Hull, J.O. Pedersen, and H. Schutze. Method combination for document filtering. In Proceedings of the Nineteenth International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 279–287, 1996.
Google Scholar
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning (ECML’98), pages 137–142, 1998.
Google Scholar
W. Lam and C.Y. Ho. Using a generalized instance set for automatic text categorization. In Proceedings of the Twenty-First International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 81–89, 1998.
Google Scholar
W. Lam, K.F. Low, and C.Y. Ho. Using a Bayesian network induction approach for text categorization. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, (IJCAI), Nagoya, Japan, pages 745–750, 1997.
Google Scholar
L.S. Larkey and W.B. Croft. Combining classifiers in text categorization. In Proceedings of the Nineteenth International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 289–297, 1996.
Google Scholar
D.D. Lewis, R.E. Schapore, J.P. Call, and R. Papka. Training algorithms for linear text classifiers. In Proceedings of the Nineteenth International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 298–306, 1996.
Google Scholar
V. Vapnic. The Nature of Statistical Learning Theory. Springer, New York, 1995.
Google Scholar
Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In Proceedings of the Seventeenth International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 13–22, 1994.
Google Scholar
Y. Yang, T. Ault, and T. Pierce. Combining multiple learning strategies for effective cross validation. In Proceedings of the International Conference on Machine Learning (ICML 2000), pages 1167–1174, 2000.
Google Scholar
Y. Yang and C.D. Chute. An example-based mapping method for text categorization and retrieval. ACM Transactions on Information Systems, 12(3):252–277, 1994.
Article Google Scholar
Y. Yang and X. Liu. A re-examination of text categorization methods. In Proceedings of the Twenty-First International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 42–49, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Ho Sin Hang Engineering Building, Shatin, Hong Kong
Kwok-Yin Lai⋆ & Wai Lam

Authors

Kwok-Yin Lai⋆
View author publications
You can also search for this author in PubMed Google Scholar
Wai Lam
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Information Systems, The University of Hong Kong, Pokfulam, Hong Kong China
David Cheung
CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, ACT 2601, Australia
Graham J. Williams
Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave., Kowloon, Hong Kong China
Qing Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lai⋆, KY., Lam, W. (2001). Meta-learning Models for Automatic Textual Document Categorization. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_11

Download citation

DOI: https://doi.org/10.1007/3-540-45357-1_11
Published: 11 April 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41910-5
Online ISBN: 978-3-540-45357-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics