A Fast Subspace Text Categorization Method Using Parallel Classifiers

Tripathi, Nandita; Oakes, Michael; Wermter, Stefan

doi:10.1007/978-3-642-28601-8_12

Nandita Tripathi¹⁷,
Michael Oakes¹⁷ &
Stefan Wermter¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7182))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1355 Accesses
1 Citations

Abstract

In today’s world, the number of electronic documents made available to us is increasing day by day. It is therefore important to look at methods which speed up document search and reduce classifier training times. The data available to us is frequently divided into several broad domains with many sub-category levels. Each of these domains of data constitutes a subspace which can be processed separately. In this paper, separate classifiers of the same type are trained on different subspaces and a test vector is assigned to a subspace using a fast novel method of subspace detection. This parallel classifier architecture was tested with a wide variety of basic classifiers and the performance compared with that of a single basic classifier on the full data space. It was observed that the improvement in subspace learning was accompanied by a very significant reduction in training times for all types of classifiers used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Friedman, J.H.: On Bias, Variance, 0/1—Loss, and the Curse-of- Dimensionality. Data Mining and Knowledge Discovery 1(1), 55–77 (1997)
Article Google Scholar
Parsons, L., Haque, E., Liu, H.: Subspace Clustering for High Dimensional Data: A Review. ACM SIGKDD Explorations Newsletter 6(1), 90–105 (2004)
Article Google Scholar
Varshney, K.R., Willsky, A.S. : Learning dimensionality-reduced classifiers for information fusion. In: Proceedings of the 12th International Conference on Information Fusion, pp. 1881–1888 (July 2009)
Google Scholar
Fradkin, D., Madigan, D.: Experiments with Random Projections for Machine Learning. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–522 (2003)
Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Article Google Scholar
Yaslan, Y., Cataltepe, Z.: Co-training with relevant random subspaces. Neurocomputing 73, 1652–1661 (2010)
Article Google Scholar
Garcia-Pedrajas, N., Ortiz-Boyer, D.: Boosting Random Subspace Method, vol. 21, pp. 1344–1362 (2008)
Google Scholar
Kotsiantis, S.B.: Local Random Subspace Method for constructing multiple decision stumps. In: International Conference on Information and Financial Engineering, pp. 125–129 (2009)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MathSciNet MATH Google Scholar
Schapire, R.E.: The boosting approach to machine learning: An overview. In: Nonlinear Estimation and Classification. Lecture Notes in Statist., vol. 171, pp. 149–171. Springer, New York (2003)
Google Scholar
Al-Kofahi, K., et al.: Combining multiple classifiers for text categorization. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM 2001, pp. 97–104 (2001)
Google Scholar
Ruiz, M.G., Srinivasan, P.: Hierarchical Neural Networks for Text Categorization. In: SIGIR 1999 (1999)
Google Scholar
Estabrooks, A., Japkowicz, N.: A mixture-of-experts framework for text classification. In: Proceedings of the 2001 Workshop on Computational Natural Language Learning, Toulouse, France, July 6-7, vol. 7, pp. 1–8 (2001)
Google Scholar
Tripathi, N., et al.: Semantic Subspace Learning with Conditional Significance Vectors. In: Proceedings of the IEEE International Joint Conference on Neural Networks, Barcelona, pp. 3670–3677 (July 2010)
Google Scholar
Wermter, S., Panchev, C., Arevian, G.: Hybrid Neural Plausibility Networks for News Agents. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence, pp. 93–98 (1999)
Google Scholar
Wermter, S.: Hybrid Connectionist Natural Language Processing. Chapman and Hall (1995)
Google Scholar
Rose, T., Stevenson, M., Whitehead, M.: The Reuters Corpus Volume 1 - from Yesterday’s News to Tomorrow’s Language Resources. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002), pp. 827–833 (2002)
Google Scholar
Zeimpekis, D., Gallopoulos, E.: Generating Term Document Matrices from Text Collections. In: Kogan, J., Nicholas, C. (eds.) Grouping Multidimensional Data: Recent Advances in Clustering, Springer, Heidelberg (2005)
Google Scholar
Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Google Scholar
Hall, M., et al.: The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)
Article Google Scholar
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Google Scholar
Verma, B.: Fast training of multilayer perceptrons. IEEE Transactions on Neural Networks 8(6), 1314–1320 (1997)
Article Google Scholar
Zhang, H., Su, J.: Naive Bayes for Optimal ranking. Journal of Experimental and Theoretical Artificial Intelligence 20(2), 79–93 (2008)
Article MATH Google Scholar
Pernkopf, F.: Discriminative learning of Bayesian network classifiers. In: Proceedings of the 25th IASTED International Multi-Conference: Artificial Intelligence and Applications, pp. 422–427 (2007)
Google Scholar
Frank, E., Witten, I.H.: Generating Accurate Rule Sets Without Global Optimization. In: Shavlik, J. (ed.) Machine Learning: Proceedings of the Fifteenth International Conference. Morgan Kaufmann Publishers (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, Engineering and Technology, University of Sunderland, St. Peters Way, Sunderland, SR6 0DD, United Kingdom
Nandita Tripathi & Michael Oakes
Institute for Knowledge Technology, Department of Computer Science, University of Hamburg, Vogt Koelln, Str. 30, 22527, Hamburg, Germany
Stefan Wermter

Authors

Nandita Tripathi
View author publications
You can also search for this author in PubMed Google Scholar
Michael Oakes
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Wermter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research (CIC), National Polytechnic Institute (IPN), Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tripathi, N., Oakes, M., Wermter, S. (2012). A Fast Subspace Text Categorization Method Using Parallel Classifiers. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-28601-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28600-1
Online ISBN: 978-3-642-28601-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics