Abstract
In this paper, we propose a multi-dimensional category model (MDCM) for classifying multi-dimensional text collection. We can parallel and distribute the process of text classification in separately on each dimension. With this model, performance of classifiers improves in both accuracy and time complexity. For classification accuracy, some benefits can be obtained. Classifiers learn from larger training documents with a small number of classes on each dimension. We can select the best classifier for each dimension and combine the results from them. For time complexity, the learning and classifying phases can be in parallel and distributed manner. The efficiency of MDCM is investigated on drug information data set which assigns topics in monographs in the first dimension and primary therapeutic classes in the second dimension. The experimental results show that parallel text classification on MDCM performs better than flat model in both accuracy and time complexity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103–134 (2000)
Lertnattee, V., Theeramunkong, T.: Effect of term distributions on centroid-based text categorization. Information Science 158, 89–115 (2004)
Joachims, T.: Learning to Classify Text using Support Vector Machines. Kluwer Academic Publishers, Dordrecht (2002)
Kruengkrai, C., Jaruskulchai, C.: A parallel learning algorithm for text classification. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 201–206. ACM Press, New York (2002)
Ruoccom, A., Frieder, O.: Clustering and classification of large document bases in a parallel environment. Journal of the American Society for Information Science 48, 932–943 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lertnattee, V., Theeramunkong, T. (2004). Parallel Text Categorization for Multi-dimensional Data. In: Liew, KM., Shen, H., See, S., Cai, W., Fan, P., Horiguchi, S. (eds) Parallel and Distributed Computing: Applications and Technologies. PDCAT 2004. Lecture Notes in Computer Science, vol 3320. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30501-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-30501-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24013-6
Online ISBN: 978-3-540-30501-9
eBook Packages: Computer ScienceComputer Science (R0)