Skip to main content

Multi-Label Text Categorization Forecasting Probability Problem Using Support Vector Machine Techniques

  • Chapter
  • First Online:
Information Technologies in Environmental Engineering

Part of the book series: Environmental Science and Engineering ((ENVENG,volume 3))

  • 1540 Accesses

Abstract

The pervasiveness of information available on the Internet means that increasing numbers of documents must be classified. Text categorization is not only undertaken by domain experts, but also by automatic text categorization systems. Therefore, a text categorization system with a multi-label classifier is necessary to process the large number of documents. In this study, a proposed multi-label text categorization system is developed to classify multi-label documents. Data mapping is performed to transform data from a high-dimensional space to a lower-dimensional space with paired SVM output values, thus lower the complexity of the computation. A pair-wise comparison approach is applied to set the membership function in each predicted class to judge all possible classified classes. Finally, the overlapped area of two classes is obtained from the decision function to determine where a document is classified. A comparative study is performed on multi-label approaches using Reuter’s data sets. The results of the empirical experiment indicate that the proposed multi-label text categorization system performs better than other methods in terms of overall performance indices. Additionally, the probability of 0.5 for model membership function is a good criterion to judge between correctly and incorrectly classified documents from the results of the empirical experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abe, S., Inoue,T. (2002). Fuzzy support vector machines for multiclass problems. In proceedings of 10th European symposium on artificial neural networks (pp. 113-118). Bruges, Belgium, April.

    Google Scholar 

  2. Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757-1771.

    Article  Google Scholar 

  3. Cawley G. (2000). MATLAB Support Vector Machine Toolbox (svm_v0.54).

    Google Scholar 

  4. Chiang, D. A. & Lin, N. P. (1999). Correlation of fuzzy sets. Fuzzy Set and Systems, 102, 221-226.

    Article  Google Scholar 

  5. Elisseeff, A., & Weston, J. (2002). A kernel method for multi-labelled classification. Advances in Neural Information Processing Systems, 14, 681-687.

    Google Scholar 

  6. Egghe, L. & Michel, C. (2003). Construction of weak and strong similarity measures for ordered sets of documents using fuzzy set techniques. Information Processing and Management, 39, 771-807.

    Article  Google Scholar 

  7. Friedman, J. (1996). Another approach to polychotomus classification, Technical report, Department of Statistics, Stanford University, available at http://www-stat.standford.edu.tw/report/friedman/poly.ps.Z.

  8. Haykin, S., 1999. Neural Networks. New Jersey: Practice-Hall Press.

    Google Scholar 

  9. Joachims, T. (1998). Text categorization with support machines: learning with many features. In proceedings 10th Europen Conference on machine learning (ECML) Chemnitz: Springer-Verlag (pp. 137-142).

    Google Scholar 

  10. Kao, T. H. (2006) Advanced parametric mixture model for multi-label text categorization, a dissertation submitted in partial fulfillment of the requirements for the degree of master of science on national Taiwan University.

    Google Scholar 

  11. McCallum, A. K. (1999). Multi-label text classification with classification with a mixture model trained by EM. In proceedings of the AAAI’ 99 Workshop on Text Learning (pp.1-7).

    Google Scholar 

  12. Mill, J. & Inoue, A. (2003). An application of fuzzy support vector machines. Proceeding of the 22nd North American Fuzzy Information Processing Society (pp.302-306). Chicago, Illinois, July 24-26,

    Google Scholar 

  13. Saito, K. (2005). Multiple topic detection by parametric mixture models (PMM)—Automatic web page categorization for browsing. NTT Technical Review, 3(3), 15-18.

    Google Scholar 

  14. Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24, 513–523.

    Article  Google Scholar 

  15. Salton, G. (1991). Developments in automatic text retrieval. Science, 30, 974-980.

    Article  Google Scholar 

  16. Schapire, R. & Singer, Y. (2000). BoosTexter : A Boosting-based System for Text Categorization. Machine Learning, 39, 135-168.

    Article  Google Scholar 

  17. Takahashi, F., Abe, S. (2002). Decision-tree-based multiclass support vector machines. Proceedings of the 9th international conference on neural information processing (pp. 1418-1422).

    Google Scholar 

  18. Tanaka, H.,Sakano, H. & Ohtsuka, S. (2004). Retrieval Method for Multi-category Images. Proceedings of the 17th International Conference on Pattern Recognition (pp. 1051-1054).

    Google Scholar 

  19. Tsoumakas, G. & Katakis, I. (2007). Multi-Label Classification: An Overview. International Journal of Data Warehousing and Mining, 3(3), 1-13.

    Article  Google Scholar 

  20. Tsujinishi, D. & Abe, S. (2003). Fuzzy least squares support vector machines for multiclass problems. Neural Networks, 16(5), 785-792.

    Article  Google Scholar 

  21. Ueda, N. & Saito, K. (2003). Parametric mixture models for multi-labeled text. Advances in Neural Information Processing Systems, 15, 721-728.

    Google Scholar 

  22. Wang, X. & Wu, C. (2004). Using membership functions to improve multiclass SVM. In Proceedings 7th International Conference on Signal Processing (pp.1459-1462).

    Google Scholar 

  23. Wang, L., Chang, M. & Feng, J. (2005). Parallel and sequential support vector machines for multi-label classification. International Journal of Information Technology, 11(9), 11-18.

    CAS  Google Scholar 

  24. Zhang, M.L. & Zhou, Z.H. (2006). Multi-label neural networks with applications to functional genomics and text categorization. IEEE transactions on knowledge and data engineering, 18(10), 1338-1351.

    Article  Google Scholar 

  25. Zhang, M.L. (2006). The MATLAB package source of BPMLL. http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/annex/BPMLL.htm.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui-Min Chiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Chiang, HM., Wang, TY., Chiang, YM. (2011). Multi-Label Text Categorization Forecasting Probability Problem Using Support Vector Machine Techniques. In: Golinska, P., Fertsch, M., Marx-Gómez, J. (eds) Information Technologies in Environmental Engineering. Environmental Science and Engineering(), vol 3. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19536-5_3

Download citation

Publish with us

Policies and ethics