Abstract
In a multi-label classification problem, each document is associated with a subset of labels. The documents often consist of multiple features. In addition, each document is usually associated with several labels. Therefore, feature selection is an important task in machine learning, which attempts to remove irrelevant and redundant features that can hinder the performance. This paper suggests transforming the multi-label documents into single-label documents before using the standard feature selection algorithm. Under this process, the document is copied into labels to which it belongs by adopting assigning all features to each label it belongs. With this context, we conducted a comparative study on five feature selection methods. These methods are incorporated into the traditional Naive Bayes classifiers, which are adapted to deal with multi-label documents. Experiments conducted with benchmark datasets showed that the multi-label Naive Bayes classifier coupled with the GSS method delivered a better performance than the MLNB classifier using other methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Schapire, R.E., Singer, Y.: BoosTexter: a boosting-based system for text categorization. Mach. Learn. 39, 135–168 (2000)
Comité, F.D., Gilleron, R., Tommasi, M.: Learning multi-label alternating decision trees from texts and data. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS, vol. 2734, pp. 251–274. Springer, Heidelberg (2003)
Min-Ling, Z., Zhi-Hua, Z.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Eng. 18, 1338–1351 (2006)
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40, 2038–2048 (2007)
Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, pp. 681–687. MIT Press (2001)
Wei, Z., Zhang, H., Zhang, Z., Li, W., Miao, D.: A Naïve Bayesian multi-label classification algorithm with application to visualize text search results. Int. J. Adv. Intell. 3, 173–188 (2011)
Zhao, Z., Morstatter, F., Sharma, S., Alelyani, S., Anand, A., Liu, H.: Advancing feature selection research. ASU Feature Selection Repository. Technical report, Arizona State University (2010)
Spolaôr, N., Cherman, E.A., Monard, M.C., Lee, H.D.: A comparison of multi-label feature selection methods using the problem transformation approach. Electron. Notes Theoret. Comput. Sci. 292, 135–151 (2013)
Yu, Y., Wang, Y.: Feature selection for multi-label learning using mutual information and GA. In: Miao, D., Pedrycz, W., Slezak, D., Peters, G., Hu, Q., Wang, R. (eds.) RSKT 2014. LNCS, vol. 8818, pp. 454–464. Springer, Heidelberg (2014)
Weizhu, C., Jun, Y., Benyu, Z., Zheng, C., Qiang, Y.: Document transformation for multi-label feature selection in text categorization. In: Seventh IEEE International Conference on Data Mining, pp. 451–456. IEEE Press, Omaha, NE, USA (2001)
Doquire, G., Verleysen, M.: Feature selection for multi-label classification problems. In: Cabestany, J., Rojas, I., Joya, G. (eds.) IWANN 2011, Part I. LNCS, vol. 6691, pp. 9–16. Springer, Heidelberg (2011)
Zhang, M.L., Peña, J.M., Robles, V.: Feature selection for multi-label Naive Bayes classification. Inf. Sci. 179, 3218–3229 (2009)
Gunal, S.: Hybrid feature selection for text classification. Turk. J. Electr. Eng. Comput. Sci. 20, 1296–1311 (2012)
Spolaôr, N., Tsoumakas, G.: Evaluating feature selection methods for multi-label text classification. In: BioASQ Workhsop (2012)
Shao, H., Li, G., Liu, G., Wang, Y.: Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine. Sci China Inf. Sci. 65, 1–13 (2013)
Yang, Y., Pederson, J.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, pp. 412–420. ICML, Tennessee, USA (1997)
Galavotti, L., Sebastiani, F., Simi, M.: Experiments on the use of feature selection and negative evidence in automated text categorization. In: Borbinha, J.L., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 59–68. Springer, Heidelberg (2000)
Van, R.: Information Retrieval, 2nd edn. Butterworths, London (1979)
Dunja, M.: Machine learning on non-homogeneous, distributed text data. Ph.D. dissertation, University of Ljubljana, Slovenia (1998)
Hwee, T.N., Wei, B.G., Kok, L.: Feature selection, perceptron learning, and a usability case study for text categorization. In: Proceedings of SIGIR-97, 20th ACM International Conference on Research and Development in Information Retrieval, pp 67–73. ACM, Philadelphia, Pennsylvania, USA (1997)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer, New York (2010)
Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26, 1819–1837 (2014)
Tsoumakas, G., Katakis, I.: Multi label classification: an overview. Int. J. Data Warehouse. Min. 3, 1–13 (2007)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR). 34, 1–47 (2002)
Naonori, U., Kazumi, S.: Parametric mixture models for multi-labeled text. In: Advances in Neural Information Processing Systems, vol. 15, pp. 721—728. MIT Press (2003)
David, D.L., Yiming, Y., Tony, G.R., Fan, L.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
Srivastava, A.N., Zane-Ulman, B.: Discovering recurring anomalies in text reports regarding complex space systems. In: Aerospace Conference, pp. 3853–3862. IEEE (2005)
Mark D.S., James A., Ben, C.: A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 623–632. ACM Press, Lisbon, Portugal (2007)
Acknowledgments
The research of this paper is financially supported by the Malaysian Ministry of Education (MOE) grant no. ERGS/1/2013/ICT07/UKM/02/5.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Alhutaish, R., Omar, N., Abdullah, S. (2015). A Comparison of Multi-label Feature Selection Methods Using the Algorithm Adaptation Approach. In: Badioze Zaman, H., et al. Advances in Visual Informatics. IVIC 2015. Lecture Notes in Computer Science(), vol 9429. Springer, Cham. https://doi.org/10.1007/978-3-319-25939-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-25939-0_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25938-3
Online ISBN: 978-3-319-25939-0
eBook Packages: Computer ScienceComputer Science (R0)