A Comparison of Multi-label Feature Selection Methods Using the Algorithm Adaptation Approach

Alhutaish, Roiss; Omar, Nazlia; Abdullah, Salwani

doi:10.1007/978-3-319-25939-0_18

Roiss Alhutaish²⁰,
Nazlia Omar²⁰ &
Salwani Abdullah²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9429))

Included in the following conference series:

International Visual Informatics Conference

1264 Accesses

Abstract

In a multi-label classification problem, each document is associated with a subset of labels. The documents often consist of multiple features. In addition, each document is usually associated with several labels. Therefore, feature selection is an important task in machine learning, which attempts to remove irrelevant and redundant features that can hinder the performance. This paper suggests transforming the multi-label documents into single-label documents before using the standard feature selection algorithm. Under this process, the document is copied into labels to which it belongs by adopting assigning all features to each label it belongs. With this context, we conducted a comparative study on five feature selection methods. These methods are incorporated into the traditional Naive Bayes classifiers, which are adapted to deal with multi-label documents. Experiments conducted with benchmark datasets showed that the multi-label Naive Bayes classifier coupled with the GSS method delivered a better performance than the MLNB classifier using other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Article MATH Google Scholar
Schapire, R.E., Singer, Y.: BoosTexter: a boosting-based system for text categorization. Mach. Learn. 39, 135–168 (2000)
Article MATH Google Scholar
Comité, F.D., Gilleron, R., Tommasi, M.: Learning multi-label alternating decision trees from texts and data. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS, vol. 2734, pp. 251–274. Springer, Heidelberg (2003)
Chapter Google Scholar
Min-Ling, Z., Zhi-Hua, Z.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Eng. 18, 1338–1351 (2006)
Article Google Scholar
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40, 2038–2048 (2007)
Article MATH Google Scholar
Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, pp. 681–687. MIT Press (2001)
Google Scholar
Wei, Z., Zhang, H., Zhang, Z., Li, W., Miao, D.: A Naïve Bayesian multi-label classification algorithm with application to visualize text search results. Int. J. Adv. Intell. 3, 173–188 (2011)
Google Scholar
Zhao, Z., Morstatter, F., Sharma, S., Alelyani, S., Anand, A., Liu, H.: Advancing feature selection research. ASU Feature Selection Repository. Technical report, Arizona State University (2010)
Google Scholar
Spolaôr, N., Cherman, E.A., Monard, M.C., Lee, H.D.: A comparison of multi-label feature selection methods using the problem transformation approach. Electron. Notes Theoret. Comput. Sci. 292, 135–151 (2013)
Article Google Scholar
Yu, Y., Wang, Y.: Feature selection for multi-label learning using mutual information and GA. In: Miao, D., Pedrycz, W., Slezak, D., Peters, G., Hu, Q., Wang, R. (eds.) RSKT 2014. LNCS, vol. 8818, pp. 454–464. Springer, Heidelberg (2014)
Google Scholar
Weizhu, C., Jun, Y., Benyu, Z., Zheng, C., Qiang, Y.: Document transformation for multi-label feature selection in text categorization. In: Seventh IEEE International Conference on Data Mining, pp. 451–456. IEEE Press, Omaha, NE, USA (2001)
Google Scholar
Doquire, G., Verleysen, M.: Feature selection for multi-label classification problems. In: Cabestany, J., Rojas, I., Joya, G. (eds.) IWANN 2011, Part I. LNCS, vol. 6691, pp. 9–16. Springer, Heidelberg (2011)
Chapter Google Scholar
Zhang, M.L., Peña, J.M., Robles, V.: Feature selection for multi-label Naive Bayes classification. Inf. Sci. 179, 3218–3229 (2009)
Article MATH Google Scholar
Gunal, S.: Hybrid feature selection for text classification. Turk. J. Electr. Eng. Comput. Sci. 20, 1296–1311 (2012)
Google Scholar
Spolaôr, N., Tsoumakas, G.: Evaluating feature selection methods for multi-label text classification. In: BioASQ Workhsop (2012)
Google Scholar
Shao, H., Li, G., Liu, G., Wang, Y.: Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine. Sci China Inf. Sci. 65, 1–13 (2013)
Article MathSciNet Google Scholar
Yang, Y., Pederson, J.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, pp. 412–420. ICML, Tennessee, USA (1997)
Google Scholar
Galavotti, L., Sebastiani, F., Simi, M.: Experiments on the use of feature selection and negative evidence in automated text categorization. In: Borbinha, J.L., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 59–68. Springer, Heidelberg (2000)
Chapter Google Scholar
Van, R.: Information Retrieval, 2nd edn. Butterworths, London (1979)
Google Scholar
Dunja, M.: Machine learning on non-homogeneous, distributed text data. Ph.D. dissertation, University of Ljubljana, Slovenia (1998)
Google Scholar
Hwee, T.N., Wei, B.G., Kok, L.: Feature selection, perceptron learning, and a usability case study for text categorization. In: Proceedings of SIGIR-97, 20th ACM International Conference on Research and Development in Information Retrieval, pp 67–73. ACM, Philadelphia, Pennsylvania, USA (1997)
Google Scholar
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer, New York (2010)
Google Scholar
Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26, 1819–1837 (2014)
Article Google Scholar
Tsoumakas, G., Katakis, I.: Multi label classification: an overview. Int. J. Data Warehouse. Min. 3, 1–13 (2007)
Article Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR). 34, 1–47 (2002)
Article MathSciNet Google Scholar
Naonori, U., Kazumi, S.: Parametric mixture models for multi-labeled text. In: Advances in Neural Information Processing Systems, vol. 15, pp. 721—728. MIT Press (2003)
Google Scholar
David, D.L., Yiming, Y., Tony, G.R., Fan, L.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
Google Scholar
Srivastava, A.N., Zane-Ulman, B.: Discovering recurring anomalies in text reports regarding complex space systems. In: Aerospace Conference, pp. 3853–3862. IEEE (2005)
Google Scholar
Mark D.S., James A., Ben, C.: A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 623–632. ACM Press, Lisbon, Portugal (2007)
Google Scholar

Download references

Acknowledgments

The research of this paper is financially supported by the Malaysian Ministry of Education (MOE) grant no. ERGS/1/2013/ICT07/UKM/02/5.

Author information

Authors and Affiliations

Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Roiss Alhutaish, Nazlia Omar & Salwani Abdullah

Authors

Roiss Alhutaish
View author publications
You can also search for this author in PubMed Google Scholar
Nazlia Omar
View author publications
You can also search for this author in PubMed Google Scholar
Salwani Abdullah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roiss Alhutaish .

Editor information

Editors and Affiliations

Fac Info Science and Techn, Universiti Kebangsaan Malaysia, Selangor, Malaysia
Halimah Badioze Zaman
University of Cambridge, Cambridge, United Kingdom
Peter Robinson
Center for Digital Video Process, Dublin 9, Ireland
Alan F. Smeaton
Computer Science and Information Enginee, National Central University, Jhongli City, Taiwan
Timothy K. Shih
Kingston University, Kingston upon Thames, United Kingdom
Sergio Velastin
Universiti Kebangsaan Malaysia, Institute of Visual Informatics, Bangi, Malaysia
Azizah Jaafar
Universiti Kebangsaan Malaysia, Institute of Visual Informatics, Bangi, Malaysia
Nazlena Mohamad Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alhutaish, R., Omar, N., Abdullah, S. (2015). A Comparison of Multi-label Feature Selection Methods Using the Algorithm Adaptation Approach. In: Badioze Zaman, H., et al. Advances in Visual Informatics. IVIC 2015. Lecture Notes in Computer Science(), vol 9429. Springer, Cham. https://doi.org/10.1007/978-3-319-25939-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-25939-0_18
Published: 27 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25938-3
Online ISBN: 978-3-319-25939-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics