An Ensemble Approach to Multi-label Classification of Textual Data

Kurach, Karol; Pawłowski, Krzysztof; Romaszko, Łukasz; Tatjewski, Marcin; Janusz, Andrzej; Nguyen, Hung Son

doi:10.1007/978-3-642-35527-1_26

An Ensemble Approach to Multi-label Classification of Textual Data

Karol Kurach²²,
Krzysztof Pawłowski²²,
Łukasz Romaszko²²,
Marcin Tatjewski²²,
Andrzej Janusz²² &
…
Hung Son Nguyen²²

Conference paper

3503 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7713))

Abstract

In this paper, we investigate different approaches to multilabel classification of textual data, with a special focus on ensemble techniques. Commonly used classifier ensembles combine outputs of base learning models in order to enhance the learning results. The multi-label classification problem introduces some new challenges to the ensemble learning methods. For instance, one needs to decide in which order is it better to aggregate the base learners - on a level of individual labels and then for the whole label sets, or the other way around. We discuss this issue and experimentally compare selected approaches. In the experiments, we use data from JRS’2012 Data Mining Competition, whose scope was topical classification of biomedical research papers, and as the base learners we utilize the models employed by the winners of this contest.

This research was supported by the National Centre for Research and Development (NCBiR) under grant SP/I/1/77065/10 by the strategic scientific research and experimental development program: “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Caruana, R., Munson, A., Niculescu-Mizil, A.: Getting the most out of ensemble selection. In: Proceedings of the 6th IEEE International Conference on Data Mining, pp. 828–833 (2006)
Google Scholar
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2), 139–157 (2000)
Article Google Scholar
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36(1-2), 105–139 (1999)
Article Google Scholar
Janusz, A., Nguyen, H.S., Ślęzak, D., Stawicki, S., Krasuski, A.: JRS’2012 Data Mining Competition: Topical Classification of Biomedical Research Papers. In: Yao, J., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 422–431. Springer, Heidelberg (2012)
Chapter Google Scholar
Žbontar, J., Žitnik, M., Zidar, M., Majcen, G., Potočnik, M., Zupan, B.: Team ULjubljana’s Solution to the JRS 2012 Data Mining Competition. In: Yao, J., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 471–478. Springer, Heidelberg (2012)
Chapter Google Scholar
Janusz, A., Świeboda, W., Krasuski, A., Nguyen, H.S.: Interactive document indexing method based on explicit semantic analysis. In: Yao, J., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 156–165. Springer, Heidelberg (2012)
Chapter Google Scholar
Beck, J., Sequeira, E.: PubMed Central (PMC): An archive for literature from life sciences journals. In: McEntyre, J., Ostell, J. (eds.) The NCBI Handbook. National Center for Biotechnology Information, Bethesda (2003)
Google Scholar
Bembenik, R., Skonieczny, L., Rybiński, H., Niezgódka, M.: Intelligent Tools for Building a Scientific Information Platform, vol. 390. Springer-Verlag New York Inc. (2012)
Google Scholar
Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. IJDWM 3(3), 1–13 (2007)
Google Scholar
Zhou, Z., Zhang, M.: Multi-instance multi-label learning with application to scene classification. In: Advances in Neural Information Processing Systems 19, p. 1609 (2007)
Google Scholar
Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics 22(7), 830–836 (2006)
Article Google Scholar
Zhou, Z., Zhang, M., Huang, S., Li, Y.: Multi-instance multi-label learning. Artificial Intelligence 176(1), 2291–2320 (2012)
Article MathSciNet MATH Google Scholar
McCallum, A.: Multi-label text classification with a mixture model trained by em. In: Proceedings of AAAI 1999 Workshop on Text Learning (1999)
Google Scholar
Zhang, M.L., Zhou, Z.H.: Ml-knn: A lazy learning approach to multi-label learning. Pattern Recognition 40(7), 2038–2048 (2007)
Article MATH Google Scholar
Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of the 21st International Conference on Machine Learning, pp. 137–144. ACM Press (2004)
Google Scholar
Janusz, A.: Combining Multiple Classification or Regression Models Using Genetic Algorithms. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 130–137. Springer, Heidelberg (2010)
Chapter Google Scholar
Bennett, J., Lanning, S.: The netflix prize. In: KDD Cup and Workshop in Conjunction with KDD (2007)
Google Scholar
Kurach, K., Pawłowski, K., Romaszko, Ł., Tatjewski, M., Janusz, A., Nguyen, H.S.: Multi-label classification of biomedical articles. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds.) Intelligent Tools for Building a Scientific Information Platform: Advanced Architectures and Solutions. Springer (2012)
Google Scholar
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.Y.: A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing 16(6), 1190–1208 (1995)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics, Informatics and Mechanics, The University of Warsaw, Banacha 2, 02-097, Warsaw, Poland
Karol Kurach, Krzysztof Pawłowski, Łukasz Romaszko, Marcin Tatjewski, Andrzej Janusz & Hung Son Nguyen

Authors

Karol Kurach
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Pawłowski
View author publications
You can also search for this author in PubMed Google Scholar
Łukasz Romaszko
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Tatjewski
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej Janusz
View author publications
You can also search for this author in PubMed Google Scholar
Hung Son Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Fudan University, Handan Road 220, 200433, Shanghai, China
Shuigeng Zhou
Chinese Academy of Sciences, Academy of Mathematics and Systems Science, Dongguancun East Road 55, 100190, Beijing, China
Songmao Zhang
Department of Computer Science and Engineering, University of Minnesota, Union Street SE 200, 55455, Minneapolis, MN, USA
George Karypis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kurach, K., Pawłowski, K., Romaszko, Ł., Tatjewski, M., Janusz, A., Nguyen, H.S. (2012). An Ensemble Approach to Multi-label Classification of Textual Data. In: Zhou, S., Zhang, S., Karypis, G. (eds) Advanced Data Mining and Applications. ADMA 2012. Lecture Notes in Computer Science(), vol 7713. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35527-1_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-35527-1_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35526-4
Online ISBN: 978-3-642-35527-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics