Abstract
Multi-label classification assigns a data item to one or several classes. This problem of multiple labels arises in fields like acoustic and visual scene analysis, news reports and medical diagnosis. In a generative framework, data with multiple labels can be interpreted as additive mixtures of emissions of the individual sources. We propose a deconvolution approach to estimate the individual contributions of each source to a given data item. Similarly, the distributions of multi-label data are computed based on the source distributions. In experiments with synthetic data, the novel approach is compared to existing models and yields more accurate parameter estimates, higher classification accuracy and ameliorated generalization to previously unseen label sets. These improvements are most pronounced on small training data sets. Also on real world acoustic data, the algorithm outperforms other generative models, in particular on small training data sets.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Arons, B.: A review of the cocktail party effect. Journal of the American Voice I/O Society 12, 35–50 (1992)
Boutell, M., Luo, J., Shen, X., Brown, C.: Learning multi-label scene classification. Pattern Recognition, 1757–1771 (2004)
Zhu, S., Ji, X., Xu, W., Gong, Y.: Multi-labelled classification using maximum entropy method. In: Proceedings of SIGIR 2005 (2005)
Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. of Articificial Intelligence Research 2, 263–286 (1995)
Clare, A., King, R.D.: Knowledge discovery in multi-label phenotype data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001)
Elisseeff, A., Weston, J.: Kernel methods for multi-labelled classification and categorical regression problems. In: Proceedings of NIPS 2002 (2002)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)
McCallum, A.K.: Multi-label text classification with a mixture model trained by EM. In: Proceedings of NIPS 1999 (1999)
Tsoumakas, G., Katakis, I.: Multi label classification: An Overview. Int. J. of Data Warehousing and Mining 3(3), 1–13 (2007)
Caruana, R.: Multitask learning. Machine Learning 28(1), 41–75 (1997)
Pols, L.: Spectral analysis and identification of Dutch vowels in monosyllabic words. PhD thesis, Free University of Amsterdam (1966)
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. In: Readings in speech recognition, pp. 267–296 (1990)
Hastie, T., Tibshirani, R.: Discriminant analysis by Gaussian Mixtures. J. of the Royal Statist. Soc. B 58, 155–176 (1996)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. of the Royal Statist. Soc. BÂ 39(1), 138 (1977)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Streich, A.P., Buhmann, J.M. (2008). Classification of Multi-labeled Data: A Generative Approach. In: Daelemans, W., Goethals, B., Morik, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2008. Lecture Notes in Computer Science(), vol 5212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87481-2_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-87481-2_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87480-5
Online ISBN: 978-3-540-87481-2
eBook Packages: Computer ScienceComputer Science (R0)