Model-Aware Representation Learning for Categorical Data with Hierarchical Couplings

Song, Jianglong; Zhu, Chengzhang; Zhao, Wentao; Liu, Wenjie; Liu, Qiang

doi:10.1007/978-3-319-68612-7_28

Jianglong Song¹⁷,
Chengzhang Zhu^17,18,
Wentao Zhao¹⁷,
Wenjie Liu¹⁷ &
…
Qiang Liu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10614))

Included in the following conference series:

International Conference on Artificial Neural Networks

4235 Accesses
1 Citations

Abstract

Learning an appropriate representation for categorical data is a critical yet challenging task. Current research makes efforts to embed the categorical data into the vector or dis/similarity spaces, however, it either ignores the complex interactions within data or overlooks the relationship between the representation and its fed learning model. In this paper, we propose a model-aware representation learning framework for categorical data with hierarchical couplings, which simultaneously reveals the couplings from value to object and optimizes the fitness of the represented data for the follow-up learning model. An SVM-aware representation learning method has been instantiated for this framework. Extensive experiments on ten UCI categorical datasets with diverse characteristics demonstrate the representation via our proposed method can significantly improve the learning performance (up to 18.64% improved) compared with other three competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmad, A., Dey, L.: A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set. Pattern Recogn. Lett. 28(1), 110–118 (2007)
Article Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Breiman, L., Friedman, J.H., Olshen, R., Stone, C.J.: Classification and regression trees. Biometrics 40(3), 358 (1984)
MATH MathSciNet Google Scholar
Cao, F., Liang, J., Li, D., Bai, L., Dang, C.: A dissimilarity measure for the k-modes clustering algorithm. Knowl.-Based Syst. 26, 120–127 (2012)
Article Google Scholar
Grąbczewski, K., Jankowski, N.: Transformations of symbolic data for continuous data oriented models. In: Kaynak, O., Alpaydin, E., Oja, E., Xu, L. (eds.) ICANN/ICONIP -2003. LNCS, vol. 2714, pp. 359–366. Springer, Heidelberg (2003). doi:10.1007/3-540-44989-2_43
Chapter Google Scholar
Ienco, D., Pensa, R.G., Meo, R.: From context to distance: learning dissimilarity for categorical data clustering. ACM Trans. Knowl. Discov. Data 6(1), 1–25 (2012)
Article Google Scholar
Jia, H., Cheung, Y.M., Liu, J.: A new distance metric for unsupervised learning of categorical data. IEEE Trans. Neural Netw. Learn. Syst. 27(5), 1065–1079 (2016)
Article MathSciNet Google Scholar
Le, S.Q., Ho, T.B.: An association-based dissimilarity measure for categorical data. Pattern Recogn. Lett. 26(16), 2549–2557 (2005)
Article Google Scholar
Ng, M.K., Li, M.J., Huang, J.Z., He, Z.: On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 503–507 (2007)
Article Google Scholar
Peng, S., Hu, Q., Chen, Y., Dang, J.: Improved support vector machine algorithm for heterogeneous data. Pattern Recogn. 48(6), 2072–2083 (2015)
Article Google Scholar
Stanfill, C., Waltz, D.: Toward memory-based reasoning. Commun. ACM 29(12), 1213–1228 (1986)
Article Google Scholar
Vapnik, V.N.: Statistical Learning Theory, vol. 1. Wiley, New York (1998)
MATH Google Scholar
Wang, C., Dong, X., Zhou, F., Cao, L., Chi, C.H.: Coupled attribute similarity learning on categorical data. IEEE Trans. Neural Netw. Learn. Syst. 26(4), 781 (2015)
Article MathSciNet Google Scholar
Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artif. Intell. Res. 6(1), 1–34 (1997)
MATH MathSciNet Google Scholar
Xie, J., Szymanski, B.K., Zaki, M.J.: Learning dissimilarities for categorical symbols. In: JMLR: Workshop on Feature Selection in Data Mining, pp. 2228–2238. JMLR.org (2013)
Zhang, K., Wang, Q., Chen, Z., Marsic, I., Kumar, V., Jiang, G., Zhang, J.: From categorical to numerical: multiple transitive distance learning and embedding. In: SIAM International Conference on Data Mining, pp. 46–54. SIAM (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, China
Jianglong Song, Chengzhang Zhu, Wentao Zhao, Wenjie Liu & Qiang Liu
Advanced Analytics Institute, University of Technology Sydney, Ultimo, Australia
Chengzhang Zhu

Authors

Jianglong Song
View author publications
You can also search for this author in PubMed Google Scholar
Chengzhang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Wentao Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianglong Song .

Editor information

Editors and Affiliations

University of Lausanne, Lausanne, Switzerland
Alessandra Lintas
University of Genoa, Genoa, Italy
Stefano Rovetta
Universitat Pompeu Fabra, Barcelona, Spain
Paul F.M.J. Verschure
University of Lausanne, Lausanne, Switzerland
Alessandro E.P. Villa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, J., Zhu, C., Zhao, W., Liu, W., Liu, Q. (2017). Model-Aware Representation Learning for Categorical Data with Hierarchical Couplings. In: Lintas, A., Rovetta, S., Verschure, P., Villa, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2017. ICANN 2017. Lecture Notes in Computer Science(), vol 10614. Springer, Cham. https://doi.org/10.1007/978-3-319-68612-7_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-68612-7_28
Published: 25 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68611-0
Online ISBN: 978-3-319-68612-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Model-Aware Representation Learning for Categorical Data with Hierarchical Couplings