Abstract
Support vector machine is a supervised learning model with associated learning algorithms that analyzes data and recognizes patterns. In various applications, the SVM shows its advantage of the classification performance, however, the original SVM was designed for the numerical data. For using the SVM on the nominal data, most previous research used a certain number to replace each nominal value or transformed the nominal value into the one hot vector. Both methods could not present the original nominal data’s structure and the similarity between them, which leads to information loss from the data and reduce the classification performance. In this work, we design a novel coupled similarity metric between nominally attributed data. This metric is pairwise, we also propose an adapted SVM which can handle this. The experiment result shows the proposed method outperforms the traditional SVM and other popular classification methods on various public data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abernethy, J., Bach, F., Evgeniou, T., Vert, J.-P.: A new approach to collaborative filtering: operator estimation with spectral regularization. J. Mach. Learn. Res. 10, 803–826 (2009)
Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63, 503–527 (2007)
Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a mahalanobis metric from equivalence constraints. J. Mach. Learn. Res. 6(6), 937–965 (2005)
Ben-Hur, A., Noble, W.S.: Kernel methods for predicting protein-protein interactions. Bioinformatics 21(suppl 1), i38–i46 (2005)
Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: a comparative evaluation. In: Proceedings of the 8th SIAM International Conference on Data Mining, pp. 243–254 (2008)
Cao, L.: In-depth behavior understanding and use: the behavior informatics approach. Inf. Sci. 180(17), 3067–3085 (2010)
Brunner, C., Fischer, A., Luig, K., Thies, T.: Pairwise support vector machines and their application to large scale problems. J. Mach. Learn. Res. 13, 2279–2292 (2012)
Cao, L., Philip, S.Y.: Behavior Computing. Modeling, Analysis, Mining and Decision. Springer, London (2012)
Cost, S., Salzberg, S.: A weighted nearest neighbor algorithm for learning with symbolic features. Mach. Learn. 10(1), 57–78 (1993)
Cao, L., Gorodetsky, V., Mitkas, P.: Agent mining: the synergy of agents and data mining. IEEE Intell. Syst. 24(3), 64–72 (2009)
Cao, L.: Non-IIDness learning in behavioral and social data. Comput. J. 57(9), 1358–1370 (2014)
Cao, L., Ou, Y., Yu, P.S.: Coupled behavior analysis with applications. IEEE Trans. Knowl. Data Eng. 24(8), 1378–1392 (2012)
Cao, L., Ou, Y., Yu, P.S., Wei, G.: Detecting abnormal coupled sequences and sequence changes in group-based manipulative trading behaviors. In: KDD2010, pp. 85–94 (2010)
Cao, L., Zhang, H., Zhao, Y., Luo, D., Zhang, C.: Combined mining: discovering informative knowledge in complex data. IEEE Trans. SMC Part B 41(3), 699–712 (2011)
Cao, L., Zhao, Y., Zhang, C.: Mining impact-targeted activity patterns in imbalanced data. IEEE Trans. Knowl. Data Eng. 20(8), 1053–1066 (2008)
Das, G., Mannila, H.: Context-based similarity measures for categorical databases. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 201–210. Springer, Heidelberg (2000)
Duan, K.-B., Keerthi, S.S.: Which is the best multiclass SVM method? an empirical study. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 278–285. Springer, Heidelberg (2005)
Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications. ASA-SIAM Series on Statistics and Applied Probability, Philadelphia (2007)
Cao, L.: Coupling learning of complex interactions. Inf. Process. Manage. 51(2), 167–186 (2015)
Hill, S.I., Doucet, A.: A framework for kernel-based multi-category classification. J. Artif. Intell. Res. (JAIR) 30, 525–564 (2007)
Hsu, C.-W., Lin, C.-J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)
Kulis, B., Basu, S., Dhillon, I., Mooney, R.: Semi-supervised graph clustering: a kernel approach. Mach. Learn. 74(1), 1–22 (2009)
Phillips, P.J. et al.: Support vector machines applied to face recognition, vol. 285. Citeseer (1998)
Cao, L.: Combined mining: analyzing object and pattern relations for discovering and constructing complex but actionable patterns. WIREs Data Min. Knowl. Discovery 3(2), 140–155 (2013)
Rapaport, F., Barillot, E., Vert, J.-P.: Classification of arrayCGH data using fused SVM. Bioinformatics 24(13), i375–i382 (2008)
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)
Wang, C., Cao, L. et al.: Coupled nominal similarity in unsupervised learning. In: Proceedings of CIKM2011, pp. 973–978 (2011)
Wang, C., She, Z., Cao, L.: Coupled clustering ensemble: incorporating coupling relationships both between base clusterings and objects. In: Proceedings of ICDE2013 (2013)
Wang, C., She, Z., Cao, L.: Coupled attribute analysis on numerical data. In: Proceedings of IJCAI2013 (2013)
Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artif. Intell. Res. 6, 1–34 (1997)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, M., Li, J., Ou, Y., Cao, L. (2015). A Coupled Similarity Kernel for Pairwise Support Vector Machine. In: Cao, L., et al. Agents and Data Mining Interaction. ADMI 2014. Lecture Notes in Computer Science(), vol 9145. Springer, Cham. https://doi.org/10.1007/978-3-319-20230-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-20230-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20229-7
Online ISBN: 978-3-319-20230-3
eBook Packages: Computer ScienceComputer Science (R0)