Abstract
Time series classification in the dissimilarity space combines the advantages of elastic dissimilarity functions such as the dynamic time warping distance and the rich mathematical structure of Euclidean spaces. We applied dimension reduction using PCA followed by support vector learning on dissimilarity representations to 42 UCR datasets. The results suggest that time series classification in dissimilarity space has potential to complement the state-of-the-art, because the SVM classifiers perform better on the 42 datasets with higher confidence than the nearest-neighbor classifier based on the dynamic time warping distance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Batista, G.E., Wang, X., Keogh, E.J.: A complexity-invariant distance measure for time series. In: SIAM International Conference on Data Mining, vol. 11, pp. 699–710 (2011)
Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)
Bunke, H., Riesen, K.: Graph classification based on dissimilarity space embedding. In: da Vitoria Lobo, N., Kasparis, T., Roli, F., Kwok, J.T., Georgiopoulos, M., Anagnostopoulos, G.C., Loog, M. (eds.) Structural, Syntactic, and Statistical Pattern Recognition. LNCS, vol. 5342, pp. 996–1007. Springer, Heidelberg (2008)
Cao, L.J., Chua, K.S., Chong, W.K., Lee, H.P., Gu, Q.M.: A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing 55(1–2), 321–336 (2003)
Chen, Y., Garcia, E., Gupta, M., Rahimi, A., Cazzanti, L.: Similarity-based classification: concepts and algorithms. J. Mach. Learn. Res. 10, 747–776 (2009)
Cortes, C., Vapnik, V.: Support-vector network. Mach. Learn. 20, 273–297 (1995)
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)
Duin, R., de Ridder, D., Tax, D.: Experiments with object based discriminant functions; a featureless approach to pattern recognition. Pattern Recogn. Lett. 18(11–13), 1159–1166 (1997)
Duin, R.P.W., Pekalska, E.: The dissimilarity space: bridging structural and statistical pattern recognition. Pattern Recogn. Lett. 33(7), 807–962 (2012)
Fu, T.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)
Geibel, P., Jain, B., Wysotzki, F.: SVM learning with the SH inner product. In: European Symposium on Artificial Neural Networks (2004)
Geurts, P.: Pattern extraction for time series classification. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 115–127. Springer, Heidelberg (2001)
Graepel, T., Herbrich, R., Bollmann-Sdorra, P., Obermayer, K.: Classification on pairwise proximity data. In: Advances in Neural Information Processing Systems (1999)
Graepel, T., Herbrich, R., Schölkopf, B., Smola, A., Bartlett, P., Müller, K.-R., Obermayer, K., Williamson, R.: Classification on proximity data with LP-machines. In: International Conference on Artificial Neural Networks (1999)
Gudmundsson, S., Runarsson, T.P., Sigurdsson, S.: Support vector machines and dynamic time warping for time series. In: Joint Conference on Neural Networks (2008)
Haasdonk, H., Burkhardt, B.: Invariant kernels for pattern analysis and machine learning. Mach. Learn. 68, 35–61 (2007)
Hochreiter, S., Obermayer, K.: Support vector machines for dyadic data. Neural Comput. 18(6), 1472–1510 (2006)
Jain, B.J., Geibel, P., Wysotzki, F.: SVM learning with the Schur? Hadamard inner product for graphs. Neurocomputing 64, 93–105 (2005)
Jain, B.J., Spiegel, S.: Time series classification in dissimilarity spaces. In: Proceedings of the 1st International Workshop on Advanced Analytics and Learning on Temporal Data (2015)
Jain, B.J.: Generalized gradient learning on time series. Mach. Learn. 100(2), 587–608 (2015)
Kate, R.J.: Using dynamic time warping distances as features for improved time series classification. Data Min. Knowl. Discov. 30(2), 283–312 (2016)
Keogh, E., Zhu, Q., Hu, B., Hao, Y., Xi, X., Wei, L., Ratanamahatana, C.A.: The UCR Time Series Classification/Clustering Homepage (2011). www.cs.ucr.edu/~eamonn/time_series_data/
Laub, J., Müller, K.R.: Feature discovery in non-metric pairwise data. J. Mach. Learn. Res. 5, 801–818 (2004)
Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Discov. 15(2), 107–144 (2007)
Lines, J., Bagnall, A.: Time series classification with ensembles of elastic distance measures. Data Min. Knowl. Discov. 29(3), 565–592 (2015)
Livi, L., Rizzi, A., Sadeghian, A.: Optimized dissimilarity space embedding for labeled graphs. Inf. Sci. 266, 47–64 (2014)
Ong, C., Mary, X., Canu, S., Smola, A.J.: Learning with non-positive kernels. In: International Conference on Machine Learning (2004)
Pekalska, E., Duin, R.P.W.: The Dissimilarity Representation for Pattern Recognition. World Scientific, River Edge (2005)
Pekalska, E., Duin, R.P.W., Paclik, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recogn. 39(2), 189–208 (2006)
Petitjean, F., Ketterlin, A., Gançarski, P.: A global averaging method for dynamic time warping, with applications to clustering. Pattern Recogn. 44(3), 678–693 (2011)
Riesen, K., Neuhaus, M., Bunke, H.: Graph embedding in vector spaces by means of prototype selection. In: Escolano, F., Vento, M. (eds.) GbRPR. LNCS, vol. 4538, pp. 383–393. Springer, Heidelberg (2007)
Riesen, K., Bunke, H.: Graph classification based on vector space embedding. Int. J. Pattern Recogn. Artif. Intell. 23(6), 1053–1081 (2009)
Spillmann, B., Neuhaus, M., Bunke, H., Pękalska, E., Duin, R.P.W.: Transforming strings to vector spaces using prototype selection. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) Structural, Syntactic, and Statistical Pattern Recognition. LNCS, vol. 4109, pp. 287–296. Springer, Heidelberg (2006)
Subasi, A., Gursoy, M.I.: EEG signal classification using PCA, ICA, LDA and support vector machines. Expert Syst. Appl. 37(12), 8659–8666 (2010)
Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast time series classification using numerosity reduction. In: International Conference on Machine Learning (2006)
Xing, Z., Pei, J., Keogh, E.: A brief survey on sequence classification. ACM SIGKDD Explor. Newslett. 12(1), 40–48 (2010)
Acknowledgements
B. Jain was funded by the DFG Sachbeihilfe JA 2109/4-1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Performance Profiles
A Performance Profiles
Performance profiles have been introduced by Dolan to compare the efficiency of algorithms [7]. Here, we use performance profiles to compare differences in the classification accuracy of a collection of classifiers on a set of classification problems. The comparison is summarized by one curve per classifier, which is easier to read than a table of classification accuracies.
To define a performance profile, we assume that \(\mathbb {C}\) is a set of classifiers to be compared and \(\mathbb {P}\) is the set of all classification problems. For each classification problem \(p \in \mathbb {P}\) and each classifier \(c \in \mathbb {C}\), we define
as the performance of classifier c for problem p. In performance profiles, we do not consider the absolute performance of a classifier in terms of its classification accuracy, but its relative performance with respect to the best performing classifier. The classifier with the best performance on problem p has classification accuracy
Then the relative performance of classifier c on problem p is given by
The relative performance \(r_{c, p}\) takes values from the interval [0, 1]. The better the performance of a classifier for a given problem, the lower is its relative performance. Thus, the lower the relative performance, the better the classifier. Moreover, from
follows that \(r_{c,p}\) is the factor by which the classification accuracy \(\rho _{c,p}\) deviates from the best classification accuracy \(\rho _p^*\).
Finally, the performance profile of classifier \(c \in \mathbb {C}\) over all problems \(p \in \mathbb {P}\) is an empirical cumulative distribution function
It is sufficient to keep three three facts in mind to interpret performance profiles:
-
1.
The value \(P_c(0)\) is the fraction of problems on which classifier c is best.
-
2.
\(P_c(\tau )\) is the fraction of problems on which the performance of classifier c deviates at most by factor \(\tau \) from the best performance.
-
3.
\(\tau _{\max }\) with \(P_c(\tau _{\max }) = 1\) is the maximum factor by which classifier c deviates from the best performance.
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Jain, B., Spiegel, S. (2016). Dimension Reduction in Dissimilarity Spaces for Time Series Classification. In: Douzal-Chouakria, A., Vilar, J., Marteau, PF. (eds) Advanced Analysis and Learning on Temporal Data. AALTD 2015. Lecture Notes in Computer Science(), vol 9785. Springer, Cham. https://doi.org/10.1007/978-3-319-44412-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-44412-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44411-6
Online ISBN: 978-3-319-44412-3
eBook Packages: Computer ScienceComputer Science (R0)