Advertisement

Optimizing shapelets quality measure for imbalanced time series classification

  • Qiuyan YanEmail author
  • Yang Cao
Article
  • 36 Downloads

Abstract

Time series classification has been considered as one of the most challenging problems in data mining and is widely used in a broad range of fields. A biased distribution leads to classification on minority time series objects more severe. A commonly taken approach is to extract or select the representative features to retain the structure of a time series object. However, when the data distribution is imbalanced, the traditional features cannot represent time series effectively, especially in multi-class environment. In this paper, Shapelets — a primitive time series mining technology — is applied to extract the most representative subsequences. Especially, we verify that IG (Information Gain) is unsuitable as a shapelet quality measure for imbalanced data sets. Nevertheless, we propose two quality measures for shapelets on imbalanced binary and multi-class problem respectively. Based on extracted shapelet features, we select the diversified top-k shapelets based on new quality measure to represent the top-k best features and achieve this procedure on map-reduce framework. Lastly, two oversampling methods based on shapelet features are proposed to re-balance the binary and multi-class time series data sets. We validated our methods on the benchmark data sets by comparing with the canonical classifiers and the state-of-the-art time series algorithms. It is verified that the proposed algorithms perform more competitive than the compared methods in statistical significance.

Keywords

Imbalanced time series data Feature selection Shapelets Quality measure 

Notes

Acknowledgments

This work is supported by the National Natural Science Foundation of China(No.61876186),the Youth Science Foundation of China University of Mining and Technology under Grant No (2013QNB16).

References

  1. 1.
    Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progress Artif Intell 5(4):1–12CrossRefGoogle Scholar
  2. 2.
    Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying: mining of time series data experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552CrossRefGoogle Scholar
  3. 3.
    Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Mining Knowl Discov 22(1–2):149–182MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Lin J, Keogh E, Li W, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Mining Knowl Discov 15(2):107–144MathSciNetCrossRefGoogle Scholar
  5. 5.
    Lines J, Davis LM, Hills J, Bagnall A (2012) A shapelet transform for time series classification. In: Acm Sigkdd international conference on knowledge discovery & data miningGoogle Scholar
  6. 6.
    Yan Q, Sun Q, Yan X (2016) Adapting ELM to time series classification: a novel diversified top-k shapelets extraction method. In: Databases theory and applications - 27th Australasian database conference, ADC, pp 215–227Google Scholar
  7. 7.
    Deng H, Runger G, Tuv E, Vladimir M (2013) A time series forest for classification and feature extraction. Inf Sci 239(4):142– 153MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Mohan S, Zhihai W (2018) Random Pairwise shapelets forest[C]. In: Advances in knowledge discovery and data mining, pp 68–80Google Scholar
  9. 9.
    Collell G, Prelec D, Patil KR (2018) A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data[J]. Neurocomputing 275:330–340CrossRefGoogle Scholar
  10. 10.
    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357CrossRefzbMATHGoogle Scholar
  11. 11.
    Han H, Wang W, Mao B (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Proceedings of advances in intelligent computing, pp 878–887Google Scholar
  12. 12.
    Nitesh V, Chawla L (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: Knowledge discovery in databases: PKDD 2003, pp 107–119Google Scholar
  13. 13.
    Zhou C, Liu B, Wang S (2016) CMO-SMOTE: misclassification cost minimization oriented synthetic minority oversampling technique for imbalanced learning. In: International conference on intelligent human-machine systems & cyberneticsGoogle Scholar
  14. 14.
    He H, Yang B, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International joint conference on neural networks, pp 1322–1328Google Scholar
  15. 15.
    Bo T, He H (2017) GIR-based ensemble sampling approaches for imbalanced learning. Pattern Recogn 71:306–319CrossRefGoogle Scholar
  16. 16.
    Zhang C, Guo J, Qi C, Jiang ZL, Xuan W (2018) EHSBoost: enhancing ensembles for imbalanced data-sets by evolutionary hybrid-sampling. In: International conference on security, pattern analysis, and cybernetics (SPAC)Google Scholar
  17. 17.
    Braytee A, Hussain FK, Anaissi A, Kennedy PJ (2015) ABC-sampling for balancing imbalanced datasets based on artificial bee colony algorithm. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), pp 594–599Google Scholar
  18. 18.
    Kang Q, Chen X, Li S, Zhou M (2017) A noise-filtered under-sampling scheme for imbalanced classification. IEEE Trans Cybern 47(12):4263–4274CrossRefGoogle Scholar
  19. 19.
    Rivera WA (2017) Noise reduction a priori synthetic over-sampling for class imbalanced data sets. Inf Sci 408:146–161CrossRefGoogle Scholar
  20. 20.
    Zhang W, Kobeissi S, Tomko S, Challis C (2017) Adaptive sampling scheme for learning in severely imbalanced large scale data. In: Proceedings of the Ninth Asian conference on machine learning, pp 240–247Google Scholar
  21. 21.
    Zhu T, Lin Y, Liu Y (2017) Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recogn 72:327–340CrossRefGoogle Scholar
  22. 22.
    Alejo R, Monroy-De-Jesús J, Ambriz-Polo JC, Pacheco-Sánchez JH (2017) An improved dynamic sampling back-propagation algorithm based on mean square error to face the multi-class imbalance problem[J]. Neural Comput Appl 1:1–15Google Scholar
  23. 23.
    García-Pedrajas N, Romero Del Castillo JA, Cerruela-García G (2017) A proposal for local k values for k-nearest neighbor rule. IEEE Trans Neural Netw Learn Syst 28(2):470–475CrossRefGoogle Scholar
  24. 24.
    Mullick SS, Datta S, Das S (2018) Adaptive learning-based k-nearest neighbor classifiers with resilience to class imbalance. IEEE Trans Neural Netw Learn Syst 29(11):5713–5725MathSciNetGoogle Scholar
  25. 25.
    Deepak G, Bharat R (2018) Entropy based fuzzy least squares twin support vector machine for class imbalance learning. Appl Intell 48(11):4212–4231CrossRefGoogle Scholar
  26. 26.
    Xu Y, Wang Q (2018) Maximum margin of twin spheres machine with pinball loss for imbalanced data classification. Appl Intell 48(1):23–34. learning. Applied intelligence, 1–20MathSciNetCrossRefGoogle Scholar
  27. 27.
    Lines J, Taylor S, Bagnall AJ (2018) Time Series Classification with HIVE-COTE: the hierarchical vote collective of transformation-based ensembles. TKDD 12(5):51–52CrossRefGoogle Scholar
  28. 28.
    Chen Z, Lin T (2018) A synthetic neighborhood generation based ensemble learning for the imbalanced data classification. Appl Intell 48(8):2441–2457CrossRefGoogle Scholar
  29. 29.
    Cao H, Li X-L, Woon Y-K, Ng S-K (2011) SPO: structure preserving oversampling for imbalanced time series classification. In: IEEE 11th international conference on data miningGoogle Scholar
  30. 30.
    Cao H, Li XLi, Woon YK, Ng SK (2013) Integrated oversampling for imbalanced time series classification. IEEE Trans Knowl Data Eng 25(12):2809–2822CrossRefGoogle Scholar
  31. 31.
    Liang G, Zhang C (2012) An efficient and simple under-sampling technique for imbalanced time series classification. In: Acm International conference on information & knowledge managementGoogle Scholar
  32. 32.
    Liang G (2013) An effective method for imbalanced time series classification: hybrid sampling. In: Proceedings of the 26th Australasian joint conference on ai 2013: advances in artificial intelligence, pp 374–385Google Scholar
  33. 33.
    Gong Z, Chen H (2016) Model-based oversampling for imbalanced sequence classification. In: CIKM, pp 1009–1018Google Scholar
  34. 34.
    Ye L, Keogh EJ (2009) Time series shapelets: a new primitive for data mining. In: Acm Sigkdd international conference on knowledge discovery & data mining, pp 947–956Google Scholar
  35. 35.
    He Q, Zhidong, Zhuang F , Shang T, Shi Z (2012) Fast time series classification based on infrequent shapelets. In: International conference on machine learning & applications, pp 215–219Google Scholar
  36. 36.
    Zakaria J, Mueen A, Keogh E (2012) Clustering time series using unsupervised-shapelets. In: IEEE International conference on data mining, pp 785–794Google Scholar
  37. 37.
    Dong YJ, Hai WZ, Meng H (2015) Shapelet pruning and shapelet coverage for time series classification. J Softw, 2311–2325Google Scholar
  38. 38.
    Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. in: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1154–1162Google Scholar
  39. 39.
    Hou L, Kwok JT, Zurada JM (2016) Efficient learning of timeseries shapelets. In: Thirtieth Aaai conference on artificial intelligence, pp 1209–1215Google Scholar
  40. 40.
    The UCR Time Series Classification Archive. (2015) www.cs.ucr.edu/eamonn/time_series_data/
  41. 41.
    Cao H, Tan, et al (2014) A parsimonious mixture of Gaussian trees model for oversampling in imbalanced and multimodal time-series classification. IEEE Trans Neural Netw Learn Syst 25(12):2226–2239CrossRefGoogle Scholar
  42. 42.
    Keerthi SS, Shevade SK, Bhattacharyya C, et al (2001) Improvements to Platt’s SMO algorithm for SVM classifier design. Neur Comput 13(3):637–649CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Computer Science and TechnologyChina University of Mining TechnologyXuzhouChina
  2. 2.School of Safety EngineeringChina University of Mining TechnologyXuzhouChina
  3. 3.State Key Laboratory of Coal Resources and Safe MiningXuzhouChina

Personalised recommendations