Nowadays, the Internet of Things and the e-Health are producing huge collections of Time Series that are analyzed in order to classify current status or to detect certain events, among others. In two-class problems, when the positive events to detect are infrequent, the gathered data lack balance. Even in unsupervised learning, this imbalance causes models to decrease their generalization capability. In order to solve such problem, Time Series balancing algorithms have been proposed. Time Series balancing algorithms have barely been studied; the different approaches make use of either a single bag of Time Series extracting some of them in order to generate a synthetic new one or ghost points in the distance space. These solutions are suitable when there is one only data source and they are univariate datasets. However, in the context of the Internet of Things, where multiple data sources are available, these approaches may not perform coherently. Besides, up to our knowledge there is not multiple datasources and multivariate TS balancing algorithms in the literature. In this research, we study two main concerns that should be considered when designing balancing Time Series algorithms: on the one hand, the TS balancing algorithms should deal with multiple multivariate data sources; on the other hand, the balancing algorithms should be shape preserving. A new algorithm is proposed for balancing multivariate Time Series datasets, as part of our work. A complete evaluation of the algorithm is performed dealing with two real-world multivariate Time Series datasets coming from the e-Health domain: one about epilepsy crisis identification and the other on fall detection. A thorough analysis of the performance is discussed, showing the advantages of considering the Time Series issues within the balancing algorithm.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
In medical record databases regarding a rare disease, where there is a large number of patients who do not have that disease, the counterpart class is the one corresponding to patients without the desease.
Remember that only the FALL TSs are TS_SMOTEd.
R Caret package.
Abbate S, Avvenuti M, Corsini P, Light J, Vecchio A (2010) Monitoring of human movements for fall detection and activities recognition in elderly care using wireless sensor network: a survey. In: Merret GV, Tan YK (eds) Wireless sensor networks: application-centric design. InTech, Rijeka, Croatia, pp 147–166
Alvarez-Alvarez A, Triviño G, Cordón O (2012) Human gait modeling using a genetic fuzzy finite state machine. IEEE Trans Fuzzy Syst 20(2):205–223
Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29
Baydogan MG, Runger G (2015) Learning a symbolic representation for multivariate time series classification. Data Min Knowl Discov 29(2):400–422
Beniczky S, Polster T, Kjaer T, Hjalgrim H (2013) Detection of generalized tonic-clonic seizures by a wireless wrist accelerometer: a prospective, multicenter study. Epilepsia 4(54):e58–61
Berndt D.J, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: Proceedings of the 3rd international conference on knowledge discovery and data mining, AAAIWS’94. AAAI Press, pp 359–370
Breiman L, Friedman J, Stone Charles J, Olshen Richard A (1984) Classification and regression trees. CRC Press, Cambridge
Casilari E, Santoyo-Ramn JA, Cano-Garca JM (2017) UMAFALL: a multisensor dataset for the research on automatic fall detection. Procedia Comput Sci 110(Supplement C):32–39
Chan TK, Chin CS (2018) Health stages diagnostics of underwater thruster using sound features with imbalanced dataset. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3407-3
Chawla NV (2005) Data mining for imbalanced datasets: an overview. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, Boston, MA, pp 853–867
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Coppersmith D, Hong SJ, Hosking JRM (1999) Partitioning nominal attributes in decision trees. Data Min Knowl Discov 3(8):197–217
de la Cal E, Villar J, Vergara P, Sedano J (2017) An study on the distances of an extension of the smote algorithm for time series. In: Proceedings of the 17th international conference on computational and mathematical methods in science and engineering (CMMSE 2017), pp 722–733
de la Cal E, Villar J, Vergara P, Sedano J, Herrero A (2017) A smote extension for balancing multivariate epilepsy-related time series datasets. In: Proceedings of 12th international conference on soft computing models in industrial and environmental applications (SOCO 2017), pp 439–448
Friedman JHA, Finkel JBR (1977) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw 3(3):209–226
Fu T (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181
Galar M, Fernández A, Barrenechea E, Herrera F (2013) EUSBOOST: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognit 46(12):3460–3471
Hardjono T, Pentland AS (2016) Preserving data privacy in the IoT world. Technical report, Massachusetts Institute of Technology (Connection Science & Engineering)
He H, Bai Y, Garcia E, Li S et al (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: IEEE International joint conference on neural networks. IEEE, pp 1322–1328
Khojasteh S, Villar J, Chira C, González V, de la Cal E (2018) Improving fall detection using an on-wrist wearable accelerometer. J Sens 18(5):1350
Köknar-Tezel S, Latecki LJ (2011) Improving svm classification on imbalanced time series data sets with ghost points. Knowl Inf Syst 28(1):1–23
Lopes Vinicius M, Barradas Filho Oliveira A, Barros Kardec A, Viegas Moraes Amorim I, Silva Claudio OL, Marques Pereira E, Marques Lopes BA (2017) Attesting compliance of biodiesel quality using composition data and classification methods. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3087-4
López V, Fernández A, del Jesus M, Herrera F (2013) A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets. Knowl Based Syst 38:85–104
Mishra S, Saravanan C, Dwivedi V, Pathak K (2015) Discovering flood rising pattern in hydrological time series data mining during the pre monsoon period. Indian J Mar Sci 44(3):3
Montgomery DC, Jennings CL, Kulahci M (2015) Introduction to time series analysis and forecasting. Wiley, New York
Moses D et al (2015) A survey of data mining algorithms used in cardiovascular disease diagnosis from multi-lead ECG data. Kuwait J Sci 42(2):206–235
Nooralishahi P, Seera M, Loo CK (2017) Online semi-supervised multi-channel time series classifier based on growing neural gas. Neural Comput Appl 28(11):3491–3505
Sez JA, Krawczyk B, Woniak M (2016) Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recognit 57:164–178
Stefanowski J, Wilk S (2008) Selective pre-processing of imbalanced data for improving classification performance. In: Proceedings of the 10th international conference in data warehousing and knowledge discovery (DaWaK 2008), pp 283–292
Suto J, Oniga S, Lung C, Orha I (2018) Comparison of offline and real-time human activity recognition results using machine learning techniques. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3437-x
Tang S, Chen S (2008) The generation mechanism of synthetic minority class examples. In: Proceedings of 5th international conference on information technology and applications in biomedicine (ITAB 2008), pp 444–447
Villar JR, González S, Sedano J, Chira C, Trejo-Gabriel-Galán JM (2015) Improving human activity recognition and its application in early stroke diagnosis. Int J Neural Syst 25(4):1450,036–1450,055
Villar JR, Menéndez M, de la Cal E, González VM, Sedano J (2017) Identification of abnormal movements with 3D accelerometer sensors for its application to seizure recognition. J Appl Logic Part B 24:54–61
Villar JR, Vergara P, Menéndez M, de la Cal E, González VM, Sedano J (2016) Generalized models for the classification of abnormal movements in daily life and its applicability to epilepsy convulsion recognition. Int J Neural Syst 26(6):1650,037–1650,052
Conflict of interest
The authors declare that they have no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research has been funded by the Spanish Ministry of Economy, Industry and Competitiveness (MINECO), under Grants TIN2014-56967-R and TIN2017-84804-R.
About this article
Cite this article
de la Cal, E.A., Villar, J.R., Vergara, P.M. et al. Design issues in Time Series dataset balancing algorithms. Neural Comput & Applic 32, 1287–1304 (2020). https://doi.org/10.1007/s00521-019-04011-4
- Imbalanced Time Series
- Correlation measures
- Human activity recognition
- Epilepsy onset recognition
- Fall detection