Advertisement

Evolving Systems

, Volume 10, Issue 1, pp 13–28 | Cite as

Applying nature-inspired optimization algorithms for selecting important timestamps to reduce time series dimensionality

  • Muhammad Marwan Muhammad FuadEmail author
Original Paper

Abstract

Time series data account for a major part of data supply available today. Time series mining handles several tasks such as classification, clustering, query-by-content, prediction, and others. Performing data mining tasks on raw time series is inefficient as these data are high-dimensional by nature. Instead, time series are first pre-processed using several techniques before different data mining tasks can be performed on them. In general, there are two main approaches to reduce time series dimensionality; the first is what we call landmark methods. These methods are based on finding characteristic features in the target time series. The second is based on data transformations. These methods transform the time series from the original space into a reduced space, where they can be managed more efficiently. The method we present in this paper applies a third approach, as it projects a time series onto a lower-dimensional space by selecting important points in the time series. The novelty of our method is that these points are not chosen according to a geometric criterion, which is subjective in most cases, but through an optimization process. The other important characteristic of our method is that these important points are selected on a dataset-level and not on a single time series-level. The direct advantage of this strategy is that the distance defined on the low-dimensional space lower bounds the original distance applied to raw data. This enables us to apply the popular GEMINI algorithm. The promising results of our experiments on a wide variety of time series datasets, using different optimizers, and applied to the two major data mining tasks, validate our new method.

Keywords

Classification Clustering Differential evolution Genetic algorithm Particle swarm optimization Time series mining 

References

  1. Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. Proceedings of the 4th conference on foundations of data organization and algorithmsGoogle Scholar
  2. Agrawal R, Lin KI, Sawhney HS, Shim K (1995) Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In Proceedings of the 21st int’l conference on very large databases. Zurich, Switzerland, pp. 490–501Google Scholar
  3. Bramer M (2007) Principles of data mining. Springer, London. https://doi.org/10.1007/978-1-4471-4884-5 zbMATHGoogle Scholar
  4. Cai Y, Ng R (2004) Indexing spatio-temporal trajectories with Chebyshev polynomials. In SIGMOD. https://doi.org/10.1145/1007568.1007636
  5. Chan KP, Fu AWC (1999) Efficient time series matching by wavelets. In Proc. 15th. int. conf. on data engineeringGoogle Scholar
  6. Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. http://www.cs.ucr.edu/~eamonn/time_series_data. Accessed 29 Oct 2017
  7. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evolutionary Computation. https://doi.org/10.1109/4235.996017
  8. Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series. Proc of the 34th VLDBGoogle Scholar
  9. El-Ghazali T (2009) Metaheuristics: from design to implementation. Wiley, Hoboken, NJ. http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470278587.html
  10. Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. Proc. ACM SIGMOD Conf., MinneapolisGoogle Scholar
  11. Feoktistov V (2006) Differential evolution: in search of solutions (Springer optimization and its applications). Springer- Verlag New York, Inc., SecaucuszbMATHGoogle Scholar
  12. Gorunescu F (2006) Data mining: concepts, models and techniques. Blue Publishing House, Cluj-NapocazbMATHGoogle Scholar
  13. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, Waltham, MA. https://www.elsevier.com/books/data-mining-concepts-and-techniques/han/978-0-12-381479-1
  14. Haupt RL, Haupt SE (2004) Practical genetic algorithms with CD-ROM. Wiley, Hoboken, NJ. http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471455652.html
  15. Hetland ML (2003) A Survey of recent methods for efficient retrieval of similar time sequences. In: Last M, Kandel A, Bunke H (eds) Data mining in time series databases. World Scientific Printers (S) Pte Ltd, Singapore. http://www.worldscientific.com/worldscibooks/10.1142/5210
  16. Kanungo T, Netanyahu NS, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892CrossRefGoogle Scholar
  17. Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2000) Dimensionality reduction for fast similarity search in large time series databases. J Know Info Syst. https://doi.org/10.1007/PL00011669
  18. Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2002) Locally adaptive dimensionality reduction for similarity search in large time series databases. ACM Trans Database Syst (TODS) TODS Homepage Arch 27(2):188–228. https://doi.org/10.1145/568518.568520
  19. Korn F, Jagadish H, Faloutsos C (1997) Efficiently supporting ad hoc queries in large datasets of time sequences. Proceedings of SIGMOD ‘97, Tucson, AZ, pp 289–300Google Scholar
  20. Larose DT (2005) Discovering knowledge in data: an introduction to data mining. Wiley, New YorkzbMATHGoogle Scholar
  21. Lin J, Keogh E, Lonardi S, Chiu BY (2003) A symbolic representation of time series, with implications for streaming algorithms. DMKD 2003:2–11CrossRefGoogle Scholar
  22. Ma Q, Xu D, Iv P, Shi Y (2007) Application of NSGA-II in parameter optimization of extended state observer. Challenges of Power Engineering and Environment. https://doi.org/10.1007/978-3-540-76694-0_109
  23. Maulik U, Bandyopadhyay S, Mukhopadhyay A (2011) Multiobjective genetic algorithms for clustering. Springer, HeidelbergCrossRefzbMATHGoogle Scholar
  24. Mörchen F (2006) Time series knowledge mining, PhD thesis, Philipps-University Marburg, Germany, Görich & Weiershäuser, Marburg, Germany. Accessed 29 Oct 2017Google Scholar
  25. Morinaka Y, Yoshikawa M, Amagasa T, Uemura S (2001) The L-index: An indexing structure for efficient subsequence matching in time sequence databases. Proc. 5th Pacific Asia conf. on knowledge discovery and data mining, pp 51–60Google Scholar
  26. Muhammad Fuad MM (2015) Applying non-dominated sorting genetic algorithm II to multi-objective optimization of a weighted multi-metric distance for performing data mining tasks. The 18th European conference on the applications of evolutionary computation—EvoApplications 2015, April 8–10, 2015, Copenhagen, Denmark. Published in lecture notes in computer science, Volume 9028Google Scholar
  27. Muhammad Fuad MM (2016) A differential evolution optimization algorithm for reducing time series dimensionality. The 2016 IEEE Congress on Evolutionary Computation—IEEE CEC 2016. July 24–29, 2016, Vancouver, CanadaGoogle Scholar
  28. Perng C, Wang H, Zhang S, Parker S (2000) Landmarks: a new model for similarity-based pattern querying in time series databases. Proceedings of 16th international conference on data engineering, pp. 33–45Google Scholar
  29. Srinivas N, Deb K (1995) Multi-objective function optimization using non-dominated sorting genetic algorithms. J Evolut Comput 2(3):221–248CrossRefGoogle Scholar
  30. Wang Q, Megalooikonomou VA (2008) Dimensionality reduction technique for efficient time series similarity analysis. Information systems, v.33 n.1., 115–132, March, 2008. https://doi.org/10.1016/j.is.2007.07.002
  31. Yi BK, Faloutsos C (2000) Fast time sequence indexing for arbitrary Lp norms. Proceedings of the 26th international conference on very large databases, Cairo, EgyptGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.Aarhus University, MOMAAarhus NDenmark

Personalised recommendations