Skip to main content

Discretization of Time Series Dataset Using Relative Frequency and K-Nearest Neighbor Approach

  • Conference paper
Advanced Data Mining and Applications (ADMA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6440))

Included in the following conference series:

Abstract

In this work, we propose an improved approach of time series data discretization using the Relative Frequency and K- nearest Neighbor functions called the RFknn method. The main idea of the method is to improve the process of determining the sufficient number of intervals for discretization of time series data. The proposed approach improved the time series data representation by integrating it with the Piecewise Aggregate Approximation (PAA) and the Symbolic Aggregate Approximation (SAX) representation. The intervals are represented as a symbol and can ensure efficient mining process where better knowledge model can be obtained without major loss of knowledge. The basic idea is not to minimize or maximize the number of intervals of the temporal patterns over their class labels. The performance of RFknn is evaluated using 22 temporal datasets and compared to the original time series discretization SAX method with similar representation. We show that RFknn can improve representation preciseness without losing symbolic nature of the original SAX representation. The experimental results showed that RFknn gives better term of representation with lower and comparable error rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Waldron, M., Manuel, P.: Genetic Algorithms as a Data Discretization Method. In: Proceeding of Midwest Instruction and Computing Symposium (2005)

    Google Scholar 

  2. Jiawei, H., Micheline, K.: Data Mining: Concepts and Techniques. Morgan Kaufmann, CA (2005)

    Google Scholar 

  3. Keogh, E.: A decade of progress in indexing and mining large time series databases. In: Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, pp. 1268–1268 (2006)

    Google Scholar 

  4. Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 349–371 (2002)

    Google Scholar 

  5. Han, J.: Data mining techniques. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of data, Montreal, Quebec, Canada, p. 545 (1996)

    Google Scholar 

  6. Rakesh, A., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)

    Chapter  Google Scholar 

  7. Mörchen, F.: Optimizing time series discretization for knowledge discovery. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, Illinois, USA, pp. 660–665 (2005)

    Google Scholar 

  8. Brian, B., Shivnath, B.: Models and issues in data stream systems. In: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Madison, Wisconsin, pp. 1–16 (2002)

    Google Scholar 

  9. Jessica, L., Eamonn, K.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, San Diego, California, pp. 2–11 (2003)

    Google Scholar 

  10. John, F.R., Kathleen, H., Myra, S.: An Updated Bibliography of Temporal, Spatial, and Spatio-temporal Data Mining Research. In: Roddick, J., Hornsby, K.S. (eds.) TSDM 2000. LNCS (LNAI), vol. 2007, pp. 147–163. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  11. Acosta, M., Nicandro, H.G., Daniel-Alejandro, C.R.: Entropy Based Linear Approximation Algorithm for Time Series Discretization. Advances in Artificial Intelligence and Applications 32, 214–224 (2007)

    Google Scholar 

  12. Alejandro, G.L.D.: Algoritmo de Discretización de Series de Tiempo Basado en Entropía y su Aplicación en Datos Colposcopicos. Universidad Veracruzana (2007)

    Google Scholar 

  13. Acosta-Mesa, D.A.: Discretization of Time Series Dataset with a Genetic Search. In: Aguirre, H., et al. (eds.) MICAI 2009. LNCS, vol. 5845, pp. 201–212. Springer, Heidelberg (2009)

    Google Scholar 

  14. Geurts, P.: Pattern extraction for time series classification. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 115–127. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  15. Byoung-Kee, Y., Christos, F.: Fast Time Sequence Indexing for Arbitrary Lp Norms. Book Fast Time Sequence Indexing for Arbitrary Lp Norms, Series Fast Time Sequence Indexing for Arbitrary Lp Norms,Kaufmann. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  16. Almahdi, M.A., Azuraliza, A.B., Abdul, R.H.: Dynamic data discretization technique based on frequency and K-Nearest Neighbour algorithm. In: Proceedings of the 2nd Conference on Data Mining and Optimization, Malaysia, pp. 27–28 (2009)

    Google Scholar 

  17. Keogh, E., Xi, X., Wei, L.: The UCR Time Series Classification Clustering Homepage (2006), http://www.cs.ucr.edu/~eamonn/time_series_data

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abu Bakar, A., Mohammed Ahmed, A., Razak Hamdan, A. (2010). Discretization of Time Series Dataset Using Relative Frequency and K-Nearest Neighbor Approach. In: Cao, L., Feng, Y., Zhong, J. (eds) Advanced Data Mining and Applications. ADMA 2010. Lecture Notes in Computer Science(), vol 6440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17316-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17316-5_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17315-8

  • Online ISBN: 978-3-642-17316-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics