Discretization of Time Series Dataset Using Relative Frequency and K-Nearest Neighbor Approach

Abu Bakar, Azuraliza; Mohammed Ahmed, Almahdi; Razak Hamdan, Abdul

doi:10.1007/978-3-642-17316-5_18

Azuraliza Abu Bakar²²,
Almahdi Mohammed Ahmed²² &
Abdul Razak Hamdan²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6440))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2455 Accesses
1 Citations
3 Altmetric

Abstract

In this work, we propose an improved approach of time series data discretization using the Relative Frequency and K- nearest Neighbor functions called the RFknn method. The main idea of the method is to improve the process of determining the sufficient number of intervals for discretization of time series data. The proposed approach improved the time series data representation by integrating it with the Piecewise Aggregate Approximation (PAA) and the Symbolic Aggregate Approximation (SAX) representation. The intervals are represented as a symbol and can ensure efficient mining process where better knowledge model can be obtained without major loss of knowledge. The basic idea is not to minimize or maximize the number of intervals of the temporal patterns over their class labels. The performance of RFknn is evaluated using 22 temporal datasets and compared to the original time series discretization SAX method with similar representation. We show that RFknn can improve representation preciseness without losing symbolic nature of the original SAX representation. The experimental results showed that RFknn gives better term of representation with lower and comparable error rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Waldron, M., Manuel, P.: Genetic Algorithms as a Data Discretization Method. In: Proceeding of Midwest Instruction and Computing Symposium (2005)
Google Scholar
Jiawei, H., Micheline, K.: Data Mining: Concepts and Techniques. Morgan Kaufmann, CA (2005)
Google Scholar
Keogh, E.: A decade of progress in indexing and mining large time series databases. In: Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, pp. 1268–1268 (2006)
Google Scholar
Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 349–371 (2002)
Google Scholar
Han, J.: Data mining techniques. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of data, Montreal, Quebec, Canada, p. 545 (1996)
Google Scholar
Rakesh, A., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)
Chapter Google Scholar
Mörchen, F.: Optimizing time series discretization for knowledge discovery. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, Illinois, USA, pp. 660–665 (2005)
Google Scholar
Brian, B., Shivnath, B.: Models and issues in data stream systems. In: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Madison, Wisconsin, pp. 1–16 (2002)
Google Scholar
Jessica, L., Eamonn, K.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, San Diego, California, pp. 2–11 (2003)
Google Scholar
John, F.R., Kathleen, H., Myra, S.: An Updated Bibliography of Temporal, Spatial, and Spatio-temporal Data Mining Research. In: Roddick, J., Hornsby, K.S. (eds.) TSDM 2000. LNCS (LNAI), vol. 2007, pp. 147–163. Springer, Heidelberg (2001)
Chapter Google Scholar
Acosta, M., Nicandro, H.G., Daniel-Alejandro, C.R.: Entropy Based Linear Approximation Algorithm for Time Series Discretization. Advances in Artificial Intelligence and Applications 32, 214–224 (2007)
Google Scholar
Alejandro, G.L.D.: Algoritmo de Discretización de Series de Tiempo Basado en Entropía y su Aplicación en Datos Colposcopicos. Universidad Veracruzana (2007)
Google Scholar
Acosta-Mesa, D.A.: Discretization of Time Series Dataset with a Genetic Search. In: Aguirre, H., et al. (eds.) MICAI 2009. LNCS, vol. 5845, pp. 201–212. Springer, Heidelberg (2009)
Google Scholar
Geurts, P.: Pattern extraction for time series classification. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 115–127. Springer, Heidelberg (2001)
Chapter Google Scholar
Byoung-Kee, Y., Christos, F.: Fast Time Sequence Indexing for Arbitrary Lp Norms. Book Fast Time Sequence Indexing for Arbitrary Lp Norms, Series Fast Time Sequence Indexing for Arbitrary Lp Norms,Kaufmann. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Almahdi, M.A., Azuraliza, A.B., Abdul, R.H.: Dynamic data discretization technique based on frequency and K-Nearest Neighbour algorithm. In: Proceedings of the 2nd Conference on Data Mining and Optimization, Malaysia, pp. 27–28 (2009)
Google Scholar
Keogh, E., Xi, X., Wei, L.: The UCR Time Series Classification Clustering Homepage (2006), http://www.cs.ucr.edu/~eamonn/time_series_data

Download references

Author information

Authors and Affiliations

Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, University Kebangsaan Malaysia, 43600, Bangi, Selangor Darul Ehsan, Malaysia
Azuraliza Abu Bakar, Almahdi Mohammed Ahmed & Abdul Razak Hamdan

Authors

Azuraliza Abu Bakar
View author publications
You can also search for this author in PubMed Google Scholar
Almahdi Mohammed Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Abdul Razak Hamdan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Engineering and Information Technology, University of Technology Sydney, 2007, Sydney, NSW, Australia
Longbing Cao
College of Computer Science, Chongqing University, 400030, Chongqing, China
Yong Feng
College of Computer Science, Chongqing University , 400030, Chongqing, China
Jiang Zhong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abu Bakar, A., Mohammed Ahmed, A., Razak Hamdan, A. (2010). Discretization of Time Series Dataset Using Relative Frequency and K-Nearest Neighbor Approach. In: Cao, L., Feng, Y., Zhong, J. (eds) Advanced Data Mining and Applications. ADMA 2010. Lecture Notes in Computer Science(), vol 6440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17316-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-17316-5_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17315-8
Online ISBN: 978-3-642-17316-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics