Abstract
In this work, we propose an improved approach of time series data discretization using the Relative Frequency and K- nearest Neighbor functions called the RFknn method. The main idea of the method is to improve the process of determining the sufficient number of intervals for discretization of time series data. The proposed approach improved the time series data representation by integrating it with the Piecewise Aggregate Approximation (PAA) and the Symbolic Aggregate Approximation (SAX) representation. The intervals are represented as a symbol and can ensure efficient mining process where better knowledge model can be obtained without major loss of knowledge. The basic idea is not to minimize or maximize the number of intervals of the temporal patterns over their class labels. The performance of RFknn is evaluated using 22 temporal datasets and compared to the original time series discretization SAX method with similar representation. We show that RFknn can improve representation preciseness without losing symbolic nature of the original SAX representation. The experimental results showed that RFknn gives better term of representation with lower and comparable error rates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Waldron, M., Manuel, P.: Genetic Algorithms as a Data Discretization Method. In: Proceeding of Midwest Instruction and Computing Symposium (2005)
Jiawei, H., Micheline, K.: Data Mining: Concepts and Techniques. Morgan Kaufmann, CA (2005)
Keogh, E.: A decade of progress in indexing and mining large time series databases. In: Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, pp. 1268–1268 (2006)
Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 349–371 (2002)
Han, J.: Data mining techniques. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of data, Montreal, Quebec, Canada, p. 545 (1996)
Rakesh, A., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)
Mörchen, F.: Optimizing time series discretization for knowledge discovery. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, Illinois, USA, pp. 660–665 (2005)
Brian, B., Shivnath, B.: Models and issues in data stream systems. In: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Madison, Wisconsin, pp. 1–16 (2002)
Jessica, L., Eamonn, K.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, San Diego, California, pp. 2–11 (2003)
John, F.R., Kathleen, H., Myra, S.: An Updated Bibliography of Temporal, Spatial, and Spatio-temporal Data Mining Research. In: Roddick, J., Hornsby, K.S. (eds.) TSDM 2000. LNCS (LNAI), vol. 2007, pp. 147–163. Springer, Heidelberg (2001)
Acosta, M., Nicandro, H.G., Daniel-Alejandro, C.R.: Entropy Based Linear Approximation Algorithm for Time Series Discretization. Advances in Artificial Intelligence and Applications 32, 214–224 (2007)
Alejandro, G.L.D.: Algoritmo de Discretización de Series de Tiempo Basado en EntropÃa y su Aplicación en Datos Colposcopicos. Universidad Veracruzana (2007)
Acosta-Mesa, D.A.: Discretization of Time Series Dataset with a Genetic Search. In: Aguirre, H., et al. (eds.) MICAI 2009. LNCS, vol. 5845, pp. 201–212. Springer, Heidelberg (2009)
Geurts, P.: Pattern extraction for time series classification. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 115–127. Springer, Heidelberg (2001)
Byoung-Kee, Y., Christos, F.: Fast Time Sequence Indexing for Arbitrary Lp Norms. Book Fast Time Sequence Indexing for Arbitrary Lp Norms, Series Fast Time Sequence Indexing for Arbitrary Lp Norms,Kaufmann. Morgan Kaufmann, San Francisco (2000)
Almahdi, M.A., Azuraliza, A.B., Abdul, R.H.: Dynamic data discretization technique based on frequency and K-Nearest Neighbour algorithm. In: Proceedings of the 2nd Conference on Data Mining and Optimization, Malaysia, pp. 27–28 (2009)
Keogh, E., Xi, X., Wei, L.: The UCR Time Series Classification Clustering Homepage (2006), http://www.cs.ucr.edu/~eamonn/time_series_data
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abu Bakar, A., Mohammed Ahmed, A., Razak Hamdan, A. (2010). Discretization of Time Series Dataset Using Relative Frequency and K-Nearest Neighbor Approach. In: Cao, L., Feng, Y., Zhong, J. (eds) Advanced Data Mining and Applications. ADMA 2010. Lecture Notes in Computer Science(), vol 6440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17316-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-17316-5_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17315-8
Online ISBN: 978-3-642-17316-5
eBook Packages: Computer ScienceComputer Science (R0)