Abstract
We introduce a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes as input a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a simple text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Naïve Bayes Model and SVM, and for unsupervised, we tested Jaccard, TFIDF, Jaro and JaroWinkler. In our experiments, SVM and TFIDF outperformed the other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mainwaring, A., Polastre, J., et al.: Wireless Sensor Networks for habitat monitoring. In: WSNA, pp. 88–97 (2002)
Xu, B., Wolfson, O.: Time-Series Prediction with Applications to Traffic and Moving Objects Databases. In: MobiDE, pp. 56–60 (2003)
Oliver, R.C., Smettem, K., et al.: Field Testing a Wireless Sensor Network for Reactive Environmental Monitoring. In: ISSNIP, pp. 7–12 (2004)
Aggrawal, C.C., Han, J., Yu, P.S.: On Demand Classification of Data Streams. In: KDD, pp. 503–508 (2004)
Kadous, M.W., Sammut, C.: Classification of multivariate time series and structured data using constructive induction. Machine Learning Journal, 176–216 (2005)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining Concept-Drifting Data Streams Using Ensemble Classifiers. In: SIGKDD, pp. 226–235 (2003)
Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A Symbolic Representation of Time Series with Implications for Streaming Algorithms. In: DMKD, pp. 2–11 (2003)
Geurts, P.: Pattern Extraction for Time Series Classification. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS, vol. 2168, pp. 115–127. Springer, Heidelberg (2001)
Xianping, G.: Pattern Matching in Financial Time Series Data. In: Final Project Report for ICS 278 UC Irvine (1998)
Agrawal, R., Psaila, G., Wimmers, E.L., Zait, M.: Querying Shapes of Histories. In: VLBD, pp. 502–514 (1995)
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2000)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Cohen, W.W., Ravikumar, P., Fienberg, S.: A Comparison of String Distance Metrics for Naming-matching tasks. In: IIWEB (2003)
On, B.W., Lee, D.W., Kang, J.W., Mitra, P.: Comparative Study of Name Disambiguation Problem using a Scalable Blocking-based Framework. In: JCDL, pp. 344–353 (2005)
Chen, J., Greiner, R.: Comparing Bayesian Network Classifiers. In: Proc. of UAI 1999, pp. 101–108 (1999)
Hettich, S., Bay, S.D.: The UCI KDD Archive. University of California, Department of Information and Computer Science, Irvine, CA, http://kdd.ics.uci.edu
A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm
SecondString (Jave-based Package of Approximate String-Matching), http://secondstring.sourceforge.net
Java Bayesian Network Classifier Toolkit, http://jbnc.sourceforge.net
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Seo, S., Kang, J., Lee, D., Ryu, K.H. (2006). Multivariate Stream Data Classification Using Simple Text Classifiers. In: Bressan, S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2006. Lecture Notes in Computer Science, vol 4080. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11827405_41
Download citation
DOI: https://doi.org/10.1007/11827405_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37871-6
Online ISBN: 978-3-540-37872-3
eBook Packages: Computer ScienceComputer Science (R0)