Skip to main content

Multivariate Stream Data Classification Using Simple Text Classifiers

  • Conference paper
Database and Expert Systems Applications (DEXA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4080))

Included in the following conference series:

Abstract

We introduce a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes as input a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a simple text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Naïve Bayes Model and SVM, and for unsupervised, we tested Jaccard, TFIDF, Jaro and JaroWinkler. In our experiments, SVM and TFIDF outperformed the other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mainwaring, A., Polastre, J., et al.: Wireless Sensor Networks for habitat monitoring. In: WSNA, pp. 88–97 (2002)

    Google Scholar 

  2. Xu, B., Wolfson, O.: Time-Series Prediction with Applications to Traffic and Moving Objects Databases. In: MobiDE, pp. 56–60 (2003)

    Google Scholar 

  3. Oliver, R.C., Smettem, K., et al.: Field Testing a Wireless Sensor Network for Reactive Environmental Monitoring. In: ISSNIP, pp. 7–12 (2004)

    Google Scholar 

  4. Aggrawal, C.C., Han, J., Yu, P.S.: On Demand Classification of Data Streams. In: KDD, pp. 503–508 (2004)

    Google Scholar 

  5. Kadous, M.W., Sammut, C.: Classification of multivariate time series and structured data using constructive induction. Machine Learning Journal, 176–216 (2005)

    Google Scholar 

  6. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining Concept-Drifting Data Streams Using Ensemble Classifiers. In: SIGKDD, pp. 226–235 (2003)

    Google Scholar 

  7. Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A Symbolic Representation of Time Series with Implications for Streaming Algorithms. In: DMKD, pp. 2–11 (2003)

    Google Scholar 

  8. Geurts, P.: Pattern Extraction for Time Series Classification. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS, vol. 2168, pp. 115–127. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  9. Xianping, G.: Pattern Matching in Financial Time Series Data. In: Final Project Report for ICS 278 UC Irvine (1998)

    Google Scholar 

  10. Agrawal, R., Psaila, G., Wimmers, E.L., Zait, M.: Querying Shapes of Histories. In: VLBD, pp. 502–514 (1995)

    Google Scholar 

  11. Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2000)

    Google Scholar 

  12. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  13. Cohen, W.W., Ravikumar, P., Fienberg, S.: A Comparison of String Distance Metrics for Naming-matching tasks. In: IIWEB (2003)

    Google Scholar 

  14. On, B.W., Lee, D.W., Kang, J.W., Mitra, P.: Comparative Study of Name Disambiguation Problem using a Scalable Blocking-based Framework. In: JCDL, pp. 344–353 (2005)

    Google Scholar 

  15. Chen, J., Greiner, R.: Comparing Bayesian Network Classifiers. In: Proc. of UAI 1999, pp. 101–108 (1999)

    Google Scholar 

  16. Hettich, S., Bay, S.D.: The UCI KDD Archive. University of California, Department of Information and Computer Science, Irvine, CA, http://kdd.ics.uci.edu

  17. A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm

  18. SecondString (Jave-based Package of Approximate String-Matching), http://secondstring.sourceforge.net

  19. Java Bayesian Network Classifier Toolkit, http://jbnc.sourceforge.net

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Seo, S., Kang, J., Lee, D., Ryu, K.H. (2006). Multivariate Stream Data Classification Using Simple Text Classifiers. In: Bressan, S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2006. Lecture Notes in Computer Science, vol 4080. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11827405_41

Download citation

  • DOI: https://doi.org/10.1007/11827405_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37871-6

  • Online ISBN: 978-3-540-37872-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics