Skip to main content

DCF: An Efficient Data Stream Clustering Framework for Streaming Applications

  • Conference paper
Database and Expert Systems Applications (DEXA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4080))

Included in the following conference series:

Abstract

Streaming applications, such as environment monitoring and vehicle location tracking require handling high volumes of continuously arriving data and sudden fluctuations in these volumes while efficiently supporting multi-dimensional historical queries. The use of the traditional database management systems is inappropriate because they require excessive number of disk I/O in continuously updating massive data streams. In this paper, we propose DCF (Data Stream Clustering Framework), a novel framework that supports efficient data stream archiving for streaming applications. DCF can reduce a great amount of disk I/O in the storage system by grouping incoming data into clusters and storing them instead of raw data elements. In addition, even when there is a temporary fluctuation in the amount of incoming data, it can stably support storing all incoming raw data by controlling the cluster size. Our experimental results show that our approach significantly reduces the number of disk accesses in terms of both inserting and retrieving data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Guttman, A.: R-Trees: A Dynamic Index Structure for Spatial Searching. In: Proceedings of ACM SIGMOD, pp. 47–57 (1984)

    Google Scholar 

  2. Wolfson, O., Prasad Sistla, A., Chamberlain, S., Yesha, Y.: Updating and Querying Databases that Track Mobile Units. Special issue on mobile data management and applications of distributed and parallel databases 7(3), 257–387 (1999)

    Google Scholar 

  3. Kwon, D., Lee, S., Lee, S.: Indexing the Current Positions of Moving Objects Using the Lazy Update R-tree. In: Proceeding of the Third International Conference on Mobile Data Management, Singapore (Januray 2002)

    Google Scholar 

  4. Lee, M.L., Hsu, W., Jensen, C.S., Cui, B., Teo, K.L.: Supporting Frequent Updates in R-Trees: A Bottom-Up Approach. In: Proceedings of the 29th VLDB Conferences, Berlin, Germany, pp. 608–619 (2003)

    Google Scholar 

  5. Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast Subsequence Matching in Time-Series Databases. In: Proceeding of ACM SIGMOD Conference, Mineapolis, MN (1994)

    Google Scholar 

  6. Kamel, I., Faloutsos, C.: On Packing R–trees. In: Proceedings of the second international conference on Information and Knowledge Management, Washington D.C., US, pp. 490–499 (1993)

    Google Scholar 

  7. Dewitt, D.J., Kabra, N., Luo, J., Patel, J.M., Yu, J.-B.: Client-server Paradise. In: Proceedings of the 20th International Conference on Very Large Data Base (VLDB 1994), pp. 558–569. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  8. Kamel, M.K., Kouramajian, V.: Bulk insertion in dynamic R-trees. In: Proceedings of the 4th International Symposium on Spatial Data Handling (SDH 1996), pp. 3B.31–3B.42 (1996)

    Google Scholar 

  9. Leutenegger, S.T., Lopez, M.A., Edgington, J.: STR: A simple and efficient algorithm for R-tree packing. In: Proceedings of the Thirteenth International Conference on Data Engineering, pp. 497–506 (1997)

    Google Scholar 

  10. Roussopoulos, N., Leifker, D.: Direct spatial search on pictorial databases using packed R-trees. In: Proceedings ACM-SIGMOD International Conference on Management of Data, SIGMOD Record, vol. 14(4), pp. 17–31

    Google Scholar 

  11. Li, C., Choubey, R., Rundensteiner, E.A.: Bulk-insertions into R-trees using the samll-tree-large-tree approach. In: Proceedings of the sixth ACM international symposium on Advances in geographic information systems, pp. 161–162 (1998)

    Google Scholar 

  12. Lee, T., Moon, B., Lee, S.: Bulk Insertion for R-Tree by Seeded Clustering. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 129–138. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  13. Arge, L., Hinrichs, K.H., Vahrenhold, J., Vitter, J.S.: Efficient Bulk Operations on Dynamic R-trees. Algorithmica 33(1), 104–128 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  14. Theodoridis, Y., Nascimento, M.A.: Generating Spatiotemporal Datasets on the WWW. SIGMOD Record 29(3), 39–43 (2000)

    Article  Google Scholar 

  15. http://www.cs.ucr.edu/~marioh/spatialindex/index.html

  16. Golab, L., Tamer Ozsu, M.: Data Stream Management Issues – A Survey, Technical Report CS 2003-08, University of Waterloo (April 2003)

    Google Scholar 

  17. Anderberg, M.R.: Probability and Mathematical Statistics. Academic Press, New York (1973)

    Google Scholar 

  18. Vazirgiannis, M., Theodoridis, Y., Sellis, T.: Spatio-temporal composition and indexing for large multimedia applications. Multimedia Systems 6(4), 284–298 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cho, K., Jo, S., Jang, H., Kim, S.M., Song, J. (2006). DCF: An Efficient Data Stream Clustering Framework for Streaming Applications. In: Bressan, S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2006. Lecture Notes in Computer Science, vol 4080. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11827405_12

Download citation

  • DOI: https://doi.org/10.1007/11827405_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37871-6

  • Online ISBN: 978-3-540-37872-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics