Skip to main content

Efficient k-NN Search on Streaming Data Series

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2750))

Abstract

Data streams are common in many recent applications, e.g. stock quotes, e-commerce data, system logs, network traffic management, etc. Compared with traditional databases, streaming databases pose new challenges for query processing due to the streaming nature of data which constantly changes over time. Index structures have been effectively employed in traditional databases to improve the query performance. Index building time is not of particular interest in static databases because it can easily be amortized with the performance gains in the query time. However, because of the dynamic nature, index building time in streaming databases should be negligibly small in order to be successfully used in continuous query processing. In this paper, we propose efficient index structures and algorithms for various models of k nearest neighbor (k-NN) queries on multiple data streams. We find scalar quantization as a natural choice for data streams and propose index structures, called VA-Stream and VA + -Stream, which are built by dynamically quantizing the incoming dimensions. VA + -Stream (and VA-Stream) can be used both as a dynamic summary of the database and as an index structure to facilitate efficient similarity query processing. The proposed techniques are update-efficient and dynamic adaptations of VA-file and VA + -file, and are shown to achieve the same structures as their static versions. They can be generalized to handle aged queries, which are often used in trend-related analysis. A performance evaluation on VA-Stream and VA + -Stream shows that the index building time is negligibly small while query time is significantly improved.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the Twenty-First ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Madison, Wisconsin, June 4–6, pp. 1–16 (2002)

    Google Scholar 

  2. Babu, S., Widom, J.: Continuous queries over data streams. ACM SIGMOD Record 30, 109–120 (2001)

    Article  Google Scholar 

  3. Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The R* tree: An efficient and robust access method for points and rectangles. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, May 23-25, pp. 322–331 (1990)

    Google Scholar 

  4. Chandrasekaran, S., Franklin, M.J.: Streaming queries over streaming data. In: Proceedings of 28th VLDB Conference, Hongkong, China (August 2002)

    Google Scholar 

  5. Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., El Abbadi, A.: Vector approximation based indexing for non-uniform high dimensional data sets. In: Proceedings of the 9th ACM Int. Conf. on Information and Knowledge Management, McLean, Virginia, November 2000, pp. 202–209 (2000)

    Google Scholar 

  6. Gao, L., Sean Wang, X.: Continually evaluating similarity-based pattern queries on a streaming time series. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Madison, Wisconsin (June 2002)

    Google Scholar 

  7. Gao, L., Sean Wang, X.: Improving the performance of continuous queries on fast data streams: Time series case. In: SIGMOD/DMKD Workshop, Madison, Wisconsin (June 2002)

    Google Scholar 

  8. Gao, L., Yao, Z., Sean Wang, X.: Evaluating continuous nearest neighbor queries for streaming time series via pre-fetching. In: Proc. Conf. on Information and Knowledge Management, McLean, Virginia, November 4-9 (2002)

    Google Scholar 

  9. Gersho, A.: Vector Quantization and Signal Compression. Kluwer Academic Publishers, Boston (1992)

    MATH  Google Scholar 

  10. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In: Proceedings of the 27th VLDB Conference, Rome, Italy (September 2001)

    Google Scholar 

  11. Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 47–57 (1984)

    Google Scholar 

  12. Kalashnikov, D.V., Prabhakar, S., Aref, W.G., Hambrusch, S.E.: Efficient evaluation of continuous range queries on moving objects. In: DEXA 2002, Proc. of the 13th International Conference and Workshop on Database and Expert Systems Applications, Aix en Provence, France, September 2–6 (2002)

    Google Scholar 

  13. Linde, Y., Buzo, A., Gray, R.M.: An algorithm for vector quantizer design. IEEE Transactions on Communications 28, 84–95 (1980)

    Article  Google Scholar 

  14. Liu, X., Ferhatosmanoglu, H.: Efficient k-nn search in streaming database. Technical Report OSU-CISRC-5/03-TR22, Dept. of Computer and Information Science, Ohio State University, 92 (2003)

    Google Scholar 

  15. Lloyd, S.P.: Least squares quantization in pcm. IEEE Transactions on Information Theory 28, 127–135 (1982)

    Article  MathSciNet  Google Scholar 

  16. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Math. Stat. and Prob, vol. 1, pp. 281–196 (1967)

    Google Scholar 

  17. Traderbot, http://www.traderbot.com

  18. Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the Int. Conf. on Very Large Data Bases, New York City, New York, pp. 194–205 (August 1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, X., Ferhatosmanoğlu, H. (2003). Efficient k-NN Search on Streaming Data Series. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds) Advances in Spatial and Temporal Databases. SSTD 2003. Lecture Notes in Computer Science, vol 2750. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45072-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45072-6_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40535-1

  • Online ISBN: 978-3-540-45072-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics