Skip to main content
Log in

HierarchyScan: A Hierarchical Algorithm for Similarity Search in Databases Consisting of Long Sequences

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In this paper, a hierarchical algorithm, HierarchyScan, is proposed to efficiently locate one-dimensional subsequences within a collection of sequences with arbitrary length. The proposed algorithm performs correlation between the stored sequences and the template pattern in the transformed domain to identify subsequences in a scale- and phase-independent fashion. This is in contrast to those approaches based on the computation of Euclidean distance in the transformed domain. In the proposed hierarchical algorithm, the transformed domain representation of each original sequence is divided into multiple groups of coefficients. The matching is performed hierarchically from the group with the greatest filtering capability to the group with the lowest filtering capability. Only those subsequences whose maximum correlation value is higher than a predefined threshold will be selected for additional screening. This approach is compared to the sequential scanning and an order-of-magnitude speedup is observed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Stam, R. Snodgrass. A bibliography on temporal databases, IEEE Bulletin on Data Engineering 11(4), 1988.

  2. K. K. Al-Taha, R. T. Snodgrass, M. D. Soo. Bibliography on spatiotemporal databases, Int. J. Geogr. Inf. System 8(1), 95–103, 1994.

    Article  Google Scholar 

  3. C. Faloutsos, W. Equitz, M. Flickner, W. Niblack, D. Petkovic, R. Barber. Efficient and effective querying by image content, J. Intelligent Information Systems, 3(3–4), 231–262, 1993.

    Google Scholar 

  4. R. Agrawal, T. Imielinski, A. Swami. Database mining: A performance perspective, IEEE Trans. Knowledge and Data Engineering, Special Issue on Learning and Discovery in Knowledge-Based Databases, 1993.

  5. R. Agrawal, C. Faloutsos, A. Swami. Efficient similarity search in sequence database. In: Fourth International Conference on Foundations of Data Organization and Algorithms, Chicago, October, 1993.

  6. C. Faloutsos, M. Ranganathan, Y. Manolopoulos. Fast subsequence matching in time-series databases. In: Proc. SIGMOD’94, 1994, pp. 419–429.

  7. C. Faloutsos, K.-I. Lin. FastMap: A fast algorithm for indexing, data mining, and visualization of traditional and multimedia datasets. In: Proc. SIGMOD’95, 1995, pp. 163–174.

  8. W. Lu, J. Han, B.C. Ooi. Discovery of general knowledge in large spatial databases. In: Proc. Far East Workshop on Geographic Information Systems, Singapore, 1993, pp. 275–289.

  9. H. V. Jagadish. A retrieval technique for similar shapes. In: Proc. SIGMOD’91, 1991, pp. 208–217.

  10. A. Papoulis. Probability, Random Variable, and Stochastic Process, McGraw Hill: New York, 1984.

    Google Scholar 

  11. J. B. Lee, B. G. Lee. Transform domain filtering based on pipelining structure, IEEE Trans. Signal Processing 40(8), 2061–2064, 1992.

    Article  Google Scholar 

  12. P. P. Vaidyanathan. Orthonormal and biorthonormal filter banks as convolvers, and convolutional coding gain, IEEE Trans. Signal Processing 41(6), 2110–2130, 1993.

    Article  MATH  Google Scholar 

  13. S. G. Mallat. A theory for multiresolution signal decomposition: The wavelet representation, IEEE Trans. Pattern Analysis and Machine Intelligence 11(7), 647–693, 1989.

    Article  Google Scholar 

  14. E. F. Fama. The behavior of stock market prices, J. Business, January, 34–105, 1965.

  15. M. F. M. Osborne. Brownian motion in the stock market, Operations Research, March–April, 1959.

  16. R. Agrawal, K. Lin, H. S. Sawhney, K. Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: Proc. 21st International Conference on Very Large Databases, Zurich, Switzerland, September, 1995.

  17. R. Agrawal, G. Psaila, E. L. Wimmers, M. Zait. Querying shapes of histories. In: Proc. VLDB, Switzerland, 1995, pp. 502–514.

    Google Scholar 

  18. C.-S. Li, V. Castelli, P. S. Yu. HierarchyScan: A hierarchical similarity search algorithm for databases of long sequences. In: Proc. ICDE, 1996, pp. 546–553.

  19. H. Shatkay, S. B. Zdonik. Approximate queries and representations for large data sequences. Proc. ICDE, 1996, pp. 536–545.

  20. G. Das, D. Gunopulos, H. Mannila. Finding similar time series. In: PKDD’97, 1997, pp. 88–100.

  21. B. Bollobas, G. Das, D. Gunopulos. Time-series similarity problems and well-separated geometric sets. In: 13th ACM Symposium on Computational Geometry, 1997, pp. 454–456.

  22. D. Rafiei, A. Mendelzon. Similarity based queries for time series data. In: SIGMOD, 1997, pp. 13–25.

  23. E. Keogh. Fast similarity search in the presence of longitudinal scaling of time series databases. In: Proc. IEEE International Conferences on Tools with Artificial Intelligence, 1997, pp. 578–584.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chung-Sheng Li.

Additional information

This work was funded in part by grant no. NASA/CAN NCC5-101.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, CS., Yu, P.S. & Castelli, V. HierarchyScan: A Hierarchical Algorithm for Similarity Search in Databases Consisting of Long Sequences. Knowledge and Information Systems 1, 229–256 (1999). https://doi.org/10.1007/BF03325099

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03325099

Keywords

Navigation