Knowledge and Information Systems

, Volume 1, Issue 2, pp 229–256 | Cite as

HierarchyScan: A Hierarchical Algorithm for Similarity Search in Databases Consisting of Long Sequences

Regular Paper


In this paper, a hierarchical algorithm, HierarchyScan, is proposed to efficiently locate one-dimensional subsequences within a collection of sequences with arbitrary length. The proposed algorithm performs correlation between the stored sequences and the template pattern in the transformed domain to identify subsequences in a scale- and phase-independent fashion. This is in contrast to those approaches based on the computation of Euclidean distance in the transformed domain. In the proposed hierarchical algorithm, the transformed domain representation of each original sequence is divided into multiple groups of coefficients. The matching is performed hierarchically from the group with the greatest filtering capability to the group with the lowest filtering capability. Only those subsequences whose maximum correlation value is higher than a predefined threshold will be selected for additional screening. This approach is compared to the sequential scanning and an order-of-magnitude speedup is observed.


Similarity search template matching time series correlation correlation coefficient content-based retrieval hierarchical search 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    R. Stam, R. Snodgrass. A bibliography on temporal databases, IEEE Bulletin on Data Engineering 11(4), 1988.Google Scholar
  2. 2.
    K. K. Al-Taha, R. T. Snodgrass, M. D. Soo. Bibliography on spatiotemporal databases, Int. J. Geogr. Inf. System 8(1), 95–103, 1994.CrossRefGoogle Scholar
  3. 3.
    C. Faloutsos, W. Equitz, M. Flickner, W. Niblack, D. Petkovic, R. Barber. Efficient and effective querying by image content, J. Intelligent Information Systems, 3(3–4), 231–262, 1993.Google Scholar
  4. 4.
    R. Agrawal, T. Imielinski, A. Swami. Database mining: A performance perspective, IEEE Trans. Knowledge and Data Engineering, Special Issue on Learning and Discovery in Knowledge-Based Databases, 1993.Google Scholar
  5. 5.
    R. Agrawal, C. Faloutsos, A. Swami. Efficient similarity search in sequence database. In: Fourth International Conference on Foundations of Data Organization and Algorithms, Chicago, October, 1993.Google Scholar
  6. 6.
    C. Faloutsos, M. Ranganathan, Y. Manolopoulos. Fast subsequence matching in time-series databases. In: Proc. SIGMOD’94, 1994, pp. 419–429.Google Scholar
  7. 7.
    C. Faloutsos, K.-I. Lin. FastMap: A fast algorithm for indexing, data mining, and visualization of traditional and multimedia datasets. In: Proc. SIGMOD’95, 1995, pp. 163–174.Google Scholar
  8. 8.
    W. Lu, J. Han, B.C. Ooi. Discovery of general knowledge in large spatial databases. In: Proc. Far East Workshop on Geographic Information Systems, Singapore, 1993, pp. 275–289.Google Scholar
  9. 9.
    H. V. Jagadish. A retrieval technique for similar shapes. In: Proc. SIGMOD’91, 1991, pp. 208–217.Google Scholar
  10. 10.
    A. Papoulis. Probability, Random Variable, and Stochastic Process, McGraw Hill: New York, 1984.Google Scholar
  11. 11.
    J. B. Lee, B. G. Lee. Transform domain filtering based on pipelining structure, IEEE Trans. Signal Processing 40(8), 2061–2064, 1992.CrossRefGoogle Scholar
  12. 12.
    P. P. Vaidyanathan. Orthonormal and biorthonormal filter banks as convolvers, and convolutional coding gain, IEEE Trans. Signal Processing 41(6), 2110–2130, 1993.MATHCrossRefGoogle Scholar
  13. 13.
    S. G. Mallat. A theory for multiresolution signal decomposition: The wavelet representation, IEEE Trans. Pattern Analysis and Machine Intelligence 11(7), 647–693, 1989.CrossRefGoogle Scholar
  14. 14.
    E. F. Fama. The behavior of stock market prices, J. Business, January, 34–105, 1965.Google Scholar
  15. 15.
    M. F. M. Osborne. Brownian motion in the stock market, Operations Research, March–April, 1959.Google Scholar
  16. 16.
    R. Agrawal, K. Lin, H. S. Sawhney, K. Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: Proc. 21st International Conference on Very Large Databases, Zurich, Switzerland, September, 1995.Google Scholar
  17. 17.
    R. Agrawal, G. Psaila, E. L. Wimmers, M. Zait. Querying shapes of histories. In: Proc. VLDB, Switzerland, 1995, pp. 502–514.Google Scholar
  18. 18.
    C.-S. Li, V. Castelli, P. S. Yu. HierarchyScan: A hierarchical similarity search algorithm for databases of long sequences. In: Proc. ICDE, 1996, pp. 546–553.Google Scholar
  19. 19.
    H. Shatkay, S. B. Zdonik. Approximate queries and representations for large data sequences. Proc. ICDE, 1996, pp. 536–545.Google Scholar
  20. 20.
    G. Das, D. Gunopulos, H. Mannila. Finding similar time series. In: PKDD’97, 1997, pp. 88–100.Google Scholar
  21. 21.
    B. Bollobas, G. Das, D. Gunopulos. Time-series similarity problems and well-separated geometric sets. In: 13th ACM Symposium on Computational Geometry, 1997, pp. 454–456.Google Scholar
  22. 22.
    D. Rafiei, A. Mendelzon. Similarity based queries for time series data. In: SIGMOD, 1997, pp. 13–25.Google Scholar
  23. 23.
    E. Keogh. Fast similarity search in the presence of longitudinal scaling of time series databases. In: Proc. IEEE International Conferences on Tools with Artificial Intelligence, 1997, pp. 578–584.Google Scholar

Copyright information

© Springer-Verlag Singapore Pte. Ltd. 1999

Authors and Affiliations

  • Chung-Sheng Li
    • 1
  • Philip S. Yu
    • 1
  • Vittorio Castelli
    • 1
  1. 1.IBM Thomas J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations