A signature file allows fast search for text data. It is typically a very compact data structure that aims at minimizing disk access at query time. Query processing is performed in two stages: filtering, where false negatives are guaranteed to not occur but false positives may occur, and, query refinement, where false positives are removed.
Efficient and effective text indexing is a well-known and long-standing problem in information retrieval. While inverted files are a de facto standard for text indexing, in the early days, its storage overhead was not acceptable for larger datasets. In addition, accessing an inverted file on disk may require a relatively large number of (expensive) disk seeks. The main motivation for signature files is to allow fast filtering of text using a linear scan of the signature file for finding text segments that may contain the queried term(s). Given that the found segments may be false positives, a refinement step is...
- 1.Baeza-Yates RA, Ribeiro-Neto BA. Modern information retrieval. New York: ACM Press/Addison-Wesley; 1999.Google Scholar
- 4.Deppish U. S-tree: a dynamic balanced signature index for office retrieval. In: Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1986. p. 77–87.Google Scholar
- 5.Frakes WB, Baeza-Yates RA. Information retrieval data structures & algorithms. Upper Saddle River: Prentice-Hall; 1992.Google Scholar