Skip to main content

Bit-sliced signature files for very large text databases on a parallel machine architecture

  • Conference paper
  • First Online:
Advances in Database Technology — EDBT '94 (EDBT 1994)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 779))

Included in the following conference series:

Abstract

Free text retrieval is an important problem which can significantly benefit from a parallel architecture. Signature methods have been proposed to answer text retrieval queries in parallel machines [Sta88, LF92], under the assumption that the main memory is sufficient to hold the entire signature file. We propose the use of a Parallel Bit-Sliced Signature File method on a SIMD machine architecture when the size of the signature file exceeds the available memory. We propose that we need not examine all the bit slices; instead we use a partial fetch slice swapping algorithm. This method achieves graceful performance degradation according to the database size. We provide formulae for the optimal number of signature slices to fetch and match with the query signature. Arithmetic examples show that our method can handle a 128GB database with a 2sec response time on a machine with the characteristics of the Connection Machine.

This research was sponsored partially by the Institute for Advanced Computer Studies (UMIACS), by the National Science Foundation under the grants IRI-8719458, IRI-8958546 and IRI-9205273, by a donation by EMPRESS Software Inc., and by a donation by Thinking Machines Inc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Stavros Christodoulakis and Christos Faloutsos. Design Considerations for a Message File Server. IEEE Transactions on Software Engineering, 10(2):201–210, March 1984.

    Google Scholar 

  2. Christos Faloutsos. Signature-Based Text Retrieval Methods: A Survey. IEEE Data Engineering, pages 25–32, March 1990.

    Google Scholar 

  3. Christos Faloutsos and Stavros Christodoulakis. Description and Performance Analysis of Signature File Methods for Office Filing. ACM Transactions on Office Information Systems, 5(3):237–257, July 1987.

    Google Scholar 

  4. Christos Faloutsos and Raphael Chan. Fast Text Access Methods for Optical Disks: Designs and Performance Comparison. In Proceedings of the 14th International Conference on Very Large Databases, pages 280–293, Long Beach, California, August 1988.

    Google Scholar 

  5. Christos Faloutsos and H. V. Jagadish. Hybrid Index Organizations for Text Databases. Technical Report UMIACS-TR-91-33 and CS-TR-2621, Department of Computer Science, University of Maryland, March 1991.

    Google Scholar 

  6. R. Haskin. Special-Purpose Processors for Text Retrieval. Database Engineering, 4(1):16–29, September 1981.

    Google Scholar 

  7. Zheng Lin and Christos Faloutsos. Frame Sliced Signature Files. IEEE Transactions on Knowledge and Data Engineering, 4(3):158–180, June 1992. Also available as UMD CS-TR-2146 and UMIACS-TR-88-88.

    Google Scholar 

  8. Zheng Lin. CAT: An Execution Model for Concurrent Full Text Search. In PDIS, 1992.

    Google Scholar 

  9. D. L. Lee and C. W. Leng. Partitioned Signature File: Designs and Performance Evaluation. ACM Transactions on Office Information Systems, 7(2):158–180, April 1989.

    Google Scholar 

  10. George Panagopoulos. Bit-Sliced Signature Files for Very Large Databases on a Parallel Machine Architecture. Technical Report CSC-809, Department of Computer Science, University of Maryland, April 1992.

    Google Scholar 

  11. Ron Sacks-Davis. Two Level Superimposed Coding Scheme for Partial Match Retrieval. Information Systems, 8(4):273–280, 1983.

    Google Scholar 

  12. G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.

    Google Scholar 

  13. Craig Stanfill. Parallel Computing for Information Retrieval: Recent Developments. Technical Report DR88-1, Thinking Machines Corporation, Cambridge, Mass., January 1988.

    Google Scholar 

  14. Simon Stiassny. Mathematical Analysis of Various Superimposed Coding Methods. American Documentation, 11(2):155–169, February 1960.

    Google Scholar 

  15. Harold S. Stone. Parallel Querying of Large Databases: A Case Study. IEEE Computer, 20(10):11–21, October 1987.

    Google Scholar 

  16. D. Tsichritzis and S. Christodoulakis. Message Files. ACM Transactions on Office Information Systems, 1(1):88–98, January 1983.

    Google Scholar 

  17. Thinking Machines Corporation, Cambridge, Mass. Parallel Instruction Set, Version 5.2, October 1989.

    Google Scholar 

  18. G. K. Zipf. Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Cambridge, MA, 1949.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Matthias Jarke Janis Bubenko Keith Jeffery

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Panagopoulos, G., Faloutsos, C. (1994). Bit-sliced signature files for very large text databases on a parallel machine architecture. In: Jarke, M., Bubenko, J., Jeffery, K. (eds) Advances in Database Technology — EDBT '94. EDBT 1994. Lecture Notes in Computer Science, vol 779. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57818-8_65

Download citation

  • DOI: https://doi.org/10.1007/3-540-57818-8_65

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-57818-5

  • Online ISBN: 978-3-540-48342-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics