Skip to main content

Hybrid Partition Inverted Files: Experimental Validation

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2458))

Abstract

The rapid increase in content available in digital forms gives rise to large digital libraries, targeted to support millions of users and terabytes of data. Efficiently retrieving information then is a challenging task due to the size of the collection and its index. In this paper, our high performance “hybrid” partition inverted index is validated through experiments with a 100 Gbyte collection from TREC-9 and -10. The hybrid scheme combines the term and the document approaches to partitioning inverted indices across nodes of a parallel system. Experiments on a parallel system show that this organization outperforms the document and the term partitioning schemes. Our hybrid approach should support highly efficient searching for information in a largescale digital library, implemented atop a network of computers.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. Badue, R. Baeza-Yates, B. Ribeiro-Neto and N. Ziviani. Distributed Query Processing Using Partitioned Inverted Files. In Proceedings of SPIRE 2001, IEEE CS Press, Laguna San Rafael, Chile, pp. 10–20, November 2001.

    Google Scholar 

  2. E. W. Brown. Parallel and Distributed IR, Chapter 9 in Modern Information Retrieval, Ricardo Baeza-Yates and Berthier Ribeiro-Neto, eds, ACM Press / Addison Wesley-Longman England, pp. 229–256, 1999.

    Google Scholar 

  3. C. Faloutsos and S. Christodoulakis. Signature files: An access method for documents and its analytical performance evaluation. ACM Transactions on Office Information Systems, 2(4):267–288, October 1984.

    Article  Google Scholar 

  4. D. Harman, E. Fox, R. Baeza-Yates, and W. Lee. Inverted Files, Chapter 2.1 In Information Retrieval: Data Structures & Algorithms, editors W. Frakes & R. Baeza-Yates, Prentice-Hall, pp. 28–43, 1992.

    Google Scholar 

  5. B.-S. Jeong and E. Omiecinski. Inverted file partitioning schemes in multiple disk systems. IEEE Transactions on Parallel and Distributed Systems, 6(2): 142–153, 1995.

    Article  Google Scholar 

  6. J. R. Jump. YACSIM: Reference Manual. Rice University, version 2.1 edition, March 1993.

    Google Scholar 

  7. B. A. Ribeiro-Neto and R. A. Barbosa. Query performance for tightly coupled distributed digital libraries. In Proceedings of the 3rd ACM Conference on Digita Libraries, pp. 182–190, 1998.

    Google Scholar 

  8. B. A. Ribeiro-Neto, E. S. Moura, M. S. Neubert, and N. Ziviani. Efficient distributed algorithms to build inverted files. In Proceedings of ACM SIGIR’99, pp. 105–112, 1999.

    Google Scholar 

  9. Ohm Sornil. Parallel Inverted Index for Large-Scale, Dynamic Digital Libraries. Ph. D. Dissertation, Virginia Tech Dept. of Computer Science, 2001.

    Google Scholar 

  10. Ohm Sornil and Edward A. Fox. Hybrid Partitioned Inverted Indices for Large-Scale Digital Libraries. In Proceedings of the 4th International Conference on Asian Digital Libraries, ICADL’2001, Bangalore, India, Dec. 10–12, 2001

    Google Scholar 

  11. A. S. Tomasic. Distributed Queries and Incremental Updates in Information Retrieval Systems. Ph.D. thesis, Princeton University, June 1994.

    Google Scholar 

  12. A. S. Tomasic and H. Garcia-Molina. Caching and database scaling in distributed sharednothing information retrieval systems. In Proceedings of SIGMOD’93, Washington, D.C., May 1993.

    Google Scholar 

  13. A. S. Tomasic and H. Garcia-Molina. Performance of inverted indices in shared-nothing distributed text document information retrieval systems. In Proceedings of PDIS’93, 1993.

    Google Scholar 

  14. E. M. Voorhees and D. K. Harman. NIST special publication: The 9th Text REtrieval Conference (TREC-9), November 2000.

    Google Scholar 

  15. E. M. Voorhees and D. K. Harman. NIST special publication: The 10th Text REtrieval Conference (TREC-10), November 2001.

    Google Scholar 

  16. J. Zobel, A. Moat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM Transactions on Database Systems, 23(4):453–490, December 1998.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xi, W., Sornil, O., Luo, M., Fox, E.A. (2002). Hybrid Partition Inverted Files: Experimental Validation. In: Agosti, M., Thanos, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2002. Lecture Notes in Computer Science, vol 2458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45747-X_31

Download citation

  • DOI: https://doi.org/10.1007/3-540-45747-X_31

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44178-6

  • Online ISBN: 978-3-540-45747-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics