Abstract
The rapid increase in content available in digital forms gives rise to large digital libraries, targeted to support millions of users and terabytes of data. Efficiently retrieving information then is a challenging task due to the size of the collection and its index. In this paper, our high performance “hybrid” partition inverted index is validated through experiments with a 100 Gbyte collection from TREC-9 and -10. The hybrid scheme combines the term and the document approaches to partitioning inverted indices across nodes of a parallel system. Experiments on a parallel system show that this organization outperforms the document and the term partitioning schemes. Our hybrid approach should support highly efficient searching for information in a largescale digital library, implemented atop a network of computers.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
C. Badue, R. Baeza-Yates, B. Ribeiro-Neto and N. Ziviani. Distributed Query Processing Using Partitioned Inverted Files. In Proceedings of SPIRE 2001, IEEE CS Press, Laguna San Rafael, Chile, pp. 10–20, November 2001.
E. W. Brown. Parallel and Distributed IR, Chapter 9 in Modern Information Retrieval, Ricardo Baeza-Yates and Berthier Ribeiro-Neto, eds, ACM Press / Addison Wesley-Longman England, pp. 229–256, 1999.
C. Faloutsos and S. Christodoulakis. Signature files: An access method for documents and its analytical performance evaluation. ACM Transactions on Office Information Systems, 2(4):267–288, October 1984.
D. Harman, E. Fox, R. Baeza-Yates, and W. Lee. Inverted Files, Chapter 2.1 In Information Retrieval: Data Structures & Algorithms, editors W. Frakes & R. Baeza-Yates, Prentice-Hall, pp. 28–43, 1992.
B.-S. Jeong and E. Omiecinski. Inverted file partitioning schemes in multiple disk systems. IEEE Transactions on Parallel and Distributed Systems, 6(2): 142–153, 1995.
J. R. Jump. YACSIM: Reference Manual. Rice University, version 2.1 edition, March 1993.
B. A. Ribeiro-Neto and R. A. Barbosa. Query performance for tightly coupled distributed digital libraries. In Proceedings of the 3rd ACM Conference on Digita Libraries, pp. 182–190, 1998.
B. A. Ribeiro-Neto, E. S. Moura, M. S. Neubert, and N. Ziviani. Efficient distributed algorithms to build inverted files. In Proceedings of ACM SIGIR’99, pp. 105–112, 1999.
Ohm Sornil. Parallel Inverted Index for Large-Scale, Dynamic Digital Libraries. Ph. D. Dissertation, Virginia Tech Dept. of Computer Science, 2001.
Ohm Sornil and Edward A. Fox. Hybrid Partitioned Inverted Indices for Large-Scale Digital Libraries. In Proceedings of the 4th International Conference on Asian Digital Libraries, ICADL’2001, Bangalore, India, Dec. 10–12, 2001
A. S. Tomasic. Distributed Queries and Incremental Updates in Information Retrieval Systems. Ph.D. thesis, Princeton University, June 1994.
A. S. Tomasic and H. Garcia-Molina. Caching and database scaling in distributed sharednothing information retrieval systems. In Proceedings of SIGMOD’93, Washington, D.C., May 1993.
A. S. Tomasic and H. Garcia-Molina. Performance of inverted indices in shared-nothing distributed text document information retrieval systems. In Proceedings of PDIS’93, 1993.
E. M. Voorhees and D. K. Harman. NIST special publication: The 9th Text REtrieval Conference (TREC-9), November 2000.
E. M. Voorhees and D. K. Harman. NIST special publication: The 10th Text REtrieval Conference (TREC-10), November 2001.
J. Zobel, A. Moat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM Transactions on Database Systems, 23(4):453–490, December 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xi, W., Sornil, O., Luo, M., Fox, E.A. (2002). Hybrid Partition Inverted Files: Experimental Validation. In: Agosti, M., Thanos, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2002. Lecture Notes in Computer Science, vol 2458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45747-X_31
Download citation
DOI: https://doi.org/10.1007/3-540-45747-X_31
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44178-6
Online ISBN: 978-3-540-45747-3
eBook Packages: Springer Book Archive