Abstract
In this paper we discuss the design of a parallel indexer for Web documents. By exploiting both data and pipeline parallelism, our prototype indexer efficiently builds a partitioned inverted compressed index, a suitable data structure commonly utilized by modern Web Search Engines. We discuss implementation issues and report the results of preliminary tests conducted on a SMP PCs.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Baeza–Yates, R., Ribiero–Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1998)
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes – Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Publishing, San Francisco (1999)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 107–117 (1998)
Orlando, S., Perego, R., Silvestri, F.: Design of a Parallel and Distributed WEB Search Engine. In: Proceedings of Parallel Computing (ParCo) 2001 conference, pp. 197–204. Imperial College Press (2001)
Jeong, B., Omiecinski, E.: Inverted File Partitioning Schemes in Multiple Disk Systems. IEEE Transactions on Parallel and Distributed Systems (1995)
Melnik, S., Raghavan, S., Yang, B., Garcia-Molina, H.: Building a Distributed Full– Text Index for the Web. In: World Wide Web, pp. 396–406 (2001)
Van Rijsbergen, C.: Information Retrieval. Butterworths (1979), Available at http://www.dcs.gla.ac.uk/Keith/Preface.html
Silvestri, F., Perego, R., Orlando, S.: Assigning document identifiers to enhance compressibility of web search. In: Proceedings of the Symposium on Applied Computing (SAC) - Special Track on Data Mining (DM), Nicosia, Cyprus, ACM, New York (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Silvestri, F., Orlando, S., Perego, R. (2004). WINGS: A Parallel Indexer for Web Contents. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds) Computational Science - ICCS 2004. ICCS 2004. Lecture Notes in Computer Science, vol 3036. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24685-5_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-24685-5_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22114-2
Online ISBN: 978-3-540-24685-5
eBook Packages: Springer Book Archive