Advertisement

WINGS: A Parallel Indexer for Web Contents

  • Fabrizio Silvestri
  • Salvatore Orlando
  • Raffaele Perego
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3036)

Abstract

In this paper we discuss the design of a parallel indexer for Web documents. By exploiting both data and pipeline parallelism, our prototype indexer efficiently builds a partitioned inverted compressed index, a suitable data structure commonly utilized by modern Web Search Engines. We discuss implementation issues and report the results of preliminary tests conducted on a SMP PCs.

Keywords

Main Memory Document Collection Pipeline Stage Inverted Index Inverted List 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Baeza–Yates, R., Ribiero–Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1998)Google Scholar
  2. 2.
    Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes – Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Publishing, San Francisco (1999)Google Scholar
  3. 3.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 107–117 (1998)CrossRefGoogle Scholar
  4. 4.
    Orlando, S., Perego, R., Silvestri, F.: Design of a Parallel and Distributed WEB Search Engine. In: Proceedings of Parallel Computing (ParCo) 2001 conference, pp. 197–204. Imperial College Press (2001)Google Scholar
  5. 5.
    Jeong, B., Omiecinski, E.: Inverted File Partitioning Schemes in Multiple Disk Systems. IEEE Transactions on Parallel and Distributed Systems (1995)Google Scholar
  6. 6.
    Melnik, S., Raghavan, S., Yang, B., Garcia-Molina, H.: Building a Distributed Full– Text Index for the Web. In: World Wide Web, pp. 396–406 (2001)Google Scholar
  7. 7.
    Van Rijsbergen, C.: Information Retrieval. Butterworths (1979), Available at http://www.dcs.gla.ac.uk/Keith/Preface.html
  8. 8.
    Silvestri, F., Perego, R., Orlando, S.: Assigning document identifiers to enhance compressibility of web search. In: Proceedings of the Symposium on Applied Computing (SAC) - Special Track on Data Mining (DM), Nicosia, Cyprus, ACM, New York (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Fabrizio Silvestri
    • 1
    • 2
  • Salvatore Orlando
    • 3
  • Raffaele Perego
    • 1
  1. 1.Istituto di Scienze e Tecnologie dell’InformazioneISTI–CNRPisaItaly
  2. 2.Dipartimento di InformaticaUniversità di PisaItaly
  3. 3.Dipartimento di InformaticaUniversità Ca’ Foscari di VeneziaItaly

Personalised recommendations