Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4312))

Included in the following conference series:

Abstract

Metadata harvesting has become a common technique to transfer a stream of data from one metadata repository or digital library system to another. As collections of metadata, and their associated digital objects, grow in size, the ingest of these items at the destination archive can take a significant amount of time, depending on the type of indexing or post-processing that is required. This paper discusses an approach to parallelise the post-processing of data in a small cluster of machines or a multi-processor environment, while not increasing the burden on the source data provider. Performance tests have been carried out on varying architectures and the results indicate that this technique is indeed promising for some scenarios and can be extended to more computationally-intensive ingest procedures. In general, the technique presents a new approach for the construction of harvest-based distributed or component-based digital libraries, with better scalability than before.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andresen, D., Yang, T., Egecioglu, O., Ibarra, O.H., Smith, T.R.: Scalability Issues for High Performance Digital Libraries on the World Wide Web. Technical Report 1996-03, Department of Computer Science, University of California Santa Barbara (March 1996)

    Google Scholar 

  2. Bar, M.: openMosix, a Linux Kernel Extension for Single System Image Clustering. In: Proceedings of Linux Kongress: 10th International Linux System Technology Conference, October 15-16, 2003, Saarbrücken, Germany (2003)

    Google Scholar 

  3. Brown, R.G.: Engineering a Beowulf-style Compute Cluster, Duke University Physics Department (2004), available http://www.phy.duke.edu/~rgb/Beowulf/beowulf_book/beowulf_book/index.html

  4. Diligent: A Digital Library Infrastructure on Grid Enabled Technology (2006), Website http://www.diligentproject.org/

  5. Dongarra, J., Kennedy, K., White, A.: Introduction. In: Dongarra, J., Foster, I., Fox, G., Gropp, W., Kennedy, K., Torczon, L., White, A. (eds.) Sourcebook of Parallel Computing, Morgan Kaufman, Amsterdam (2003)

    Google Scholar 

  6. Haedstrom, M.: Research Challenges in Digital Archiving and Long-term Preservation. In: NSF Post Digital Library Futures Workshop, June 15-17, 2003, Cape Cod (2003), available http://www.sis.pitt.edu/~dlwkshop/paper_hedstrom.html

  7. Imafouo, A.: A Scalability Survey in IR and DL. TCDL Bulletin 2(2) (2006), http://www.ieee-tcdl.org/Bulletin/v2n2/imafouo/imafouo.html

  8. Lagoze, C., Van de Sompel, H.: The Open Archives Initiative: Building a low-barrier interoperability framework. In: Proceedings of the ACM-IEEE Joint Conference on Digital Libraries, Roanoke, VA, USA, June 24-28, 2001, pp. 54–62 (2001)

    Google Scholar 

  9. Lagoze, C., Van de Sompel, H., Nelson, M., Warner, S.: The Open Archives Initiative Protocol for Metadata Harvesting – Version 2.0, Open Archives Initiative (June 2002), available http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm

  10. Lyman, P., Varian, H.R.: How Much Information 2003? University of California (2003), available http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/index.htm

  11. Wilkinson, B., Allen, M.: Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers. Prentice Hall, New Jersey (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Suleman, H. (2006). Parallelising Harvesting. In: Sugimoto, S., Hunter, J., Rauber, A., Morishima, A. (eds) Digital Libraries: Achievements, Challenges and Opportunities. ICADL 2006. Lecture Notes in Computer Science, vol 4312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11931584_11

Download citation

  • DOI: https://doi.org/10.1007/11931584_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49375-4

  • Online ISBN: 978-3-540-49377-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics