Skip to main content

Building HyperView Wrappers for Publisher Web Sites

  • Conference paper
  • First Online:
Research and Advanced Technology for Digital Libraries (ECDL 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1513))

Included in the following conference series:

Abstract

Electronic journals are becoming a major source of scientific information. Researchers interested only in certain topics do not have time to scan all possibly relevant journals on a regular basis. A digital library can assist them by providing a uniform, search-able interface for electronic journals. To this purpose, a catalogue of metadata on the available journals such as authors and titles of articles must be established by the digital library. If there is no cooperation with journal publishers, this metadata must be extracted from the publishers’ Web Sites, overcoming the intrinsic heterogeneity problems.

Within the framework of the ongoing Natural Sciences Digital Library project at the Free University of Berlin, we have designed a wrapper-mediator mechanism that copes with the heterogeneity problems of automatic metadata acquisition. It is based on our generic HyperView methodology for integration ofWeb Sites. From this methodology it inherits two elegant and effective features. First, the structure of the publisher site is specified with abstract graph-schemata, instead of being hard-coded in scripts for data acquisition. Second, a powerful view concept based on declarative graph-transformation rules is used for information extraction.

Supported by the German Research Society, Berlin-Brandenburg Graduate School on Distributed Information Systems (DFG grant no. GRK 316)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B. Adelberg. NoDoSE: A tool for semi-automatically extracting semi-structured data from text documents-brad adelberg. In SIGMOD Conference 1998, 1998.

    Google Scholar 

  2. G. Arocena and A. Mendelzon. WebOQL: Restructuring documents, databases and webs. In Proc. of 14th. Intl. Conf. on Data Engineering (ICDE 98), 1998.

    Google Scholar 

  3. N. Ashish and C. Knoblock. Wrapper generation for semi-structured internet sources. In Proc. Workshop on Management of Semistructured Data, Tucson, 1997.

    Google Scholar 

  4. P. Atzeni and G. Mecca. Cut & paste. In PODS’97, pages 12–15, Tucson, Arizona, 1997.

    Google Scholar 

  5. M. Baldonado, C.K. Chang, L. Gravano, and A. Paepcke. The stanford digital library metadata architecture. International Journal on Digital Libraries, 1(2):108–121, 1997.

    Article  Google Scholar 

  6. BUBL (British National Information Service for the higher education community). http://bubl.ac.uk/admin/purpose.htm.

  7. S. Cluet, C. Delobel, J. Siméon, and K. Smaga. Your mediators need data conversion! In SIGMOD Conference 1998, pages 177–188, 1998.

    Google Scholar 

  8. M. Dreger et al. Medoc information broker-harnessing the information in leterature and full text databases. In N. Fuhr J. Callan, editor, Proc. SIGIR workshop on Networked Information Retrieval, 1996.

    Google Scholar 

  9. D. Faensen, A. Hinze, and H. Schweppe. Alerting in a digital library environment-do channels meet the requirements. In ECDL’98, 1998.

    Google Scholar 

  10. L.C. Faulstich, M. Spiliopoulou, and V. Linnemann. WIND: A warehouse for internet data. In Advances in Databases-Proceedings BNCOD 15, number 1271 in LNCS, pages 169–183. Springer, 1997.

    Chapter  Google Scholar 

  11. Lukas C. Faulstich. Integrating web sites using HyperView. Submitted for publication., 1998.

    Google Scholar 

  12. Mary Fernandez, Daniela Florescu, Jaewoo Kang, Alon Levy, and Dan Suciu. Catching the boat with Strudel: experiences with a web-site management system. In SIGMOD, pages 414–425, 1998.

    Google Scholar 

  13. H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. Integrating and accessing heterogeneous information sources in TSIM-MIS. In AAAI Symposium on Information Gathering, pages 61–64, 1995.

    Google Scholar 

  14. A. Gupta and I. S. Mumick. Maintenance of materialized views: Problems, techniques and applications. IEEE Quarterly Bulletin on Data Engineering; Special Issue on Materialized Views and Data Warehousing, 18(2):3–18, 1995.

    Google Scholar 

  15. J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, and A. Crespo. Extracting semistructured information from the web. In Proc. Workshop on Management of Semistructured Data, Tucson, 1997.

    Google Scholar 

  16. JSTOR. http://www.jstor.org/.

  17. D. Konopnicki and O. Shmueli. W3QS: A system for WWW querying. In ICDE’97, pages 586–586, April 1997.

    Google Scholar 

  18. L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian. A declarative language for querying and restructuring the Web. In IEEE, editor, RIDE’96, pages 12–21. IEEE Computer Society Press, 1996.

    Google Scholar 

  19. M. Ley. Die Trierer Informatik-Bibliographie DBLP. In GI Jahrestagung 1997, pages 257–266, 1997. http://dblp.uni-trier.de.

  20. C.A. Lynch. The Z39-50 information retrieval protocol: An overview and status report. ACM Computer Communication Review, 21(1):58–70, 1991.

    Article  Google Scholar 

  21. A. O. Mendelzon, G. A. Mihaila, and T. Milo. Querying the World Wide Web. International Journal on Digital Libraries, 1(1):54–67, 1997.

    Google Scholar 

  22. P. Merialdo P. Atzeni, G. Mecca. To weave the web. In VLDB’ 97, pages 206–215, 1997.

    Google Scholar 

  23. PHP3 manual. http://www.php.net/manual/, 1998.

  24. B.R. Schatz, W.H. Mischo, T.W. Cole, J.B. Hardin, A.P. Bishop, and H. Chen. Federating diverse collections of scientific literature. IEEE Computer, 29(5), 1996.

    Google Scholar 

  25. Simon Fraser University Electronic Library in Computing Science. http://fas.sfu.ca/projects/ElectronicLibrary/Collections/CMPT/.

  26. D. Smith and M. Lopez. Information extraction for semi-structured documents. In Proc. Workshop on Management of Semistructured Data, Tucson, 1997.

    Google Scholar 

  27. Stanford University Libraries-Electronic Journals Collection. http://www-sul.stanford.edu/collect/ejourns.html.

  28. Elektronische Zeitschriftenbibliothek, Universität Regensburg. http://www.bibliothek.uniegensburg.de/ezeit/ezb.phtml.

  29. Stony Brook University Libraries-electronic journals. http://www.sunysb.edu/library/ldeljour.htm.

  30. J. Widom. Research problems in data warehousing. In 4th International Conference on Information and Knowledge Management, pages 25–30, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Faulstich, L.C., Spiliopoulou, M. (1998). Building HyperView Wrappers for Publisher Web Sites. In: Nikolaou, C., Stephanidis, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 1998. Lecture Notes in Computer Science, vol 1513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49653-X_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-49653-X_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65101-7

  • Online ISBN: 978-3-540-49653-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics