Skip to main content

Object-Extraction-Based Hidden Web Information Retrieval

  • Conference paper
  • First Online:
Advances in Web-Age Information Management (WAIM 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2419))

Included in the following conference series:

  • 333 Accesses

Abstract

Traditional search engines ignore the tremendous amount information “hidden” behind search forms of Web pages, in large searchable electronic databases, which is called hidden Web. In this paper, we address this problem of designing a system for extracting and retrieval hidden Web information. We present a generic operational model of the hidden Web information retrieval and describe the key techniques. We introduce a new Tag-Tree-based Object Extraction Technique for automatically extracting hidden Web information from web pages. Based on this technique, we implement the retrieval algorithm for structured query of hidden Web information. The test results have also been reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://www.brightplanet.com/deepcontent/tutorials/DeepWeb/index.asp

  2. InvisibleWeb.com home page http://www.invisibleWeb.com

  3. Steve Lawrence and C.L. Giles: Searching the World Wide Web. Science, 280:98–100, 1998

    Article  Google Scholar 

  4. Steve Lawrence and C.L. Giles: Accessibility of information on the web. Nature, 400:107–109, 1999

    Article  Google Scholar 

  5. Sriram Raghavan and Hector Garcia-Molina: Crawling the hidden Web, In Proc. of the International Conference on Vary Large Data Bases (VLDB). Rome, Italy, September 2001.

    Google Scholar 

  6. Panagiotis G. Ipeirotis, Luis Gravano and Mehran Sahami: Probe, Count, and Classify: Categorizing Hidden-Web Databases. Proc. of the ACM SIGMOD Conference, Santa Barbara, California, USA, May 2001

    Google Scholar 

  7. Arnaud Sahuguest and Fabien Azavant: Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F. Proc. of the International Conference on Very Large Data Bases (VLDB), Edinburgh, Scotland, September 1999.

    Google Scholar 

  8. Ling Liu, Calton Pu, and Wei Han: XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources. Proc. of the International Conference on Data Engineering (ICDE), San Deigo, California, February 2000.

    Google Scholar 

  9. David buttler, Ling Liu, and Calton Pu: A Fully automated Object Extraction System for the World Wide Web. Proc. of the International Conference on Distributed Computing Systems, Phoenix, Arizona, April 2001.

    Google Scholar 

  10. Jussi Myllymaki: Effective Web Data Extaction with Standard XML Technologies. Proc. of the International World Wide Web Conference, HongKong, May 2001

    Google Scholar 

  11. Naveen Ashish and Craig Knoblock: Wrapper Generation for Semi-Structured Internet Sources. Proc. of the ACM SIGMOD Workshop on Management of Semistructured Data, Tucson, Arizona, May 1997

    Google Scholar 

  12. A. Heydon and M. Najork: Mercator: A scalable, extensible Web crawler. World Wide Web, 2(4): 219–229, Dec 1999

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hui, S., Ling, Z., Yunming, Y., Fanyuan, M. (2002). Object-Extraction-Based Hidden Web Information Retrieval. In: Meng, X., Su, J., Wang, Y. (eds) Advances in Web-Age Information Management. WAIM 2002. Lecture Notes in Computer Science, vol 2419. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45703-8_31

Download citation

  • DOI: https://doi.org/10.1007/3-540-45703-8_31

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44045-1

  • Online ISBN: 978-3-540-45703-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics