Skip to main content

Research on Automate Discovery of Deep Web Interfaces

  • Conference paper
  • First Online:
  • 1372 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9419))

Abstract

The main means to obtain information from Deep Web is submitting query condition through the provided query interfaces, so it is the first problem that needs to be solved for Deep Web data integration system. At present, most researchers think of query interface is merely defined within the form html tag. This paper firstly proposes the concept of interface block, then designs the interface block location method based on page and vision information, and finally takes the judgment of whether interface block is a query interface or not as the special multi-class classification problems and by applying classification algorithm combining C4.5 decision tree and SVM. The experiment adopts TEL-8 data sets of UIUC, and the findings indicate that the method in this paper get an accuracy of 97.30%, and has good feasibility and practicability.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Madhavan, J., Ko, D., Kot, Ł., et al.: Google’s Deep Web crawl. Proc. VLDB Endowment 1(2), 1241–1252 (2008)

    Article  Google Scholar 

  2. Bergman, M.K.: White paper: the Deep Web: surfacing hidden value. J. Electron. Publishing 7(1), 1–17 (2001)

    Article  Google Scholar 

  3. Cope, J., Craswell, N., Hawking, D.: Automated discovery of search interfaces on the web. In: Proceedings of the 14th Australasian database conference, ADC 2003, vol. 17, pp. 181–189 (2003)

    Google Scholar 

  4. Barbosa, L., Freire, J.: Combining classifiers to identify online databases. In: Proceedings of the 16th international conference on World Wide Web, WWW 2007, pp. 431–440. ACM, New York (2007a). ISBN 978-1-59593-654-7

    Google Scholar 

  5. Jiang, L., Wu, Z., Feng, Q., Liu, J., Zheng, Q.: Efficient Deep Web crawling using reinforcement learning. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010, Part I. LNCS, vol. 6118, pp. 428–439. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Wang, Y., Li, H., Zuo, W., He, F., Wang, X., Chen, K.: Research on discovering Deep Web entries. Comput. Sci. Inf. Syst. 8(3), 779–799 (2011)

    Article  Google Scholar 

  7. Marin-Castro, H.M., Sosa-Sosa, V.J., Martinez-Trinidad, J.F., et al.: Automatic discovery of Web Query Interfaces using machine learning techniques. J. Intell. Inf. Syst. 40(1), 85–108 (2013)

    Article  Google Scholar 

  8. Wang, H., Xu, Q., Zhou, L.: Deep Web search interface identification: a semi-supervised ensemble approach. Inf. 5, 634–651 (2014)

    Google Scholar 

  9. Gravano, L., Ipeirotis, P.G., Sahami, M.: QProber: a system for automatic classification of hidden-web databases. ACM TOIS 21(1), 1–41 (2003)

    Article  Google Scholar 

  10. He, B., Tao, T., Chang, K.C.C.: organizing structured web sources by query schemas: a clustering approach. In: Gravano, L. (ed.) Proceeding of ACM the 13th Conference on Information and Knowlege Management, pp. 22–31, ACM Press, Washington (2004)

    Google Scholar 

  11. Barbosa, L., Freire, J., Silva, A.: Organizing hidden-web databases by clustering visible web documents. In: Doqac, A. (ed.) Proceeding of IEEE the 23rd Internatiobnal Conference on Data Engineering, pp. 326–335, IEEE Computer Society, Istanbul (2007)

    Google Scholar 

  12. Shestakov, D.: On building a search interface discovery system. In: Lacroix, Z. (ed.) RED 2009. LNCS, vol. 6162, pp. 81–93. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  13. Du, X., Zheng, Y.Q., Yan, Z.M.: Automate discovery of Deep Web interfaces. In: Information Science and Engineering (ICISE), pp. 3572–3575 (2010)

    Google Scholar 

  14. Yue, K., Dong, L., Derong, S., et al.: D-EEM: a DOM-Tree based entity extraction mechanism for Deep Web. J. Comput. Res. Dev. 5, 014 (2010)

    Google Scholar 

  15. Lu, Y., He, H., Zhao, H., et al.: Annotating search results from web databases. IEEE Trans. Knowl. Data Eng. 25(3), 514–527 (2013)

    Article  Google Scholar 

  16. He, Y., Xin, D., Ganti, V., et al.: Crawling Deep Web entity pages. In: Proceedings of the sixth ACM International Conference on Web Search and Data Mining, pp. 355–364. ACM (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hang Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ye, F., Yu, H. (2015). Research on Automate Discovery of Deep Web Interfaces. In: Wang, J., et al. Web Information Systems Engineering – WISE 2015. WISE 2015. Lecture Notes in Computer Science(), vol 9419. Springer, Cham. https://doi.org/10.1007/978-3-319-26187-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26187-4_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26186-7

  • Online ISBN: 978-3-319-26187-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics