Research on Automate Discovery of Deep Web Interfaces

Ye, Feiyue; Yu, Hang

doi:10.1007/978-3-319-26187-4_14

Research on Automate Discovery of Deep Web Interfaces

Feiyue Ye²⁰ &
Hang Yu²⁰

Conference paper
First Online: 18 December 2015

1372 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9419))

Abstract

The main means to obtain information from Deep Web is submitting query condition through the provided query interfaces, so it is the first problem that needs to be solved for Deep Web data integration system. At present, most researchers think of query interface is merely defined within the form html tag. This paper firstly proposes the concept of interface block, then designs the interface block location method based on page and vision information, and finally takes the judgment of whether interface block is a query interface or not as the special multi-class classification problems and by applying classification algorithm combining C4.5 decision tree and SVM. The experiment adopts TEL-8 data sets of UIUC, and the findings indicate that the method in this paper get an accuracy of 97.30%, and has good feasibility and practicability.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Madhavan, J., Ko, D., Kot, Ł., et al.: Google’s Deep Web crawl. Proc. VLDB Endowment 1(2), 1241–1252 (2008)
Article Google Scholar
Bergman, M.K.: White paper: the Deep Web: surfacing hidden value. J. Electron. Publishing 7(1), 1–17 (2001)
Article Google Scholar
Cope, J., Craswell, N., Hawking, D.: Automated discovery of search interfaces on the web. In: Proceedings of the 14th Australasian database conference, ADC 2003, vol. 17, pp. 181–189 (2003)
Google Scholar
Barbosa, L., Freire, J.: Combining classifiers to identify online databases. In: Proceedings of the 16th international conference on World Wide Web, WWW 2007, pp. 431–440. ACM, New York (2007a). ISBN 978-1-59593-654-7
Google Scholar
Jiang, L., Wu, Z., Feng, Q., Liu, J., Zheng, Q.: Efficient Deep Web crawling using reinforcement learning. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010, Part I. LNCS, vol. 6118, pp. 428–439. Springer, Heidelberg (2010)
Chapter Google Scholar
Wang, Y., Li, H., Zuo, W., He, F., Wang, X., Chen, K.: Research on discovering Deep Web entries. Comput. Sci. Inf. Syst. 8(3), 779–799 (2011)
Article Google Scholar
Marin-Castro, H.M., Sosa-Sosa, V.J., Martinez-Trinidad, J.F., et al.: Automatic discovery of Web Query Interfaces using machine learning techniques. J. Intell. Inf. Syst. 40(1), 85–108 (2013)
Article Google Scholar
Wang, H., Xu, Q., Zhou, L.: Deep Web search interface identification: a semi-supervised ensemble approach. Inf. 5, 634–651 (2014)
Google Scholar
Gravano, L., Ipeirotis, P.G., Sahami, M.: QProber: a system for automatic classification of hidden-web databases. ACM TOIS 21(1), 1–41 (2003)
Article Google Scholar
He, B., Tao, T., Chang, K.C.C.: organizing structured web sources by query schemas: a clustering approach. In: Gravano, L. (ed.) Proceeding of ACM the 13th Conference on Information and Knowlege Management, pp. 22–31, ACM Press, Washington (2004)
Google Scholar
Barbosa, L., Freire, J., Silva, A.: Organizing hidden-web databases by clustering visible web documents. In: Doqac, A. (ed.) Proceeding of IEEE the 23rd Internatiobnal Conference on Data Engineering, pp. 326–335, IEEE Computer Society, Istanbul (2007)
Google Scholar
Shestakov, D.: On building a search interface discovery system. In: Lacroix, Z. (ed.) RED 2009. LNCS, vol. 6162, pp. 81–93. Springer, Heidelberg (2010)
Chapter Google Scholar
Du, X., Zheng, Y.Q., Yan, Z.M.: Automate discovery of Deep Web interfaces. In: Information Science and Engineering (ICISE), pp. 3572–3575 (2010)
Google Scholar
Yue, K., Dong, L., Derong, S., et al.: D-EEM: a DOM-Tree based entity extraction mechanism for Deep Web. J. Comput. Res. Dev. 5, 014 (2010)
Google Scholar
Lu, Y., He, H., Zhao, H., et al.: Annotating search results from web databases. IEEE Trans. Knowl. Data Eng. 25(3), 514–527 (2013)
Article Google Scholar
He, Y., Xin, D., Ganti, V., et al.: Crawling Deep Web entity pages. In: Proceedings of the sixth ACM International Conference on Web Search and Data Mining, pp. 355–364. ACM (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, China
Feiyue Ye & Hang Yu

Authors

Feiyue Ye
View author publications
You can also search for this author in PubMed Google Scholar
Hang Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hang Yu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jianyong Wang
Poznan University of Economics, Poznan, Poland
Wojciech Cellary
Florida Atlantic University, Boca Raton, Florida, USA
Dingding Wang
Victoria University, Melbourne, Victoria, Australia
Hua Wang
Florida International University, Miami, Florida, Florida, USA
Shu-Ching Chen
Florida International University, Miami, Florida, USA
Tao Li
Victoria University, Melbourne, Victoria, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ye, F., Yu, H. (2015). Research on Automate Discovery of Deep Web Interfaces. In: Wang, J., et al. Web Information Systems Engineering – WISE 2015. WISE 2015. Lecture Notes in Computer Science(), vol 9419. Springer, Cham. https://doi.org/10.1007/978-3-319-26187-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-26187-4_14
Published: 18 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26186-7
Online ISBN: 978-3-319-26187-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics