Abstract
The main means to obtain information from Deep Web is submitting query condition through the provided query interfaces, so it is the first problem that needs to be solved for Deep Web data integration system. At present, most researchers think of query interface is merely defined within the form html tag. This paper firstly proposes the concept of interface block, then designs the interface block location method based on page and vision information, and finally takes the judgment of whether interface block is a query interface or not as the special multi-class classification problems and by applying classification algorithm combining C4.5 decision tree and SVM. The experiment adopts TEL-8 data sets of UIUC, and the findings indicate that the method in this paper get an accuracy of 97.30%, and has good feasibility and practicability.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Madhavan, J., Ko, D., Kot, Ł., et al.: Google’s Deep Web crawl. Proc. VLDB Endowment 1(2), 1241–1252 (2008)
Bergman, M.K.: White paper: the Deep Web: surfacing hidden value. J. Electron. Publishing 7(1), 1–17 (2001)
Cope, J., Craswell, N., Hawking, D.: Automated discovery of search interfaces on the web. In: Proceedings of the 14th Australasian database conference, ADC 2003, vol. 17, pp. 181–189 (2003)
Barbosa, L., Freire, J.: Combining classifiers to identify online databases. In: Proceedings of the 16th international conference on World Wide Web, WWW 2007, pp. 431–440. ACM, New York (2007a). ISBN 978-1-59593-654-7
Jiang, L., Wu, Z., Feng, Q., Liu, J., Zheng, Q.: Efficient Deep Web crawling using reinforcement learning. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010, Part I. LNCS, vol. 6118, pp. 428–439. Springer, Heidelberg (2010)
Wang, Y., Li, H., Zuo, W., He, F., Wang, X., Chen, K.: Research on discovering Deep Web entries. Comput. Sci. Inf. Syst. 8(3), 779–799 (2011)
Marin-Castro, H.M., Sosa-Sosa, V.J., Martinez-Trinidad, J.F., et al.: Automatic discovery of Web Query Interfaces using machine learning techniques. J. Intell. Inf. Syst. 40(1), 85–108 (2013)
Wang, H., Xu, Q., Zhou, L.: Deep Web search interface identification: a semi-supervised ensemble approach. Inf. 5, 634–651 (2014)
Gravano, L., Ipeirotis, P.G., Sahami, M.: QProber: a system for automatic classification of hidden-web databases. ACM TOIS 21(1), 1–41 (2003)
He, B., Tao, T., Chang, K.C.C.: organizing structured web sources by query schemas: a clustering approach. In: Gravano, L. (ed.) Proceeding of ACM the 13th Conference on Information and Knowlege Management, pp. 22–31, ACM Press, Washington (2004)
Barbosa, L., Freire, J., Silva, A.: Organizing hidden-web databases by clustering visible web documents. In: Doqac, A. (ed.) Proceeding of IEEE the 23rd Internatiobnal Conference on Data Engineering, pp. 326–335, IEEE Computer Society, Istanbul (2007)
Shestakov, D.: On building a search interface discovery system. In: Lacroix, Z. (ed.) RED 2009. LNCS, vol. 6162, pp. 81–93. Springer, Heidelberg (2010)
Du, X., Zheng, Y.Q., Yan, Z.M.: Automate discovery of Deep Web interfaces. In: Information Science and Engineering (ICISE), pp. 3572–3575 (2010)
Yue, K., Dong, L., Derong, S., et al.: D-EEM: a DOM-Tree based entity extraction mechanism for Deep Web. J. Comput. Res. Dev. 5, 014 (2010)
Lu, Y., He, H., Zhao, H., et al.: Annotating search results from web databases. IEEE Trans. Knowl. Data Eng. 25(3), 514–527 (2013)
He, Y., Xin, D., Ganti, V., et al.: Crawling Deep Web entity pages. In: Proceedings of the sixth ACM International Conference on Web Search and Data Mining, pp. 355–364. ACM (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ye, F., Yu, H. (2015). Research on Automate Discovery of Deep Web Interfaces. In: Wang, J., et al. Web Information Systems Engineering – WISE 2015. WISE 2015. Lecture Notes in Computer Science(), vol 9419. Springer, Cham. https://doi.org/10.1007/978-3-319-26187-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-26187-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26186-7
Online ISBN: 978-3-319-26187-4
eBook Packages: Computer ScienceComputer Science (R0)