Abstract
Recently, the Web has been rapidly “deepened” by many searchable databases online, where data are hidden behind query interfaces. Automatic processing of a query interface is a must to access the invisible contents of deep Web. This entails automatic segmentation, i.e., the task of grouping related components of an interface together. The segmentation is divided into two steps: interface component labeling and interface component grouping. In this paper we present a new approach to perform query interface segmentation using two-phase Conditional Random Fields (CRFs). At the first phase, one CRFs model is used to tag each component with a semantic label (attribute-name, operator, operand or other); at the second phase, another CRFs model is used to create groups of related components. Experiments show that our approach yields high accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wu, W., Yu, C., Doan, A.H., Meng, W.: An interactive clustering-based approach to integrating source query interfaces on the deep Web. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 95–106 (2004)
Dong, Y., Li, Q., Ding, Y., Peng, Z.: ETTA-IM:A deep web query interface matching approach based on evidence theory and task assignment. Expert Systems with Applications 38(8), 10218–10228 (2011)
Chang, K.C., He, B., Zhang, Z.: Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. In: Conference on Innovative Data Systems Research, pp. 44–55 (2005)
Jeffery, S.R., Cohen, S., Dong, X., Ko, D., Yu, C., Halevy, A.: Web-scale Data Integration: You can only afford to Pay As You Go. In: Proceedings of the Conference on Innovative Data Systems Research, pp. 342–350 (2007)
He, H., Meng, W., Lu, Y., Yu, C., Wu, Z.: Towards Deeper Understanding of the Search Interfaces of the Deep Web. World Wide Web 10(2), 133–155 (2007)
Zhang, Z., He, B., Chuan, K.C.: Understanding Web query interfaces: best-effort parsing with hidden syntax. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 107–118 (2004)
Nguyen, H., Nguyen, T., Freire, J.: Learning to extract form labels. Proc. VLDB Endow. 1(1), 684–694 (2008)
Khare, R., An, Y.: An empirical study on using hidden markov model for search interface segmentation. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 17–26 (2009)
Lafferty, J.D., Callum, A.M., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)
He, B., Patel, M., Zhang, Z., Chang, K.C.: Accessing the deep web:A Survey. Communications of the ACM 50(5), 94–101 (2007)
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(3), 503–528 (1989)
Yang, Z., Lin, H., Li, Y.: Exploiting the contextual cues for bio-entity name recognition in biomedical literature. J. of Biomedical Informatics 41(4), 580–587 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dong, Y., Zhao, X., Zhang, G. (2011). A Study on Using Two-Phase Conditional Random Fields for Query Interface Segmentation. In: Gong, Z., Luo, X., Chen, J., Lei, J., Wang, F.L. (eds) Web Information Systems and Mining. WISM 2011. Lecture Notes in Computer Science, vol 6988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23982-3_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-23982-3_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23981-6
Online ISBN: 978-3-642-23982-3
eBook Packages: Computer ScienceComputer Science (R0)