A Study on Using Two-Phase Conditional Random Fields for Query Interface Segmentation

Dong, Yongquan; Zhao, Xiangjun; Zhang, Gongjie

doi:10.1007/978-3-642-23982-3_45

Yongquan Dong²¹,
Xiangjun Zhao²¹ &
Gongjie Zhang²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6988))

Included in the following conference series:

International Conference on Web Information Systems and Mining

1291 Accesses

Abstract

Recently, the Web has been rapidly “deepened” by many searchable databases online, where data are hidden behind query interfaces. Automatic processing of a query interface is a must to access the invisible contents of deep Web. This entails automatic segmentation, i.e., the task of grouping related components of an interface together. The segmentation is divided into two steps: interface component labeling and interface component grouping. In this paper we present a new approach to perform query interface segmentation using two-phase Conditional Random Fields (CRFs). At the first phase, one CRFs model is used to tag each component with a semantic label (attribute-name, operator, operand or other); at the second phase, another CRFs model is used to create groups of related components. Experiments show that our approach yields high accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wu, W., Yu, C., Doan, A.H., Meng, W.: An interactive clustering-based approach to integrating source query interfaces on the deep Web. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 95–106 (2004)
Google Scholar
Dong, Y., Li, Q., Ding, Y., Peng, Z.: ETTA-IM:A deep web query interface matching approach based on evidence theory and task assignment. Expert Systems with Applications 38(8), 10218–10228 (2011)
Article Google Scholar
Chang, K.C., He, B., Zhang, Z.: Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. In: Conference on Innovative Data Systems Research, pp. 44–55 (2005)
Google Scholar
Jeffery, S.R., Cohen, S., Dong, X., Ko, D., Yu, C., Halevy, A.: Web-scale Data Integration: You can only afford to Pay As You Go. In: Proceedings of the Conference on Innovative Data Systems Research, pp. 342–350 (2007)
Google Scholar
He, H., Meng, W., Lu, Y., Yu, C., Wu, Z.: Towards Deeper Understanding of the Search Interfaces of the Deep Web. World Wide Web 10(2), 133–155 (2007)
Article Google Scholar
Zhang, Z., He, B., Chuan, K.C.: Understanding Web query interfaces: best-effort parsing with hidden syntax. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 107–118 (2004)
Google Scholar
Nguyen, H., Nguyen, T., Freire, J.: Learning to extract form labels. Proc. VLDB Endow. 1(1), 684–694 (2008)
Article Google Scholar
Khare, R., An, Y.: An empirical study on using hidden markov model for search interface segmentation. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 17–26 (2009)
Google Scholar
Lafferty, J.D., Callum, A.M., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)
Google Scholar
He, B., Patel, M., Zhang, Z., Chang, K.C.: Accessing the deep web:A Survey. Communications of the ACM 50(5), 94–101 (2007)
Article Google Scholar
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(3), 503–528 (1989)
Article MathSciNet MATH Google Scholar
Yang, Z., Lin, H., Li, Y.: Exploiting the contextual cues for bio-entity name recognition in biomedical literature. J. of Biomedical Informatics 41(4), 580–587 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Xuzhou Normal University, Xuzhou, China
Yongquan Dong, Xiangjun Zhao & Gongjie Zhang

Authors

Yongquan Dong
View author publications
You can also search for this author in PubMed Google Scholar
Xiangjun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Gongjie Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Inforamtion Science, University of Macau, Av. Padre Tomás Pereira, Taipa, Macau, China
Zhiguo Gong
School of Computer, Shanghai University, 200444, Shanghai, China
Xiangfeng Luo
College of Computer and Software, Taiyuan University of Technology, 030024, Taiyuan, China
Junjie Chen
School of Computer and Information Engineering, Shanghai University of Electric Power, 200090, Shanghai, China
Jingsheng Lei
Department of Business Administration, Caritas Institute of Higher Education, 18 Chui Ling Road, Tseung Kwan O, Hong Kong, China
Fu Lee Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dong, Y., Zhao, X., Zhang, G. (2011). A Study on Using Two-Phase Conditional Random Fields for Query Interface Segmentation. In: Gong, Z., Luo, X., Chen, J., Lei, J., Wang, F.L. (eds) Web Information Systems and Mining. WISM 2011. Lecture Notes in Computer Science, vol 6988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23982-3_45

Download citation

DOI: https://doi.org/10.1007/978-3-642-23982-3_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23981-6
Online ISBN: 978-3-642-23982-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics