Skip to main content

A Study on Using Two-Phase Conditional Random Fields for Query Interface Segmentation

  • Conference paper
Web Information Systems and Mining (WISM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6988))

Included in the following conference series:

  • 1291 Accesses

Abstract

Recently, the Web has been rapidly “deepened” by many searchable databases online, where data are hidden behind query interfaces. Automatic processing of a query interface is a must to access the invisible contents of deep Web. This entails automatic segmentation, i.e., the task of grouping related components of an interface together. The segmentation is divided into two steps: interface component labeling and interface component grouping. In this paper we present a new approach to perform query interface segmentation using two-phase Conditional Random Fields (CRFs). At the first phase, one CRFs model is used to tag each component with a semantic label (attribute-name, operator, operand or other); at the second phase, another CRFs model is used to create groups of related components. Experiments show that our approach yields high accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wu, W., Yu, C., Doan, A.H., Meng, W.: An interactive clustering-based approach to integrating source query interfaces on the deep Web. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 95–106 (2004)

    Google Scholar 

  2. Dong, Y., Li, Q., Ding, Y., Peng, Z.: ETTA-IM:A deep web query interface matching approach based on evidence theory and task assignment. Expert Systems with Applications 38(8), 10218–10228 (2011)

    Article  Google Scholar 

  3. Chang, K.C., He, B., Zhang, Z.: Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. In: Conference on Innovative Data Systems Research, pp. 44–55 (2005)

    Google Scholar 

  4. Jeffery, S.R., Cohen, S., Dong, X., Ko, D., Yu, C., Halevy, A.: Web-scale Data Integration: You can only afford to Pay As You Go. In: Proceedings of the Conference on Innovative Data Systems Research, pp. 342–350 (2007)

    Google Scholar 

  5. He, H., Meng, W., Lu, Y., Yu, C., Wu, Z.: Towards Deeper Understanding of the Search Interfaces of the Deep Web. World Wide Web 10(2), 133–155 (2007)

    Article  Google Scholar 

  6. Zhang, Z., He, B., Chuan, K.C.: Understanding Web query interfaces: best-effort parsing with hidden syntax. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 107–118 (2004)

    Google Scholar 

  7. Nguyen, H., Nguyen, T., Freire, J.: Learning to extract form labels. Proc. VLDB Endow. 1(1), 684–694 (2008)

    Article  Google Scholar 

  8. Khare, R., An, Y.: An empirical study on using hidden markov model for search interface segmentation. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 17–26 (2009)

    Google Scholar 

  9. Lafferty, J.D., Callum, A.M., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)

    Google Scholar 

  10. He, B., Patel, M., Zhang, Z., Chang, K.C.: Accessing the deep web:A Survey. Communications of the ACM 50(5), 94–101 (2007)

    Article  Google Scholar 

  11. Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(3), 503–528 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  12. Yang, Z., Lin, H., Li, Y.: Exploiting the contextual cues for bio-entity name recognition in biomedical literature. J. of Biomedical Informatics 41(4), 580–587 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dong, Y., Zhao, X., Zhang, G. (2011). A Study on Using Two-Phase Conditional Random Fields for Query Interface Segmentation. In: Gong, Z., Luo, X., Chen, J., Lei, J., Wang, F.L. (eds) Web Information Systems and Mining. WISM 2011. Lecture Notes in Computer Science, vol 6988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23982-3_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23982-3_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23981-6

  • Online ISBN: 978-3-642-23982-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics