Skip to main content

A Human-Machine Method for Web Table Understanding

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7923))

Abstract

Tabular data on the Web has become a rich source of structured data that is useful for ordinary users to explore. Due to its potential, tables on the Web have recently attracted a number of studies with the goals of understanding the semantics of those Web tables and providing effective search and exploration mechanisms over them. Table understanding is to identify, recognize and interpret tabular structures to enable a variety of tasks such as data extraction, data interpretation, data integration, and search and analysis. In this paper, we propose a human-machine hybrid method for effectively understanding tables on the Web. We develop novel techniques to address four main problems in Web table understanding: Web table extraction, Web table interpretation, Web table integration, and Web table search and analysis. We also discuss some open problems that need more research investigation in Web table understanding. We believe that Web table management will attract much more attention in the coming years.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: A collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250 (2008)

    Google Scholar 

  2. Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. VLDB 1(1), 538–549 (2008)

    Google Scholar 

  3. Deng, D., Li, G., Feng, J.: An efficient trie-based method for approximate entity extraction with edit-distance constraints. In: ICDE, pp. 762–773 (2012)

    Google Scholar 

  4. Deng, D., Li, G., Feng, J.: Top-k string similarity search with edit-distance constraints. In: ICDE (2013)

    Google Scholar 

  5. Elmeleegy, H., Madhavan, J., Halevy, A.Y.: Harvesting relational tables from lists on the web. VLDB 2(1), 1078–1089 (2009)

    Google Scholar 

  6. Elmeleegy, H., Madhavan, J., Halevy, A.Y.: Harvesting relational tables from lists on the web. VLDB J. 20(2), 209–226 (2011)

    Article  Google Scholar 

  7. Fan, J., Li, G., Zhou, L.: Interactive sql query suggestion: Making databases user-friendly. In: ICDE, pp. 351–362 (2011)

    Google Scholar 

  8. Feng, J., Li, G.: Efficient fuzzy type-ahead search in xml data. IEEE Trans. Knowl. Data Eng. 24(5), 882–895 (2012)

    Article  MathSciNet  Google Scholar 

  9. Franklin, M.J., Kossmann, D., Kraska, T., Ramesh, S., Xin, R.: Crowddb: Answering queries with crowdsourcing. In: SIGMOD Conference, pp. 61–72 (2011)

    Google Scholar 

  10. Gonzalez, H., Halevy, A.Y., Jensen, C.S., Langen, A., Madhavan, J., Shapley, R., Shen, W., Goldberg-Kidon, J.: Google fusion tables: Web-centered data management and collaboration. In: SIGMOD Conference, pp. 1061–1066 (2010)

    Google Scholar 

  11. Ji, S., Li, G., Li, C., Feng, J.: Efficient interactive fuzzy keyword search. In: WWW, pp. 371–380 (2009)

    Google Scholar 

  12. Li, G., Deng, D., Feng, J.: Faerie: Efficient filtering algorithms for approximate dictionary-based entity extraction. In: SIGMOD Conference, pp. 529–540 (2011)

    Google Scholar 

  13. Li, G., Deng, D., Wang, J., Feng, J.: Pass-join: A partition-based method for similarity joins. VLDB 5(3), 253–264 (2011)

    Google Scholar 

  14. Li, G., Fan, J., Wu, H., Wang, J., Feng, J.: Dbease: Making databases user-friendly and easily accessible. In: CIDR, pp. 45–56 (2011)

    Google Scholar 

  15. Li, G., Ji, S., Li, C., Feng, J.: Efficient type-ahead search on relational data: a tastier approach. In: SIGMOD Conference, pp. 695–706 (2009)

    Google Scholar 

  16. Li, G., Ji, S., Li, C., Feng, J.: Efficient fuzzy full-text type-ahead search. VLDB J. 20(4), 617–640 (2011)

    Article  Google Scholar 

  17. Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: SIGMOD Conference, pp. 903–914 (2008)

    Google Scholar 

  18. Li, G., Wang, J., Li, C., Feng, J.: Supporting efficient top-k queries in type-ahead search. In: SIGIR, pp. 355–364 (2012)

    Google Scholar 

  19. Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. VLDB 3(1), 1338–1347 (2010)

    Google Scholar 

  20. Liu, X., Lu, M., Ooi, B.C., Shen, Y., Wu, S., Zhang, M.: Cdas: A crowdsourcing data analytics system. VLDB 5(10), 1040–1051 (2012)

    Google Scholar 

  21. Marcus, A., Wu, E., Karger, D.R., Madden, S., Miller, R.C.: Human-powered sorts and joins. VLDB 5(1), 13–24 (2011)

    Google Scholar 

  22. Parameswaran, A.G., Garcia-Molina, H., Park, H., Polyzotis, N., Ramesh, A., Widom, J.: Crowdscreen: algorithms for filtering data with humans. In: SIGMOD Conference, pp. 361–372 (2012)

    Google Scholar 

  23. Pimplikar, R., Sarawagi, S.: Answering table queries on the web using column keywords. VLDB 5(10), 908–919 (2012)

    Google Scholar 

  24. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A core of semantic knowledge. In: WWW, pp. 697–706 (2007)

    Google Scholar 

  25. Venetis, P., Halevy, A.Y., Madhavan, J., Pasca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Recovering semantics of tables on the web. VLDB 4(9), 528–538 (2011)

    Google Scholar 

  26. Wang, J., Li, G., Feng, J.: Trie-join: Efficient trie-based string similarity joins with edit-distance constraints. PVLDB 3(1), 1219–1230 (2010)

    Google Scholar 

  27. Wang, J., Li, G., Feng, J.: Fast-join: An efficient method for fuzzy token matching based string similarity join. In: ICDE, pp. 458–469 (2011)

    Google Scholar 

  28. Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD (2013)

    Google Scholar 

  29. Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD Conference, pp. 481–492 (2012)

    Google Scholar 

  30. Yakout, M., Ganjam, K., Chakrabarti, K., Chaudhuri, S.: Infogather: entity augmentation and attribute discovery by holistic matching with web tables. In: SIGMOD Conference, pp. 97–108 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, G. (2013). A Human-Machine Method for Web Table Understanding. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds) Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38562-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38562-9_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38561-2

  • Online ISBN: 978-3-642-38562-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics