Skip to main content

Understanding Tables on the Web

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7532))

Abstract

The Web contains a wealth of information, and a key challenge is to make this information machine processable. In this paper, we study how to “understand” HTML tables on the Web, which is one step further from finding the schemas of tables. From 0.3 billion Web documents, we obtain 1.95 billion tables, and 0.5-1% of these contain information of various entities and their properties. We argue that in order for computers to understand these tables, computers must first have a brain – a general purpose knowledge taxonomy that is comprehensive enough to cover the concepts (of worldly facts) in a human mind. Second, we argue that the process of understanding a table is the process of finding the right position for the table in the knowledge taxonomy. Once a table is associated with a concept in the knowledge taxonomy, it will be automatically linked to all other tables that are associated with the same concept, as well as tables associated with concepts related to this concept. In other words, understanding occurs when computers will understand the semantics of the tables through the interconnections of concepts in the knowledge base. In this paper, we illustrate a two phase process. Our experimental results show that the approach is feasible and it may benefit many useful applications such as web search.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: A probabilistic taxonomy for text understanding. In: SIGMOD (2012)

    Google Scholar 

  2. Lee, T., Wang, Z., Wang, H., Hwang, S.: Web scale taxonomy cleansing. In: VLDB (2011)

    Google Scholar 

  3. Zhang, Z., Zhu, K.Q., Wang, H.: A system for extracting top-k lists from the web. In: KDD (2012)

    Google Scholar 

  4. Liu, X., Song, Y., Liu, S., Wang, H.: Automatic taxonomy construction from keywords. In: KDD (2012)

    Google Scholar 

  5. Singh, P., Lin, T., Mueller, E., Lim, G., Perkins, T., Li Zhu, W.: Open Mind Common Sense: Knowledge Acquisition from the General Public. In: Meersman, R., Tari, Z. (eds.) CoopIS 2002, DOA 2002, and ODBASE 2002. LNCS, vol. 2519. Springer, Heidelberg (2002)

    Google Scholar 

  6. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD (2008)

    Google Scholar 

  7. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: COLING, pp. 539–545 (1992)

    Google Scholar 

  8. Song, Y., Wang, H., Wang, Z., Li, H., Chen, W.: Short text conceptualization using a probabilistic knowledgebase. In: IJCAI (2011)

    Google Scholar 

  9. Cafarella, M.J., Wu, E., Halevy, A., Zhang, Y., Wang, D.Z.: Webtables: Exploring the power of tables on the web. In: VLDB (2008)

    Google Scholar 

  10. Wang, Y., Hu, J.: A machine learning based approach for table detection on the web. In: WWW (2002)

    Google Scholar 

  11. Yoshida, M., Torisawa, K., Tsujii, J.: A method to integrate tables of the world wide web. In: International Workshop on Web Document Analysis (2001)

    Google Scholar 

  12. Chen, H., Tsai, S., Tsai, J.: Mining tables from large scale html texts. In: ICCL (2000)

    Google Scholar 

  13. Cafarella, M.J., Wu, E., Halevy, A., Zhang, Y., Wang, D.Z.: Uncovering the relational web. In: WebDB (2008)

    Google Scholar 

  14. Yakout, M., Ganjam, K., Chakrabarti, K., Chaudhuri, S.: Infogather: entity augmentation and attribute discovery by holistic matching with web tables. In: SIGMOD (2012)

    Google Scholar 

  15. Syed, Z., Finin, T., Mulwad, V., Joshi, A.: Exploiting a web of semantic data for interpreting tables. In: Proceedings of the Second Web Science Conference (2010)

    Google Scholar 

  16. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A Nucleus for a Web of Open Data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  17. Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. In: VLDB (2010)

    Google Scholar 

  18. Venetis, P., Halevy, A.Y., Madhavan, J., Pasca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Recovering semantics of tables on the web. PVLDB 4 (2011)

    Google Scholar 

  19. Pasca, M.: Organizing and searching the world wide web of facts - step two: Harnessing the wisdom of the crowds. In: WWW (2007)

    Google Scholar 

  20. Bellare, K., Talukdar, P.P., Kumaran, G., Pereira, F., Liberman, M., McCallum, A., Dredze, M.: Lightly-supervised attribute extraction. In: NIPS (2007)

    Google Scholar 

  21. Elmeleegy, H., Madhavan, J., Halevy, A.: Harvesting relational tables from lists on the web. In: VLDB (2009)

    Google Scholar 

  22. He, Y., Xin, D.: Seisa: set expansion by iterative similarity aggregation. In: WWW (2011)

    Google Scholar 

  23. Pyreddy, P., Croft, W.B.: Tintin: A system for retrieval in text tables. In: ICDL (1997)

    Google Scholar 

  24. Pinto, D., McCallum, A., Wei, X., Croft, W.B.: Table extraction using conditional random fields. In: SIGIR (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, J., Wang, H., Wang, Z., Zhu, K.Q. (2012). Understanding Tables on the Web. In: Atzeni, P., Cheung, D., Ram, S. (eds) Conceptual Modeling. ER 2012. Lecture Notes in Computer Science, vol 7532. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34002-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34002-4_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34001-7

  • Online ISBN: 978-3-642-34002-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics