Understanding Tables on the Web

  • Jingjing Wang
  • Haixun Wang
  • Zhongyuan Wang
  • Kenny Q. Zhu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7532)


The Web contains a wealth of information, and a key challenge is to make this information machine processable. In this paper, we study how to “understand” HTML tables on the Web, which is one step further from finding the schemas of tables. From 0.3 billion Web documents, we obtain 1.95 billion tables, and 0.5-1% of these contain information of various entities and their properties. We argue that in order for computers to understand these tables, computers must first have a brain – a general purpose knowledge taxonomy that is comprehensive enough to cover the concepts (of worldly facts) in a human mind. Second, we argue that the process of understanding a table is the process of finding the right position for the table in the knowledge taxonomy. Once a table is associated with a concept in the knowledge taxonomy, it will be automatically linked to all other tables that are associated with the same concept, as well as tables associated with concepts related to this concept. In other words, understanding occurs when computers will understand the semantics of the tables through the interconnections of concepts in the knowledge base. In this paper, we illustrate a two phase process. Our experimental results show that the approach is feasible and it may benefit many useful applications such as web search.


Political Party Semantic Search Candidate Schema Human Judge Knowledge Taxonomy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: A probabilistic taxonomy for text understanding. In: SIGMOD (2012)Google Scholar
  2. 2.
    Lee, T., Wang, Z., Wang, H., Hwang, S.: Web scale taxonomy cleansing. In: VLDB (2011)Google Scholar
  3. 3.
    Zhang, Z., Zhu, K.Q., Wang, H.: A system for extracting top-k lists from the web. In: KDD (2012)Google Scholar
  4. 4.
    Liu, X., Song, Y., Liu, S., Wang, H.: Automatic taxonomy construction from keywords. In: KDD (2012)Google Scholar
  5. 5.
    Singh, P., Lin, T., Mueller, E., Lim, G., Perkins, T., Li Zhu, W.: Open Mind Common Sense: Knowledge Acquisition from the General Public. In: Meersman, R., Tari, Z. (eds.) CoopIS 2002, DOA 2002, and ODBASE 2002. LNCS, vol. 2519. Springer, Heidelberg (2002)Google Scholar
  6. 6.
    Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD (2008)Google Scholar
  7. 7.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: COLING, pp. 539–545 (1992)Google Scholar
  8. 8.
    Song, Y., Wang, H., Wang, Z., Li, H., Chen, W.: Short text conceptualization using a probabilistic knowledgebase. In: IJCAI (2011)Google Scholar
  9. 9.
    Cafarella, M.J., Wu, E., Halevy, A., Zhang, Y., Wang, D.Z.: Webtables: Exploring the power of tables on the web. In: VLDB (2008)Google Scholar
  10. 10.
    Wang, Y., Hu, J.: A machine learning based approach for table detection on the web. In: WWW (2002)Google Scholar
  11. 11.
    Yoshida, M., Torisawa, K., Tsujii, J.: A method to integrate tables of the world wide web. In: International Workshop on Web Document Analysis (2001)Google Scholar
  12. 12.
    Chen, H., Tsai, S., Tsai, J.: Mining tables from large scale html texts. In: ICCL (2000)Google Scholar
  13. 13.
    Cafarella, M.J., Wu, E., Halevy, A., Zhang, Y., Wang, D.Z.: Uncovering the relational web. In: WebDB (2008)Google Scholar
  14. 14.
    Yakout, M., Ganjam, K., Chakrabarti, K., Chaudhuri, S.: Infogather: entity augmentation and attribute discovery by holistic matching with web tables. In: SIGMOD (2012)Google Scholar
  15. 15.
    Syed, Z., Finin, T., Mulwad, V., Joshi, A.: Exploiting a web of semantic data for interpreting tables. In: Proceedings of the Second Web Science Conference (2010)Google Scholar
  16. 16.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A Nucleus for a Web of Open Data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  17. 17.
    Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. In: VLDB (2010)Google Scholar
  18. 18.
    Venetis, P., Halevy, A.Y., Madhavan, J., Pasca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Recovering semantics of tables on the web. PVLDB 4 (2011)Google Scholar
  19. 19.
    Pasca, M.: Organizing and searching the world wide web of facts - step two: Harnessing the wisdom of the crowds. In: WWW (2007)Google Scholar
  20. 20.
    Bellare, K., Talukdar, P.P., Kumaran, G., Pereira, F., Liberman, M., McCallum, A., Dredze, M.: Lightly-supervised attribute extraction. In: NIPS (2007)Google Scholar
  21. 21.
    Elmeleegy, H., Madhavan, J., Halevy, A.: Harvesting relational tables from lists on the web. In: VLDB (2009)Google Scholar
  22. 22.
    He, Y., Xin, D.: Seisa: set expansion by iterative similarity aggregation. In: WWW (2011)Google Scholar
  23. 23.
    Pyreddy, P., Croft, W.B.: Tintin: A system for retrieval in text tables. In: ICDL (1997)Google Scholar
  24. 24.
    Pinto, D., McCallum, A., Wei, X., Croft, W.B.: Table extraction using conditional random fields. In: SIGIR (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jingjing Wang
    • 1
  • Haixun Wang
    • 2
  • Zhongyuan Wang
    • 2
  • Kenny Q. Zhu
    • 3
  1. 1.University of WashingtonUSA
  2. 2.Microsoft Research AsiaChina
  3. 3.Shanghai Jiao Tong UniversityChina

Personalised recommendations