Abstract
Tabular data on the Web has become a rich source of structured data that is useful for ordinary users to explore. Due to its potential, tables on the Web have recently attracted a number of studies with the goals of understanding the semantics of those Web tables and providing effective search and exploration mechanisms over them. Table understanding is to identify, recognize and interpret tabular structures to enable a variety of tasks such as data extraction, data interpretation, data integration, and search and analysis. In this paper, we propose a human-machine hybrid method for effectively understanding tables on the Web. We develop novel techniques to address four main problems in Web table understanding: Web table extraction, Web table interpretation, Web table integration, and Web table search and analysis. We also discuss some open problems that need more research investigation in Web table understanding. We believe that Web table management will attract much more attention in the coming years.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: A collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250 (2008)
Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. VLDB 1(1), 538–549 (2008)
Deng, D., Li, G., Feng, J.: An efficient trie-based method for approximate entity extraction with edit-distance constraints. In: ICDE, pp. 762–773 (2012)
Deng, D., Li, G., Feng, J.: Top-k string similarity search with edit-distance constraints. In: ICDE (2013)
Elmeleegy, H., Madhavan, J., Halevy, A.Y.: Harvesting relational tables from lists on the web. VLDB 2(1), 1078–1089 (2009)
Elmeleegy, H., Madhavan, J., Halevy, A.Y.: Harvesting relational tables from lists on the web. VLDB J. 20(2), 209–226 (2011)
Fan, J., Li, G., Zhou, L.: Interactive sql query suggestion: Making databases user-friendly. In: ICDE, pp. 351–362 (2011)
Feng, J., Li, G.: Efficient fuzzy type-ahead search in xml data. IEEE Trans. Knowl. Data Eng. 24(5), 882–895 (2012)
Franklin, M.J., Kossmann, D., Kraska, T., Ramesh, S., Xin, R.: Crowddb: Answering queries with crowdsourcing. In: SIGMOD Conference, pp. 61–72 (2011)
Gonzalez, H., Halevy, A.Y., Jensen, C.S., Langen, A., Madhavan, J., Shapley, R., Shen, W., Goldberg-Kidon, J.: Google fusion tables: Web-centered data management and collaboration. In: SIGMOD Conference, pp. 1061–1066 (2010)
Ji, S., Li, G., Li, C., Feng, J.: Efficient interactive fuzzy keyword search. In: WWW, pp. 371–380 (2009)
Li, G., Deng, D., Feng, J.: Faerie: Efficient filtering algorithms for approximate dictionary-based entity extraction. In: SIGMOD Conference, pp. 529–540 (2011)
Li, G., Deng, D., Wang, J., Feng, J.: Pass-join: A partition-based method for similarity joins. VLDB 5(3), 253–264 (2011)
Li, G., Fan, J., Wu, H., Wang, J., Feng, J.: Dbease: Making databases user-friendly and easily accessible. In: CIDR, pp. 45–56 (2011)
Li, G., Ji, S., Li, C., Feng, J.: Efficient type-ahead search on relational data: a tastier approach. In: SIGMOD Conference, pp. 695–706 (2009)
Li, G., Ji, S., Li, C., Feng, J.: Efficient fuzzy full-text type-ahead search. VLDB J. 20(4), 617–640 (2011)
Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: SIGMOD Conference, pp. 903–914 (2008)
Li, G., Wang, J., Li, C., Feng, J.: Supporting efficient top-k queries in type-ahead search. In: SIGIR, pp. 355–364 (2012)
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. VLDB 3(1), 1338–1347 (2010)
Liu, X., Lu, M., Ooi, B.C., Shen, Y., Wu, S., Zhang, M.: Cdas: A crowdsourcing data analytics system. VLDB 5(10), 1040–1051 (2012)
Marcus, A., Wu, E., Karger, D.R., Madden, S., Miller, R.C.: Human-powered sorts and joins. VLDB 5(1), 13–24 (2011)
Parameswaran, A.G., Garcia-Molina, H., Park, H., Polyzotis, N., Ramesh, A., Widom, J.: Crowdscreen: algorithms for filtering data with humans. In: SIGMOD Conference, pp. 361–372 (2012)
Pimplikar, R., Sarawagi, S.: Answering table queries on the web using column keywords. VLDB 5(10), 908–919 (2012)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A core of semantic knowledge. In: WWW, pp. 697–706 (2007)
Venetis, P., Halevy, A.Y., Madhavan, J., Pasca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Recovering semantics of tables on the web. VLDB 4(9), 528–538 (2011)
Wang, J., Li, G., Feng, J.: Trie-join: Efficient trie-based string similarity joins with edit-distance constraints. PVLDB 3(1), 1219–1230 (2010)
Wang, J., Li, G., Feng, J.: Fast-join: An efficient method for fuzzy token matching based string similarity join. In: ICDE, pp. 458–469 (2011)
Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD (2013)
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD Conference, pp. 481–492 (2012)
Yakout, M., Ganjam, K., Chakrabarti, K., Chaudhuri, S.: Infogather: entity augmentation and attribute discovery by holistic matching with web tables. In: SIGMOD Conference, pp. 97–108 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, G. (2013). A Human-Machine Method for Web Table Understanding. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds) Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38562-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-38562-9_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38561-2
Online ISBN: 978-3-642-38562-9
eBook Packages: Computer ScienceComputer Science (R0)