Advertisement

Attribute Retrieval from Relational Web Tables

  • Arlind Kopliku
  • Karen Pinel-Sauvagnat
  • Mohand Boughanem
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7024)

Abstract

In this paper, we propose an attribute retrieval approach which extracts and ranks attributes from HTML tables. Given an instance (e.g. Tower of Pisa), we want to retrieve from the Web its attributes (e.g. height, architect). Our approach uses HTML tables which are probably the largest source for attribute retrieval. Three recall oriented filters are applied over tables to check the following three properties: (i) is the table relational, (ii) has the table a header, and (iii) the conformity of its attributes and values. Candidate attributes are extracted from tables and ranked with a combination of relevance features. Our approach can be applied to all instances and is shown to have a high recall and a reasonable precision. Moreover, it outperforms state of the art techniques.

Keywords

information retrieval attribute retrieval 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alfonseca, E., Pasca, M., Robledo-Arnuncio, E.: Acquisition of instance attributes via labeled and related instances. In: SIGIR 2010, pp. 58–65. ACM, New York (2010)Google Scholar
  2. 2.
    Almuhareb, A., Poesio, M.: Attribute-based and value-based clustering: An evaluation. In: EMNLP. ACL (2004)Google Scholar
  3. 3.
    Bellare, K., Talukdar, P.P., Kumaran, G., Pereira, O., Liberman, M., Mccallum, A., Dredze, M.: Lightlysupervised attribute extraction for web search. In: Proceedings of Machine Learning for Web Search Workshop, NIPS 2007 (2007)Google Scholar
  4. 4.
    Ben-Yitzhak, O., Golbandi, N., Har’El, N., Lempel, R., Neumann, A., Ofek-Koifman, S., Sheinwald, D., Shekita, E., Sznajder, B., Yogev, S.: Beyond basic faceted search. In: WSDM 2008, pp. 33–44. ACM, New York (2008)Google Scholar
  5. 5.
    Cafarella, M.J., Banko, M., Etzioni, O.: Relational Web Search. Technical report, University of Washington (2006)Google Scholar
  6. 6.
    Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. Proc. VLDB Endow. 1(1), 538–549 (2008)CrossRefGoogle Scholar
  7. 7.
    Cafarella, M.J., Halevy, A.Y., Zhang, Y., Wang, D.Z., Wu, E.: Uncovering the Relational Web. In: WebDB (2008)Google Scholar
  8. 8.
    Chang, C.-H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A survey of web information extraction systems. IEEE Trans. on Knowl. and Data Eng. 18, 1411–1428 (2006)CrossRefGoogle Scholar
  9. 9.
    Chen, H.-H., Tsai, S.-C., Tsai, J.-H.: Mining tables from large scale html texts. In: COLING 2000, USA, pp. 166–172 (2000)Google Scholar
  10. 10.
    Crescenzi, V., Mecca, G., Merialdo, P.: Roadrunner: Towards automatic data extraction from large web sites. In: VLDB 2001, USA, pp. 109–118 (2001)Google Scholar
  11. 11.
    Kopliku, A.: Aggregated search: From information nuggets to aggregated documents. In: CORIA RJCRI 2009, Toulon, France (2009)Google Scholar
  12. 12.
    Kopliku, A., Pinel-Sauvagnat, K., Boughanem, M.: Retrieving attributes using web tables. In: Joint Conference on Digital Libraries 2011, Ottawa, Canada (2011)Google Scholar
  13. 13.
    Pasca, M., Durme, B.V.: What you seek is what you get: Extraction of class attributes from query logs. In: IJCAI, pp. 2832–2837 (2007)Google Scholar
  14. 14.
    Pasca, M., Durme, B.V.: Weakly-supervised acquisition of open-domain classes and class attributes from web documents and query logs. In: ACL, pp. 19–27 (2008)Google Scholar
  15. 15.
    Popescu, A.-M., Etzioni, O.: Extracting product features and opinions from reviews. In: HLT 2005, pp. 339–346. ACL, Stroudsburg (2005)Google Scholar
  16. 16.
    Tokunaga, K., Kazama, J., Torisawa, K.: Automatic discovery of attribute words from web documents. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 106–118. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with java implementations. SIGMOD Rec. 31, 76–77 (2002)CrossRefGoogle Scholar
  18. 18.
    Wong, T.-L., Lam, W.: A probabilistic approach for adapting information extraction wrappers and discovering new attributes. In: ICDM 2004, pp. 257–264. IEEE Computer Society, Washington, DC (2004)Google Scholar
  19. 19.
    Wong, T.-L., Lam, W.: An unsupervised method for joint information extraction and feature mining across different web sites. Data Knowl. Eng. 68, 107–125 (2009)CrossRefGoogle Scholar
  20. 20.
    Wu, F., Hoffmann, R., Weld, D.S.: Information extraction from wikipedia: moving down the long tail. In: KDD 2008, pp. 731–739. ACM, New York (2008)Google Scholar
  21. 21.
    Yoshinaga, N., Torisawa, K.: Open-domain attribute-value acquisition from semi-structured texts. In: Proceedings of the Workshop on Ontolex, pp. 55–66 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Arlind Kopliku
    • 1
  • Karen Pinel-Sauvagnat
    • 1
  • Mohand Boughanem
    • 1
  1. 1.IRIT, University of ToulouseFrance

Personalised recommendations