Skip to main content

Low-Cost Supervision for Multiple-Source Attribute Extraction

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5449))

Abstract

Previous studies on extracting class attributes from unstructured text consider either Web documents or query logs as the source of textual data. Web search queries have been shown to yield attributes of higher quality. However, since many relevant attributes found in Web documents occur infrequently in query logs, Web documents remain an important source for extraction. In this paper, we introduce Bootstrapped Web Search (BWS) extraction, the first approach to extracting class attributes simultaneously from both sources. Extraction is guided by a small set of seed attributes and does not rely on further domain-specific knowledge. BWS is shown to improve extraction precision and also to improve attribute relevance across 40 test classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tokunaga, K., Kazama, J., Torisawa, K.: Automatic discovery of attribute words from web documents. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS, vol. 3651, pp. 106–118. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  2. Paşca, M.: Organizing and searching the World Wide Web of facts - step two: Harnessing the wisdom of the crowds. In: Proceedings of the 16th World Wide Web Conference (WWW 2007), Banff, Canada, pp. 101–110 (2007)

    Google Scholar 

  3. Paşca, M., Van Durme, B., Garera, N.: The role of documents vs. queries in extracting class attributes from text. In: Proceedings of the 16th International Conference on Information and Knowledge Management (CIKM 2007), Lisbon, Portugal, pp. 485–494 (2007)

    Google Scholar 

  4. Lee, L.: Measures of distributional similarity. In: Proceedings of the 37th Annual Meeting of the Association of Computational Linguistics (ACL 1999), College Park, Maryland, pp. 25–32 (1999)

    Google Scholar 

  5. Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plaintext collections. In: Proceedings of the 5th ACM International Conference on Digital Libraries (DL 2000), San Antonio, Texas, pp. 85–94 (2000)

    Google Scholar 

  6. Lin, D., Pantel, P.: Concept discovery from text. In: Proceedings of the 19th International Conference on Computational linguistics (COLING 2002), Taipei, Taiwan, pp. 1–7 (2002)

    Google Scholar 

  7. Shinzato, K., Torisawa, K.: Acquiring hyponymy relations from Web documents. In: Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL 2004), Boston, Massachusetts, pp. 73–80 (2004)

    Google Scholar 

  8. Brants, T.: TnT - a statistical part of speech tagger. In: Proceedings of the 6th Conference on Applied Natural Language Processing (ANLP 2000), Seattle, Washington, pp. 224–231 (2000)

    Google Scholar 

  9. Voorhees, E.: Evaluating answers to definition questions. In: Proceedings of the 2003 Human Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, pp. 109–111 (2003)

    Google Scholar 

  10. Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to construct knowledge bases from the world wide web. Artificial Intelligence 118, 69–113 (2000)

    Article  MATH  Google Scholar 

  11. Schubert, L.: Turing’s dream and the knowledge challenge. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006), Boston, Massachusetts (2006)

    Google Scholar 

  12. Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the 1999 Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC 1999), College Park, Maryland, pp. 189–196 (1999)

    Google Scholar 

  13. Shinyama, Y., Sekine, S.: Named entity discovery using comparable news articles. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 848–853 (2004)

    Google Scholar 

  14. Klementiev, A., Roth, D.: Weakly supervised named entity transliteration and discovery from multilingual comparable corpora. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 2006), Sydney, Australia, pp. 817–824 (2006)

    Google Scholar 

  15. Riloff, E., Jones, R.: Learning dictionaries for information extraction by multi-level bootstrapping. In: Proceedings of the 16th National Conference on Artificial Intelligence (AAAI 1999), Orlando, Florida, pp. 474–479 (1999)

    Google Scholar 

  16. Feldman, R., Rosenfeld, B.: Boosting unsupervised relation extraction by using NER. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP-ACL 2006), Sydney, Australia, pp. 473–481 (2006)

    Google Scholar 

  17. Thelen, M., Riloff, E.: A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), Philadelphia, Pennsylvania, pp. 214–221 (2002)

    Google Scholar 

  18. Cafarella, M., Downey, D., Soderland, S., Etzioni, O.: KnowItNow: Fast, scalable information extraction from the Web. In: Proceedings of the Human Language Technology Conference (HLT-EMNLP 2005), Vancouver, Canada, pp. 563–570 (2005)

    Google Scholar 

  19. Chklovski, T., Gil, Y.: An analysis of knowledge collected from volunteer contributors. In: Proceedings of the 20th National Conference on Artificial Intelligence (AAAI 2005), Pittsburgh, Pennsylvania, pp. 564–571 (2005)

    Google Scholar 

  20. Ravi, S., Paşca, M.: Using structured text for large-scale attribute extraction. In: Proceedings of the 17th International Conference on Information and Knowledge Management (CIKM 2008), Napa Valley, California, pp. 1183–1192 (2008)

    Google Scholar 

  21. Yoshinaga, N., Torisawa, K.: Open-domain attribute-value acquisition from semi-structured texts. In: Proceedings of the 6th International Semantic Web Conference (ISWC 2007), Workshop on Text to Knowledge: The Lexicon/Ontology Interface (OntoLex 2007), Busan, South Korea, pp. 55–66 (2007)

    Google Scholar 

  22. Probst, K., Ghani, R., Krema, M., Fano, A., Liu, Y.: Semi-supervised learning of attribute-value pairs from product descriptions. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), Hyderabad, India, pp. 2838–2843 (2007)

    Google Scholar 

  23. Pantel, P., Pennacchiotti, M.: Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 2006), Sydney, Australia, pp. 113–120 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Reisinger, J., Paşca, M. (2009). Low-Cost Supervision for Multiple-Source Attribute Extraction. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00382-0_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00382-0_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00381-3

  • Online ISBN: 978-3-642-00382-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics