Portable Extraction of Partially Structured Facts from the Web

Salway, Andrew; Kelly, Liadh; Skadiņa, Inguna; Jones, Gareth J. F.

doi:10.1007/978-3-642-14770-8_38

Andrew Salway²²,
Liadh Kelly²²,
Inguna Skadiņa²³ &
…
Gareth J. F. Jones²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6233))

Included in the following conference series:

International Conference on Natural Language Processing

1158 Accesses
1 Citations

Abstract

A novel fact extraction task is defined to fill a gap between current information retrieval and information extraction technologies. It is shown that it is possible to extract useful partially structured facts about different kinds of entities in a broad domain, i.e. all kinds of places depicted in tourist images. Importantly the approach does not rely on existing linguistic resources (gazetteers, taggers, parsers, etc.) and it ported easily and cheaply between two rather different languages (English and Latvian). Previous fact extraction from the web has focused on the extraction of structured data, e.g. (Building-LocatedIn-Town). In contrast we extract richer and more interesting facts, such as a fact explaining why a building was built. Enough structure is maintained to facilitate subsequent processing of the information. For example, the partial structure enables straightforward template-based text generation. We report positive results for the correctness and interest of English and Latvian facts and for their utility in enhancing image captions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Purves, R.S., Edwardes, A.J., Sanderson, M.: Describing the Where - improving image annotation and search through geography. In: First Intl. Workshop on Metadata Mining for Image Understanding (2008)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)
Google Scholar
Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: 16th ACM SIGIR, pp. 49–58 (1993)
Google Scholar
Sarawagi, S.: Information Extraction. Foundations and Trends in Databases 1(3), 261–377 (2008)
Article Google Scholar
Lin, J.: An Exploration of the Principles Underlying Redundancy-based Factoid Question Answering. ACM Trans. Information Systems 25(2), 1–55 (2007)
Article Google Scholar
Dumais, S., et al.: Web Question Answering: Is More Always Better? In: 25th ACM SIGIR, pp. 291–298 (2002)
Google Scholar
Goldstein, J., et al.: Multi-document Summarization by Sentence Extraction. In: NAACL-ANLP 2000 Workshop on Automatic Summarization, pp. 40–48 (2000)
Google Scholar
Pasca, M., et al.: Organizing and Searching the World Wide Web of Facts - Step One: the One-Million Fact Extraction Challenge. In: 21st Nat. Conf. on AI (AAAI 2006), pp. 1400–1405 (2006)
Google Scholar
Banko, M., Etzioni, O.: The Tradeoffs Between Open and Traditional Relation Extraction. In: ACL 2008, pp. 28–36 (2008)
Google Scholar
Etzioni, O., et al.: Open Information Extraction from the Web. Comms. of the ACM 51(12), 68–74 (2008)
Article Google Scholar
TextRunner Search (March 30, 2010), http://www.cs.washington.edu/research/
Yahoo! Search BOSS (March 30, 2010), http://developer.yahoo.com/search/boss/
Powerset (March 30, 2010), http://powerset.com
Google Squared (March 30, 2010), http://www.google.com/squared

Download references

Author information

Authors and Affiliations

Centre for Digital Video Processing, School of Computing, Dublin City University, Dublin 9, Ireland
Andrew Salway, Liadh Kelly & Gareth J. F. Jones
Tilde, 75 Vienības Gatve, Rīga, 1004, Latvia
Inguna Skadiņa

Authors

Andrew Salway
View author publications
You can also search for this author in PubMed Google Scholar
Liadh Kelly
View author publications
You can also search for this author in PubMed Google Scholar
Inguna Skadiņa
View author publications
You can also search for this author in PubMed Google Scholar
Gareth J. F. Jones
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Reykjavik University, Kringlan 1, 103, Reykjavik, Iceland
Hrafn Loftsson
Department of Icelandic, University of Iceland, Árnagardur v/Sudurgötu, 101, Reykjavik, Iceland
Eiríkur Rögnvaldsson
Arni Magnusson Institute for Icelandic Studies, Neshagi 16, 101, Reykjavik, Iceland
Sigrún Helgadóttir

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Salway, A., Kelly, L., Skadiņa, I., Jones, G.J.F. (2010). Portable Extraction of Partially Structured Facts from the Web. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds) Advances in Natural Language Processing. NLP 2010. Lecture Notes in Computer Science(), vol 6233. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14770-8_38

Download citation

DOI: https://doi.org/10.1007/978-3-642-14770-8_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14769-2
Online ISBN: 978-3-642-14770-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics