Making Structured Data Searchable via Natural Language Generation

Leidner, Jochen L.; Kamkova, Darya

doi:10.1007/978-3-642-40769-7_43

Jochen L. Leidner²⁴ &
Darya Kamkova²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8132))

Included in the following conference series:

International Conference on Flexible Query Answering Systems

1402 Accesses
1 Citations

Abstract

Relational Databases are used to store structured data, which is typically accessed using report builders based on SQL queries. To search, forms need to be understood and filled out, which demands a high cognitive load. Due to the success of Web search engines, users have become acquainted with the easier mechanism of natural language search for accessing unstructured data. However, such keyword-based search methods are not easily applicable to structured data, especially where structured records contain non-textual content such as numbers.

We present a method to make structured data, including numeric data, searchable with a Web search engine-like keyword search access mechanism. Our method is based on the creation of surrogate text documents using Natural Language Generation (NLG) methods that can then be retrieved by off-the-shelf search methods.

We demonstrate that this method is effective by applying it to two real-life sized databases, a proprietary database comprising corporate Environmental, Social and Governance (ESG) data and a public-domain environmental pollution database, respectively, in a federated scenario. Our evaluation includes speed and index size investigations, and indicates effectiveness (P@1 = 84%, P@5 = 92%) and practicality of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cafarella, M.J., Halevy, A., Madhavan, J.: Structured data on the Web. Communications of the ACM 54(2), 72–79 (2011)
Article Google Scholar
KPMG: KPMG International Survey of Corporate Responsibility Reporting 2008 (2008), http://ec.europa.eu/enterprise/policies/sustainable-business/corporate-social-responsibility/reporting-disclosure/swedish-presidency/files/surveys_and_reports/international_survey_of_csr_reporting_2008_-_kpmg_en.pdf (cited March 3, 2013)
Manning, C.D., Raghavan, P., Schütze, H.: An Introduction to Information Retrieval. Cambridge University Press (2008)
Google Scholar
Leidner, J.L., Bos, J., Dalmas, T., Curran, J.R., Clark, S., Bannard, C.J., Webber, B.L., Steedman, M.: QED: The Edinburgh TREC-2003 question answering system. In: TREC Workshop Notes, pp. 631–635 (2003)
Google Scholar
Androutsopoulos, I., Ritchie, G.D., Thanisch, P.: Natural language interfaces to databases – an introduction. Natural Language Engineering 1(1), 29–81 (1995)
Article Google Scholar
Blunschi, L., Jossen, C., Kossmann, D., Mori, M., Stockinger, K.: Data-thirsty business analysts need SODA: Search over data warehouse. In: Macdonald, C., Ounis, I., Ruthven, I. (eds.) Proceedings of the 20th ACM Conference on Information and Knowledge Management, CIKM 2011, Glasgow, United Kingdom, October 24-28, pp. 2525–2528. ACM (2011)
Google Scholar
Chen, Y., Wang, W., Liu, Z., Lin, X.: Keyword search on structured and semi-structured data. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD 2009, pp. 1005–1010. ACM, New York (2009)
Chapter Google Scholar
Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: A system for keyword-based search over relational databases. In: Proceedings of the 18th International Conference on Data Engineering (ICDE), pp. 5–16. IEEE Computer Society, Washington, DC (2002)
Chapter Google Scholar
Bicer, V., Tran, T., Nedkov, R.: Ranking support for keyword search on structured data using relevance models. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 1669–1678. ACM, New York (2011)
Google Scholar
Coffman, J., Weaver, A.C.: Structured data retrieval using cover density ranking. In: Proceedings of the 2nd International Workshop on Keyword Search on Structured Data, KEYS 2010, pp. 1:1–1:6. ACM, New York (2010)
Google Scholar
Garcia-Alvarado, C., Ordonez, C.: Keyword search across databases and documents. In: Proceedings of the 2nd International Workshop on Keyword Search on Structured Data, KEYS 2010, pp. 2:1–2:6. ACM, New York (2010)
Google Scholar
Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: EASE: An effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 903–914. ACM, New York (2008)
Chapter Google Scholar
The World Bank Group: Environment (2012), http://data.worldbank.org/topic/environment (cited March 3, 2013)
Harmancioglu, N.B., Singh, V.P., Alpaslan, M.N. (eds.): Environmental Data Management. Kluwer Academic Publishers, Norwell (1998)
Google Scholar
United Nations Environment Programme: Environmental data explorer (2012), http://geodata.grid.unep.ch/ (cited March 3, 2013)
Kihn, E., Zhizhin, M., Siquig, R., Redmon, R.: The environmental scenario generator (ESG): A distributed environmental data archive analysis tool. Data Science Journal 3, 10–28 (2004)
Article Google Scholar
Bicer, V., Tran, T., Abecker, A., Nedkov, R.: KOIOS: Utilizing semantic search for easy-access and visualization of structured environmental data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part II. LNCS, vol. 7032, pp. 1–16. Springer, Heidelberg (2011)
Chapter Google Scholar
Ribando, J.M., Bonne, G.: A new quality factor: Finding alpha with ASSET4 ESG data, Research Note (2010)
Google Scholar
Paiva, D.S.: A survey of applied natural language generation systems. Technical Report ITRI-98-03, University of Brighton (1998)
Google Scholar
Piwek, P., van Deemter, K.: Constraint-based natural language generation: A survey. Technical report, Open University, Technical Report No. 2006/03 (2006)
Google Scholar
Reiter, E., Dale, R.: Building Natural Language Generation Systems. Studies in Natural Language Processing. Cambridge University Press (2000)
Google Scholar
The Apache Foundation: Hibernate Search (2012), http://www.hibernate.org/subprojects/search.html (cited March 3, 2013)
U.S. Environmental Protection Agency (EPA), Toxic spill data (2013), http://data.gov (cited March 3, 2013)

Download references

Author information

Authors and Affiliations

Catalyst Lab, Thomson Reuters Global Resources, Neuhofstrasse 1, CH-6340, Baar, Switzerland
Jochen L. Leidner & Darya Kamkova

Authors

Jochen L. Leidner
View author publications
You can also search for this author in PubMed Google Scholar
Darya Kamkova
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electronic Systems, Aalborg University, 6700, Esbjerg, Denmark
Henrik Legind Larsen
Department of Computer Science and Artificial Intelligence, University of Granada, 18071, Granada, Spain
Maria J. Martin-Bautista
Department of Computer Science and Arificial IntelIigence, University of Granada, 18071, Granada, Spain
María Amparo Vila
CBIT, Roskilde University, Universitetsvej 1, 4000, Roskilde, Denmark
Troels Andreasen
CBIT, Roskilde University, 4000, Roskilde, Denmark
Henning Christiansen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Leidner, J.L., Kamkova, D. (2013). Making Structured Data Searchable via Natural Language Generation. In: Larsen, H.L., Martin-Bautista, M.J., Vila, M.A., Andreasen, T., Christiansen, H. (eds) Flexible Query Answering Systems. FQAS 2013. Lecture Notes in Computer Science(), vol 8132. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40769-7_43

Download citation

DOI: https://doi.org/10.1007/978-3-642-40769-7_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40768-0
Online ISBN: 978-3-642-40769-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics