Skip to main content

Making Structured Data Searchable via Natural Language Generation

with an Application to ESG Data

  • Conference paper
Flexible Query Answering Systems (FQAS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8132))

Included in the following conference series:

Abstract

Relational Databases are used to store structured data, which is typically accessed using report builders based on SQL queries. To search, forms need to be understood and filled out, which demands a high cognitive load. Due to the success of Web search engines, users have become acquainted with the easier mechanism of natural language search for accessing unstructured data. However, such keyword-based search methods are not easily applicable to structured data, especially where structured records contain non-textual content such as numbers.

We present a method to make structured data, including numeric data, searchable with a Web search engine-like keyword search access mechanism. Our method is based on the creation of surrogate text documents using Natural Language Generation (NLG) methods that can then be retrieved by off-the-shelf search methods.

We demonstrate that this method is effective by applying it to two real-life sized databases, a proprietary database comprising corporate Environmental, Social and Governance (ESG) data and a public-domain environmental pollution database, respectively, in a federated scenario. Our evaluation includes speed and index size investigations, and indicates effectiveness (P@1 = 84%, P@5 = 92%) and practicality of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cafarella, M.J., Halevy, A., Madhavan, J.: Structured data on the Web. Communications of the ACM 54(2), 72–79 (2011)

    Article  Google Scholar 

  2. KPMG: KPMG International Survey of Corporate Responsibility Reporting 2008 (2008), http://ec.europa.eu/enterprise/policies/sustainable-business/corporate-social-responsibility/reporting-disclosure/swedish-presidency/files/surveys_and_reports/international_survey_of_csr_reporting_2008_-_kpmg_en.pdf (cited March 3, 2013)

  3. Manning, C.D., Raghavan, P., Schütze, H.: An Introduction to Information Retrieval. Cambridge University Press (2008)

    Google Scholar 

  4. Leidner, J.L., Bos, J., Dalmas, T., Curran, J.R., Clark, S., Bannard, C.J., Webber, B.L., Steedman, M.: QED: The Edinburgh TREC-2003 question answering system. In: TREC Workshop Notes, pp. 631–635 (2003)

    Google Scholar 

  5. Androutsopoulos, I., Ritchie, G.D., Thanisch, P.: Natural language interfaces to databases – an introduction. Natural Language Engineering 1(1), 29–81 (1995)

    Article  Google Scholar 

  6. Blunschi, L., Jossen, C., Kossmann, D., Mori, M., Stockinger, K.: Data-thirsty business analysts need SODA: Search over data warehouse. In: Macdonald, C., Ounis, I., Ruthven, I. (eds.) Proceedings of the 20th ACM Conference on Information and Knowledge Management, CIKM 2011, Glasgow, United Kingdom, October 24-28, pp. 2525–2528. ACM (2011)

    Google Scholar 

  7. Chen, Y., Wang, W., Liu, Z., Lin, X.: Keyword search on structured and semi-structured data. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD 2009, pp. 1005–1010. ACM, New York (2009)

    Chapter  Google Scholar 

  8. Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: A system for keyword-based search over relational databases. In: Proceedings of the 18th International Conference on Data Engineering (ICDE), pp. 5–16. IEEE Computer Society, Washington, DC (2002)

    Chapter  Google Scholar 

  9. Bicer, V., Tran, T., Nedkov, R.: Ranking support for keyword search on structured data using relevance models. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 1669–1678. ACM, New York (2011)

    Google Scholar 

  10. Coffman, J., Weaver, A.C.: Structured data retrieval using cover density ranking. In: Proceedings of the 2nd International Workshop on Keyword Search on Structured Data, KEYS 2010, pp. 1:1–1:6. ACM, New York (2010)

    Google Scholar 

  11. Garcia-Alvarado, C., Ordonez, C.: Keyword search across databases and documents. In: Proceedings of the 2nd International Workshop on Keyword Search on Structured Data, KEYS 2010, pp. 2:1–2:6. ACM, New York (2010)

    Google Scholar 

  12. Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: EASE: An effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 903–914. ACM, New York (2008)

    Chapter  Google Scholar 

  13. The World Bank Group: Environment (2012), http://data.worldbank.org/topic/environment (cited March 3, 2013)

  14. Harmancioglu, N.B., Singh, V.P., Alpaslan, M.N. (eds.): Environmental Data Management. Kluwer Academic Publishers, Norwell (1998)

    Google Scholar 

  15. United Nations Environment Programme: Environmental data explorer (2012), http://geodata.grid.unep.ch/ (cited March 3, 2013)

  16. Kihn, E., Zhizhin, M., Siquig, R., Redmon, R.: The environmental scenario generator (ESG): A distributed environmental data archive analysis tool. Data Science Journal 3, 10–28 (2004)

    Article  Google Scholar 

  17. Bicer, V., Tran, T., Abecker, A., Nedkov, R.: KOIOS: Utilizing semantic search for easy-access and visualization of structured environmental data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part II. LNCS, vol. 7032, pp. 1–16. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  18. Ribando, J.M., Bonne, G.: A new quality factor: Finding alpha with ASSET4 ESG data, Research Note (2010)

    Google Scholar 

  19. Paiva, D.S.: A survey of applied natural language generation systems. Technical Report ITRI-98-03, University of Brighton (1998)

    Google Scholar 

  20. Piwek, P., van Deemter, K.: Constraint-based natural language generation: A survey. Technical report, Open University, Technical Report No. 2006/03 (2006)

    Google Scholar 

  21. Reiter, E., Dale, R.: Building Natural Language Generation Systems. Studies in Natural Language Processing. Cambridge University Press (2000)

    Google Scholar 

  22. The Apache Foundation: Hibernate Search (2012), http://www.hibernate.org/subprojects/search.html (cited March 3, 2013)

  23. U.S. Environmental Protection Agency (EPA), Toxic spill data (2013), http://data.gov (cited March 3, 2013)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Leidner, J.L., Kamkova, D. (2013). Making Structured Data Searchable via Natural Language Generation. In: Larsen, H.L., Martin-Bautista, M.J., Vila, M.A., Andreasen, T., Christiansen, H. (eds) Flexible Query Answering Systems. FQAS 2013. Lecture Notes in Computer Science(), vol 8132. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40769-7_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40769-7_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40768-0

  • Online ISBN: 978-3-642-40769-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics