Skip to main content

AQA: Automatic Question Answering System for Czech

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9924))

Abstract

Question answering (QA) systems have become popular nowadays, however, a majority of them concentrates on the English language and most of them are oriented to a specific limited problem domain.

In this paper, we present a new question answering system called AQA (Automatic Question Answering). AQA is an open-domain QA system which allows users to ask all common questions related to a selected text collection. The first version of the AQA system is developed and tested for the Czech language, but we also plan to include more languages in future versions.

The AQA strategy consists of three main parts: question processing, answer selection and answer extraction. All modules are syntax-based with advanced scoring obtained by a combination of TF-IDF, tree distance between the question and candidate answers and other selected criteria. The answer extraction module utilizes named entity recognizer which allows the system to catch entities that are most likely to answer the question.

Evaluation of the AQA system is performed on a previously published Simple Question-Answering Database, or SQAD, with more than 3,000 question-answer pairs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    One text token per line with multiple attributes separated by tabs.

  2. 2.

    In the current version, we use the morphological analyser Majka [5, 11] disambiguated by the DESAMB [7] tagger.

  3. 3.

    TF-IDF stands for Term Frequency-Inverse Document Frequency.

References

  1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  2. Fader, A., Zettlemoyer, L., Etzioni, O.: Open question answering over curated and extracted knowledge bases. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1156–1165. ACM (2014)

    Google Scholar 

  3. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, pp. 363–370. Association for Computational Linguistics, Stroudsburg (2005). http://dx.doi.org/10.3115/1219840.1219885

  4. Horák, A., Medved’, M.: SQAD: simple question answering database. In: Eighth Workshop on Recent Advances in Slavonic Natural Language Processing, pp. 121–128. Tribun EU, Brno (2014)

    Google Scholar 

  5. Jakubíček, M., Kovář, V., Šmerk, P.: Czech morphological tagset revisited. In: Proceedings of Recent Advances in Slavonic Natural Language Processing 2011, pp. 29–42 (2011)

    Google Scholar 

  6. Kovář, V., Horák, A., Jakubíček, M.: Syntactic analysis using finite patterns: a new parsing system for Czech. In: Vetulani, Z. (ed.) LTC 2009. LNCS, vol. 6562, pp. 161–171. Springer, Heidelberg (2011)

    Google Scholar 

  7. Šmerk, P.: Towards morphological disambiguation of Czech (2007)

    Google Scholar 

  8. Ševčíková, M., Žabokrtský, Z., Straková, J., Straka, M.: Czech named entity corpus 1.1 (2014). http://hdl.handle.net/11858/00-097C-0000-0023-1B04-C, LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague

  9. Shtok, A., Dror, G., Maarek, Y., Szpektor, I.: Learning from the past: answering new questions with past answers. In: Proceedings of the 21st International Conference on World Wide Web, pp. 759–768. ACM (2012)

    Google Scholar 

  10. Yih, W.T., He, X., Meek, C.: Semantic parsing for single-relation question answering. In: Proceedings of ACL 2014, vol. 2, pp. 643–648. Citeseer (2014)

    Google Scholar 

  11. Šmerk, P.: Fast morphological analysis of Czech. In: Proceedings of the RASLAN Workshop 2009, Brno (2009)

    Google Scholar 

Download references

Acknowledgments

This work has been partly supported by the Grant Agency of CR within the project 15-13277S. The research leading to these results has received funding from the Norwegian Financial Mechanism 2009–2014 and the Ministry of Education, Youth and Sports under Project Contract no. MSMT-28477/2014 within the HaBiT Project 7F14047.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marek Medved’ .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Medved’, M., Horák, A. (2016). AQA: Automatic Question Answering System for Czech. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45510-5_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45509-9

  • Online ISBN: 978-3-319-45510-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics