AQA: Automatic Question Answering System for Czech

Medved’, Marek; Horák, Aleš

doi:10.1007/978-3-319-45510-5_31

AQA: Automatic Question Answering System for Czech

Marek Medved’¹⁷ &
Aleš Horák¹⁷

Conference paper
First Online: 03 September 2016

1758 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9924))

Abstract

Question answering (QA) systems have become popular nowadays, however, a majority of them concentrates on the English language and most of them are oriented to a specific limited problem domain.

In this paper, we present a new question answering system called AQA (Automatic Question Answering). AQA is an open-domain QA system which allows users to ask all common questions related to a selected text collection. The first version of the AQA system is developed and tested for the Czech language, but we also plan to include more languages in future versions.

The AQA strategy consists of three main parts: question processing, answer selection and answer extraction. All modules are syntax-based with advanced scoring obtained by a combination of TF-IDF, tree distance between the question and candidate answers and other selected criteria. The answer extraction module utilizes named entity recognizer which allows the system to catch entities that are most likely to answer the question.

Evaluation of the AQA system is performed on a previously published Simple Question-Answering Database, or SQAD, with more than 3,000 question-answer pairs.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
One text token per line with multiple attributes separated by tabs.
2.
In the current version, we use the morphological analyser Majka [5, 11] disambiguated by the DESAMB [7] tagger.
3.
TF-IDF stands for Term Frequency-Inverse Document Frequency.

References

Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Chapter Google Scholar
Fader, A., Zettlemoyer, L., Etzioni, O.: Open question answering over curated and extracted knowledge bases. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1156–1165. ACM (2014)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, pp. 363–370. Association for Computational Linguistics, Stroudsburg (2005). http://dx.doi.org/10.3115/1219840.1219885
Horák, A., Medved’, M.: SQAD: simple question answering database. In: Eighth Workshop on Recent Advances in Slavonic Natural Language Processing, pp. 121–128. Tribun EU, Brno (2014)
Google Scholar
Jakubíček, M., Kovář, V., Šmerk, P.: Czech morphological tagset revisited. In: Proceedings of Recent Advances in Slavonic Natural Language Processing 2011, pp. 29–42 (2011)
Google Scholar
Kovář, V., Horák, A., Jakubíček, M.: Syntactic analysis using finite patterns: a new parsing system for Czech. In: Vetulani, Z. (ed.) LTC 2009. LNCS, vol. 6562, pp. 161–171. Springer, Heidelberg (2011)
Google Scholar
Šmerk, P.: Towards morphological disambiguation of Czech (2007)
Google Scholar
Ševčíková, M., Žabokrtský, Z., Straková, J., Straka, M.: Czech named entity corpus 1.1 (2014). http://hdl.handle.net/11858/00-097C-0000-0023-1B04-C, LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague
Shtok, A., Dror, G., Maarek, Y., Szpektor, I.: Learning from the past: answering new questions with past answers. In: Proceedings of the 21st International Conference on World Wide Web, pp. 759–768. ACM (2012)
Google Scholar
Yih, W.T., He, X., Meek, C.: Semantic parsing for single-relation question answering. In: Proceedings of ACL 2014, vol. 2, pp. 643–648. Citeseer (2014)
Google Scholar
Šmerk, P.: Fast morphological analysis of Czech. In: Proceedings of the RASLAN Workshop 2009, Brno (2009)
Google Scholar

Download references

Acknowledgments

This work has been partly supported by the Grant Agency of CR within the project 15-13277S. The research leading to these results has received funding from the Norwegian Financial Mechanism 2009–2014 and the Ministry of Education, Youth and Sports under Project Contract no. MSMT-28477/2014 within the HaBiT Project 7F14047.

Author information

Authors and Affiliations

Faculty of Informatics, Natural Language Processing Centre, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Marek Medved’ & Aleš Horák

Authors

Marek Medved’
View author publications
You can also search for this author in PubMed Google Scholar
Aleš Horák
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marek Medved’ .

Editor information

Editors and Affiliations

Masaryk University , Brno, Czech Republic
Petr Sojka
Masaryk University , Brno, Czech Republic
Aleš Horák
Masaryk University , Brno, Czech Republic
Ivan Kopeček
Masaryk University , Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Medved’, M., Horák, A. (2016). AQA: Automatic Question Answering System for Czech. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-45510-5_31
Published: 03 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45509-9
Online ISBN: 978-3-319-45510-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics