Quora Question Answer Dataset

  • Ahmad AghaebrahimianEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10415)


We report on a progressing work for compiling Quora Question Answer dataset. Quora dataset is composed of questions which are posed in Quora Question Answering site. It is the only dataset which provides sentence-level and word-level answers at the same time. Moreover, the questions in the dataset are authentic which is much more realistic for Question Answering systems. We test the performance of a state-of-the-art Question Answering system on the dataset and compare it with human performance to establish an upper bound.


Dataset Question answering Sentence-level answer Word-level answer 



This research was partially funded by the Ministry of Education, Youth and Sports of the Czech Republic under SVV project number 260 453, core research funding, and GAUK 207-10/250098 of Charles University in Prague.


  1. 1.
    Aghaebrahimian, A.: Constrained deep answer sentence selection. In: Proceedings of the 20th International Conference on Text, Speech and Dialogue (TSD) (2017)Google Scholar
  2. 2.
    Aghaebrahimian, A., Jurčíček, F.: Open-domain factoid question answering via knowledge graph search. In: Proceedings of the Workshop on Human-Computer Question Answering, The North American Chapter of the Association for Computational Linguistics (NAACL) (2016)Google Scholar
  3. 3.
    Bollacker, K., Tufts, P., Pierce, T., Robert, C.: A platform for scalable, collaborative, structured information integration. In: Proceedings of the Sixth International Workshop on Information Integration on the Web (2007)Google Scholar
  4. 4.
    Bordes, A., Usunier, N., Chopra, S., Weston, J.: Large-scale simple question answering with memory networks. arxiv:1506.02075 (2015)
  5. 5.
    Hermann, K.M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., Blunsom, P.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems (2015)Google Scholar
  6. 6.
    Hill, F., Bordes, A., Chopra, S., Weston, J.: The goldilocks principle: reading children’s books with explicit memory representations. arxiv:1511.02301 (2015)
  7. 7.
    Kadlec, R., Schmid, M., Bajgar, O., Kleindienst, J.: Text understanding with the attention sum reader network. In: Proceedings of the Association for Computational Linguistics (2016)Google Scholar
  8. 8.
    Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. arxiv:1606.05250 (2016)
  9. 9.
    Rao, J., He, H., Lin, J.: Noise-contrastive estimation for answer selection with deep neural networks. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (2016)Google Scholar
  10. 10.
    Richardson, M., Burges, J.C., C., Erin, R.: MCTest: a challenge dataset for the open-domain machine comprehension of text. In: Empirical Methods in Natural Language Processing (EMNLP) (2013)Google Scholar
  11. 11.
    Santos, C.D., Tan, M., Xiang, B., Zhou, B.: Attentive pooling networks. arXiv:1602.03609v1 (2016)
  12. 12.
    Voorhees, E.M., Tice, D.M.: Building a question answering test collection. In: ACM Special Interest Group on Information Retreival (SIGIR) (2000)Google Scholar
  13. 13.
    Yang, Y., Yih, S.W.T., Meek, C.: WikiQA: a challenge dataset for open-domain question answering. In: Empirical Methods in Natural Language Processing (EMNLP) (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Faculty of Mathematics and Physics, Institute of Formal and Applied LinguisticsCharles University in PraguePraha 1Czech Republic

Personalised recommendations