Skip to main content

The First Conversational Intelligence Challenge

  • Conference paper
  • First Online:
The NIPS '17 Competition: Building Intelligent Systems

Abstract

The first Conversational Intelligence Challenge was conducted over 2017 with finals at NIPS conference. The challenge IS aimed at evaluating the state of the art in non-goal-driven dialogue systems (chatbots) and collecting a large dataset of human-to-machine and human-to-human conversations manually labelled for quality. We established a task for formal human evaluation of chatbots that allows to test capabilities of chatbot in topic-oriented dialogue. Instead of traditional chit-chat, participating systems and humans were given a task to discuss a short text. Ten dialogue systems participated in the competition. The majority of them combined multiple conversational models such as question answering and chit-chat systems to make conversations more natural. The evaluation of chatbots was performed by human assessors. Almost 1,000 volunteers were attracted and over 4,000 dialogues were collected during the competition. Final score of the dialogue quality for the best bot was 2.7 compared to 3.8 for human. This demonstrates that current technology allows supporting dialogue on a given topic but with quality significantly lower than that of human. To close this gap we plan to continue the experiments by organising the next conversational intelligence competition. This future work will benefit from the data we collected and dialogue systems that we made available after the competition presented in the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://developer.amazon.com/alexaprize

  2. 2.

    https://en.wikipedia.org/wiki/Loebner_Prize

  3. 3.

    https://bibinlp.umiacs.umd.edu/

  4. 4.

    http://turing.tilda.ws/

  5. 5.

    https://telegram.org

  6. 6.

    https://messenger.com

  7. 7.

    https://mongodb.com

  8. 8.

    https://github.com/deepmipt/convai-testing-system

  9. 9.

    https://github.com/DeepPavlov/convai/tree/master/2017/solutions

  10. 10.

    https://github.com/sld/convai-bot-1337

  11. 11.

    Unfortunately, we were not able to collect any more dialogues during the round

References

  • Bordes, A. and Weston, J. (2016). Learning end-to-end goal-oriented dialog. CoRR, abs/1605.07683.

    Google Scholar 

  • Chorowski, J., Łańcucki, A., Malik, S., Pawlikowski, M., Rychlikowski, P., and Zykowski, P. (2018). A Talker Ensemble: University of Wrocaw entry to the NIPS 2017 Conversational Intelligence Challenge. NIPS 2017 Competition track Springer Proceedings.

    Google Scholar 

  • Lavie, A. and Agarwal, A. (2007). Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation, StatMT ’07, pages 228–231, Stroudsburg, PA, USA.

    Chapter  Google Scholar 

  • Li, J., Galley, M., Brockett, C., Gao, J., and Dolan, B. (2016). A persona-based neural conversation model. CoRR, abs/1603.06155.

    Google Scholar 

  • Liu, C., Lowe, R., Serban, I. V., Noseworthy, M., Charlin, L., and Pineau, J. (2016). How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. CoRR, abs/1603.08023.

    Google Scholar 

  • Logacheva, V., Burtsev, M., Malykh, V., Polulyakh, V., and Seliverstov, A. (2018). ConvAI Dataset of Topic-Oriented Human-to-Chatbot Dialogues. NIPS 2017 Competition track Springer Proceedings.

    Google Scholar 

  • Lowe, R., Noseworthy, M., Serban, I. V., Angelard-Gontier, N., Bengio, Y., and Pineau, J. (2017). Towards an automatic turing test: Learning to evaluate dialogue responses. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1116–1126, Vancouver, Canada.

    Chapter  Google Scholar 

  • Lowe, R., Serban, I. V., Noseworthy, M., Charlin, L., and Pineau, J. (2016). On the evaluation of dialogue systems with next utterance classification. CoRR, abs/1605.05414.

    Google Scholar 

  • Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 311–318, Stroudsburg, PA, USA.

    Google Scholar 

  • Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas.

    Chapter  Google Scholar 

  • Serban, I. V., Sankar, C., Germain, M., Zhang, S., Lin, Z., Subramanian, S., Kim, T., Pieper, M., Chandar, S., Ke, N. R., Mudumba, S., de Brébisson, A., Sotelo, J., Suhubdy, D., Michalski, V., Nguyen, A., Pineau, J., and Bengio, Y. (2017). A deep reinforcement learning chatbot. CoRR, abs/1709.02349.

    Google Scholar 

  • Serban, I. V., Sordoni, A., Bengio, Y., Courville, A. C., and Pineau, J. (2015). Hierarchical neural network generative models for movie dialogues. CoRR, abs/1507.04808.

    Google Scholar 

  • Shen, X., Su, H., Li, Y., Li, W., Niu, S., Zhao, Y., Aizawa, A., and Long, G. (2017). A conditional variational framework for dialog generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 504–509.

    Google Scholar 

  • Sukhbaatar, S., Szlam, A., Weston, J., and Fergus, R. (2015). End-to-end memory networks. In NIPS-2015: Proceedings of the 28th International Conference on Neural Information Processing Systems, pages 2440–2448.

    Google Scholar 

  • Yu, Z., Xu, Z., Black, A. W., and Rudnicky, A. I. (2016). Chatbot Evaluation and Database Expansion via Crowdsourcing. In WOCHAT workshop at IVA-2016, Los Angeles, California.

    Google Scholar 

Download references

Acknowledgements

Participation of MB, VL and VM was supported by National Technology Initiative and PAO Sberbank project ID 0000000007417F630002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mikhail Burtsev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Burtsev, M. et al. (2018). The First Conversational Intelligence Challenge. In: Escalera, S., Weimer, M. (eds) The NIPS '17 Competition: Building Intelligent Systems. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-94042-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94042-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94041-0

  • Online ISBN: 978-3-319-94042-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics