Inductive Transfer Learning for Detection of Well-Formed Natural Language Search Queries

  • Bakhtiyar SyedEmail author
  • Vijayasaradhi Indurthi
  • Manish Gupta
  • Manish Shrivastava
  • Vasudeva Varma
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11438)


Users have been trained to type keyword queries on search engines. However, recently there has been a significant rise in the number of verbose queries. Often times such queries are not well-formed. The lack of well-formedness in the query might adversely impact the downstream pipeline which processes these queries. A well-formed natural language question as a search query aids heavily in reducing errors in downstream tasks and further helps in improved query understanding. In this paper, we employ an inductive transfer learning technique by fine-tuning a pretrained language model to identify whether a search query is a well-formed natural language question or not. We show that our model trained on a recently released benchmark dataset spanning 25,100 queries gives an accuracy of 75.03% thereby improving by \(\sim \)5 absolute percentage points over the state-of-the-art.


  1. 1.
    Baeza-Yates, R., Calderón-Benavides, L., González-Caro, C.: The intention behind web queries. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 98–109. Springer, Heidelberg (2006). Scholar
  2. 2.
    Barr, C., Jones, R., Regelson, M.: The linguistic structure of English web-search queries. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1021–1030. Association for Computational Linguistics (2008)Google Scholar
  3. 3.
    Bawa, M., Bayardo Jr., R.J., Rajagopalan, S., Shekita, E.J.: Make it fresh, make it quick: searching a network of personal webservers. In: Proceedings of the 12th International Conference on World Wide Web, pp. 577–586. ACM (2003)Google Scholar
  4. 4.
    Bergsma, S., Wang, Q.I.: Learning noun phrase query segmentation. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (2007)Google Scholar
  5. 5.
    Copestake, A.A., Flickinger, D.: An open source grammar development environment and broad-coverage English grammar using HPSG. In: LREC, Athens, Greece, pp. 591–600 (2000)Google Scholar
  6. 6.
    Dai, A.M., Le, Q.V.: Semi-supervised sequence learning. In: Advances in Neural Information Processing Systems, pp. 3079–3087 (2015)Google Scholar
  7. 7.
    Fader, A., Zettlemoyer, L., Etzioni, O.: Paraphrase-driven learning for open question answering. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1608–1618 (2013)Google Scholar
  8. 8.
    Faruqui, M., Das, D.: Identifying well-formed natural language questions. In: EMNLP (2018, to appear)Google Scholar
  9. 9.
    Gupta, M., Bendersky, M., et al.: Information retrieval with verbose queries. Found. Trends® Inf. Retrieval 9(3–4), 209–354 (2015)CrossRefGoogle Scholar
  10. 10.
    Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 328–339 (2018)Google Scholar
  11. 11.
    Manshadi, M., Li, X.: Semantic tagging of web search queries. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pp. 861–869. Association for Computational Linguistics (2009)Google Scholar
  12. 12.
    Markatos, E.P.: On caching search engine query results. Comput. Commun. 24(2), 137–143 (2001)CrossRefGoogle Scholar
  13. 13.
    Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models. arXiv preprint arXiv:1708.02182 (2017)
  14. 14.
    Mishra, N., Saha Roy, R., Ganguly, N., Laxman, S., Choudhury, M.: Unsupervised query segmentation using only query logs. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 91–92. ACM (2011)Google Scholar
  15. 15.
    Mostafazadeh, N., Misra, I., Devlin, J., Mitchell, M., He, X., Vanderwende, L.: Generating natural questions about an image. arXiv preprint arXiv:1603.06059 (2016)
  16. 16.
    Mou, L., et al.: How transferable are neural networks in NLP applications? arXiv preprint arXiv:1603.06111 (2016)
  17. 17.
    Roy, R.S., Choudhury, M., Bali, K.: Are web search queries an evolving protolanguage? In: The Evolution of Language, pp. 304–311. World Scientific (2012)Google Scholar
  18. 18.
    Yang, J., Hauff, C., Bozzon, A., Houben, G.J.: Asking the right question in collaborative Q&A systems. In: Proceedings of the 25th ACM Conference on Hypertext and Social Media, pp. 179–189. ACM (2014)Google Scholar
  19. 19.
    Yannakoudakis, H., Rei, M., Andersen, Ø.E., Yuan, Z.: Neural sequence-labelling models for grammatical error correction. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 2795–2806 (2017)Google Scholar
  20. 20.
    Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Bakhtiyar Syed
    • 1
    Email author
  • Vijayasaradhi Indurthi
    • 1
  • Manish Gupta
    • 1
    • 2
  • Manish Shrivastava
    • 1
  • Vasudeva Varma
    • 1
  1. 1.IIIT HyderabadHyderabadIndia
  2. 2.MicrosoftHyderabadIndia

Personalised recommendations