Skip to main content
Log in

Personalized weather information for low-literate farmers using multimodal dialog systems

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Speech-based services over simple mobile phones are a viable way of providing information-access to under-served populations (low-literate, low-income, tech-shy, handicapped, linguistic minority, marginalized). Despite the simplicity and flexibility of speech input, telephone-based information services commonly rely on push-button (DTMF) input. This is primarily because high accuracy automatic speech recognition (ASR), that is essential for an end-to-end spoken interaction, is not available for several languages in developing regions. We share findings from an HCI design intervention for a dialog system-based weather information service for farmers in Pakistan. We demonstrate that a high accuracy ASR alone is not sufficient for effective, inclusive speech interfaces. We present the details of the iterative improvement of the existing service that had low task success rate (37.8%) despite being based on a very high accuracy ASR (trained on target language speech data). Based on a deployment spanning 23,997 phone calls from 6893 users over 10 months, we show that as multimodal input, user adaptation and context-specific help were added to supplement the ASR, the task success rate increased to 96.3%. Following this intervention, the service was made the national weather hotline of Pakistan.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

Speech corpus is available publiclyFootnote 1.

Notes

  1. http://www.cle.org.pk/clestore/speechcorpus.htm

References

  • Ahmad, S. S. O., Naseem, M., & Raza, A. A. (2017). Maternal awareness for low-literate expecting parents via voice-based telephone services. HCI.

  • Batool, A., Razaq, S., Javaid, M., Fatima, B., & Toyama, K. (2017). Maternal complications: nuances in mobile interventions for maternal health in Urban Pakistan. In Proceedings of the Ninth International Conference on Information and Communication Technologies and Development (p. 3). ACM.

  • Bohus, D., & Rudnicky, A. (2005). LARRI: A language-based maintenance and repair assistant. Spoken Multimodal Human-Computer Dialogue in Mobile Environments, 203–218.

  • Bratt, H., Dowding, J., & Hunicke-Smith, K. (1995). The SRI telephone ATIS system. In Proceedings of the Spoken Language Systerns Technology Workshop (pp. 218–220).

  • Cuendet, S., Medhi, I., Bali, K., & Cutrell, E. (2013). VideoKheti: Making video content accessible to low-literate and novice users. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 2833–2842). ACM.

  • Ejaz, H., Hussain, S. A., & Raza, A. A. (2018). The case for IVR-based citizen journalism in Pakistan. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct (pp. 87–94). ACM.

  • Gram Vaani. (2017). Retrieved from http://www.gramvaani.org/.

  • Grover, A. S., Plauché, M., Barnard, E., & Kuun, C. (2009). HIV health information access using spoken dialogue systems: Touchtone vs. speech. In Information and Communication Technologies and Development (ICTD), 2009 International Conference on (pp. 95–107). IEEE.

  • Gulaid, M., & Vashistha, A. (2013). Ila Dhageyso: an interactive voice forum to foster transparent governance in Somaliland. In Proceedings of the Sixth International Conference on Information and Communications Technologies and Development: Notes-Volume 2 (pp. 41–44). ACM.

  • Lee, K. M., & Lai, J. (2005). Speech versus touch: A comparative study of the use of speech and DTMF keypad for navigation. International Journal of Human-Computer Interaction, 19(3), 343–360.

    Article  Google Scholar 

  • Litman, D. J., & Silliman, S. (2004). ITSPOKE: An intelligent tutoring spoken dialogue system. In Demonstration papers at HLT-NAACL 2004 (pp. 5–8). Association for Computational Linguistics.

  • Maneesha, V., & Abhishek, B. (2014). Innovative IVR system for farmers: Enhancing ICT adoption.

  • McTear, M. F. (2002). Spoken dialogue technology: Enabling the conversational user interface. ACM Computing Surveys (CSUR), 34(1), 90–169.

    Article  Google Scholar 

  • Medhi, I., Sagar, A., & Toyama, K. (2006). Text-free user interfaces for illiterate and semi-literate users. In Information and Communication Technologies and Development, 2006. ICTD’06. International Conference on (pp. 72–82). IEEE.

  • Moitra, A., Das, V., Vaani, G., Kumar, A., & Seth, A. (2016). Design lessons from creating a mobile-based community media platform in Rural India. In Proceedings of the Eighth International Conference on Information and Communication Technologies and Development (pp. 1–11).

  • Mudliar, P., Donner, J., & Thies, W. (2012). Emergent practices around CGNet Swara, voice forum for citizen journalism in rural India. In Proceedings of the Fifth International Conference on Information and Communication Technologies and Development (pp. 159–168). ACM.

  • Pakistan Bureau of Statistics. (2019). Retrieved from http://www.pbs.gov.pk/content/agriculture-statistics.

  • Pakistan Telecommunication Authority. (2019). Retrieved from https://www.pta.gov.pk//en/telecom-indicators.

  • Patel, N., Agarwal, S., Rajput, N., Nanavati, A., Dave, P., & Parikh, T. S. (2009). A comparative study of speech and dialed input voice interfaces in rural India. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 51–54). ACM.

  • Patel, N., Chittamuru, D., Jain, A., Dave, P., & Parikh, T. S. (2010). Avaaj otalo: a field study of an interactive voice forum for small farmers in rural india. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 733–742). ACM.

  • Pellom, B., Ward, W., Hansen, J., Cole, R., Hacioglu, K., Zhang, J., et al (2001). University of Colorado dialog systems for travel and navigation. In Proceedings of the first international conference on Human language technology research (pp. 1–6). Association for Computational Linguistics.

  • Qasim, M., Hussain, S., Habib, T., & Rahman, S. U. (2016a). Spoken dialog system framework supporting multiple concurrent sessions. In 2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (OCOCOSDA) (pp. 116–121). IEEE.

  • Qasim, M., Nawaz, S., Hussain, S., & Habib, T. (2016b). Urdu speech recognition system for district names of Pakistan: Development, challenges and solutions. In 2016 Conference of the Criental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA) (pp. 28–32). IEEE.

  • Qiao, F., Sherwani, J., & Rosenfeld, R. (2010). Small-vocabulary speech recognition for resource-scarce languages. In Proceedings of the First ACM Symposium on Computing for Development (p. 3). ACM.

  • Rauf, S., Hameed, A., Habib, T., & Hussain, S. (2015). District names speech corpus for pakistani languages. In 2015 International Conference Oriental COCOSDA Held Jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) (pp. 207–211). IEEE.

  • Raza, A. A., Milo, C., Alster, G., Sherwani, J., Pervaiz, M., Razaq, S., et al. (2012). Viral Entertainment as a vehicle for disseminating speech-based services to low-literate users. In International Conference on Information and Communication Technologies and Development (ICTD) (Vol. 2).

  • Raza, A. A., Saleem, B., Randhawa, S., Tariq, Z., Athar, A., Saif, U., et al. (2018). Baang: a viral speech-based social platform for under-connected populations. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (p. 643). ACM.

  • Raza, A. A., Tariq, Z., Randhawa, S., Saleem, B., Athar, A., Saif, U., et al. (2019). Voice-based quizzes for measuring knowledge retention in under-connected populations. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (p. 412). ACM.

  • Raza, A. A., Ul Haq, F., Tariq, Z., Pervaiz, M., Razaq, S., Saif, U., et al. (2013). Job opportunities through entertainment: virally spread speech-based services for low-literate users. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 2803–2812). ACM.

  • Reda, A., Panjwani, S., & Cutrell, E. (2011). Hyke: a low-cost remote attendance tracking system for developing regions. In Proceedings of the 5th ACM workshop on Networked systems for developing regions (pp. 15–20). ACM.

  • Roche, R., Hladilek, E., & Reid, S. (2006). Disaster recovery virtual roll call and recovery management system. Google Patents.

  • Rocheleau, B., & Wu, L. (2005). E-Government and financial transactions: Potential versus reality. The Electronic Journal of E-Government, 3(4), 219–230.

    Google Scholar 

  • Seneff, S., & Polifroni, J. (2000). Dialogue management in the Mercury flight reservation system. In Proceedings of the 2000 ANLP/NAACL Workshop on Conversational systems-Volume 3 (pp. 11–16). Association for Computational Linguistics.

  • Sharma Grover, A., Stewart, O., & Lubensky, D. (2009). Designing interactive voice response (IVR) interfaces: Localisation for low literacy users.

  • Sherwani, J. (2009). Speech interfaces for information access by low literate users (PhD Thesis). Carnegie Mellon University.

  • Sherwani, J., Ali, N., Mirza, S., Fatma, A., Memon, Y., Karim, M., et al. (2007). Healthline: Speech-based access to health information by low-literate users. In Information and Communication Technologies and Development, 2007. ICTD 2007. International Conference on (pp. 1–9). IEEE.

  • Sherwani, J., Palijo, S., Mirza, S., Ahmed, T., Ali, N., & Rosenfeld, R. (2009). Speech vs. touch-tone: Telephony interfaces for information access by low literate users. In Information and Communication Technologies and Development (ICTD), 2009 International Conference on (pp. 447–457). IEEE.

  • Swaminathan, S., Medhi Thies, I., Mehta, D., Cutrell, E., Sharma, A., & Thies, W. (2019). Learn2Earn: Using mobile airtime incentives to bolster public awareness campaigns. Proceedings of the ACM on Human-Computer Interaction, 3, 1–20.

    Article  Google Scholar 

  • The World Factbook—Central Intelligence Agency. (2017). Retrieved from https://www.cia.gov/library/publications/the-world-factbook/fields/2103.html#136.

  • Thies, I. M., & others. (2015). User interface design for low-literate and novice users: Past, present and future. Foundations and Trends® in Human–Computer Interaction, 8(1), 1–72.

  • Vashistha, A., Cutrell, E., Borriello, G., & Thies, W. (2015). Sangeet swara: A community-moderated voice forum in rural india. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (pp. 417–426). ACM.

  • Vashistha, A., Sethi, P., & Anderson, R. (2017). Respeak: A voice-based, crowd-powered speech transcription system. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 1855–1866). ACM.

  • Vashistha, A., Sethi, P., & Anderson, R. (2018). BSpeak: An accessible crowdsourcing marketplace for low-income blind people. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM.

  • Vashistha, A., Garg, A., & Anderson, R. (2019). ReCall: Crowdsourcing on basic phones to financially sustain voice forums. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1–13).

  • Vashistha, A., Saif, U., & Raza, A. A. (2019). The internet of the orals. Communications of the ACM, 62(11), 100–103.

    Article  Google Scholar 

  • Wang, H., & Singhal, A. (2018). Audience-centered discourses in communication and social change: The ‘Voicebook’of Main Kuch Bhi Kar Sakti Hoon, an entertainment-education initiative in India. Journal of Multicultural Discourses, 13(2), 176–191.

    Article  Google Scholar 

  • White, J., Duggirala, M., Kummamuru, K., & Srivastava, S. (2012). Designing a voice-based employment exchange for rural India. In Proceedings of the Fifth International Conference on Information and Communication Technologies and Development (pp. 367–373). ACM.

  • Wolfe, N., Hong, J., Raza, A. A., Raj, B., & Rosenfeld, R. (2015). Rapid development of public health education systems in low-literacy multilingual environments: Combating ebola through voice messaging. In ISCA Special Interest Group on Speech and Language Technology in Education (SLaTE). INTERSPEECH.

  • Zainudeen, A., Samarajiva, R., & Sivapragasam, N. (2010). Cellbazaar, a mobile-based e-marketplace: Success factors and potential for expansion.

  • Zue, V., Seneff, S., Glass, J. R., Polifroni, J., Pao, C., Hazen, T. J., & Hetherington, L. (2000). JUPlTER: A telephone-based conversational interface for weather information. IEEE Transactions on Speech and Audio Processing, 8(1), 85–96.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haris Bin Zia.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qasim, M., Zia, H.B., Athar, A. et al. Personalized weather information for low-literate farmers using multimodal dialog systems. Int J Speech Technol 24, 455–471 (2021). https://doi.org/10.1007/s10772-021-09806-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-021-09806-2

Keywords

Navigation