Personalized weather information for low-literate farmers using multimodal dialog systems

Qasim, Muhammad; Zia, Haris Bin; Athar, Awais; Habib, Tania; Raza, Agha Ali

doi:10.1007/s10772-021-09806-2

Personalized weather information for low-literate farmers using multimodal dialog systems

Published: 01 February 2021

Volume 24, pages 455–471, (2021)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Muhammad Qasim¹,
Haris Bin Zia²,
Awais Athar³,
Tania Habib¹ &
…
Agha Ali Raza⁴

303 Accesses
3 Citations
Explore all metrics

Abstract

Speech-based services over simple mobile phones are a viable way of providing information-access to under-served populations (low-literate, low-income, tech-shy, handicapped, linguistic minority, marginalized). Despite the simplicity and flexibility of speech input, telephone-based information services commonly rely on push-button (DTMF) input. This is primarily because high accuracy automatic speech recognition (ASR), that is essential for an end-to-end spoken interaction, is not available for several languages in developing regions. We share findings from an HCI design intervention for a dialog system-based weather information service for farmers in Pakistan. We demonstrate that a high accuracy ASR alone is not sufficient for effective, inclusive speech interfaces. We present the details of the iterative improvement of the existing service that had low task success rate (37.8%) despite being based on a very high accuracy ASR (trained on target language speech data). Based on a deployment spanning 23,997 phone calls from 6893 users over 10 months, we show that as multimodal input, user adaptation and context-specific help were added to supplement the ASR, the task success rate increased to 96.3%. Following this intervention, the service was made the national weather hotline of Pakistan.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development and Evaluation of a Voice Command App for Smartphone Interaction Using the Speech-to-Text API

Multimodal HALEF: An Open-Source Modular Web-Based Multimodal Dialog Framework

Spoken Dialog System in Bodo Language for Agro Services

Data availability

Speech corpus is available publicly^{Footnote 1}.

Notes

http://www.cle.org.pk/clestore/speechcorpus.htm

References

Ahmad, S. S. O., Naseem, M., & Raza, A. A. (2017). Maternal awareness for low-literate expecting parents via voice-based telephone services. HCI.
Batool, A., Razaq, S., Javaid, M., Fatima, B., & Toyama, K. (2017). Maternal complications: nuances in mobile interventions for maternal health in Urban Pakistan. In Proceedings of the Ninth International Conference on Information and Communication Technologies and Development (p. 3). ACM.
Bohus, D., & Rudnicky, A. (2005). LARRI: A language-based maintenance and repair assistant. Spoken Multimodal Human-Computer Dialogue in Mobile Environments, 203–218.
Bratt, H., Dowding, J., & Hunicke-Smith, K. (1995). The SRI telephone ATIS system. In Proceedings of the Spoken Language Systerns Technology Workshop (pp. 218–220).
Cuendet, S., Medhi, I., Bali, K., & Cutrell, E. (2013). VideoKheti: Making video content accessible to low-literate and novice users. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 2833–2842). ACM.
Ejaz, H., Hussain, S. A., & Raza, A. A. (2018). The case for IVR-based citizen journalism in Pakistan. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct (pp. 87–94). ACM.
Gram Vaani. (2017). Retrieved from http://www.gramvaani.org/.
Grover, A. S., Plauché, M., Barnard, E., & Kuun, C. (2009). HIV health information access using spoken dialogue systems: Touchtone vs. speech. In Information and Communication Technologies and Development (ICTD), 2009 International Conference on (pp. 95–107). IEEE.
Gulaid, M., & Vashistha, A. (2013). Ila Dhageyso: an interactive voice forum to foster transparent governance in Somaliland. In Proceedings of the Sixth International Conference on Information and Communications Technologies and Development: Notes-Volume 2 (pp. 41–44). ACM.
Lee, K. M., & Lai, J. (2005). Speech versus touch: A comparative study of the use of speech and DTMF keypad for navigation. International Journal of Human-Computer Interaction, 19(3), 343–360.
Article Google Scholar
Litman, D. J., & Silliman, S. (2004). ITSPOKE: An intelligent tutoring spoken dialogue system. In Demonstration papers at HLT-NAACL 2004 (pp. 5–8). Association for Computational Linguistics.
Maneesha, V., & Abhishek, B. (2014). Innovative IVR system for farmers: Enhancing ICT adoption.
McTear, M. F. (2002). Spoken dialogue technology: Enabling the conversational user interface. ACM Computing Surveys (CSUR), 34(1), 90–169.
Article Google Scholar
Medhi, I., Sagar, A., & Toyama, K. (2006). Text-free user interfaces for illiterate and semi-literate users. In Information and Communication Technologies and Development, 2006. ICTD’06. International Conference on (pp. 72–82). IEEE.
Moitra, A., Das, V., Vaani, G., Kumar, A., & Seth, A. (2016). Design lessons from creating a mobile-based community media platform in Rural India. In Proceedings of the Eighth International Conference on Information and Communication Technologies and Development (pp. 1–11).
Mudliar, P., Donner, J., & Thies, W. (2012). Emergent practices around CGNet Swara, voice forum for citizen journalism in rural India. In Proceedings of the Fifth International Conference on Information and Communication Technologies and Development (pp. 159–168). ACM.
Pakistan Bureau of Statistics. (2019). Retrieved from http://www.pbs.gov.pk/content/agriculture-statistics.
Pakistan Telecommunication Authority. (2019). Retrieved from https://www.pta.gov.pk//en/telecom-indicators.
Patel, N., Agarwal, S., Rajput, N., Nanavati, A., Dave, P., & Parikh, T. S. (2009). A comparative study of speech and dialed input voice interfaces in rural India. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 51–54). ACM.
Patel, N., Chittamuru, D., Jain, A., Dave, P., & Parikh, T. S. (2010). Avaaj otalo: a field study of an interactive voice forum for small farmers in rural india. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 733–742). ACM.
Pellom, B., Ward, W., Hansen, J., Cole, R., Hacioglu, K., Zhang, J., et al (2001). University of Colorado dialog systems for travel and navigation. In Proceedings of the first international conference on Human language technology research (pp. 1–6). Association for Computational Linguistics.
Qasim, M., Hussain, S., Habib, T., & Rahman, S. U. (2016a). Spoken dialog system framework supporting multiple concurrent sessions. In 2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (OCOCOSDA) (pp. 116–121). IEEE.
Qasim, M., Nawaz, S., Hussain, S., & Habib, T. (2016b). Urdu speech recognition system for district names of Pakistan: Development, challenges and solutions. In 2016 Conference of the Criental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA) (pp. 28–32). IEEE.
Qiao, F., Sherwani, J., & Rosenfeld, R. (2010). Small-vocabulary speech recognition for resource-scarce languages. In Proceedings of the First ACM Symposium on Computing for Development (p. 3). ACM.
Rauf, S., Hameed, A., Habib, T., & Hussain, S. (2015). District names speech corpus for pakistani languages. In 2015 International Conference Oriental COCOSDA Held Jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) (pp. 207–211). IEEE.
Raza, A. A., Milo, C., Alster, G., Sherwani, J., Pervaiz, M., Razaq, S., et al. (2012). Viral Entertainment as a vehicle for disseminating speech-based services to low-literate users. In International Conference on Information and Communication Technologies and Development (ICTD) (Vol. 2).
Raza, A. A., Saleem, B., Randhawa, S., Tariq, Z., Athar, A., Saif, U., et al. (2018). Baang: a viral speech-based social platform for under-connected populations. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (p. 643). ACM.
Raza, A. A., Tariq, Z., Randhawa, S., Saleem, B., Athar, A., Saif, U., et al. (2019). Voice-based quizzes for measuring knowledge retention in under-connected populations. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (p. 412). ACM.
Raza, A. A., Ul Haq, F., Tariq, Z., Pervaiz, M., Razaq, S., Saif, U., et al. (2013). Job opportunities through entertainment: virally spread speech-based services for low-literate users. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 2803–2812). ACM.
Reda, A., Panjwani, S., & Cutrell, E. (2011). Hyke: a low-cost remote attendance tracking system for developing regions. In Proceedings of the 5th ACM workshop on Networked systems for developing regions (pp. 15–20). ACM.
Roche, R., Hladilek, E., & Reid, S. (2006). Disaster recovery virtual roll call and recovery management system. Google Patents.
Rocheleau, B., & Wu, L. (2005). E-Government and financial transactions: Potential versus reality. The Electronic Journal of E-Government, 3(4), 219–230.
Google Scholar
Seneff, S., & Polifroni, J. (2000). Dialogue management in the Mercury flight reservation system. In Proceedings of the 2000 ANLP/NAACL Workshop on Conversational systems-Volume 3 (pp. 11–16). Association for Computational Linguistics.
Sharma Grover, A., Stewart, O., & Lubensky, D. (2009). Designing interactive voice response (IVR) interfaces: Localisation for low literacy users.
Sherwani, J. (2009). Speech interfaces for information access by low literate users (PhD Thesis). Carnegie Mellon University.
Sherwani, J., Ali, N., Mirza, S., Fatma, A., Memon, Y., Karim, M., et al. (2007). Healthline: Speech-based access to health information by low-literate users. In Information and Communication Technologies and Development, 2007. ICTD 2007. International Conference on (pp. 1–9). IEEE.
Sherwani, J., Palijo, S., Mirza, S., Ahmed, T., Ali, N., & Rosenfeld, R. (2009). Speech vs. touch-tone: Telephony interfaces for information access by low literate users. In Information and Communication Technologies and Development (ICTD), 2009 International Conference on (pp. 447–457). IEEE.
Swaminathan, S., Medhi Thies, I., Mehta, D., Cutrell, E., Sharma, A., & Thies, W. (2019). Learn2Earn: Using mobile airtime incentives to bolster public awareness campaigns. Proceedings of the ACM on Human-Computer Interaction, 3, 1–20.
Article Google Scholar
The World Factbook—Central Intelligence Agency. (2017). Retrieved from https://www.cia.gov/library/publications/the-world-factbook/fields/2103.html#136.
Thies, I. M., & others. (2015). User interface design for low-literate and novice users: Past, present and future. Foundations and Trends® in Human–Computer Interaction, 8(1), 1–72.
Vashistha, A., Cutrell, E., Borriello, G., & Thies, W. (2015). Sangeet swara: A community-moderated voice forum in rural india. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (pp. 417–426). ACM.
Vashistha, A., Sethi, P., & Anderson, R. (2017). Respeak: A voice-based, crowd-powered speech transcription system. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 1855–1866). ACM.
Vashistha, A., Sethi, P., & Anderson, R. (2018). BSpeak: An accessible crowdsourcing marketplace for low-income blind people. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM.
Vashistha, A., Garg, A., & Anderson, R. (2019). ReCall: Crowdsourcing on basic phones to financially sustain voice forums. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1–13).
Vashistha, A., Saif, U., & Raza, A. A. (2019). The internet of the orals. Communications of the ACM, 62(11), 100–103.
Article Google Scholar
Wang, H., & Singhal, A. (2018). Audience-centered discourses in communication and social change: The ‘Voicebook’of Main Kuch Bhi Kar Sakti Hoon, an entertainment-education initiative in India. Journal of Multicultural Discourses, 13(2), 176–191.
Article Google Scholar
White, J., Duggirala, M., Kummamuru, K., & Srivastava, S. (2012). Designing a voice-based employment exchange for rural India. In Proceedings of the Fifth International Conference on Information and Communication Technologies and Development (pp. 367–373). ACM.
Wolfe, N., Hong, J., Raza, A. A., Raj, B., & Rosenfeld, R. (2015). Rapid development of public health education systems in low-literacy multilingual environments: Combating ebola through voice messaging. In ISCA Special Interest Group on Speech and Language Technology in Education (SLaTE). INTERSPEECH.
Zainudeen, A., Samarajiva, R., & Sivapragasam, N. (2010). Cellbazaar, a mobile-based e-marketplace: Success factors and potential for expansion.
Zue, V., Seneff, S., Glass, J. R., Polifroni, J., Pao, C., Hazen, T. J., & Hetherington, L. (2000). JUPlTER: A telephone-based conversational interface for weather information. IEEE Transactions on Speech and Audio Processing, 8(1), 85–96.
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Engineering and Technology, Lahore, Pakistan
Muhammad Qasim & Tania Habib
Information Technology University, Lahore, Pakistan
Haris Bin Zia
European Bioinformatics Institute, Cambridgeshire, UK
Awais Athar
Lahore University of Management Sciences, Lahore, Pakistan
Agha Ali Raza

Authors

Muhammad Qasim
View author publications
You can also search for this author in PubMed Google Scholar
Haris Bin Zia
View author publications
You can also search for this author in PubMed Google Scholar
Awais Athar
View author publications
You can also search for this author in PubMed Google Scholar
Tania Habib
View author publications
You can also search for this author in PubMed Google Scholar
Agha Ali Raza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haris Bin Zia.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qasim, M., Zia, H.B., Athar, A. et al. Personalized weather information for low-literate farmers using multimodal dialog systems. Int J Speech Technol 24, 455–471 (2021). https://doi.org/10.1007/s10772-021-09806-2

Download citation

Received: 11 May 2020
Accepted: 06 January 2021
Published: 01 February 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10772-021-09806-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Personalized weather information for low-literate farmers using multimodal dialog systems

Abstract

Access this article

Similar content being viewed by others

Development and Evaluation of a Voice Command App for Smartphone Interaction Using the Speech-to-Text API

Multimodal HALEF: An Open-Source Modular Web-Based Multimodal Dialog Framework

Spoken Dialog System in Bodo Language for Agro Services

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Personalized weather information for low-literate farmers using multimodal dialog systems

Abstract

Access this article

Similar content being viewed by others

Development and Evaluation of a Voice Command App for Smartphone Interaction Using the Speech-to-Text API

Multimodal HALEF: An Open-Source Modular Web-Based Multimodal Dialog Framework

Spoken Dialog System in Bodo Language for Agro Services

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation