Abstract
Interest in the proper treatment of mental health has been rapidly growing under the steep changes in society, family structure and lifestyle. COVID-19 pandemic in addition drastically accelerates this necessity worldwide, which brings about a huge demand on digital therapeutics for this purpose. One of the key ingredients to this attempt is the appropriately designed practice contents for the prevention and treatment of mental illness. In this paper, we present novel deep generative models to construct the mental training contents based upon mindfulness approach, with a particular focus on providing Acceptance and Commitment Therapy (ACT) on the self-talk techniques. To this end, we first introduce ACT script generator for mindfulness meditation. With over one-thousand sentences collected from the various sources for ACT training practices, we develop a text generative model through fine-tuning on the variant of GPT-2. Next, we introduce a voice generator to implement the self-talk technique, a text-to-speech application using the ACT training script generated above. Computational and human evaluation results demonstrate the high quality of generated training scripts and self-talk contents. To the best of our knowledge, this is the first approach to generate the meditation contents using artificial intelligence techniques, which is able to deeply touch the human mind to care and cure the mental health of individuals. Applications would be main treatment contents for digital therapeutics and meditation curriculum design.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Behan, C.: The benefits of meditation and mindfulness practices during times of crisis such as COVID-19. Irish J. Psychol. Med. 37(4), 256–258 (2020)
Results of COVID-19 National mental health survey in the first quarter of 2021 by Ministry of Health and Welfare. http://www.mohw.go.kr/react/al/sal0301vw.jsp?PAR_MENU_ID=04&MENU_ID=0403&CONT_SEQ=365582&page=1. Accessed 15 Nov 2021
Alderwick, H., Dixon, J.: The NHS long term plan. BMJ 364, l84 (2019). https://doi.org/10.1136/bmj.l84
Bunker, L., Williams, J.M., Zinsser, N.: Cognitive techniques for improving performance and building confidence. Appl. Sport Psychol.: Pers. Growth Peak Perform. 2(1), 225–242 (1993)
Flett, J.A.M., et al.: Mobile mindfulness meditation: a randomised controlled trial of the effect of two popular apps on mental health. Mindfulness 10(5), 863–876 (2019)
Van Raalte, J.L., Vincent, A.: Self-talk in sport and performance. In: Oxford Research Encyclopedia of Psychology (2017)
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
An, S.H., Jeong, O.R.: A study on the psychological counseling AI Chatbot system based on sentiment analysis. J. Inf. Technol. Serv. 20(3), 75–86 (2021)
SKT-AI KoGPT2. https://github.com/SKT-AI/KoGPT2/tree/589182bc85a0741b8ac20b49cdd56d9e44b9479c. Accessed 15 Nov 2021
Steele, C.M.: The psychology of self-affirmation: sustaining the integrity of the self. In: Advances in Experimental Social Psychology, vol. 21, pp. 261–302. Academic Press (1988)
Hayes, S.C., Wilson, K.G.: Acceptance and commitment therapy: altering the verbal support for experiential avoidance. Behav. Analyst 17(2), 289–303 (1994)
Shen, J., et al.: Natural TTS synthesis by conditioning WaveNet on MEL spectrogram predictions. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2018)
Wang, Y., et al.: Tacotron: towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135 (2017)
Tacotron2-original. https://github.com/Rayhane-mamah/Tacotron-2.git. Accessed 15 Nov 2021
Tacotron2-multi-speaker-korean. https://github.com/hccho2/Tacotron2-Wavenet-Korean-TTS.git. Accessed 15 Nov 2021
Park, K.: KSS Dataset: Korean Single speaker Speech Dataset (2018). https://kaggle.com/bryanpark/korean-single-speaker-speech-dataset
Google STT API. https://cloud.google.com/speech-to-text?hl=ko. Accessed 15 Nov 2021
Arik, S., et al.: Deep voice 2: multi-speaker neural text-to-speech. arXiv preprint arXiv:1705.08947 (2017)
Cooper, E., et al.: Zero-shot multi-speaker text-to-speech with state-of-the-art neural speaker embeddings. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2020)
Papineni, K., et al.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (2002)
KoBERT. https://github.com/lovit/KoBERTScore. Accessed 15 Nov 2021
Amidei, J., Piwek, P., Willis, A.: The use of rating and Likert scales in Natural Language Generation human evaluation tasks: a review and some recommendations (2019)
Kaneko, T., et al.: StarGAN-VC2: rethinking conditional methods for StarGAN-based voice conversion. arXiv preprint arXiv:1907.12279 (2019)
ITU-T Recommendation: Telephone transmission quality subjective opinion tests. A method for subjective performance assessment of the quality of speech voice output devices, p. 85 (1994)
De Koning, T.C.M., et al.: Of MOS and men: bridging the gap between objective and subjective quality measurements in mobile TV. In: Multimedia on Mobile Devices 2007, vol. 6507. International Society for Optics and Photonics (2007)
Weiss, R.J., et al.: Wave-Tacotron: spectrogram-free end-to-end text-to-speech synthesis. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2021)
Lim, D., et al.: JDI-T: jointly trained duration informed transformer for text-to-speech without explicit alignment. arXiv preprint arXiv:2005.07799 (2020)
Tang, Y.-Y., Hölzel, B.K., Posner, M.I.: The neuroscience of mindfulness meditation. Nat. Rev. Neurosci. 16(4), 213–225 (2015)
Acknowledgement
This research was supported by (i) the Samsung Research Funding Center of Samsung Electronics under Project Number No. SRFC-TC1603-52, and (ii) the National Research Foundation of Korea (NRF) grant funded by the Korean government (No. 2020R1G1A1102683).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Kim, S.H., Kim, J.H., Yang, J.A., Lee, J.Y., Lee, J.H. (2022). Touching Minds: Deep Generative Models Composing the Digital Contents to Practice Mindfulness. In: Kim, JH., Singh, M., Khan, J., Tiwary, U.S., Sur, M., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2021. Lecture Notes in Computer Science, vol 13184. Springer, Cham. https://doi.org/10.1007/978-3-030-98404-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-98404-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98403-8
Online ISBN: 978-3-030-98404-5
eBook Packages: Computer ScienceComputer Science (R0)