Multimedia Tools and Applications

, Volume 78, Issue 23, pp 33041–33068 | Cite as

MIMOSE: multimodal interaction for music orchestration sheet editors

An integrable multimodal music editor interaction system
  • Andrea Coletta
  • Maria De MarsicoEmail author
  • Emanuele Panizzi
  • Bardh Prenkaj
  • Domenicomichele Silvestri


The increasing number and accuracy of sensors devoted to human- computer input are supporting the emergence of novel multimodal interaction paradigms. These, in turn, can unlock additional strategies to design innovative user-friendly systems. The underlying approaches to user-computer interaction leverage natural channels of communication (e.g. gestures and voice), therefore oftentimes are less cumbersome than traditional interface modalities. This paper proposes a wrapper-based strategy to easily map keyboard shortcuts onto multimodal actions. The presented case study is a music editor software. These applications are often overwhelming for novice users, therefore discouraging their interaction. MIMOSE - Multimodal Interaction for Music Orchestration Sheet Editors addresses these limitations. Instead of relying on buttons and mixture pads for the composition of a music opera, it provides a gesture- and voice-based multimodal wrapper for music editor applications. The user assumes the role of an orchestra conductor. Hence, the wrapper translates user gestures and music jargon keywords into mouse clicks or keyboard pressings, by substituting keyboard shortcuts with multimodal actions. This provides a user ecologically tuned and immersive environment of interaction. It is worth noticing that the wrapped application is not necessarily an open source one. In fact, events already captured by such application are just sent over different channels than keyboard and mouse and are triggered by multimodal actions instead of key pressing. After presenting the features of the wrapper, we describe its application to an open source software tool for music editing and present twofold evaluation results. We separately evaluated the performances of each interaction modality in terms of accuracy and F1 score. Furthermore, we asked real users to evaluate the usability of the application when extended by the wrapper. The user evaluation relies on ad-hoc tailored QUIS and SUXES questionnaires in order to assess the user-friendliness of the resulting application. The results are encouraging from both technical quality and usability points of view. The wrapper at the core of MIMOSE can be adapted to other kinds of applications, with a minimal coding effort.


MIMOSE Music sheet editor software Multimodal wrapper Gesture recognition Hot-word recognition User-system friendliness 



  1. 1.
    Awada IA, Mocanu I, Florea AM, Cramariuc B (2017) Multimodal interface for elderly people. In: Control Systems and Computer Science (CSCS), 2017 21st International Conference on. IEEE, p 536–541Google Scholar
  2. 2.
    Barfield W (2015) Fundamentals of wearable computers and augmented reality. CRC Press, Boca RatonCrossRefGoogle Scholar
  3. 3.
    Caschera MC, D’Ulizia A, Ferri F, Grifoni P (2013) Multimodal interaction in gaming. In: Demey YT, Panetto H (eds) On the move to meaningful internet systems: OTM 2013 workshops. Springer, Berlin Heidelberg, pp 694–703CrossRefGoogle Scholar
  4. 4.
    Chin JP, Diehl VA, Norman KL (1988) Development of an instrument measuring user satisfaction of the human-computer interface. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, New York, pp 213–218Google Scholar
  5. 5.
    Edwards ADN (2002) Multimodal interaction and people with disabilities. In: Multimodality in language and speech systems. Springer, Berlin, pp 73–92CrossRefGoogle Scholar
  6. 6.
    Forsberg A, Dieterich M, Zeleznik R (1998) The music notepad. In: Proceedings of the 11th annual ACM symposium on user interface software and technology. ACM, New York, pp 203–210Google Scholar
  7. 7.
    Gruenstein A, Hsu B-JP, Glass J, Seneff S, Hetherington L, Cyphers S, Badr I, Wang C, Liu S (2008) A multimodal home entertainment interface via a mobile device. In: Proceedings of the ACL-08: HLT Workshop on Mobile Language Processing, p 1–9Google Scholar
  8. 8.
    Laver KE, George S, Thomas S, Deutsch JE, Crotty M (2015) Virtual reality for stroke rehabilitation. Cochrane Database Syst Rev (2)Google Scholar
  9. 9.
    Li L, Yu F, Shi D, Shi J, Tian Z, Yang J, Wang X, Jiang Q (2017) Application of virtual reality technology in clinical medicine. Am J Transl Res 9(9):3867Google Scholar
  10. 10.
    Lin F, Ye L, Duffy VG, Chuan-Jun S (2002) Developing virtual environments for industrial training. Inf Sci 140(1–2):153–170CrossRefGoogle Scholar
  11. 11.
    Ohta Y, Tamura H (2014) Mixed reality: merging real and virtual worlds. Springer Publishing Company, IncorporatedGoogle Scholar
  12. 12.
    Piekarski W, Thomas B (2002) Arquake: the outdoor augmented reality gaming system. Commun ACM 45(1):36–38CrossRefGoogle Scholar
  13. 13.
    Rubine D (1991) The automatic recognition of gestures. PhD thesis, CiteseerGoogle Scholar
  14. 14.
    Rubine D (1991) Specifying gestures by example. SIGGRAPH Comput Graph 25(4):329–337CrossRefGoogle Scholar
  15. 15.
    Sharma S, Kallioniemi P, Heimonen T, Hakulinen J, Turunen M, Keskinen T (2018) Overcoming socio-technical challenges for cross-cultural collaborative applications. In: Proceedings of the 17th ACM conference on interaction design and children. ACM, New York, pp 325–336CrossRefGoogle Scholar
  16. 16.
    Stone R (2001) Virtual reality for interactive training: an industrial practitioner’s viewpoint. Int J Hum Comput Stud 55(4):699–711CrossRefGoogle Scholar
  17. 17.
    Tse E, Greenberg S, Shen C, Forlines C (2007) Multimodal multiplayer tabletop gaming. Comput Entertain 5(2):12CrossRefGoogle Scholar
  18. 18.
    Turunen M, Hakulinen J, Melto A, Heimonen T, Laivo T, Hella J (2009) Suxes-user experience evaluation method for spoken and multimodal interaction. In: Tenth Annual Conference of the International Speech Communication AssociationGoogle Scholar
  19. 19.
    Wigdor D, Wixon D (2011) Brave NUI world: designing natural user interfaces for touch and gesture. Elsevier, AmsterdamGoogle Scholar
  20. 20.
    Zyda M (2005) From visual simulation to virtual reality to games. Computer 38(9):25–32CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceSapienza University of RomeRomeItaly

Personalised recommendations