Fusion and Fission: Improved MMIA for Multi-modal HCI Based on WPS and Voice-XML

  • Jung-Hyun Kim
  • Kwang-Seok Hong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4541)


This paper implements the Multi-Modal Instruction Agent (hereinafter, MMIA) including a synchronization between audio-gesture modalities, and suggests improved fusion and fission rules depending on SNNR (Signal Plus Noise to Noise Ratio) and fuzzy value, based on the embedded KSSL (Korean Standard Sign Language) recognizer using the WPS (Wearable Personal Station) and Voice-XML. Our approach fuses and recognizes the sentence and word-based instruction models that are represented by speech and KSSL, and then translates recognition result that is fissioned according to a weight decision rule into synthetic speech and visual illustration (graphical display by HMD-Head Mounted Display) in real-time. In order to insure the validity of our approach, we evaluate performance with the average recognition rates and the recognition time of MMIA. In the experimental results, the average recognition rates of the MMIA for the prescribed 65 sentential and 156 word instruction models were 94.33% and 96.85% in clean environments, and 92.29% and 92.91% were shown in noisy environments. In addition, the average recognition time is approximately 0.36 ms in given both environments.


Recognition Rate Speech Recognition Hand Gesture Noisy Environment Weight Decision 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Grasso, M.A., Ebert, D.S., Finin, T.W.: The integrality of speech in multimodal interfaces. ACM Trans. Comput.-Hum. Interact. 5(4), 303–325 (1998)CrossRefGoogle Scholar
  2. 2.
    Perlman, G., et al.: HCI Bibliography.: Human-Computer Interaction Resources,
  3. 3.
    Kim, J.-H., et al.: Hand Gesture Recognition System using Fuzzy Algorithm and RDBMS for Post PC. In: Proceedings of FSKD2005. LNCS (LNAI), vol. 3614, pp. 170–175. Springer-Verlag, Berlin Heidelberg New York (2005)Google Scholar
  4. 4.
    Kim, J.-H., et al.: An Implementation of KSSL Recognizer for HCI Based on Post Wearable PC and Wireless Networks KES 2006. LNCS (LNAI), vol. 4251 Part I, pp. 788–797. Springer-Verlag, Berlin Heidelberg New York (2006)Google Scholar
  5. 5.
    i.MX21 Processor Data-sheet
  6. 6.
    Kim, S.-G.: Standardization of Signed Korean. Journal of KSSE, vol. 9. KSSE (1992)Google Scholar
  7. 7.
    Kim, S.-G.: Korean Standard Sign Language Tutor, 1st, Osung Publishing Company, Seoul (2000)Google Scholar
  8. 8.
    Oracle 10g DW Guide
  9. 9.
    Kim, J.-H., Hong, K.-S.: An Implementation of the Real-Time KSSL Recognition System based on the Post wearable PC. In: Proc. ICCS 2006. Lecture Notes in Computer Science, Part IV, vol. 3994, Springer-Verlag, Berlin, Heidelberg, New York (2006)Google Scholar
  10. 10.
    Chen, C.H.: Fuzzy Logic and Neural Network Handbook, 1st edn. McGraw-Hill, New York (1992)Google Scholar
  11. 11.
    Vasantha kandaswamy, W.B.: Smaranda Fuzzy Algebra. American Research Press, Seattle (2003)Google Scholar
  12. 12.
    McGlashan, S., et al.: Voice Extensible Markup Language (VoiceXML) Version 2.0. W3C Recommendation (1992)
  13. 13.
    Martin, W. H.: DeciBel-The New Name for the Transmission Unit, Bell System Technical Journal (January 1929) Google Scholar
  14. 14.
    NIOSH working group.: STRESS...AT WORK NIOSH, Publication No. 99-101,U.S. National Institutes of Occupational Health (2006)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Jung-Hyun Kim
    • 1
  • Kwang-Seok Hong
    • 1
  1. 1.School of Information and Communication Engineering, Sungkyunkwan University, 300, Chunchun-dong, Jangan-gu, Suwon, KyungKi-do, 440-746Korea

Personalised recommendations