Multimodal Interfaces in Support of Human-Human Interaction
After building computers that paid no intention to communicating with humans, the computer science community has devoted significant effort over the years to more sophisticated interfaces that put the "human in the loop" of computers. These interfaces have improved usability by providing more appealing output (graphics, animations), more easy to use input methods (mouse, pointing, clicking, dragging) and more natural interaction modes (speech, vision, gesture, etc.). Yet all these interaction modes have still mostly been restricted to human-machine interaction and made severely limiting assumptions on sensor setup and expected human behavior. (For example, a gesture might be presented clearly in front of the camera and have a clear start and end time). Such assumptions, however, are unrealistic and have, consequently, limited the potential productivity gains, as the machine still operates in a passive mode, requiring the user to pay considerable attention to the technological artifact.