1 Introduction

Manipulating 3D digital contents through mid-air gestures is a new experience for most people. In many applications, such as interactive product virtual exhibition in public and medical image display in a surgery room, the tasks may include translation, rotation, and scaling of 3D components. In order to facilitate the natural mapping between controls and displays, eliciting intuitive gestures from a group of user-defined gestures is necessary. However, due to individual differences in the experiences of using 3D applications and relative input devices, it is extremely difficult to develop consensus gestures. Given the potential difficulty, identifying the common characteristics of intuitive gestures is still important to inform the development process of gesture recognition algorithms. Therefore, the objective of this research is to study the common characteristics of intuitive gestures through a pilot experiment of 3D digital content manipulations.

2 Literature Review

Since 3D and mid-air hand gesture controls are natural, intuitive and sanitary [13], the number of applications have increased significantly. The contexts include interactive navigation systems in museum [4], surgical imaging systems [1, 2], interactive public displays [5], and 3D modelling [6]. Based on the number and trajectory of hands, mid-air gestures could be classified as one or two hands, linear or circular movements, and different degrees of freedom in path (1D, 2D, or 3D) [7]. If the context is not considered, mid-air gestures could be pointing, semaphoric, pantomimic, iconic, and manipulation [8]. The types of control tasks could be select, release, accept, refuse, remove, cancel, navigate, identify, translate, and rotate [8]. Since the characteristics of contexts could influence gesture vocabularies [9], the gestures for short-range human computer interaction [10] and TV controls [11, 12] were reported to be different. Even for the same task in the same context, users may prefer different gestures, which could be influenced by previous experiences in using different devices. While choosing an intuitive mid-air gesture for a specific task, it is necessary to consider and analyze the common characteristics of user-defined gestures.

3 Experiment

In order to explore the characteristics of hand gestures for manipulating 3D digital contents, a pilot experiment was carried out. The context of an interactive exhibition system for 3D product virtual models was considered. In a laboratory with illumination control, each participant stood on the spot in front of a 50-inch TV, with a distance of 200 cm. During the experiments, the images simulating the rotating products were displayed on the TV, which was controlled by a laptop computer with a computer mouse. In order to obtain gesture characteristics, the motions of the body and hand joints were recorded by one overhead camera and two 3D depth cameras. Each participant conducted two trials of experiments to offer self-defined gestures for rotating product models with respect to the vertical axis (Fig. 1). The participants were encouraged to provide with separate gestures for start or stop rotations, respectively. In the first trial, a Microsoft Kinect for Windows (v2) sensor was mounted on the top of the TV. The sensor could extract 25 joints per person. The motion of the arms and hands was recorded by the Kinect Studio program running on a desktop computer. The images of body tracking were displayed on a 23-inch monitor, which was placed on the right hand side of the TV. In the second trial, an Intel RealSense 3D Camera (F200) was used to extract the position and orientation of 22 joints on a hand. It was placed between the participant and the TV. The distance to the participant was adjusted with respect to the arm length. The height was adjusted to the shoulder height of each participant. The motion of each hand gesture was recorded by the Hands Viewer program. The program was running on a laptop computer with a 15-inch display, which was placed on the lower right hand side of the TV. Therefore, each participant performed the tasks of user-defined gestures by facing two 3D depth cameras with different distances. In addition, offering different gestures between two trials was encouraged.

Fig. 1.
figure 1

Experiment setup

4 Results and Discussions

Twenty students, majored in the Master Program of Industrial Design, were invited to participate in the experiment. From two trials of user-defined gestures, forty gestures were recorded. In the first trial, the numbers of one-hand and two-hand gestures were 8 and 12, respectively. In the second trial, the numbers of one-hand and two-hand gestures were 11 and 9, respectively. There were no significant differences in the numbers of one-hand and two-hand gestures. For systematic behavior coding and analysis, the gestures were categorized based on hand poses, orientations, motions and trajectories. Fourteen types of one-hand gestures were identified (Table 1). Fifteen types of two-hand gestures were identified (Table 2). Although many types of gestures were identified, common characteristics could be extracted. Open palm and D Handshape (American Sign Language) were the most intuitive hand poses. For one-hand gestures, moving along the circumference of a horizontal circle was the most intuitive hand motion and trajectory. For two-hand gestures, moving two hands relatively with a constant distance along the circumference of a horizontal circle was the most intuitive hand motion and trajectory. Sample gestures recorded by the Hands Viewer program were displayed in Fig. 2.

Table 1. User-defined gestures with one hand
Table 2. User-defined gestures with two hands
Fig. 2.
figure 2

Sample gestures recorded by the hands viewer program

5 Conclusion

In this research, a systematic behavior coding scheme was developed to analyze and decompose user-defined gestures. In addition, the most intuitive hand pose and trajectory of gestures for rotating 3D virtual models with respect to the vertical axis were identified. These results could be used to inform the development team of mid-air gestures and serve as the references for 3D digital content manipulations.