Keywords

1 Introduction

Gestures form an important aspect in human communication, to the point that people gesture even in telephone conversations. Gesture recognition can be viewed as the ability of a computer based system to decode the meaning of a gesture [1]. Hand gesture recognition has many application areas for instance sign language recognition, robotic arm control and Human Vehicle Interaction (HVI) [2].

In this study, the main application area of interest was sign language recognition. Hand gesture recognition has demonstrated to be more convenient over other conventional methods of human computer interactions like mouse and key board [3]. There are two approaches to hand gesture recognition, namely data glove and vision-based [1]. The vision-based approach can be categorized as appearance-based methods and 3D hand model-based methods. Appearance-based methods are preferred in real-time performance, because it is less complex to perform image processing on a 2D image. The 3D hand model-based method provides a better description of hand features. However, as the 3D hand models are articulated, deformable objects with many degrees of freedom require a very large image database to cover all the characteristic shapes under different views. Matching the query image frames from video input with all images in the database is time-consuming and computationally expensive [4].

The vision-based approach is considered to provide a more natural and intuitive human computer interface [3]. However, hand gesture recognition has proved to be quite challenging due to the multiple context and interpretations of gestures amid other challenges like the complex non-rigid characteristics of the hand [5]. Sign language (SL) is also primarily grounded on spatial characteristics and iconicity characteristics. Hand parameters like the shape, motion of the hand, position in space as well as lips movement, and facial expressions are used to decode meaning of a sign [6].

Past research indicates that most research in sign language recognition is confined to a small subset of the whole sign language due to the constraints associated with vision-based hand gesture recognition [7]. This paper outlines the constraints associated with vision-based sign language hand gesture recognition.

2 Objective

The objectives of this study are to:

  • Analyze the constraints in the hand tracking and segmentation phase of a vision-based sign language hand gesture recognition system.

  • Analyze the constraints in the feature extraction phase of a vision-based sign language hand gesture recognition system.

  • Analyze the constraints in the classification phase of a vision-based sign language hand gesture recognition system.

3 Methodology

In this study, a qualitative research design was employed through desktop research. The research comprised document analysis, which can be defined as an orderly process for reviewing or assessing printed and electronic documents [8]. Document analysis has been applied in many research studies to triangulate other methods, but can also be used singly in research [9]. It has been argued by [10] to be less time consuming, because it involves data selection as opposed to data collection and hence suitable for repeated reviews [10].

Desktop research, as guided by [11], has been successfully employed by [2, 12] and many other authors to bring out important conclusions; hence this method of data collection was used in this study. Twelve papers were reviewed in this study. The papers were searched using the google scholar search engine using key words matching the objectives.

4 Technology Description

This paper was based on identification of the constraints associated with the implementation of a vision-based sign language hand gesture recognition system. Different authors have come up with different representations and terms of the phases that comprise a typical vision-based gesture recognition system. Below is Table 1, indicating some of the terms used.

Table 1. Vision-based hand gesture recognition system phases by different authors

As depicted in Table 1, the phases of a vision-based hand gesture recognition system are similar even though they represent different instances of different systems. The phase includes image acquisition, hand tracking and segmentation, feature extraction, classification and recognition. Below is a brief description of each phase and the constraints associated with the phase.

  • i. Image acquisition from camera

The first step in gesture recognition is to capture the gesture via a video camera, either attached to the computer or independent from the computerFootnote 1. The constraints in this phase may be due to a number of factors. For instance, accuracy of gesture recognition may be affected by the following camera specifications: color range, resolution and accuracy, frame rate, lens characteristics and camera computer interface [5].

  • ii. Hand region segmentation

The main objective of the segmentation phase is to remove the background and noises, leaving only the Region of Interest (ROI), which is the only useful information in the image. This objective can be achieved in various ways like skin colour detection, hand shape features detection and background subtraction [3]. A Bayesian classifier, which a is supervised learning model, can be used for skin colour segmentation as well as an unsupervised model such as K-Mean clustering [3].

  • iii. Hand detection and tracking

Hand tracking is an important phase in gesture recognition and can be achieved through a number of algorithms. The algorithms return information such as the colour tracking, template matching, motion tracking and other cues, which can be returned in order to track the hand. These algorithms may include Kalman filtering, particle filtering, optical flow, camshaft, viola jones, and mean shift among others [3, 15].

In the tracking phase while using the skin color-based methods, the skin colour may vary from one person to another posing a major constraint. Hence the Hue Saturation and Value (HSV) and Yellow blue component and red component (YCBCr) colour models are used to give a better result than other models, because they separate luminance from chrominance components.

  • iv. Hand gesture classification and recognition

Classification of the gesture is also viewed as the point of recognition of the gesture, because it is the last step of a hand gesture recognition system. This phase involves matching the current gesture feature with stored features. The classification algorithms play an important role in the gesture recognition system as they determine the accuracy of the gesture. The speed of the classification algorithm is also important, especially for real time systems as speed is of the essence [9]. In this phase there are many algorithms, which can be applied. They can be categorized as mathematical model based algorithms such as Hidden Markov Model (HMM) and Finite State Machine (FSM), or as soft computing algorithms such as neural networks [3].

5 Result

Constraints as identified by different authors are summarized in Table 2.

Table 2. Constraints as identified by different authors

Constraints arranged in the phase that they occur

  • (a) Constraints associated with image acquisition

Image acquisition is the first step in vision-based sign language hand gesture recognition. This is done via a camera attached to the system or attached on the system. Table 3 illustrates the constraints associated with image acquisition.

Table 3. Constraints associated with image acquisition [5]
  • (b) Constraints in the hand tracking and segmentation phase of a vision-based sign language hand gesture recognition system

The main constraint in hand tracking is brought about by the ability of the hand to move in different directions depending on its 27 degrees of freedom. This constraint is referred to, by most researchers, as rotation. Other constraints in this phase include variation in the speed of hand gestures [3], variation in skin colour, illumination variation, background complexity, and occlusion. Table 4 below outlines the constraints associated with tracking and segmentation of hand gestures.

Table 4. Constraints associated with tracking and segmentation [5]
  • (c) Constraints in the feature extraction phase of a vision-based sign language hand gesture recognition system

The most notable constraints in this phase include rotation, scale and translation. Rotation constraintarises when the hand region is rotated in any direction in the scene. Scale constraint arises, because of the different sizes of people’s hands making the gestures. The translation problem is the variation of hand positions in different images, which leads to erroneous representation of the features [19]. Table 5 indicates the constraints that can be encountered in the feature extraction phase of a vision-based sign language hand gesture recognition system.

Table 5. Constraints in the feature extraction phase [5]
  • (d) constraints in the classification and recognition phase of a vision-based sign language hand gesture recognition system

An appropriate classifier identifies gesture features and categorizes them into either predefined classes (supervised) or by their similarity (unsupervised) [20]. Some of the limitations encountered in this phase include large data sets for classifier training in some algorithms, computational complexity, selection of optimum parameters and recognition of unknown gestures. Below is Table 6 outlining the constraints likely to be encountered in the classification phase.

Table 6. Constraints in the classification phase [5]

The constraints can also be categorized by the cause. The three causes include the hand itself, the system and equipment in use and environmental factors, as indicated in Fig. 1.

Fig. 1.
figure 1

Pictorial representation of gesture challenges or constraints [14]

6 Business Benefits

This study highlights the constraints encountered in vison-based system implementation in a logical way since the constraints are presented in the phases they are mostly likely to occur. This will help researchers and gesture recognition system developers to easily identify the constraints they want to address using new or a combination of existing algorithms. This can be beneficial in many hand gesture recognition application areas like robot control, game applications sign language recognition, amongst others.

The results of this study can provide a basis for a better sign language hand gesture recognition system capable of full sign language interpretation. Sign language interpretation systems are beneficial for communication, because they assist hearing impaired individuals to understand the non-hearing impaired and vice versa. Vision-based sign language interpretation systems enable communication in a natural way without the need for a human interpreter, hence they are likely to be more cost efficient. The vision-based gesture recognition interpretation systems can be deployed as software applications on mobile phones, computers, laptops and even tablets. This can facilitate communication for hearing impaired individuals in public facilities like banks, airports, churches and schools.

7 Conclusion

In this paper, the phases of a typical vison-based sign language hand gesture recognition system are identified. The constraints that can be encountered in each stage of a vision-based hand gesture recognition system are outlined. It is evident from the literature that the challenges begin right from the first phase, which is image acquisition where camera resolution and quality can affect the gesture recognition rate. Background noise and lighting also pose serious constraints.

These constraints coupled with many others as mentioned in this paper have resulted in development of many algorithms. Each of these algorithms has its strengths and weaknesses. Hence the choice of the algorithm to use for sign language application may vary from one researcher to another. Further work needs to be done in order to find better solutions to overcome the constraints.