Keywords

1 Introduction

Along with the development of the auto industry and the transportation industry, traffic accidents have caused great loss in the property and damage to the society. Amongst these traffic accidents more than 20% of these traffic accidents are caused by fatigue driving. Safe driving has become a hot issue in today’s society, Therefore, it is of great significance to develop a real-time and accurate fatigue detection system to send fatigue warning information when the driver is tired, which can effectively reduce the occurrence of traffic accidents.

At present, fatigue testing contains three main directions. First, fatigue detection based on the vehicle state detection method, mainly through the turning angle, vehicle driving speed to detect whether the driver fatigue, this method is subject to external interference, the detection accuracy has a greater impact. Second, based on driver’s physiological information [7], mainly by detecting the driver’s heart rate, pulse and other physiological signals to determine whether the driver is in a state of fatigue, This method requires the driver to carry a lot of testing equipment, very cumbersome, and the driver has a great interference. Third, fatigue detection methods based on computer vision [6, 8,9,10], this method is a non-intrusive way, the facial features can be calculated by analyzing the changes of facial expression, such as eye closure duration, yawning and so on.

In the fatigue detection, driver face detection and alignment are important. The multitask cascaded convolutional networks to face detection and alignment [1] has proven to be an effective method. Another very important step is the detection of human eye state. Compared to the traditional active infrared radiation method [2], normal camera image employs a safer passive way. To detect the state of eyes, There are many methods, such as AdaBoost classifier [3], SVM classifier [4] and so on. However, their ability of expressing features is limited. Recently, convolutional neural network (CNN) achieve remarkable progresses in a variety of computer vision tasks. In our paper, we design a driver fatigue detection system using multitask cascaded convolutional networks. As shown in Fig. 1, the method mainly includes five parts: Joint face detection and alignment using multitask cascaded convolutional networks, normalize the current image and ground truth shape according to the scaled mean shape, extract the area of eye, state of eye recognition, fatigue detection.

Fig. 1.
figure 1

Algorithm block diagrams.

2 Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks

Fatigue detection system should have high recognition accuracy and can detect the fatigue effectively in real-time. How to quickly and accurately detect the face of the driver and the eye alignment and overcome the impact of a certain light are the difficulties of fatigue detection system. Kaipeng et al. [1] propose a new cascaded CNNs-based framework for joint face detection and alignment, and carefully design lightweight CNN architecture for real-time performance. The overall pipeline is shown in Fig. 2, which is the input of the following three-stage cascaded framework.

Fig. 2.
figure 2

Joint face detection and alignment using multitask cascaded convolutional networks.

Stage 1: Exploit a fully convolutional network, called proposal network (P-Net), to obtain the candidate facial windows and their bounding box regression vectors. Then candidates are calibrated based on the estimated bounding box regression vectors. After that, employ nonmaximum suppression (NMS) to merge highly overlapped candidates.

Stage 2: All candidates are fed to another CNN, called refine network (R-Net), which further rejects a large number of false candidates, performs calibration with bounding box regression, and conducts NMS.

Stage 3: This stage is similar to the second stage, but in this stage we aim to identify face regions with more supervision. In particular, the network will output five facial landmarks’ positions.

3 Extraction Area Eye

3.1 Face Normalization

In order to accurately extract the eye areas, we need to calculate the average face. Then normalize the current image and ground truth shape according to the scaled mean shape, this process is 2D affine transformation. The 2D affine transformation is a method used to change the rotation angle, the scale, and the location of a shape. The transformation can be represented as Eq. (1).

$$ \left\{ {\begin{array}{*{20}l} {x = ax^{{\prime }} + by^{{\prime }} + c} \hfill \\ {y = dx^{{\prime }} + ey^{{\prime }} + f} \hfill \\ \end{array} } \right. $$
(1)

Where \( (x_{i} ,y_{i} )^{T} \) is the coordinate of the ith feature point on the average face, \( (x_{i}^{{\prime }} ,y_{i}^{{\prime }} )^{T} \) is the coordinate of the ith feature point on the detected face. It has a matrix representation shown as Eq. (2).

$$ \left[ {\begin{array}{*{20}c} {x_{i} } \\ {y_{i} } \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} a & b & c \\ d & e & f \\ 0 & 0 & 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {x_{i}^{{\prime }} } \\ {y_{i}^{{\prime }} } \\ 1 \\ \end{array} } \right] = M\left[ {\begin{array}{*{20}c} {x_{i}^{{\prime }} } \\ {y_{i}^{{\prime }} } \\ 1 \\ \end{array} } \right] $$
(2)

For convenience, Eq. (2) can be rewritten as Eq. (3).

$$ U = Kh $$
(3)

Where U is the feature point matrix of the average face, K is the feature point matrix of the detected face. h is affine transformation matrix. It can be calculated with least squares solution. Then, the solution of h can be obtained as Eq. (4).

$$ h = \left( {K^{T} K} \right)^{ - 1} K^{T} U $$
(4)

Normalize the current image and ground truth shape.

According to the scaled mean shape aimed at change the detected faces’ rotation angle, the scale, and the location of a shape. As shown in Fig. 3.

Fig. 3.
figure 3

Normalize the current image and ground truth shape according to the scaled mean

3.2 Eye Area Extraction

In this paper, we extract the area of eyes based on the facial landmarks after normalization as shown in Fig. 4. The eye area has a size of 32 × 32.

Fig. 4.
figure 4

The extraction result of the eye area.

4 Eye State Recognition

CNN expresses features more better, avoiding the manual feature selection. So we used convolutional neural network to detect the state of eyes.

4.1 Convolutional Neural Network

To have high recognition accuracy of state of eyes and can detect the fatigue effectively real-time, three convolutional layers are used in our proposed network as shown in Fig. 5. Each convolution layer connects a pooling layer, the first convolution layer is connected with a max pooling, the last two convolutions are connected with average pooling. The ReLU layers add non-linear constraints and the Dropout layers prevents overfitting in the networks.

Fig. 5.
figure 5

Structure of the convolution neural network.

4.2 Activation Functions

Sigmoid function and tanh function are commonly used non-linear activation functions, but these functions exist the gradient vanishing, So we use the ReLU function (Rectified linear unit) which is defined as Eq. (5).

$$ f(x) = \left\{ {\begin{array}{*{20}l} x \hfill & {if\,x \ge 0} \hfill \\ 0 \hfill & {if\,x < 0} \hfill \\ \end{array} } \right. $$
(5)

ReLU can effectively alleviate the problem of gradient vanishing, So as to train the deep neural network directly in a supervised manner. The network can get sparse expression after the ReLU function, with the advantage of unilateral suppression.

5 Fatigue Detection Based on PERCLOS

After eye area extraction, the next step is to detect driver fatigue based on PERCLOS (percentage of eyelid closure over the pupil over time). PERCLOS is an established parameter to detect the level of drowsiness. Level of drowsiness can be judged based on the PERCLOS threshold value, PERCLOS is a parameter that is used to detect driver fatigue [5]. It is calculated as (6).

$$ f_{PERCLOS} = \frac{{n_{close} }}{{N_{total} }} \times 100\% $$
(6)

Let \( n_{close} \) be the number of eye-close frames over a period time. \( N_{total} \) is the total number of frames over a period time. When the driver is in a state of fatigue, the driver’s PERCLOS value will be higher than normal. We set the PERCLOS threshold, when the driver’s PERCLOS value is higher than this threshold, then the current driver is considered fatigue.

6 Experiment and Results

VS2013, running on a Win7 system with Intel (R) Core(TM) i7-6700HQ, CPUs (3.40 GHz), 32 GB memory, GPU NVNID GeForce GTX 1070.

6.1 Train

In order to overcome the influence of light on image, the training data must contain data for different light intensities to enhance the robustness of the network, as shown in Fig. 6.

Fig. 6.
figure 6

Different light intensities of Parts of the training samples.

Since we perform eye state recognition, here we use the following two different kinds of data annotation in our training process:

  1. (1)

    negatives: 36 × 36 sample area was randomly intercepted near the eye area, regions whose the intersection-over-union (IoU) ratio is less than 0.4 to any ground-truth eyes as shown in Fig. 7.

    Fig. 7.
    figure 7

    Negatives training samples.

  2. (2)

    positives: Positive samples are divided into two types, open eyes samples and closed eyes samples, their IoU above 0.6 to a ground truth face, as shown in Fig. 8.

    Fig. 8.
    figure 8

    Positives training samples.

6.2 Training Results

We select images including eye images of open and closed as positives samples, and randomly crop several patches to collect negatives samples. We select 120000 images as training samples. The eye state recognition rate of the network has an increase in the number of iterations when training the samples, the result is shown in Fig. 9.

Fig. 9.
figure 9

The result of recognition rate.

With the increase of the iteration number, the accuracy rate gradually increased, the final accuracy rate between 0.995 to 0.996 fluctuations. In order to test the performance of the network, we collected three sections of video data, respectively, the accuracy rate shown in the Table 1.

Table 1. The test result of eye state.

Through statistical 5 tests videos includes 1239 frames of 320 * 240 images, computing the average time-consuming of the method include each module and overall time. Table 2 is the time-consuming result. The method complies with the requirement of real-time.

Table 2. The test of time consuming.

6.3 Fatigue Detection Based on PERCLOS

When the driver is in a state of fatigue, the driver’s PERCLOS value will be higher than normal, by setting the PERCLOS threshold, when the driver’s PERCLOS value is higher than this threshold, then the current driver is considered fatigue. In this paper, the PERCLOS threshold is set to 0.30, when the driver is fatigue, the PERCLOS value is bigger than 0.30, Fig. 10 shows PERCLOS result.

Fig. 10.
figure 10

The PERCLOS value.

Figure 11 shows the Sample images of detection results.

Fig. 11.
figure 11

Driver Fatigue Detection system.

7 Conclusion

In this paper we propose a driver fatigue detection system. This system uses the multitask cascaded convolutional networks to face detection and alignment. And then use another convolution neural network (CNN) for eye state recognition. Finally we calculate the percentage of eyelid closure (PERCLOS) to detect the fatigue. The method of eye state recognition provides high accuracy and can detect the fatigue effectively in real-time. Tests show that the system implementation is successful and the system does indeed infer fatigue reliably.