Keywords

1 Introduction

Minimum flight crew workload [13] is identified as one of the difficulties of civil transport category aircraft airworthiness certification issues which containing multi-disciplinary including human factors. Pilot action analysis is one of the most important aspects in studying minimum flight crew workload [4]. Workload is divided into 10 factors, among them 9 factors are pilot actions related: (1) Accessibility, ease and simplicity of operation; (2) Accessibility and conspicuity of necessary instruments; (3) Number, urgency and complexity of operating procedures; (4) Degree and duration of mental and physical effort; (5) Actions requiring a crewmember to be unavailable at his duty station; (6) Degree of automation provided in the aircraft systems to manage failures; (7) Communications and navigation workload; (8) Increased workload associated with any emergency; (9) Incapacitation of a flight crewmember. In each factor, we have some information that we need to figure out, such as task, distribution of action area, action duration, path of action, operation procedures and so on. In this paper, we extract pilot action features from video data, and we use these features to analyze pilot action patterns, and then the results will be used in workload assessment.

Action is complicated, and it involves the problem of multiple scale. Multiple scale mainly manifest in time and space [5]. Different scales indicate different action patterns. We regard the cockpit as an intelligent environment, and we analyze the action pattern with the help of this environment. At a general level we can decompose the construct of an intelligent environment into three main components [6]. In the first instance we have the core sensing technology which has the ability to record the interactions with the environment. These may be in the form of for example video, contact sensors or motion sensors. A data processing module has the task to infer decisions based on the information gleaned from the sensing technology and with the third and final component providing the feedback to those within the environment via a suite of multi-modal interfaces. It has been the aim of this text to focus specifically on the data processing module, specifically focusing on the notion of activity recognition. Within the domain of intelligent environments some may have the view that the process of activity recognition forms the critical path in providing a truly automated environment. It is tasked with extracting and establishing meaningful activities from a myriad of sensor activations. Although work in this area is still deemed to be emerging, the initial results achieved have been more than impressive.

2 Activity Recognition

Activity recognition is the process whereby an actor’s behavior and his/her situated environment are monitored and analyzed to infer the undergoing activities [7]. It comprises many different tasks, namely activity modeling, behavior and environment monitoring, data processing and pattern recognition. To perform activity recognition, it is therefore necessary to

  1. 1.

    Create computational activity models in a way that allows software systems/agents to conduct reasoning and manipulation.

  2. 2.

    Monitor and capture a user’s behavior along with the state change of the environment.

  3. 3.

    Process perceived information through aggregation and fusion to generate a high-level abstraction of context or situation.

  4. 4.

    Decide which activity recognition algorithm to use, and finally

  5. 5.

    Carry out pattern recognition to determine the performed activity.

Monitoring an actor’s behavior along with changes in the environment is a critical task in activity recognition. This monitoring process is responsible for capturing relevant contextual information for activity recognition systems to infer an actor’s activity [8]. In terms of the way and data type of these monitoring facilities, there are currently two main activity recognition approaches; vision-based activity recognition and sensor-based activity recognition.

Traditional tracking is applied in 2d video space, it can provide moving information in 2D plane, but we can’t get depth information. We use a new stereo vision-based model for multi-object detection and tracking in surveillance systems. Unlike most existing monocular camera-based systems, a stereo vision system is constructed in our model to overcome the problems of illumination variation, shadow interference, and object occlusion. In each frame, a sparse set of feature points are identified in the camera coordinate system, and then projected to the 2D ground plane [9]. A kernel-based clustering algorithm is proposed to group the projected points according to their height values and locations on the plane. By producing clusters, the number, position, and orientation of objects in the surveillance scene can be determined for online multi-object detection and tracking.

To enable long-term tracking, the key problem is the detection of the object when it appears in the camera’s field of view. This problem is aggravated by the fact that the object may change its appearance thus making the appearance from the initial frame irrelevant. Next, a successful long-term tracker should handle scale and illumination changes, background clutter, and partial occlusions and operate in real-time. TLD [10] Method is proved to be an efficient method to track target in the video. TLD is a framework designed for long-term tracking of an unknown object in a video stream. The components of the framework are characterized as follows: Tracker estimates the object’s motion between consecutive frames under the assumption that the frame-to-frame motion is limited and the object is visible. The tracker is likely to fail and never recover if the object moves out of the camera view. Detector treats every frame as independent and performs full scanning of the image to localize all appearances that have been observed and learned in the past. As any other detector, the detector makes two types of errors: false positives and false negative. Learning observes performance of both, tracker and detector, estimates detector’s errors and generates training examples to avoid these errors in the future. The learning component assumes that both the tracker and the detector can fail. By the virtue of the learning, the detector generalizes to more object appearances and discriminates against background.

3 Experiment and Conclusion

In this paper, we aim at study multiple scale from the angle of space scale. We set up three scales: hand, arm, and body. We believe that a certain connections lies between each scales. Our main work content in this paper is to track the movement of pilot during flight task. The result will be further used to analyze pilot action.

First, vision based pattern recognition method is used to locate moving targets in video. Then we use logical based method to recognition actions. We also set up a series of experiments in dynamic flight simulators simulating real flight missions. Two camera are used to monitor the flight scene. We use stereo calibrate to get intrinsic and exterior parameter. TLD method is used to track pilot movement in the flight scene. Then we can get 3D coordinate of moving target. The multiple scale analysis can applied based on the position of moving target in 3D space.

The experiment results show that our approach is effective in pilot action recognition. We can track the movement of pilot in real-time and accurate positions in 3D space.