Fault Injection, Detection and Treatment in Simulated Autonomous Vehicles
- 927 Downloads
In the last few years autonomous vehicles have been on the rise. This increase in popularity lead by new technology advancements and availability to the regular consumer has put them in a position where safety must now be a top priority. With the objective of increasing the reliability and safety of these vehicles, fault detection and treatment modules for autonomous vehicles were developed for an existing multi-agent platform that coordinates them to perform high-level missions. Additionally, a fault injection tool was also developed to facilitate the study of said modules alongside a fault categorization system to help the treatment module select the best course of action. The results obtained show the potential of the developed work, with it being able to detect all the injected faults during the tests in a small enough time frame to be able to adequately treat these faults.
KeywordsAutonomous vehicles Unmanned aerial vehicles Fault injection Fault detection Fault treatment Simulation Safety
Autonomous vehicles (AVs) have received a lot of attention in the last years thanks to their ability to perform tasks in places humans can’t reach or are too dangerous . This increase in popularity drives the need to guarantee that these systems are safe to operate both for operators and surrounding population. To assure safety of operation, AVs must be resilient to failures that create dangerous situations. Since an AV can’t rely on the judgement of a human, it must detect and handle faults internally. The simplest way to achieve this is through redundant systems that compare each other’s outputs and can take over in case of a failure. However, this approach’s disadvantages are exacerbated in small AVs as they can’t always accommodate the additional weight and space. The alternative is to analyse the data generated from the vehicle’s sensors to detect fault-related patterns and alter its behaviour to handle the fault .
Because research with real vehicles can be cumbersome and expensive, the solution to this problem is going to be developed inside a simulation platform capable of coordinating AVs to perform high-level missions, which uses FSX (Flight Simulator X) as the simulation engine . This research is a continuation of the development of this platform as it currently does not have a fault handling system, which is crucial when dealing with this kind of vehicles. While the platform and the concept of the project can be applied to any AV, it was primarily developed and tested for large fixed-wing UAVs (Unnamed Aerial Vehicles).
The goal of this project is to develop and incorporate a fault diagnosis system to the platform. This system must be easy to use and cover the most common failures in UAVs. In the end, the vehicle should be able to detect and correct fault scenarios on its own, while minimizing computational resources overhead.
To achieve this objective, several new modules were built to integrate in the existing platform. The first is a fault injection tool that allows the user to control fault injections during missions. Then, two modules were added to the vehicle agent: one for fault detection and the other for treatment. In the end, tests to these modules were conducted to assess fault detection rates and times, as well as the quality of the treatment and computational impact on the platform.
The rest of this article is structured as follows. Section 2 quickly reviews the state of the art and previous related work. Section 3 details the implementation process, starting with fault-related tests made to FSX, and the fault injection, detection and treatment modules. In Sect. 4, a description of the performed experiments is presented alongside the results, with their discussion presented in Sect. 5. Finally, Sect. 6 concludes the article and elaborates on future work.
2 State of the Art
In this section a literature review is presented in two parts. First a more general view on fault detection methods is given, before exploring some related work where these methods are applied to AVs.
2.1 Fault Detection Methods
There is a large amount of relevant literature on fault detection, which has been a serious research topic at least since the 1970s. Throughout the years, several surveys have been published which detail the advancements in fault diagnosis.
Usually, these surveys divide fault detection methods in categories to simplify their classification. Different authors propose different but similar classifications. The simplest one was proposed by Gertler, with methods divided in those that make use of a model and those that don’t . Miljković used three groups: data methods and signal models; process model-based methods; and knowledge-based methods . The first two groups are identical to Gertler’s, with a new group for the recently developed machine learning methods. Isermann’s classification is the most complete and detailed, with several groups that relate to each other . The studied classification methods were labeled using Gertler’s approach.
Model-free methods, also called data-driven methods, use the input and output data from the system under diagnosis to search for fault patterns. These are usually less accurate than model-based methods but use less computational resources as they don’t need to make model-related calculations. On the other hand, model-based methods use a model of the system in conjunction with a combination of inputs and outputs, depending on the method, resulting in more accurate detection, but with a computing performance penalty .
Summary of reviewed Fault Detection Methods
Summary of most relevant fault detection literature in AVs
Angular rate sensors
\(\bullet \) Avg. detection rate: 84%
\(\bullet \) False-positives rate: 10%
\(\bullet \) Avg. detection time: 36 s
Model based observer
\(\bullet \) Avg. detection time: 0.55 s
\(\bullet \) #False-negatives: 6
\(\bullet \)#False-positives: 9
Model-based observer and change detection with Z-test
\(\bullet \) Model Based TIC avg.: 0.143
\(\bullet \) Change Detection avg. detection time: 0.8 s
Clustering (K-means and EM)
\(\bullet \) Detection rate: 96%
\(\bullet \) False-positive rate: 1.5%
\(\bullet \) Detection time: “Almost Instant”
2.2 Fault Detection in Autonomous Vehicles
In existing literature detailing the implementation of fault detection methods in autonomous vehicles, these can be either real or simulated, with some using both. In this literature review only those that study UAVs and present significant results are discussed.
Cork et al. applied the data collected from nominal flights to train Neural Networks to predict the output of a specific sensor and compare it with the measured values . When a high difference between the two was detected, the system knew something was not right. For a data-driven system it obtained good results and could even train while being used.
While not as popular as fixed wing UAVs, single rotor UAVs also exist. One of this kind of UAVs was used as a platform to create a model-based observer system to detect faults in positioning sensors. This work concluded that detection was possible but was more difficult in the case of additive and multiplicative faults, when compared to faults that made the sensors reading freeze .
Freeman et al. monitored the aileron actuators of a light UAV by two distinct approaches: change detection (data-driven) and observers (model-based) . Both systems were tested with real flight data. It was found that the model-based approach was better at detecting faults, but it was also noted that the process of modelling the UAV was time-consuming. Meanwhile, the data-driven method was easier to implement and could also detect most of the faults.
As for a fault detection system that utilizes a game/simulation engine like the one used in this project, only one such case was found. Purvis et al. used the open-source flight simulator FlightGear to create a system that could inject, detect and treat faults related to the pitot-static system of a simulated commercial airliner. Their solution used clustering methods to label the flight data as faulty and not faulty, with very good results .
Table 2 summarizes the results of this small literature review, showing for each work the type of faulty system, fault detection method and experimental results. As expected, model-based methods worked better than data-driven ones, with Clustering being the better method when no model is used.
The implementation process was divided in three parts. First a classification system for UAV faults was created. Next, a fault injection system was implemented in the multi-agent platform; and lastly, the agent responsible for controlling the vehicles was extended to include both a fault detection and treatment modules.
3.1 Fault Classification System
UAV Fault Classification Table
Reduced lift and speed
Return to airport and emergency landing
Complete loss of lift
Emergency landing/crash where possible
Loss of comms with ATC and potential flyaway
Return to airport and emergency landing with visual indication of communications fault
Control surfaces (single, free float)
Extra effort and care in controlling aircraft required
Return to airport and emergency landing
Control surfaces (single, stuck)
Difficulty in controlling aircraft
Return to airport and emergency landing
Control surfaces (multiple)
Total loss of control
None, remaining sensors should be able to compensate faulty one
Loss of spatial awareness
Return to airport if possible, emergency landing/crash where possible otherwise
Complete loss of spatial awareness
Complete loss of sensors, control surfaces and electrical propulsion
Prolonged landing distance
Abort landing and retry in longest runway, using all of the runway
3.2 Injecting Faults in Flight Simulator X
Fault severity scale
Flight control impact
No or very subtle alterations in control; could easily reach landing site and have no problems touching down in the designated area
Significant alterations in control; can reach landing site but might have difficulty landing in the designated area
Very compromised control; difficulty in reaching landing site
Very limited or no control at all
Before a fault can be injected, it first must be described. The user can create several faults that can affect any number of aircraft at any given time or during a number of special conditions. The fault itself is defined by a number of variables that determine when it should be triggered, when it ends, how strong the fault effect should be and what behaviour it should follow. Each fault contains a list of vehicles it can affect and a list of faults that can be injected to these vehicles. Different vehicles can be injected with different faults. The user can also define to great detail what conditions will trigger the fault, which can be based on the aircraft speed, altitude or location, elapsed time, weather conditions, ground surface type, etc. The value of the fault determines how severe the impact of the fault is or, in the case of control surfaces, the position at which they should be kept for the duration of the fault. The user can also choose the time behaviour that governs the fault injection, which can be set to permanent, intermittent, transient or noise. To simulate drift-like faults a ramping variable was also added that specifies how much time the fault should take to reach the desired strength. To facilitate the creation and modification of faults to be injected in a mission, a graphical interface was created to intuitively and quickly allow a user to specify changes. Figure 1 shows an example of this interface during use.
Engine faults can be injected to individual engines or to all engines. Due to the limitations of FSX, only the “all engines” fault can make use of the strength value, with the single engine faults being restricted to being toggled, setting the engine on or off. Brakes fault is another toggle-type fault that affects the aircraft when it is trying to slow down after landing. The communications fault was handled entirely through the platform messaging system and effectively blocks all messages from reaching or leaving the affected vehicle.
3.3 Fault Detection and Treatment
The engine fault detector uses a combination of limit and trend checkers on the available engine variable: the propeller speed. The trend checker constantly analyses the propeller speed rate of change and triggers when this value is higher than a predefined value. For situations when the engine thrust descended slowly over a long period of time, also called ramping, a limit checker was also implemented that simply verifies if the engine RPM is too low (300 RPM in this case). These two methods only trigger if the aircraft’s current altitude is lower than the desired one, to prevent falsely detecting a fault when the aircraft is descending.
Faults related to brakes are detected with another trend checker. When the aircraft touches down to land, it immediately starts analysing the rate at which the aircraft slows. If this rate stays low for too long (above −2 m/s\(^2\) for over 5 s in this case), a brake failure is detected.
Once a fault is detected, the fault treatment module gives it a classification and follows the recommended action. In cases where several faults have been detected it will perform the action associated with the fault with the highest severity. In extreme cases, such as full engine failures, this module will track the aircraft return course to the airport and deduce if the aircraft has enough altitude to reach it. If this is not the case, a new landing site that the aircraft knows not to be populated is chosen to prevent crashing into a building or humans.
4 Experimental Setup and Results
This section is organized in two main parts: first, an explanation of the tests is given, followed by the presentation and analysis of the results.
4.1 Test Configuration and Scenarios
The tests to the developed work were conducted in the proximity of an airport previously modelled in detail in the platform. It was chosen because it has an interesting layout of two long and one short runway. The model of the aircraft used in the simulation was the Beechcraft Baron 58. It was picked for its relatively small size and engine configuration as it is the smallest and lightest aircraft with a twin prop engine. The small size makes it comparable to the bigger UAVs like the United States Air Force Predator, in terms of wingspan and weight, while the dual engine configuration allows for more flexibility when testing.
For every test the aircraft was given a simple mission to perform, as seen in Fig. 2, which includes taking off, making a right bank turn while ascending, holding altitude for a few miles, performing another right bank turn while now descending, approaching the smallest runway at the airport and finally landing. The different colours represent the different flight phases. The tests were all conducted with FSX running at a simulation rate of 4x to reduce test times.
The tests were separated in two phases: in the first phase only the fault detectors are active and in the second phase both the fault detection and treatment modules are operational. This way a benchmark of the outcomes of the faults can first be recorded to then compare to the outcomes when the same test is run with the fault treatment module enabled. Table 5 shows a summary of the test with all settings used.
Tests to be performed to the fault detection module.
Tests #3, #4 and #5 cover single engine full failure in all 3 flight phases, while tests #6, #7 and #8 do the same but with 2 failing engines. Finally, tests #9 to #14 test the effects of different ramping values in the different flight stages. This effect will only be tested with engine faults since this is the only one that supports continuous analog injection.
Finally, the fault treatment module is enabled, and tests #1, #2, #4 and #6 are ran again to test the ability to treat the faults in the expected way and comparing the outcome of the tests with the previous non-treated tests.
4.2 Fault Detection Test Results
In test #0 the fault detectors did not pick up any fault and as expected the aircraft performed the complete mission without problems.
Results of intermittent communications fault
Injection timestamp (s)
Pause timestamp (s)
Injection delta (s)
Detection timestamp (s)
Detection delta (s)
The results of the engine faults can be seen in Table 7. All failures were detected and no false positives were recorded. In general, failures that occurred during takeoff were the fastest ones to be detected, followed by the ones during cruising, the descending ones being the slowest overall. Regarding the outcome, the only tests where the aircraft was able to complete the test flight were the ones with single engine failure. In the others the aircraft slowly descended until it hit the ground, without first deploying the landing gear.
Results of the various engine faults
Injection timestamp (s)
Detection timestamp (s)
Detection delta (s)
Aircraft able to complete test flight
Aircraft able to complete test flight
Aircraft able to complete test flight
4.3 Fault Treatment Test Results
With the treatment module enabled, the outcomes of the tests should vary to accommodate the injected faults. Starting with test #0, no changes were detected to mission execution and again no faults were detected.
In test #1 the brake failure was correctly identified once more on landing, but this time the aircraft aborts it, again taking off and making the necessary manoeuvres to approach the longest runway in the airport and land, as suggested in the categorization system. Even with the brake failure, the aircraft was able to stop within the length of the runway. The influence of the treatment module in this test can be seen in Fig. 5.
The fault injected in test #4 was also detected just like in the first test. The aircraft started the emergency landing protocol immediately by redirecting to the closest runway available to land as depicted in Fig. 7. Compared to the first test, where the aircraft was able to finish the mission in a safe manner, diverting to the airport immediately decreases the chances of an accident in case the fault propagates to the other engine.
Finally, in test #6 the fault was correctly identified, and the same emergency landing protocol was activated as in test #4. However, this time with both engines producing no thrust, the aircraft had no way of making it back to the airport. This was quickly detected and as a consequence the aircraft landed in a close field it knew was uninhabited, as can be seen in Fig. 8. In a real-world scenario this behaviour has the potential to decrease the number of accidents involving bystanders and decrease the probability of losing the aircraft in a crash.
Performance benchmarks were also conducted to test the impact of the new modules on the platform. The test measured the CPU (Central Processing Unit) load, memory allocated and CPU time for the platform in three scenarios: Just the Control Panel open; The Control Panel and Vehicle Agent running without the detection module; and all the modules active. Table 8 displays the results.
Resources used by the platform with different active modules (test were performed on a Laptop with an Intel Core i7-4710HQ processor @3.30 GHz)
Max CPU load (%)
Max. memory (MB)
CPU time per minute (s)
Control Panel (CP)
\(\sim \) 0
CP + Vehicle Agent (VA)
\(\sim \) 0
CP + VA + Detection Module
\(\sim \) 0.7
The achieved results are promising, with all faults being detected, and no false positives. This shows that the current implementation is robust, accurate and resilient to false triggers. On the other hand, detection times were overall good but not great. This was to be expected since simple fault detection methods were used, while other authors use more advanced ones. This could be improved by using more advanced methods, such as those used in the literature mentioned in Sect. 2. Despite the slow reaction time, it was fast enough to allow the treatment module to intervene in a positive way in otherwise dangerous scenarios.
With some detection times below one second, this simple approach managed to match the detection times in other works that used model-based approaches such as Freeman et al.  and Heredia et al. , but can’t keep up in more demanding scenarios. On another note, this solution managed to achieve an average detection time similar to that of Cork et al. . The work of Purvis et al.  is the most similar to this one due to also using a flight simulator as a testbed and using a data-driven method. The use of clustering methods allowed for better results in reaction time with similar detection performance.
6 Conclusion and Future Work
A fault injection tool was successfully implemented in an existing simulation platform, alongside a fault categorization system. Both these components proved useful in the development of a simple but capable fault detection and treatment system for the aircraft controller. The fault detection module managed to perform above expectations, with good detection performance during testing, with comparable results to the works mentioned above, while using much simpler detection methods. The fault detection times were generally good, with time-sensitive faults like brakes and engines being detected quickly enough for the fault treatment module to act. This module also proved to perform well, being able to determine the best action to take when a fault occurred and maintaining the safety of bystanders always in first place by taking into consideration the surroundings of the vehicle. All of this was achieved while keeping the CPU and memory loads very minimal.
The developed work sets a solid base to continue fault-related research in this platform. The fault injection tool in particular is very useful for this kind of research as it helps create detailed fault scenarios for the detection and treatment algorithms that while being tested only with one aircraft, can handle concurrent fault injection in teams of multiple vehicles. The implementation of all the stages of a fault diagnosis system with a modular architecture also facilitates future development of new algorithms without having to redesign the system.
While the results were satisfactory, they could be improved in the future by increasing the number of failures to detect, and using different and/or more sophisticated data-driven methods that analyse more data. To do this it would likely be necessary to base the platform in another similar but more advanced simulator that can offer more data for AI-controlled vehicles and supports more fault injection options than FSX. Detection times could also be improved by using the mission details to know what should be normal and abnormal behaviour for the aircraft at a certain location or time.
More information available online at https://docs.microsoft.com/en-us/previous-versions/microsoft-esp/cc526983(v=msdn.10).
- 1.Belcastro, C.M., et al.: Preliminary risk assessment for small unmanned aircraft. In: Proceedings of the 17th AIAA Aviation Technology, Integration, and Operations Conference, June 2017, Denver, Colorado, USA (2017)Google Scholar
- 3.Cork, L.R., Walker, R., Dunn, S.: Fault detection, identification and accommodation techniques for unmanned airborne vehicle. In: Proceedings of the 11th Australian International Aerospace Congress (AIAC 2005), 14–17 March 2005, Melbourne, Australia (2005)Google Scholar
- 5.Garrido, D.: Fault injection, detection and handling in autonomous vehicles. Mathesis, Faculty of Engineering of the University of Porto (2019)Google Scholar
- 10.Miljković, D.: Fault detection methods: a literature survey. In: Proceedings of the 34th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2011), 23–27 May 2011, Opatija, Croatia, pp. 750–755 (2011)Google Scholar
- 13.Silva, D.C.: Cooperative multi-robot missions: development of a platform and a specification language. Ph.D. thesis, Faculty of Engineering, University of Porto (2011)Google Scholar