1 Introduction

The Internet of Things (IoT) revolution led to an exponential increase in the number of physical things that are connected to the Internet [23]. IoT and other connected data technologies, i.e. cyber-physical systems (CPS), thrive to enhance existing infrastructure and operational systems. CPS can bring a number of benefits to citizens, business and governments. However, securing these systems in a piecemeal mode has proven a monumental challenge [5]. With the continuous implementation of time- and safety-critical applications of CPS, the security risks and costs of potential attacks will continue to grow. As core components of safety-critical infrastructures, CPS, e.g. smart grid, have become an attractive target for cyberattacks. To illustrate, hackers sabotaged the control system of Ukraine’s electric grid causing power outage affecting about 230,000 people. Without effective security controls, attackers are potentially able to access a CPS, sometimes using Internet connected devices as an entry gateway, causing damage from long distances [17].

Cybersecurity has rapidly become an issue of major contemporary relevance in computer science. With cybersecurity incidents increasing every year, the state of the art has only become more apparent as deficient [12, 30, 31]. It is important that whilst the trend does appear to be an increasing frequency and seriousness of cybersecurity incidents, it remains an area of study which is often misunderstood and highly controversial. In many cases, cybersecurity technologies, data and heuristic methods are proprietary and unavailable for academic observation or use for free by those less economically able but whom the literature recognises as being the most vulnerable to cybersecurity incidents [6]. Thus, the long-term sustainability of day-to-day computing technologies is questionable. For example, Heartbleed [13], a vulnerability potential affecting the whole Internet, could not have been avoided because even a small human error can have catastrophic consequences. In that case, it was a short-spanned programming error many years earlier, but its effect was catastrophic. The likelihood of some influence on security leading to a breach is a matter of chance resulting from the methods used in computer security, primarily the heuristic nature of computer security software.

The majority of cybersecurity incident datasets are proprietary, often coupled with the added caveat that when vulnerabilities are discovered they are either kept secret or are only shared within acutely technical circles. Consequently, before and during the vulnerability discovery phase there is an opportunity presented for potentially silent zero-day exploitation by attackers who could be extremely capable.

It can therefore be argued that in reality, the focus is not on the nature and content of an exploit specifically, but how it is able to weave between computer security structures to achieve the end goal of exploitation. Thus, the objective of this research is to understand the risks and niceties of the largest publicly available dataset of cybersecurity incidents with the overall aim to identify patterns of importance among the dataset, particularly ones that are CPS specific. This is achieved by studying the characteristics of the VERIS Community Database (VCDB) of cybersecurity incidents. VCDB is the most accepted open-source dataset for cybersecurity incidents in industry, academia and government [16]. It records a number of cybersecurity incident features, some of which are attack mode, actor type, impact, victim type, timeline and prose summaries often accompanied by a hard reference to a source. The dataset, as of writing, now contains about 7000 unique incidents gleaned from real-life data breaches over a period of 12 years [26]. The existing VCDB dataset has extensive fielding, which is appropriate in degree to extensively critically analyse the mechanisms of cybersecurity incidents. Other datasets are too specific and do not provide a temporal dimension to security incidents.

The general trend within cybersecurity is for each organisation to take a different stance by using a different analytics platform to record cybersecurity incidents and data breaches. Indeed, cybersecurity risk analytics has become a lucrative venture [15]. This is unhelpful because not only is the data relating to cybersecurity incidents not reaching a central information repository which other users can benefit from, the organisations themselves may implement means and models which are not particularly effective nor accurate.

The motivation for this research is to analyse in detail the largest and most comprehensive dataset available for cybersecurity incidents in order to understand a greater depth of context to cybersecurity incidents. The objective is to, by Monte Carlo simulation, map a range of possibilities for future attack modes. An added motivation is to create a model for a repeatable approach to analysis of this dataset (VCDB) and other cybersecurity incident datasets. The model for the approach in this research is repeatable in other research studies.

The rest of the paper is organised as follows: Sect. 2 gives the key threats to CPS and the motivation behind this research. Section 3 covers recent studies that attempt to analyse data breaches. Section 4 details our methodology for feature analysis and the Monte Carlo simulation. Section 5 details the results of risk modelling. In Sect. 6, the results are thoroughly analysed with respect to frequency, increasing rate and loss, the principles of least privilege, human error and criticality. Section 7 concludes the paper and identifies future work avenues.

2 CPS security threats and motivation

The cybersecurity risks in CPS have become a serious concern for security professionals. This is particularly true as CPS is increasingly deployed in critical infrastructure, manufacturing and everyday life such as building control, medical devices and smart grid. When a connected endpoint is breached, a backdoor into other parts of the network is created. Hence, the result of malicious attacks can have severe consequences on human lives, business productivity and national security. In the following, we highlight some of the key security threats to CPS; we refer interested readers to the following works [3, 5, 9, 20, 32] for an extensive treatment of the topic.

Industry is driven by functional requirements with little attention to security. The massive growth in the number of CPS connected devices increases the attack vector. Most of these devices have long lifespans. Many devices do not get enough security updates; some never get updated at all. This is partially due to the fact that a CPS device has been managed by operational technology teams rather than IT departments leading to excluding them from the proactive and coordinated efforts that are deployed to secure enterprise systems.

CPS device are at risk of being compromised for various reasons. Some, often heuristic, attacks could hijack connected devices and turn them into email servers for mass spam, use them as botnets for executing DDoS (Distributed Denial of Service) attacks or simply cause interruption to business processes. The motivation behind these attacks could be financial, e.g. tamper with a physical utility meter or inject false data to misinform the production process causing monetary loss. Many attacks are also motivated by political reasons such as interrupting power supply as part of cyber warfare.

The limited computation, communication and processing resources of common CPS devices make the application of classical data encryption and secure communication protocols impractical. Hence, many CPS devices do not encrypt communication between devices and with the cloud servers. Secure communication protocols must be used to protect against unsafe communication. More recently, with the rising popularity of the zero trust security model, micro-segmentation is used to assign CPS devices to a separate network to create private communication that keeps the transmitted data secure.

One of the common CPS vulnerabilities is the use of default passwords and device misconfiguration. Until today, most devices are shipped with default passwords and settings. Attackers can use this knowledge to brute force into these devices. Weak credentials put both the user and their business at risk of being susceptible to attacks. After gaining access to CPS devices, attackers can establish remote sessions and use them to monitor the owners without their knowledge. Furthermore, attackers can use CPS devices, either through their IP addresses or built-in GPS chip, to find a user’s physical location. Installing a VPN to secure a CPS can keep the IP address private. As CPS connects the physical and virtual worlds, unsecured devices which for instance are part of a home security system, when compromised can cause massive risks to personal and business safety. One of the examples of remote access attacks is hijacking self-driving vehicles and even asking the owners to pay a ransom to return control of the vehicle, machine or medical device.

In conclusion, the plethora of different CPS devices brings various vulnerabilities. New trends such as the use of artificial intelligence (AI) and automation introduce a new vector of security threats to CPS. In modern AI-enabled CPS, simple coding errors can bring down the entire infrastructure that it was controlling. The human factor plays a key role in securing CPS; this threat can be resolved through educating individuals using or managing a CPS. Education goes beyond technical knowledge of how to secure a system to cover awareness about the impact of CPS and security attacks, which could be the difference between having a secure network and a security breach. The scale and complexity of a CPS system makes it expensive and time-consuming to secure CPS infrastructure. Security threats for CPS are only expected to intensify as more targets are becoming available. With the rise of Industry 4.0, machine phishing will become a serious concern to the smart manufacturing sector. Manufacturers and other CPS users will have to invest more in addressing the serious security vulnerabilities and threats. This paper is set to identify imminent security threats to CPS and beyond. We believe that understanding the sources of threats is the first line of defence against such risks as it helps organisations implement their information security strategies, make well-informed governance decisions and mitigates risks at an early stage.

3 Related work

There is a recognised sparsity of complete incident datasets for cybersecurity [19]. Comparisons of various datasets have been performed, but this has not included a detailed consideration of the underlying scientific and social mechanisms of cybersecurity incidents [14]. The demand for research conducting data breach analysis is very well recognised; nonetheless, there appears to be a general sparsity within the literature of such works, particularly more searching research [1].

There are very recent and rigorous research papers which analyse data security breaches [4]. The authors of [31] ascertain the existing technological capability to mitigate insider threats within computer security systems by way of a mixed-method systematic review. This research does not take into account the malicious insider. In [21], a systematic literature review and meta-analysis on artificial intelligence in penetration testing and vulnerability assessment is conducted.

Such research efforts do not investigate the wider temporal dimensions of these breaches but instead focus on common mechanisms. There is clearly a demand for research investigating wider perspectives. This motivates our research to understand the nature of security threats so that new technologies can be developed around potential further findings.

4 Feature analysis and Monte Carlo simulation methods

The approach in this study was complex and time-consuming owing to the need to manually sort data. Data was downloaded from the vz-risk/vcdb [26] repository in JSON format and the schema analysed using purpose-built software. A software package “VerisDB Analyst” was created to parse that JSON data and present it both in a web application and a REST API over HTTP [29]. The analysis conducted by the application was performed through MapReduce functions [11], which appear in and are explained by [27]. VerisDB Analyst is backed by a MongoDB database. In this particular study, the software was tested against MongoDB Atlas (the cloud offering of MongoDB) and was tested against a local installation. The primary reason for this was to assess connection and database drop-outs.

The VerisDB Analyst application contains the MapReduce function in Algorithm 1. That MapReduce algorithm was implemented in respect of each type of property [27]. The algorithm was slightly tailored to the properties in the VERIS schema [26] because it does not presently adhere to the JSON specification [7] since some keys and values are arbitrarily stored in sub-objects and property names. This posed challenges for parsing the data, because the data heirarchy was sophisticated by those problems. As a direct consequence, the Javascript used to make VerisDB Analyst is not optimal. It could be argued that data should have been pre-corrected, but this approach was not taken because VERIS is open-source and it was more appropriate to raise it as an issue with the Verizon RISK team [28].

figure a

The purpose of Algorithm 1 was to extract only the relevant data for the purposes of summarising the cybersecurity incident data. Only the overlying trends were important, not the structure or metadata of the database state itself (in this case a JSON tree). The algorithm was performed in Javascript ES7 in node.js 8.8.1. The algorithm returned computations in respect of whole objects summarised from the whole JSON database as stored on MongoDB within 2441 ms on an Intel i3-4030 single-threaded 8GB RAM on a commodity HP G6 laptop re-purposed as a server which runs the LXD container platform. The application used for statistical analysis was held in a container with MongoDB 3.4 installed locally. Given that the JSON database is 19 MB and was processed around 5 times through MongoDB aggregation queries, this is a satisfactory return time though it could be greatly enhanced with some further optimisations. Present reducibility is 75% with 19 MB reduced to 4.75 MB.

Data was extracted using the VerisDB Analyst web application which analyses the whole dataset on the fly. The match property used is the MongoDB 3.4 implementation of $match, which contains an evaluation to be performed which has to be satisfied for documents to be returned. For example, this could be whether a specified property condition is true. The group property is simply a string representation of a property to be aggregated and counted using a standardised query structure. This uses the MongoDB 3.4 implementation of $group. The sort property contains a sub-object with a single key, value pattern. The key is the property to sort against, the integer value indicated whether to sort in ascending or descending order. The unwind property flattens arrays in the specified property path in the JSON hierarchy.

Fig. 1
figure 1

The results of the Monte Carlo simulation

5 Risk assessment modelling results, summarised data fields

Figure 1 shows the results of the Monte Carlo simulation. The results were obtained for each type of query using the VCDB Dataset as at 20 October 2017. Data was obtained for actors, attack mode, the impact of any attacks and victim demographics. It should be noted that not all entries within the VCDB dataset report on all incident information. Thus, some incidents may have contained information on one field of interest but not on others, e.g. might report attack data but not victim data.

Any reference to banding within this section means a group of the highest figures, which when taken together in a group or “band”, have central tendency. The central tendency was not significant of itself but represents two caveats: (i) a collection of similar occurrences may show central tendency because multiple collection fields could refer to the same overall field (a global field) and (ii) the central tendency could reflect a probabilistic confidence interval owing to a characteristic of the dataset which may or may not have been inferential but where there are too many degrees of freedom to formally prove specific inference from data properties.

Table 1 Actors data in the analysed VCDB dataset

There appeared to be extensive data on actors within the VCDB dataset. The dataset relating to actors appeared in Table 1. The dataset showed remarkable features across all fields in respect of each actor type (internal, external or partner).

Fig. 2
figure 2

Attack actor type distribution

Internal actors tended to be motivated financially (602) compared to the external and partner actor groups. This was followed closely by the motivations of fun (200) which also featured strongly, corporate espionage (63), convenience (56) and grudges (48). In a large number of incidents, the motivation was not known (757). The dataset reported that in a large proportion of incidents the incidents were accidents (1703).

Figure 2 shows the actor type distribution. Typical actors were end-users (615) and system administrators (93) who featured in the strongest band. This was followed by executives (83), financiers (65), cashiers (57), managers (47) and developers (45) who were groups which lie in a similar band of prevalence.

Most external actors originated from the USA (219) and Russia (124) which were in the highest band. China (53), Pakistan (42), Great Britain (41) and Syria (38) form the second from highest band. The United Arab Emirates (30), Turkey (24) and North Korea (20) form the lowest band. There were (2975) records concerning external actors where the geographic origin of the attacker was not known.

Table 2 Attack data in the analysed VCDB

The majority of actors were motivated financially (1480), by political or religious ideology or protest (367), espionage (266) and fun (211). In the 1242 incidents involving external attackers, the motive of the attacker was not known. External actors were primarily activists (468), independent attackers (311), state affiliated (209) or the attacks took place during the commission of wider organised criminal offences (198). There was a secondary band of similar prevalences, which consisted of former employees (57) and nation states (39). In 2538 incidents, the variety of external actor was not known.

The majority of incidents where the actor was a partner of the victim involved originators from the USA (104). This was followed by a much smaller band, consisting of Great Britain (9), India (5), Canada (4), Republic of Ireland (2) and Australia (2). In the majority of incidents involving partners, the geographic origin of the actor could not be established (213).

The majority of incidents involving partners were accidental or unintentional (179). This was followed by a strong banding of financial motivation. The majority of actor varieties could not be established and were recorded as unknown (143).

There was a considerable amount of data relating to attack modes within Table 2. The attacks were grouped into modes, namely error, hacking, misuse, physical, malware, social, unknown and environmental. The group featuring the largest number of incidents was the “error” (2038) group together with “hacking” (1927); these groups were relatively similar in value. The “misuse” (1409) and “physical” (1343) groups formed a lower secondary banding of similar values. There was a much lower tertiary banding of the groups “malware” (545) and “social” (451). The banding was likely to be inferential to other fields.

Fig. 3
figure 3

A summary of the attack discovery method in the studied dataset

Figure 3 shows a summary of the attack discovery method. The sample reported in this study was taken from incidents involving hacking. In 797 cases, the means of discovery were not known. Over 559 incidents were voluntarily disclosed by the actors in question. There were 151 incidents identified as a result of suspicious traffic on the network. In 86 cases, the incidents were discovered when employees reported observations. In 73 cases, the customer reported observations which led to discovery of the incident in question. HIDS detection was apparent in only five cases involving hacking.

Fig. 4
figure 4

The number of security incidents per year

The incident rate per year appeared to be latent until the rate of change in incidents per year increased rapidly from 2009 (89) until 2010 (579)—this appeared as the first peak in Fig. 4. There was a fall in the incident rate in 2011 (537) after which there was a dramatic rate of change in incident rate, increasing from 1252 in 2012 to 1907 in 2013—this forms the second peak which appeared in Fig. 4. The rate of change almost entirely reversed to 2014 (902) and plateaued through to 2015 which stayed at a steady incident rate (862) and dropped lower to (649) in 2016.

Fig. 5
figure 5

The financial loss per year reported in the dataset

The financial loss per year reported in the dataset followed a logarithmic scale as in Fig. 5. The rate of change was relatively inconsistent from year to year but with an exponential increase between 2015 and 2016. The year financial loss at 2016 was approximately 1e10 USD. The 2016 value was an exponential increase from previous years. From 2010 to 2015, there were exponential fluctuations in what appeared to closely resemble a sinusoidal function. The only exception to that sinusoidal function was in 2006 (9e3 USD) where there appeared to be an exponential decrease in the financial loss amount in that year when compared to other years in the function.

Fig. 6
figure 6

The results pertaining to security incident impact variety

The results pertaining to impact variety are illustrated in Fig. 6. The largest impact of cybersecurity incidents was in assets and fraud (245), followed by legal and regulatory (226), response and recovery (172) and brand damage (136). These impact values cover the period 1972 to 2016.

Fig. 7
figure 7

An illustration of the results (summarised in Fig. 6) for impact rating

An illustration of the results for impact rating is provided in Fig. 7. For most incidents, the true impact rating was not fully understood (599). There were 14 incidents rated as having a major impact, 31 incidents were rated as being moderate, 274 minor and 244 were rated as having no impact at all, i.e. null.

Table 3 Victim demographics data

Victim demographics appear in Table 3. The most common country of victims was the USA (5121) followed by Great Britain (416), Canada (243) and Australia (96). Of all incidents, 169 were geographically unknown. Most victim organisations (990) had an employee count of over 100,000. Slightly less than this, 923 had an employee count of 1001 to 10,000. Those with 101 to 1000 employees sustained 700 incidents. Nine hundred thirty-eight victim organisations had between 1 and 100 employees. The majority of industries affected appeared to be in the public sector, with commercial bank (196) and Internet publishing (147) following marginally.

6 Discussion

It is striking from an analysis of the dataset that despite the data being collected through the open-source community, it appears to represent the picture one might expect to see based upon the malicious trends industries are actively working to address.

6.1 Frequency of internal attacks

The dataset shows that from 1972 to 2016, overall the number of internal actors and the number of external actors are roughly equal. This is important because computer systems are designed to mitigate threats from the outside inward, not usually the other way around [18]. The traditional view of computer security is that it should be organised like an onion with layers of security zones [8]. These configurations clearly have not mitigated the threats in this study as depicted in Fig. 3.

It is notable that the majority of internal attackers were end-users and system administrators, the two staff groups most trusted in a computing environment. Whilst the motivation in many attacks was not known, the majority of attacks were financially motivated. Another large proportion of attacks are noted as having been carried out for fun. Both of these findings are remarkable. These trends are likely a result of the increasing adoption and use of computer technologies to solve an increasing number of problems that were previously done offline. It appears that the increasing adoption of computers has led to a widened opportunity for attacks to take place.

6.2 Increasing rate and loss

It is apparent in Fig. 4 that cybersecurity incidents have dramatically increased since 2010. It is notable that this is approximately around the time of mass cloud technology adoption and the general move towards Internet services [22]. The exponentially increasing incident rate from 2010 to 2013 is suggestive of a gauging exercise within computer security in which the dramatic change in the yearly rate of adoption of online services was too rapid for computer scientists and computer users to keep up technologically with what clearly appears to be a widened opportunity for malicious activity. Though the incident rate appears to drop, the amount of loss occurring per year is increasing exponentially. Figure 5 appears to represent a substantial sinusoidal pattern about 1e8 which appears to scale down towards 2e8 before the annual loss figure rapidly increases to 1e12.

6.3 Principle of least privilege

The statistical distributions in Table 2 are indicative of what can be expected based upon the state of the art, which is potentially suggestive of the accuracy of VCDB. The dataset tends to show that there is a direct relation between the mode of attack and the mode of use; in fact, they are the same in most cases. For example, in Table 2, malware and hacking have strong relations to backdoors left by system administrators. Misuse incidents frequently present as being a result of privilege abuse. Social incidents tend to be as a result of phishing. The most common error incidents involve misdelivery of confidential information. It is clear that in each case, the usage of a computer system is often its means of exploitation and destruction.

This closely ties with [24], which is the seminal piece of work in computer security that defined the principle of least privilege (POLP). The principle echoes the patterns that can be observed in Table 2. Given what can be observed, POLP is not enough in principle to secure a computer system. This is because, according to the dataset, even a design which follows POLP is still vulnerable because it can still be misused to some extent. The vulnerability lies in the ability of use, which POLP cannot address. Given that POLP is the most popular design pattern for a computer security configuration [25], this is a serious problem. Following this design pattern means that usability and access are a direct trade-off for security, which is a highly unsatisfactory position resulting in the shortcomings that can be observed, depending on where the balance falls during design.

6.4 The role of human error

The majority of attack modes relate to functional work and access arrangements for computer systems. The vectors for each attack mode show in clarity that these incidents are a result of bad practices, for example sending confidential information in a way that poses a risk of misdelivery. However, not all vectors appear to relate to user error but to the mode of use itself. All modes of attack appear to be an overall abuse of the privilege of being able to use the computer system, not any specific permission target. In these cases, it is not always possible to accurately determine fault because fault itself may be a philosophical question.

The recurring theme of data stores as an affected asset (documents and databases) seems to indicate that the object of cybersecurity incidents is information. This agrees with the well-known position that data is one of the most valuable assets in hi-tech economies. Thus, it can be properly concluded that if abuse of privilege is the means to procure those documents, then privilege itself is the incurable vulnerability. It is, in effect, a vulnerability which cannot be patched because by so doing, the computer system would be rendered unusable for the purpose intended. This begs the question of how and by what means security will not be a trade-off with functionality.

6.5 Criticality

The results for Tables 1 and 2 are organised in bands of criticality. This was spontaneous and not a result of placement. The most important criticalities are that in the majority of fields, important information was unknown. It is impossible to say with any degree of precision that this is for any specific reason, but it is likely that this is an indicator of successful repudiation. It is a concerning pattern that successful repudiation could feature significantly and suggests that current technology is unable to deal with the scale of malicious activity which organisations are facing.

Geographic criticalities are extremely important. The majority of victims seem to be located within western states, though this could be because information relating to other states is kept confidential. Those organisations on the extremes of size (the largest and smallest) in terms of employee count appear to be more vulnerable than others to cybersecurity incidents. This is likely to be because in a small organisation, it is probably financially less capable for the purposes of investment in cybersecurity and has less access to expertise than a larger organisation. In the largest organisations, it is likely to be very difficult to detect breaches and to mitigate a potentially higher rate of internal malicious activity. The scalability of computer security infrastructures is a known problem within the literature [10].

The most frequent assets targeted are documents, personal computers, databases and web applications. This is not surprising because each asset usually has to be compromised carefully in tandem in order to obtain the confidential documents. The attack methods tend to reflect the type of incident involved, in particular the public sector is affected the most by cybersecurity incidents followed by commercial banking, with both having greater criticality—this is probably because they are ideal targets given that the dominant motivations in the results seem to be financial and the asset targeted tends to be documents storing confidential information.

The majority of actors seem to come from the USA and Russia. Other strongly featuring geographic originators are countries with which there are known political and military conflicts, e.g. Syria, and so their presence among the results is not surprising. Most internal threats showed criticality in the end-user and financial motive, whereas external threats showed greater criticality in activism and nation state activity. This is consistent with what one might expect based on the state of global affairs.

The mode of attack and vectors tend to demonstrate a logical consistency with what is known about cyberattack patterns. For example, statistically phishing was strongly associated with the vector of email. The vector in most attack modes was not known. Those vectors and attack modes showing greatest criticality were physical access, email and LAN access/backdoors. The attack modes showed greatest criticality in carelessness and privilege abuse—this ties in closely with established cybersecurity best practices and known modes of attack. This is concerning because the attack rates are still remarkably high even though there are well known mitigations against the same.

Moreover, this data adds to the perspective [2] that existing means of securing computer systems cannot 100% guarantee detection and mitigation of every threat. Though at first glance it may appear as though incident rates are decreasing, loss is increasing exponentially which challenges the very foundations of computer security. It suggests attackers are doing more damage through fewer activities and are thus becoming rapidly more capable.

7 Summary and conclusion

The objective of this research was to identify important trends amongst the largest publicly available dataset for cybersecurity. It also aimed to investigate the degree of randomness and the probability of extreme possibilities occurring in cybersecurity incidents; these incidents associated with the VCDB dataset because it is the largest cybersecurity incident dataset which is publicly available. This has been achieved with successful Monte Carlo simulation and by implementing a MapReduce function to take a sizeable JSON dataset and summarise it to respond to realtime queries from a web application. The general patterns identified from the MapReduce process were that there is a major increase in financial loss as a result of incidents, which are increasing in prevalence. It is becoming increasingly common for incidents to be internal in origin. It is becoming more popular for nation states to engage in acts of cyber warfare. In general, the trend appears to be that cybersecurity incidents are motivated either by ideology or financial motivation. It is to be expected that loss will continue to increase, prevalence will increase and the role of nation states in cyberattacks is likely to increase.

From the analysis in this study, it appears that the challenge faced by cybersecurity is on a theoretical battle ground. Present ways of thinking about cybersecurity trade usability with security. For instance, generalised access alone is a violation of Saltzer’s principle of least privilege [24]. The balance between usability and security is, it is submitted, the source of the prevalence of modern cyberattacks. The field of computer security needs a fundamental way of providing access with a usable experience and implementation of the functional requirements intended, without any implication for security. In order to achieve that there needs to be a theoretical rethink of cybersecurity.

Multiple temporal datasets also need to be compared together with data which is more recent in order to understand the most recent security incident trends and the underlying social reasons for them as opposed to statistics in isolation. We provide a model from which this exercise can be executed repeatedly and reliably against the same and other datasets.

The types of attacks studied in this paper shows an increasing exploitation of vulnerabilities that are almost “extinct” in classical computing environments. This can be explained by the immature security mechanisms in Supervisory Control and Data Acquisition Systems (SCADA) networks which employs a number of purpose-built protocols such as DNP3, Modbus and BACnet. Often, the security protocols deployed in CPS are not corresponding with their criticality or they even they lack any security protection mechanism. Many of the potential attacks on CPS and IoT devices can be prevented using anomaly and/or intrusion detection systems; nonetheless, these are vulnerable to statistical induction as established in [30]. However, there is a need for tailored innovative security and management mechanisms at the device-level to enhance CPS resilience.

Detailed patterns of activity reflecting the motivations of attackers need to be explored in order to understand specific types of cyberattacks. Theory then needs to be shaped to those realities. A significant amount of future academic research needs to build further datasets tailored to specific modes of enquiry. Research is also required to develop new mechanisms for the prevention of cybersecurity incidents.

It is submitted that based upon the patterns observed in this research, there needs to be a greater understanding of how the stand-off between functional requirements and security can be addressed, particularly in CPS. It is apparent that functional requirements can and often do require a trade with security, and the relationship is directly proportionate. Yet in industry there is a very high motivation for features in pressured commercial environments where new features are required often to remain competitive. The pressure to release new features classically results in security being secondary. This is one area which requires further analysis and research.

“Zero trust” or trustless computing is also a key emerging paradigm which changes the approach taken to cybersecurity generally. However, it has so far been the child of commercial research organisations with little output from academia. Academically rigorous outputs in relation to zero trust are required so that it can be applied in situations where classical application of cybersecurity mitigations are impractical, such as in the case of IIoT where devices may not have enough computational power to encrypt data. Examples of such emerging research in academic includes [31].