1 Introduction

Occupational accident prevention is the key for effective safety management of any industry. Exploration of causes and sequential events that led to these accidents will effect in deployment of interventions to prevent the further occurrence. Manufacturing industries have seen increase in occupational accidents owing to application of high end machineries with complex working procedures for better productivity. Design and operational complexity involved with heavy material transfer equipment like EOT cranes make the workplace susceptible to various hazards and unprecedented accidents. The current study focused on incident investigation data of slab casting units of an integrated steel plant of India where EOT crane operations are predominant. Adverse situations like working at height, high temperature, improper visibility due to smoke, collisions, electrical malfunction more often lead to incidents (near miss and injury/property damage) in these units. Hence we carried out analysis of 176 crane related incidents that was recorded in the online safety management system in form of incident reports in the year of 2013–2016. These reports capture different attributes pertaining to an incident like date of incident, time of incident, brief description, incident category, impact, activity type, primary cause. The incident data are recorded by using two methods: in the first method, regular employees of the organization report the incident by directly logging in the SMS. In the second method, the contractor supervisors manually report the incident in their template paper form. Then by e-mail, the soft copy of that particular incident is sent to the safety manager of concerned section/subsection. The corresponding manager then logs the incident in online SMS. For this study, the authors extracted the incident data in excel format from 2014 to 2016 to determine the root causes behind the incidents.

Quantitative and qualitative analysis of crane accidents have been carried out by researchers to extract the major causes behind the crane accidents. Several studies are carried out based on questionnaire survey to find out the major factors affecting the safety of tower crane where operator proficiency is found to be the top factor that led to many accidents [1,2,3,4]. Reference [5] worked on quantification of risk associated with human errors while performing EOT crane operations by using HTA, SHERPA and Fuzzy Vikor. Chi-square test and proportional analysis helped [6] for identification crane related hazards from accident data after regulation of new OSHA rules for crane safety. Quantitative analysis like data mining approaches are not explored to its full potential by the researchers in the field of crane accidents. Emergence of data mining approaches like support vector machine (SVM), neural network (NN), self-organizing map (SOM) and decision tree (DT) helped in analyzing accident data in various industries and suggesting preventive measures [7, 8] adopted K-means algorithm for analyzing risk factors in crane related accidents and near misses from the incident reports. Applying data mining technique SOM and cluster analysis in accident data, [9] carried out risk assessment and explored common pattern of accidents. Comparative analysis is done for prediction of road accidents using both multivariate analysis and artificial neural network (ANN) which will ultimately help in road safety management [10]. Decision tree (DT) method is one of the data mining techniques which has tremendous potential in analyzing accident data [7]. DT is a top down branching structure where information contained in a data set is systematically broken down and relation between various input and out variable is established. DT can analyze both categorical and numerical data and resolve both regression and classification type problems [11]. Rules generated by decision tree often helps in finding out behavioral pattern that lies in the accident data set [12].

This study starts with describing the methodology in Sect. 2 and a detailed schematic diagram for the methodology is shown in Fig. 1. Data pre-processing, Attribute selection, Decision tree and Assessment of rules are given in Sects. 2.1, 2.2, 2.3 and 2.4 respectively. Results and discussion are described in Sects. 3 and 4 respectively. Finally, conclusion of the study is given in Sect. 5.

Fig. 1
figure 1

Schematic diagram of methodology adopted

2 Methodology

2.1 Data Pre-processing

For achieving high accuracy of an algorithm, satisfactory quality of the data has to be maintained by performing data preprocessing. In our study, we had 202 record in the dataset before pre-processing was performed. Data cleaning of this data was done by executing following processes in sequence.

Data reduction.

Data reduction is performed in a manner that it reduces dimensions of data while maintaining the integrity of data to the original form. We have adopted dimensionality reduction which involves decreasing random attributes or redundant variables to reduce data space. This was done manually by using domain knowledge and identifying significant attributes.

Duplicate data removal.

After considering reduction of data space with respect to attributes, it is also important to check data for redundancy present in data at tuple level. The crane accident considered in the study was cleaned from duplicate data using excel.

Missing data imputation.

Missing values in tuples can be handled in different ways, such as, ignoring those tuples, filling the values manually, using a global constant to fill all missing value in an attribute, etc. For our purpose, attribute “Risk Score” had missing values which was handled using EM (expectation maximization) algorithm.

Handling missing data was the last step of our data cleaning process. After pre-processing steps were sequentially executed, we obtained 176 crane accident records with 37 attributes, which met our criteria of quality for performing mining.

2.2 Attribute Selection

Both structured and unstructured data can be exploited using data mining techniques to obtain valuable insights from data. However, for our study we have considered only structured data. Before applying any mining technique on data, it is very important that attribute selected for the application provide most distinctive and clear picture of the set objective. Since our goal was to study the hidden pattern existing in the crane accidents and also find the causes for these patterns, we concluded 8 attributes as most distinctive for our objective: Primary cause, Incident Location, Incident Category, Impact, Shift, Day, Month and Activity type.

2.3 Decision Tree

A decision tree is a widely used classification method represented in a tree structure, where internal nodes are the test done on attribute, while the branches represent the outcome of the test. The leaf nodes are the target variable or class label.

For any given tuple whose target variable is unknown, its attribute values are tested along the path of decision tree from root to leaf node, which holds the class label prediction for the tuple. To distinctively divide each node, attribute selection is employed using splitting criterion. Splitting criteria provides the information for the test to be performed at node N so that it can be partitioned in best way.

Most of the time such information measures for classification purpose are entropy and Gini Index. The Gini Index, developed by Conrado Gini in 1912, is used in CART and is measure of impurity of T.

Gini index is given as:

$$ {\text{Gini}}\, ( {\text{T)}} = 1 - \sum {\left( {{\text{k}}_{\text{i}} } \right)^{2} } $$

where ki is the probability that an arbitrary tuple in D belongs to class ci.

The application of decision tree has broadened over years and has found its strong base in exploring accident causes and severity. Reference [13] used CART algorithm to identify causes and type of road accidents for traffic accidents in Brazil. Reference [14] made use of CHAID to predict the severity of bicycle crashes by establishing relationship between 8 important categorical predictors of crash and severity. For our purpose we use CART algorithm to explore pattern underlying for different incident category and use expert’s knowledge to identify causes for these patterns.

CART.

CART (Classification and regression tree) is a cornerstone algorithm in decision tree that adopts greedy approach [15]. This algorithm responds well where no underlying relationship exists between independent and dependent variable. It is a non-parametric model that establishes relationship between independent variable (characteristic variables) and dependent variable (target variable) [13]. Apart from using Gini Index, CART also uses multivariate split for attribute selection in some cases. Multivariate split considers more than one variable at a time to measure the splitting criteria. In our study we have used Salford Predictive Modeler to perform CART algorithm with incident Category as the target variable.

Rules Extraction (RE).

Decision rules can be extracted from decision tree, which provides more insight into the data. These rules are generated in the form of X => Y which can be interpreted as IF X happens THEN Y occurs. Here X (precedent) is the set of variable and Y (consequent) is the independent variable. Number of such rules is equal to the number of level in tree. However, in case of large number rules, these rules can be pruned on the basis of three parameters: support, confidence and lift value of rule. In our case total number of rules were limited due to small size of dataset and attributes considered, therefore we were not required to prune the rules.

2.4 Assessment of Rules

Decision rules obtained from decision tree were further manually intervened to design investigating questions. These questions were then communicated to domain expert for their opinion. The purpose of receiving the expert’s opinion was to find the major causes behind the patterns explored from the accident data. Based on the expert’s opinion we proposed some intervention to prevent these accidents to occur. The schematic diagram explaining the steps involved in achieving the causes of accidents is given in Fig. 2.

Fig. 2
figure 2

Decision tree results

3 Results

It can be observed from the tree that accidents occurring during construction and maintenance lead to near misses and no other attribute is tested in this case. On the other hand, for operation activity, more distinctive attributes are required to classify tuples. Rules extracted from the tree using conditional operator are mentioned in detail in Table 1. These rules were further studies to obtain investigating question. Following set of inferences were obtained from decision rules, which were further communicated to expert for their opinion:

Table 1 Decision tree rules
  1. (1)

    Near misses (NM) are more in number in case of Construction/Maintenance activity

  2. (2)

    During activity Type (AT) {Operation}, occurrence of near miss results in Impact = {Fatality/First Aid} otherwise {Injury/Property damage}

  3. (3)

    During activity Type {Operation}, during the month April, December, January, November and October, occurrence of near miss results in Impact {Equipment/Property damage} otherwise Injury/property (I) damage (in other months)

  4. (4)

    Accidents happening on weekends (Fri, Sat, Sun) and start of the week (Mon) lead to near miss otherwise Injury/Property damage (in other days).

These results were shared with expert and opinion received from him helped us learn the causes for these patterns to exist. The causes informed by expert and interventions suggested from our end are discussed in the Discussion section.

4 Discussion

Safety experts gave their suggestions after going through the results. In maintenance and construction activities, manual work involvement is more with large number of workforces. Hence near miss cases are more in these activities.

In contrast, operation activities involve more automation and less involvement of workforce, hence probability of near misses are less. The movement of crane primarily happens during “Operation” therefore, it is obvious that the safety incidents related with hit/fall can happen which lead to fatality or first aid (mostly). Otherwise during maintenance activities, equipment are in static condition for which probability of impact of incidents are less. In the studied plant, December to April are the months when the pressure is more for production to achieve the requisite turnover after that maintenance work is carried out. So it is always expected that incidents would be more in these months and less during maintenance work. Lethargic approach of operator on Saturday and Sunday because of absence of senior authorities and Monday as the first day of week, lead to more number of incidents.

5 Conclusion

Current study focused on exploring the underlying causes of accidents frequently happening in sections in a specific steel plant where application of EOT crane is predominant. Most of the incidents happening are because of operator negligence or improper supervision at the workplace. Stringent safety guidelines at the workplace, regular monitoring of supervisor’s activities, adequate training about standard operating procedures (SOPs) to be followed at the workplace, use of proper personal protective equipment (PPE) while working are some managerial implications that can help the plant management for improving safety at workplaces and save people from accidents.