Keywords

1 Introduction

The vehicle density has been swiftly growing over few decades globally [1]. India is not an exception and tries to build its overall performance in different aspects worldwide by expanding road network and transportation system. The highly connected road and highway network in India is the backbone of the continuous development of Indian social economics. Besides the overall growth progresses, road accidents in India are also accelerating. The frequent severe road accidents not only result in life threats but also damage the infrastructure and aggregates massive economic loss. According to the Ministry of Road Transport and Highways [2], the number of accidental deaths have increased by 31% from 2007 to 2017 and take a lot of attention. Indian Government and different state authorities already implemented different safety measures and strict traffic rules.

Road safety is one of the primary parameters for the growth of a country combining other factors such as population, literacy, etc. To address the problem of insufficient road safety, it requires the dedicated action of several ministries, most notably law, planning, transport, education, public information, and health. Road safety assessment in periodical manner has played a major role in the theory and practice of transport management systems. The states are maintaining lots of statistical data which can be used to extract the progress of individual states in road safety awareness. It can be noted that every state has different population density, road infrastructure, and geographical location and number of road accidents in different states are not comparable. But the alarming situation demands an alternative data mining tool to measure the performance of the states in road safety aspects and relate them to influential parameters like population, number of vehicles, and literacy rate. The correlation among these attributes with road accidents may reveal the actual performance of the different Indian states in reducing traffic mishaps. In this paper, we develop a novel line encoding method to demonstrate the progress of any entity from time-varying data. The method is very easy to implement and disclose the proper pattern of positive or negative growth rate as the time progresses. In the current analysis, we categorize Indian states into three categories, i.e., best performer, average performer, and stragglers. Our method is unique and the results are satisfactory which can open up a new framework for performance analysis.

The remaining paper is structured as follows. We elaborate some previous works in Sect. 2. In Sect. 3, our approaches for performance analysis are proposed. We illustrate our method with an example in Sect. 4. Experimental analysis and results are shown in Sect. 5. Then in Sect. 6, we conclude.

2 Literature Review

Different computational work are already employed to analyze road network [3,4,5] and efficient transportation system [6,7,8]. Assuming the road network as a graph with road junction as vertices and highways between them as edges, different topological properties are analyzed in [9]. Topological properties are studied and found road network possesses the properties of small world assortative network. Based on betweenness centrality, most important junction points are identified. Most important junction points are concluded as potential congestion points. Wenxue et al. [10] proposed an improved method to analyze hazardous material road transportation accident rate. To analyze and divide regions of a state based on the pattern of accidents, an agglomerative hierarchical clustering algorithm is introduced [11]. To evaluate best distance measure, the Cophenetic correlation coefficient is used for clustering. The approach is applied to analyze hourly road accident data of twenty-six districts of Gujarat. Tracking vehicles and monitoring driving habits is proposed in [12] to reduce the number of road accidents. Telematics system is used to track each vehicles position and report the errors for further analysis. A modified method [13] of principal component analysis is proposed to analyze the relationship between traffic accident evaluating index and the different causes of traffic accident which includes number of people and vehicles, type of road, and corresponding environment. Data mining tools are used to predict the probability of accidents on specific roads like State highways (SHs) or Ordinary district roads (ODRs) by estimating the severity of accidents [14]. There are no such works which have been proposed for performance analysis of the states from the recorded road accident statistics. Our proposed approach is an attempt to examine past accidental data to ensure the progress of different Indian states based on the road safety measures.

3 Our Approach for Performance Analysis

In this section, we propose a novel line encoding method to compare progress of the entities \((e_1,e_2, \dots , e_n \in E)\) from time-varying data. Progress means the performance variation of different entities with respect to time. An index like progressive score will serve as the comparison base among the entities and signifies how much progress each entity has made throughout a given time range. For a entity \((e_i\in E)\), a set of random variables \(V\{v_1,v_2,v_3,\dots ,v_n\}\) at different time instant \(T\{t_1,t_2,t_3,\dots ,t_n\}\) is given. X-axis denotes time and Y-axis represents the real-valued attributes of different entities. From previous time instant \((t_n)\) to successive time instant \((t_{n+1})\), value of a entity \(e_i\) can increase, decrease, or remain moreover same. The change is denoted as \((\varDelta v)\). \(\varDelta v=v_{n+1}-v_n\). This change can be interpreted as a ray \(I_\alpha \) where the slope of the ray is \(\alpha \). The line can be oriented at any direction out of n positions. Here n is in the range from positive Y-axis to negative Y-axis in clock-wise direction. The aggregate value of all the \(I_\alpha \), i.e., \((I_{\alpha 1}+I_{\alpha 2}+I_{\alpha 3}+\dots + I_{\alpha {n-1}})\) generate the summative score for a single entity \(e_i\).

Encoded values for each \(e_i \in E\) is calculated using the following formula:

$$\begin{aligned} \text {Progressive Score } (e_i) =\sum _{1}^{n-1} I_{\alpha } \end{aligned}$$
(1)
$$\begin{aligned} \begin{aligned} I_\alpha&= {\left\{ \begin{array}{ll} &{} 0 \text { if } v_n \simeq v_{n+1} \\ &{} (\frac{\varDelta v}{|v_{max}|}) \text { otherwise} \end{array}\right. } \end{aligned} \end{aligned}$$
(2)

where \(|v_{max}|=max\{{v_1,v_2,v_3,\dots ,v_n}\}\).

4 Illustration with Example

Figure 1, illustrates the performance of four entities denoted as \(L_1, L_2, L_3\) and \(L_4\). Let, up to the time instant \(t_1\), all of there progressive score is 0. At time instant \(t_2\), progress of each entity is calculated. \(I_{L_1}=\frac{30-0}{30}=1\), \(I_{L_2}=\frac{10-0}{30}=0.33\), \(I_{L_3}=\frac{-20-0}{30}=-0.67\) and \(I_{L_4}=\frac{-30-0}{30}=-1\). In our experiment, the statistical performance data in lap of a year is assumed as a time-varying set of values. Each state are defined as different entities. Progressive score is translated differently for individual cases. For road accident rate and literacy, the states with low \((L_4)\) and high \((L_1)\) progressive score respectively are denoted as the best performer.

Fig. 1
figure 1

Illustration of progressive score for different entities

Table 1 Example dataset of five variables

Let us take a set of 5 variables as illustrated in Table 1. Each variable is having a set of values at different time instant. The difference of values between two consecutive time instant is shown Table 2. Now using the Eq. 1 we get the results in Table 3. Here, we get V1 and V2 as the lowest and highest valued entities respectively. To prevent the accident, V1 performs better than other entities.

5 Result and Analysis

In this section, we analyze the actual data based on the proposed method. We also validate the result over the benchmark dataset.

Table 2 Difference between two consecutive time instant
Table 3 Progressive score calculation

5.1 Data Collection

We have collected different statistical data maintained by the Government of India to compare the performance of different states and union territories (UT). The data includes road accident data from 2001 to 2014, literacy rate from 1951 to 2011, total population data from 1951 to 2011, total registered motor vehicles from 2001 to 2012 which are gathered from NCRB (National Crime Records Bureau) [15, 16], Ministry of Home Affairs Govt. of India [17], Census of India [18], Ministry of Highways and Road Transports, Govt. of India [2] respectively to assess the performance of our proposed approach.

5.2 Result over Different Attributes

Figure 2 depicts the results obtained from road accident data based on our proposed method. It shows West Bengal is the most progressive state in preventing road accident throughout 2001 to 2014 whereas Kerala performed the worst. Here, steep slope signifies the higher increase in accident which is costly for progress whereas gradual angle signifies lesser increase in number of accidents. If the angle is negative then it signifies the reduction in accident which is economic whereas a horizontal straight line signifies no change between two successive points and thus, non-increasing. Tamil Nadu, Rajasthan, Uttar Pradesh have more gradual angles than steeper angles turning out to be average progress.

Fig. 2
figure 2

Progress of the states based on road accident data

Table 4 Classification of the States and Union Territories
Fig. 3
figure 3

Progress of the states based on literacy

Classification of the states and Union Territories is shown in Table 4 based on their performance in alphabetical order. Most of the north-east states are classified as straggler which is a real concern. The geographical condition and environmental hazards might be the reason for their unsatisfactory performance. Some newly born states like Chhattisgarh, Uttarakhand are grouped as the best performer which is very much encouraging.

Next, we examine the literacy progress of different states. Figure 3 depicts Delhi, Kerala and Andaman & Nicobar Islands have done the best progress in literacy. Here, steeper angle signifies an increase in literacy which turns out to be good progress, whereas gradual angle turns out to be bad progress and horizontal straight line signifies non-increasing literacy rate. Mizoram, Pondicherry, and Tamil Nadu having only a few numbers of steeper angle but no negative angle denote they performed well in literacy. Himachal Pradesh has the most number of horizontal straight lines thus making it average performer.

Fig. 4
figure 4

Progress of the states based on preventing population

Population control is another important attribute of progress for the states as a whole. Figure 4 depicts Karnataka and Gujarat have done the best progress in preventing increasing population. Here, steeper angle signifies an increase in population which turns out to be bad progress, whereas gradual angles turn out to be good progress and horizontal straight lines signify retaining the same population as of previous year. Most of the states except Karnataka and Gujarat have gradual angles and thus making them less progressive in controlling population.

Fig. 5
figure 5

Progress of the states based on preventing number of motor vehicles

We analyze the number of motor vehicles registered on the roads in different states. Figure 5 depicts Gujarat, Madhya Pradesh, Rajasthan, and Punjab have done the best progress in controlling the number of motor vehicles. Only best performer states have all the angles as gradual thus making it more progressive than others whereas Delhi and Karnataka have made bad progress which signifies them as the worse performer.

5.3 Correlation Analysis

We have emphasized on the prevention of road traffic accident which is our primary research concern and consider it as the dominant attribute behind progress of each state. Other attributes like literacy rate, population, and number of transport are also correlated with the accident. Correlation analysis shows how strongly pairs of variables are related. Two set of values are \(X=(x_1,x_2, \dots )\) and \(Y=(y_1,y_2, \dots )\) are correlated using Pearson correlation coefficient analysis (PCCA). The PCCA constant [19] r is defined as

$$\begin{aligned} r(x,y)=\frac{cov(x,y)}{S_x.S_y} \end{aligned}$$
(3)

\(S_x\) and \(S_y\) are the standard deviation of X and Y respectively. cov(xy) is the covariation within x and y.

$$\begin{aligned} cov(x,y)=\frac{1}{n-1} \sum (x_i-\overline{x})(y_i-\overline{y}) \end{aligned}$$
(4)

here n is the number of paired data. \(\overline{x}, \overline{y}\) is the mean and \(x_i\) and \(y_i\) is the ith instance value of set X and Y respectively. Pearson correlation coefficient between two variables always lie in between \(+\)1 and −1 where they denote observations have identical rank and dissimilar rank between themselves respectively. We analyze different attributes with road traffic accident to examine their relationship. Result has shown road accident is directly proportional with population and number transport increment with \(r = 0.92\) and 0.84 respectively and inverse proportional with literacy rate with \(r = -0.96\).

5.4 Validation of the Proposed Approach

We have validated our result on Adenoma attributes \((A1, A2, \dots , A8)\) as shown in sample dataset [20], to find its strength of association. Table 5 shows PCCA similarity measure of Adenoma attributes based on our approach. Strength of association among different attributes are shown in Table 6. According to our experiment, primary attributes are A3 and A4 which is moreover similar to the biological conclusion. Therefore, we infer our method works well in finding primary attributes and their strength of association.

Table 5 PCCA similarity measure of Adenoma attributes
Table 6 Strength of association

6 Conclusions

Road accident does not happen only due to the improper infrastructure of the roads, rather it is influenced by other factors as well like literacy rate, number of vehicles, the population of individual states, etc. Proper road safety management results in overall development of the nation. Road safety has been made one of the most important dimensions out of several development goals set for a country like India. In this paper, we develop a line encoding method to analyze the progress of different Indian states in traffic awareness. Our result concludes that traffic control measures are not the only solution. The states have to control population and number of vehicles and make their resident more literate to control road traffic accidents. For last few years, the focus of the Indian government is to develop a holistic traffic system in a hope that it will accelerate the growth of the country. In future, we may consider the geographical position of the states, weather condition and influence of different highway network to reveal other factors for road traffic accidents.