Advertisement

Journal of Modern Transportation

, Volume 27, Issue 4, pp 235–249 | Cite as

An application of Bayesian multilevel model to evaluate variations in stochastic and dynamic transition of traffic conditions

  • Emmanuel KidandoEmail author
  • Ren Moses
  • Thobias Sando
  • Eren Erman Ozguven
Open Access
Article
  • 648 Downloads

Abstract

This study seeks to investigate the variations associated with lane lateral locations and days of the week in the stochastic and dynamic transition of traffic regimes (DTTR). In the proposed analysis, hierarchical regression models fitted using Bayesian frameworks were used to calibrate the transition probabilities that describe the DTTR. Datasets of two sites on a freeway facility located in Jacksonville, Florida, were selected for the analysis. The traffic speed thresholds to define traffic regimes were estimated using the Gaussian mixture model (GMM). The GMM revealed that two and three regimes were adequate mixture components for estimating the traffic speed distributions for Site 1 and 2 datasets, respectively. The results of hierarchical regression models show that there is considerable evidence that there are heterogeneity characteristics in the DTTR associated with lateral lane locations. In particular, the hierarchical regressions reveal that the breakdown process is more affected by the variations compared to other evaluated transition processes with the estimated intra-class correlation (ICC) of about 73%. The transition from congestion on-set/dissolution (COD) to the congested regime is estimated with the highest ICC of 49.4% in the three-regime model, and the lowest ICC of 1% was observed on the transition from the congested to COD regime. On the other hand, different days of the week are not found to contribute to the variations (the highest ICC was 1.44%) on the DTTR. These findings can be used in developing effective congestion countermeasures, particularly in the application of intelligent transportation systems, such as dynamic lane-management strategies.

Keywords

Dynamic transition of traffic regimes Hierarchical model Bayesian frameworks Lane lateral locations Days of the week Disparity effect 

1 Introduction

Establishing models that estimate the stochastic and dynamic transition of traffic regimes (DTTR) is important for predicting future traffic conditions and developing timely effective countermeasures to address congestion. For example, when two major traffic regimes—free-flow and congested regimes—are analyzed, the DTTR involves four transition phenomena. These include evolving from the free-flow to congested regime (breakdown), staying in the congested regime, congested to the free-flow regime (recovery), and staying in the free-flow regime in the next observation period. Since time is a major factor in their occurrences, the four transition processes can be referred to as the traffic regimes’ dynamic transition.

The DTTR is complex in nature, which is influenced by several factors, such as driver behavior, demand, vehicle mix, and weather conditions. Furthermore, the DTTR can vary greatly by day of the week and lateral lane locations on the same highway. Understanding the impact of these factors is useful for implementing advanced traffic management strategies such as variable speed limit, variable message signs, congestion pricing, and ramp-metering to improve the efficiency of traffic operation [1, 2].

Among the DTTR phenomena, the breakdown process is well-studied in the literature and its theory has recently been introduced in the roadway capacity estimation [3, 4, 5, 6, 7, 8, 9, 10, 11, 12]. One major limitation of many previous investigations on the breakdown phenomenon is the fact that they ignore the operational differences due to lateral lane locations on the freeways. In the analysis, the multi-lane facility’s traffic data are usually aggregated and implicitly treated as one unit [1, 13]. The resulting model is also called a complete-pooled model [14, 15], which indicates that the operational characteristics are averaged across lanes. In practice, however, the operational characteristics of freeway segments may vary significantly across lanes [1, 13, 16], which is sometimes influenced by the operational policies. For instance, in urban areas, some states in the USA restrict heavy vehicles to use lanes near the shoulder. Also, some states discourage drivers using lanes near the median unless passing slow moving vehicles. Moreover, the operational characteristics of the lanes near the shoulder can be significantly influenced by weaving (merging to the freeway and diverging to exit a freeway) than lanes near the median [13, 17]. These introduce variations in the operating characteristics of a highway [18, 19]. Developing a model that does not take into account these characteristics and constrains the effect of influencing factors on the breakdown process to be the same across all lanes may lead to incorrect conclusions.

Recognizing the operational variations across different lanes and thus the breakdown process on the freeway, some empirical studies evaluated individual lanes separately. One study compared the complete-pooled and the lane-based approach to estimate the breakdown phenomenon on the diverging sections [1]. The study shows that using the lane-based approach significantly improves the accuracy of the extracted breakdown flow rate, while the aggregated approach underestimates the breakdown flow rate. Another study evaluated individual lane breakdown behavior on the merging freeway sections [16]. It also concludes that there is a significant difference in breakdown phenomenon among lanes.

Separating data and developing a model for each group are also referred to as the no-pooled model [14]. One outstanding drawback of using this model is that the operational characteristics of lanes are assumed to be independent, which as well implicitly suggests that data are coming from completely different sources or different portions of data. Such a model assumes that the operational characteristics of one lane do not affect other lanes. However, it may not be the case in traffic operations. The breakdown usually starts with one lane, generally on a lane near shoulder, and then other lanes follow [20]. Consequently, dependence on operational characteristics as well as some similarities across different lanes exist. Instead of conducting a separate analysis for each lane, some studies have utilized the hierarchical model (random effect) to estimate the breakdown phenomenon [13]. This type of model is also referred to as a partial-pooled model. This model provides a trade-off between the complete-pooled and no-pooled model properties by accounting for both the between-group and within-group variations [15, 21]. The hierarchical model also recognizes the group similarities and integrates such information in the parameter estimates [14, 21]. Using the hierarchical Weibull model, the study in [13] indicates that there is a significant variation in operational characteristics across different lanes on the freeway. Further, the study suggests that aggregating data could potentially ignore the possibility of one lane being congested, while the rest of the lanes are not congested on the same freeway segment (partial breakdown or semi-congested state).

In summary, despite the growing literature in evaluating the probabilistic characteristics of the breakdown process, quantifying the disparity effects on the other transition phenomena that describe the DTTR is not studied in the literature. As a result, this study attempts to fill the research gap by developing hierarchical regression models to calibrate the transition probabilities that describe the DTTR and quantify the associated variations due to different lateral lane locations and days of the week. The parameters’ posterior distributions of the proposed models are all fitted via the Bayesian framework to account for model and parameter uncertainties. Moreover, the transition phenomena that define the DTTR are identified on the basis of the number of traffic regimes, which are estimated using the Gaussian mixture model (GMM). This study uses one-year traffic data collected from a freeway facility located in Jacksonville, Florida. To the best of the authors’ knowledge, the approach herein has not been presented in the existing literature.

2 Study sites and data description

Data for the analysis were acquired from the Regional Integrated Transportation Information System (RITIS) database. For the purposes of this study, two detectors (Sites 1 and 2) for the southbound traffic shown in Fig. 1a located on I-295 in Jacksonville, Florida, were selected. At these sites, the posted speed limit is 65 miles per hour (mph). The number of lanes at Site 1 is three, while Site 2 is four. All lanes are standard 12 feet wide. The two sites consist of general purpose lanes with no managed lanes. Both sites are located just upstream of off-ramps that are prone to being in the congested state especially during peak hours. Traffic variables gathered for modeling were traffic volume and speed aggregated at a 15-min interval. These data were collected from March 1, 2015, to March 31, 2016, excluding weekends and public holidays.
Fig. 1

Detector locations on Google map (a) and 24-h profiles of the traffic flow parameters at the two sites (b)

Figure 1b shows the 24-h time series of speed variable at Sites 1 and 2 for all data (one-year data) used in modeling, respectively. Evaluating these figures reveals that both sites experience congestion only in the morning peak period. As seen in the figures, the peak period is from 6 a.m. to 9 a.m. Further assessing the traffic speed variable in Fig. 1, one can say that Site 1 has a relatively lower speed than Site 2. The higher data density in the time series scatter plot for Site 1 is between 59 and 68 mph, while for Site 2 is between 61 and 81 mph in the free-flow state.

In order to obtain enough data of the breakdown and other transition events for modeling, only the morning peak period data were evaluated in the current study. The approach of grouping data into intervals, particularly to obtain peak period and then developing a statistical model, is consistent with the previous studies [22, 23, 24, 25]. Review of traffic data during the selected peak period indicates that there were more than 7800 observations on each lane used to develop the dynamic transition model for Site 2. On the other hand, data for Site 1, less than 2400 observations, were available to the authors for the analysis for the period from March 1, 2015, to March 31, 2016. The speculative reason for that could be a detector malfunction. The descriptive statistics of the traffic data by lane for both sites are shown in Table 1.
Table 1

Descriptive statistics of flow parameters during the peak period

Variable

Metric

Site 1

Site 2

Lane near median

Middle lane

Lane near shoulder

Lane near median

Inner-left lane

Inner-right lane

Lane near shoulder

Speed (mph)

Mean

53.3

54.0

61.7

62.6

60.8

60.6

64.8

Median

60.5

60.3

66.9

71.8

69.3

67.3

69.5

SD

13.7

12.3

12.1

22.4

19.9

17.9

18.9

Minimum

18.5

20.5

19.1

5.5

5.2

3.0

11.1

Maximum

71.1

71.4

84.6

91.1

87.5

86.9

98.4

Flow (veh/h/lane)

Mean

1606.2

1637.9

1156.1

1359.7

1296.8

1110.6

791.2

Median

1644

1648

1144

1364.0

1304.0

1132.0

856.0

SD

339.2

219.1

308.1

372.3

250.3

306.1

482.4

Minimum

492

672

304

40.0

180.0

12.0

12.0

Maximum

2528

2216

1856

2420.0

2152.0

1852.0

1828.0

Number of observations

2297

2300

2272

8071

8079

8082

7867

SD represents standard deviation

3 Speed thresholds for clustering traffic states

To identify the traffic regimes using the speed variable, the speed distribution of each lane was examined. It was found that the speed distributions at both sites have more than one subpopulation. The subpopulations of the speed distribution were clustered into homogeneous components using the finite GMM. The GMM model provides a highly flexible framework for fitting various distribution shapes including data with heterogeneous characteristics like traffic speed variable [25, 26, 27, 28]. The GMM model fitting the speed data \(y\) can be represented as follows:
$$\begin{aligned} & f\left( y \right) = \mathop \sum \limits_{i = 1}^{n} w_{i} N_{i} \left( {y|\mu_{i} ,\sigma_{i}^{2} } \right), \\ & w_{i} = {\text{Dirichlet}}\left( {1, \ldots ,1} \right), \\ & \mu_{1} , \ldots ,\mu_{n} \,\sim\,N\left( {0,100^{2} } \right), \\ & \sigma_{1} , \ldots ,\sigma_{n} \,\sim\,{\text{HalfCauchy}}\left( {0, 10} \right), \\ \end{aligned}$$
(1)
where \(N_{i} \left( {y|\mu_{i} ,\sigma_{i}^{2} } \right)\) is the Gaussian distribution of component \(i\), \(\mu_{i}\) is the mean parameter of component \(i\), \(\sigma_{i} \;{\text{is }}\) the standard deviation of component \(i,\)n is the total number of the Gaussian distributions in the mixture model, and \(w_{i}\) is the mixing probability of component \(i\),

Two GMM models were developed in the PyMC3 package, Python programming language, to detect the speed thresholds for clustering traffic conditions for Site 1 and Site 2 dataset. The GMM model parameters were estimated using the Markov chain Monte Carlo (MCMC) simulation through the No-U-Turns (NUTS) step. As indicated in Eq. 1, the non-informative prior distributions were used in the model. The mixing probabilities were assumed to follow the Dirichlet distribution similar to [27, 28] studies. For the mean parameters, the prior distribution was assigned to follow the normal distribution with zero mean and standard deviation of 100, \(N\left( {0, 100^{2} } \right).\) Also, the standard deviation parameters in the model were assumed to follow the half-Cauchy distribution, \({\text{HalfCauchy}}\left( {0, 10} \right).\) In the analysis, a total of 10,000 iterations were sampled in each model, whereby the initial 5000 iterations were discarded as warm-up samples, while the last 5000 iterations were used for inference. The convergences were monitored using the Gelman–Rubin statistic and trace plots.

To assign the appropriate number of mixture components, the widely available information criterion was used in the analysis [29]. Findings from the analysis indicate that two mixture components for Site 1 dataset were found to be sufficient in approximating the mixture components for all lanes. As presented in Fig. 2a, one can conclude that the two components GMM predict the field data distributions with a reasonable accuracy. These mixture components can be referred to as congested and free-flow regimes. Using the GMM estimates (mean and standard deviation), the speed thresholds were calculated, i.e., the speed values that minimize the classification error of data between the estimated components. This approach has been used before to calculate the speed thresholds that group speed data into different traffic regimes [25, 28, 30]. The results of the analysis reveal that the lane near shoulder has the highest speed threshold (63.1 mph) compared to middle lane (56.1 mph) and lane near the median (59 mph). Visual inspection of the speed distributions in Fig. 2a suggests that the shoulder lane has comparatively higher speeds than the middle and median lanes. The calculated speed thresholds presented in Fig. 2a were further used for modeling the dynamic transition of traffic conditions.
Fig. 2

Speed thresholds for clustering traffic regimes. a Site 1 dataset. b Site 2 dataset

For Site 2 dataset, three components were found to best estimate the data distributions for each lane corresponding to free-flow, congestion on-set/dissolution (COD) or transitional flow condition, and congested regimes. As seen in Fig. 2b, the expected posterior distributions approximate well the field data distributions. As opposed to Site 1, the modeling results suggest that the lane near the median has the highest speed threshold (56.2 mph) followed by the inner-left lane (55.7 mph) and then the inner-right lane (55 mph), and the lane near shoulder had the lowest speed (51 mph) for the COD and congested regimes. A similar pattern was seen on the thresholds that separate COD and free-flow regime. The estimated trend for Site 2 dataset mirrors what was revealed in one of the previous studies [13].

4 Modeling the dynamic transition of the traffic regimes

To analyze the dynamic transition of the estimated traffic regimes by the GMM, two Markov chain (MC) models were developed. The first model was the two-regime MC regression for Site 1 and the second model was the three-regime MC regression for Site 2 dataset. The discussions of the two MC regressions are presented in the following subsections.

4.1 Two-regime MC model

Suppose that the traffic states are observed in a sequence of the finite regimes at a discrete time interval \(t\) (\(t = 15\,{ \hbox{min} })\), the first-order MC model to probabilistically describe the transition of regimes is presented in Eq. 2. Note that the transition probabilities of this model are fitted with the explanatory variable \(x_{t}\) (flow rate at current time \(t\)) to account for variations or heterogeneity associated with the time-varying effect [24, 25]. The resulting transition probabilities are non-stationary, which varies as time progresses depending on the current observed flow rate.
$$p_{ij} \left( {x_{t} } \right) = {\text{Prob}}(S_{t + 1} = S_{j}^{{\prime }} |S_{t} = S_{i} , X_{t} = x_{t} ),$$
(2)
where \(p_{ij}\) is the probability of evolving from traffic regime i to j, \({\text{Prob}}(\,)\) is the probability function, \(S_{t}\) is the current observed traffic regime, \(S_{t + 1}\) is the next traffic regime, and \(S_{j}^{'}\) is the future estimated traffic regime.
Basically, the two-regime MC regression is defined by the four transition processes, which can be summarized in a matrix format as follows:
$$\varvec{P} = \left[ {\begin{array}{*{20}c} {p_{\text{ff}} } & {p_{\text{fc}} } \\ {p_{\text{cf}} } & {p_{\text{cc}} } \\ \end{array} } \right],$$
(3)
where the sum of each row equals to 1, \(p_{\text{ff}}\) is the probability of staying in the free-flow regime, \(p_{\text{fc}}\) is the probability to evolve from free-flow to congested regime (breakdown probability), \(p_{\text{cf}}\) is the probability to evolve from congested to free-flow regime (recovery probability), and \(p_{\text{cc}}\) is the probability of staying in the congested regime.
The estimated traffic regimes by the GMM are categorical in nature such as the free-flow and congested regime. There are two commonly used regressions for evaluating the influencing factors for the categorical response variables: probit and logistic regression. We selected the logistic regression model in the analysis because its model results can be easily interpreted using the odds ratio. To investigate the disparity effects associated with different lateral lane locations and days of the week (i.e., Monday through Friday) in the DTTR, the binary hierarchical logistic regressions were applied to estimate the transition probabilities presented in Eq. 3. In the analysis, traffic data are assumed to be nested to different lanes and days of the week. In this case, data within the same group are hypothesized to be correlated [14, 21, 31]. Suppose that a freeway has L lanes and m vehicles observed in each lane in each day (m = 1,.., M, and M is the total number of vehicles on the freeway). The transition process of the traffic regime \(R_{ij}\) can be predicted as follows:
$$\begin{aligned} & R_{ij} \sim{\text{Bernoulli}}\left({p_{ij} \left({x_{t}} \right)} \right), \\ & p_{ij} \left({x_{t}} \right) = \frac{1}{{1 + \exp \left({- \eta_{m}} \right)}}, \\ & \eta_{m} = \alpha_{0l} + \alpha_{1} x_{mt} + \epsilon_{k}, \\ \end{aligned}$$
(4)
where \(\alpha_{0l}\) is the random intercept associated with the lane lateral location, with the lane ordinal number \(l = 1, \ldots , L; \alpha_{1}\) represents the flow rate parameter; and \(\epsilon_{k}\) is the random effect associated with the day of the week, k = 1,…,5.

4.2 Three-regime MC model

In calibrating the transition probabilities for Site 2 dataset, the dynamic transition was assumed to occur in a sequential manner: free-flow to congestion on-set, then to the congested regime and congested regime to congestion dissolution, then to free-flow regime. The congestion dissolution and congestion on-set are assumed to have similar characteristics and thus are considered as one regime in the current study. Based on the three-phase theory by Kerner et al. [32], which indicates that there is no direct transition between congested and free-flow regimes, the transition from the free-flow to congestion regime and congested regime to free-flow is ignored in the current study. As a result, the transition probabilities for these processes were assigned zero in the matrix (Eq. 5).
$$\varvec{\pi}= \left[ {\begin{array}{*{20}c} {\pi_{\text{ff}} } & {\pi_{\text{fo}} } & 0 \\ {\pi_{\text{of}} } & {\pi_{\text{oo}} } & {\pi_{\text{oc}} } \\ 0 & {\pi_{\text{co}} } & {\pi_{\text{cc}} } \\ \end{array} } \right],$$
(5)
where the sum of each row equals to 1, \(\pi_{\text{ff}}\) is the probability of staying in the free-flow regime, \(\pi_{\text{fo}}\) is the probability to evolve from free-flow to COD regime, \(\pi_{\text{of}}\) is the probability to evolve from COD to free-flow regime, \(\pi_{\text{oo}}\) is the probability to stay in the COD regime, \(\pi_{\text{oc}}\) is the probability to evolve from COD to congested regime, \(\pi_{\text{co}}\) is the probability to evolve from congested to COD regime, and \(\pi_{\text{cc}}\) is the probability to stay in the congested regime.
As indicated in Eq. 5, the first and third rows have two nonzero elements, which indicate that there are two dependent transition processes. These transitions were calibrated using the binary logistic random-effect regression similar to those fitted for Eq. 3. In contrast, the transition processes in the second row, which include COD to free-flow, stay in the COD regime, and COD to the congested regime was calibrated using the multinomial logistic random-effect regression (Eq. 6).
$$\begin{aligned} & R_{ij } \sim {\text{Multinomial}}\left( {\pi_{ij} \left( {x_{t} } \right)} \right), \\ & \pi_{ij} \left( {x_{t} } \right) = {\text{Prob}}\left( {R_{ij } = v} \right) = \frac{{{ \exp }\left( {\lambda_{mv} } \right)}}{{\mathop \sum \nolimits_{v = 1}^{V} { \exp }\left( {\lambda_{mv} } \right)}}, \\ & \lambda_{mv} = \beta_{0lv} + \beta_{1v} x_{mvt} + \varepsilon_{kv} , \\ \end{aligned}$$
(6)
where \(\pi_{ij}\) is the probability of evolving from regime i to j, \(\beta_{0lv}\) is the random intercept for the transition process \(v\), \(\beta_{1v}\) represents the flow rate parameter for the transition process \(v\), and \(\varepsilon_{kv}\) is the random-effect term for the transition process \(v\).

4.3 Parameter estimation for the two- and three-regime MC regressions

The NUTS step in the MCMC simulation implemented in the PyMC3 package was also used to calibrate the posterior distributions of the model parameters in Eqs. 4 and 6. The Bayesian analysis requires prior distributions of the model parameters to be specified before the simulations. Figure 3 shows the prior distributions selected for use in the multilevel logistic and multinomial logistic regression, respectively. As shown in both Fig. 3, the prior distributions for the random intercept in both the two- and three-regime MC models were assigned to follow the normal distribution with mean \(\mu_{1}\) and the standard deviation \(\sigma_{1}\)—that is, \(\alpha_{0l} \;{\text{and}}\;\beta_{0lv} \,\sim\, N\left( {\mu_{1} ,\sigma_{1}^{2} } \right)\). To borrow strength and facilitate parameters smoothening from each group, the hyper-parameters were shared by all intercept coefficients [21, 31]. The advantage of assigning this type of the hyper-parameter is the fact that the resulting model gains the advantages of a complete-pooled model and a no-pooled model [31]. The hyper-parameter priors (hyper-priors) were also assigned non-informative prior distributions. For \(\mu_{1} ,\) the normal distribution was specified in terms of mean zero and the standard deviation of 100, \(\mu_{1} \,\sim\,N\left( {0, 100^{2} } \right)\) while the \(\sigma_{1}\) hyper-parameter, the half-Cauchy distribution, \(\sigma_{1} \sim {\text{halfCauchy}} \left( {0, 10} \right)\) was used. Note that the hyper-parameter \(\sigma_{1}\) was used to quantify the disparity effect due to lateral lane location. For the flow model coefficients, the prior distributions were assigned the normal distribution with the mean of zero and the standard deviation of 100, \(\alpha_{1} \;{\text{and}}\;\beta_{1v} \sim N\left( {0, 100^{2} } \right).\) Furthermore, the prior distribution for the random-effect parameter \({\epsilon}_{k}\) and \({\varepsilon}_{kv}\) associated with the days of the week was specified to follow the normal distribution with mean \(\mu_{2}\) and the standard deviation of \(\sigma_{2}\), whereby \(\mu_{2} \,\sim\,N\left( {0, 100^{2} } \right)\) and \(\sigma_{2} \,\sim\,{\text{halfCauchy}}\left( {0, 10} \right).\) Also, parameter \(\sigma_{2}\) was used to calculate variations associated with days of the week.
Fig. 3

Hierarchical structure of the multilevel regressions. a Logistic regression. b Multinomial logistic regression

5 Results

Similar to the GMM parameter estimation, 10,000 iterations were found adequate in estimating the posterior distributions of the regression’s parameters (binary and multinomial logit). Also, the initial 5000 iterations were discarded and the last 5000 iterations were used for inference. The results of the estimated regressions are presented in Tables 2 and 3. In these tables, summaries of the posterior distributions—mean, standard deviation, and the 95% posterior credible intervals (CIs) of each parameter—are reported. The next subsections discuss the results of the analysis, starting with the results discussion of Site 1 followed by Site 2 and concluding the section by discussing the disparity effects associated with factors such as lane lateral locations and days of the week.
Table 2

Parameters posterior distributions summaries for Site 1 models

Binary logistic hierarchical regression

Coefficients

Breakdown process \(\left( {P_{\text{fc}} } \right)\)

Stay in the congested regime \(\left( {P_{\text{cc}} } \right)\)

Posterior mean

Posterior SD

95% credible intervals

Posterior mean

Posterior SD

95% credible intervals

Intercept

− 65.10

3.29

− 71.10

− 58.30

− 11.50

2.90

− 17.0

− 5.80

Log of traffic flow

8.68

0.36

8.00

9.35

1.80

0.40

1.06

2.50

Dispersion \(\sigma_{1}\)

3.02

1.74

0.75

6.79

0.87

1.00

0.07

2.80

Dispersion \(\sigma_{2}\)

0.25

0.19

0.01

0.59

0.17

0.20

0.00

0.50

Stay in the free-flow regime \(\left( {P_{\text{ff}} } \right)\) and recovery transition process \(\left( {P_{\text{cf}} } \right)\) were treated as the base category in the models

Table 3

Parameters posterior distributions summaries for Site 2 models

Binary logistic hierarchical regression

Free-flow to COD \(\left( {\pi_{\text{fo}} } \right)\)

Congested regime to COD \(\left( {\pi_{\text{co}} } \right)\)

Coefficients

Posterior mean

Posterior SD

95% credible intervals

Posterior mean

Posterior SD

95% credible intervals

Intercept

− 8.73

0.71

− 10.03

− 7.36

− 3.67

0.47

− 4.58

− 2.81

Log of traffic flow

1.02

0.08

0.86

1.18

0.28

0.06

0.17

0.4

Dispersion \(\sigma_{1}\)

0.62

0.55

0.13

1.54

0.18

0.28

0.03

0.5

Dispersion \(\sigma_{2}\)

0.21

0.13

0.05

0.46

0.22

0.12

0.05

0.49

Multinomial logistic hierarchical regression

 

COD to free-flow \(\left( {\pi_{\text{of}} } \right)\)

 

COD to congested regime \(\left( {\pi_{\text{oc}} } \right)\)

Coefficients

Posterior mean

Posterior SD

95% credible intervals

Posterior mean

Posterior SD

95% credible intervals

Intercept

15.72

1.11

13.56

17.84

− 7.01

1.24

− 9.41

− 4.52

Log of traffic flow

− 2.39

0.13

− 2.64

− 2.12

0.82

0.09

0.65

0.99

Dispersion \(\sigma_{1}\)

0.9

0.72

0.19

2.23

1.8

1.13

0.51

4.05

Dispersion \(\sigma_{2}\)

0.07

0.09

0

0.2

0.17

0.12

0.01

0.39

Stay in the COD regime was treated as the base category in the multinomial logistic regression, while the stay in the free-flow and stay in the congested regime were treated as the base category in the binary logistic regressions

5.1 Results of regression models for site 1

Two regression models were fitted to calibrate the transition probabilities of the breakdown and the stay in the congested regime processes. As presented in Table 2, the logarithm of the flow rate coefficient has a positive sign, which potentially indicates that when the flow rate increases the probability of traffic to breakdown also increases. The estimate of this coefficient suggests that a 1% increase in the log-transformed flow rate increases the likelihood of breakdown by 8.68%. The CI of this estimate does not contain zero as one of the credible values, and thus it is statistically significant at 95% CIs.

Figure 4a displays the relationship between flow rate and breakdown probability using the posterior predictive lines. This figure particularly shows an “S”-shaped trend on the relationship between the two variables. Although the breakdowns were modeled as lifetime events by some previous studies [12, 33, 34], the estimated pattern in these studies is consistent with the pattern reported in Fig. 4a. Moreover, the boxplots presented in Fig. 4b were used to compare breakdown probability across lanes. Review of this figure shows that the breakdown probabilities on the lane close to shoulder at 1000 veh/h/lane are even higher than those estimated at 2000 veh/h/lane for the lane near the median and middle lane. Moreover, the likelihood of lane near shoulder to breakdown at 2000 veh/h/lane is nearly one, while the middle lane and the lane near shoulder lane have approximately 0.5 likelihood. This situation, a difference existing in the estimated likelihood at the same flow rate, can lead to a partial breakdown on a highway. Similar observations are reported by one of the previous studies [13], which suggests that lanes near the shoulder have lower capacity than lanes near the median.
Fig. 4

Breakdown probability and flow rate relationship for Site 1. a Breakdown probability versus flow rate. b Breakdown probability across lanes at different flow rates

For the stay in the congested regime, the flow rate coefficient estimate in Table 2 suggests that the likelihood of staying in the congested regime process increases by 1.8% when 1% of the log-transformed flow rate increases. As with the breakdown transition, the comparison of the estimated probability across lanes revealed that the likelihood of staying in the congested regime is higher on the lane near shoulder than on the middle lane and the lane near median at the same flow rate (Fig. 5b).
Fig. 5

Stay in the congested transition probability and flow rate relationship for Site 1. a Stay in the congested regime probability versus flow rate. b Stay in the congested regime probability across lanes at the different flow rate

It is noteworthy to know that the stay in the free-flow and the recovery transition processes (congestion to free-flow) are not presented because these were considered as the base category in the model. To clarify this, the stay in the free-flow and breakdown probabilities in the transition matrix presented in Eq. 3 sum up to 1. Since the logit link function was used in the hierarchical regression to fit the transition matrix, the breakdown estimates and the stay in the free-flow regime are the same but in opposite sign (negative vs. positive). Similarly, the estimate of the stay in the congested regime and the recovery transition processes are the same but with different signs.

5.2 Results of regression models for site 2

Due to Site 2 dataset having three regimes—free-flow, COD, and congested regimes—three regression models were fitted to calibrate the transition processes in Eq. 5. These include two binary and one multinomial logistic hierarchical regression. Table 3 gives the calibrated regression coefficients. The analysis of the free-flow to COD transition reveals that a 1% increase in the log-transformed flow rate increases the transition probability by 1.02%. For the transition from congested to COD regime process—queue discharging process—the results in Table 3 show the positive relationship: an increase in the flow rate on the highway increases the likelihood of discharging the queue. Specifically, the model estimate shows that a 1% increase in the log-transformed flow rate when the current state is congested regime increases the queue discharge probability by 0.28%. The posterior predictive trend in the relationship between the COD to congested transition probability and the flow rate is indicated in Fig. 6a. This figure shows that the predicted trend has high uncertainties because the whisk lines are spread from the expected predictive line. One reason that is attributed to the estimates to have high uncertainties is data variations. The comparison of the estimated transition probability in Fig. 6b shows that the queue discharge in the lane near shoulder has comparatively lower likelihood than in other lanes at the same flow rate.
Fig. 6

Transition probability and flow rate relationship for Site 2. a Congested to COD transition probability versus flow rate. b Congested to COD transition probability across lanes at different flow rates

The results of the multinomial logistic hierarchical regression in Table 3 were calibrated by considering the stay in the COD transition as the base category. The selection of this variable was done arbitrarily. One can select either the COD to free-flow or COD to congested regime transition as a base category, and the results of the analysis will yield the same interpretation. As indicated in Table 3, the COD to the free-flow regime transition has a negative sign with the traffic flow parameter. This suggests that increasing the traffic flow reduces the likelihood of the highway to evolve to free-flow state. The model coefficient particularly reveals that a 1% increase in the log-transformed flow rate reduces the probability of this transition process by 2.39%. The association between the flow rate and the COD to free-flow transition probability is illustrated in Fig. 7a. As demonstrated in this figure, the estimated trend is a decreasing “concave upward” shape. Paralleling the COD to free-flow transition probability across lanes, the shoulder lane indicates the highest probability for this transition, while the lane near the median, inner-right, and inner-left lanes has the nearly the same likelihood at the same flow rate (Fig. 7b).
Fig. 7

Transition probability and flow rate relationship for Site 2. a COD to free-flow transition probability versus flow rate. b COD to free-flow transition probabilities across lanes at different flow rates

Also presented in Table 3, the results for the COD to congested transition were significant at the 95% CI. The estimate of the logarithm of traffic flow is 0.82, which indicates that a 1% increase in the logarithm of flow rate would cause the likelihood of COD to congested transition to increase by 0.82% relative to staying in the COD regime.

5.3 Disparity effects caused by different lane lateral locations and days of the week

To quantify the disparity effects associated with lane lateral locations and different days of the week, the intra-class correlation coefficient (ICC) was calculated for each model. The ICC quantifies the proportion of variations that would not have been accounted in the model that ignores data clustering [35]. Alternatively, this value can be viewed as the measure of the correlation between observations within the same cluster. Because variances are non-negative in the model, the ICC normally ranges between 0 and 1. The disparity parameters presented in Tables 2 and 3 were used to calculate the ICC. The ICC analysis for the breakdown model (Site 1) shows that about 73% of the total variations are associated with the different lane lateral locations (Eq. 7). This value is relatively larger than the within-group variation: a variation due to standard logistic distribution. In this case, the breakdown events within the same lane are more similar than the breakdown events in different lanes.
$${\text{ICC}} = \left( {\frac{{\sigma_{1}^{2} }}{{\sigma_{1}^{2} + \sigma_{2}^{2} + \sigma_{\text{sl}}^{2} }}} \right) \times 100\% = \left( {\frac{{3.0^{2} }}{{0.25^{2} + 3.0^{2} + \frac{{\pi^{2} }}{3}}}} \right) \times 100\% = 73\% ,$$
(7)
where \(\sigma_{\text{sl}}^{2}\) represents within-group variance, which is \(\sigma_{sl}^{2} = \frac{{\pi^{2} }}{3} = 3.29\) for the standard logistic distribution [35].
On the other hand, different days of the week were found to have 0.5% contribution to the total variations. Furthermore, the ICC for the stay in the congested regime model is 18% for different lane lateral locations, while the factor—different days of the week—contributes only 0.7% to the total variations for Site 1 dataset.
$${\text{ICC}} = \left( {\frac{{\sigma_{2}^{2} }}{{\sigma_{1}^{2} + \sigma_{2}^{2} + \sigma_{\text{sl}}^{2} }}} \right) \times 100\% = \left( {\frac{{0.25^{2} }}{{0.25^{2} + 3.0^{2} + \frac{{\pi^{2} }}{3}}}} \right) \times 100\% = 0.5\% .$$
(8)

Similar analyses were conducted for Site 2, and the estimates indicate that the lateral lane location has the largest impact on the COD to congested transition process (ICC = 49.4%) followed by the COD to free-flow transition (ICC = 19.7%), the free-flow to COD transition (ICC = 10.5%), and the congested to COD transition (ICC = 1%). For different days of the week, the congested to COD transition has the highest variation (ICC = 1.44%) followed by the free-flow to COD transition (ICC = 1.2%), COD to congested transition (ICC = 0.5%), and COD to free-flow transition (ICC = 0.1%).

In summary, there is considerable evidence that lane lateral locations contribute a significant amount of variation to the DTTR than different days of the week (considering only weekdays). This observation is consistent across the two sites. Moreover, the highest disparity estimate associated with different days of the week is 1.44%. Based on this estimate, one may conclude that different days of the week are insignificantly causing variability in the DTTR. Even though the study in [36] investigated the difference in flow capacity due to different days of the week using the analysis of variance (ANOVA) approach, the same conclusions were made that there is no variation attributed to different days of the week on estimated capacity flow.

6 Discussion

This study has presented an empirical approach aimed at investigating disparity effects of the lateral lane locations and days of the week on the dynamic transition of traffic regimes (DTTR). In the analysis, the Markov chain theory and hierarchical regressions were integrated to describe the transition processes and the dependence of traffic regimes and capture the hierarchical structure of observations of the traffic data. The historical traffic flow parameters—speed and flow—collected for 1 year (2015–2016) from two sites on the freeway highway, were applied.

Using the GMM, the speed threshold of each lane that defines traffic conditions was identified in the analysis. Overall, the results of the hierarchical regressions in estimating the MC transition probabilities indicated that the log-transformed flow rate is the significant variable, at 95% posterior credible intervals, in predicting the likelihood of evolving from one traffic regime to the next. The lane near shoulder was estimated to have the highest likelihood of transitioning from one regime to the next compared to other lanes at a similar flow rate. Using the intra-class correlation coefficient (ICC) analysis, it was revealed that different lane lateral locations contribute a significant percentage to the total variations in the DTTR for Site 1 dataset. More specifically, the breakdown process was found to be more influenced by the variations than the rest evaluated transition processes (ICC = 73%). For Site 2 dataset, the largest variation due to lateral lane location was observed on the transition from the COD to the congested regime (ICC = 49.4%). Different days of the week, on the contrary, were found not to cause variations in the transition probabilities describing the DTTR. The highest estimate of the ICC among the fitted hierarchal models for both Site 1 and 2 was 1.44%.

The findings from this study can be possibly used to enhance the lane-distribution strategy in the application of the intelligent transportation systems, particularly in the dynamic lane-management to improve operations efficiency. Furthermore, results are anticipated to increase the awareness of the variation associated with different lateral lane locations and days of the week in traffic operations to both researchers and practitioners. This information is also useful to transportation agencies in developing other congestion countermeasures.

One limitation that could be further improved in this study is that the data that were used in modeling the DTTR from the detectors were not filtered to remove data that had overlapping bottlenecks between the exit and entrance ramps. It would be the future research task to consider this situation in the analysis. Also, more research using data with different site characteristics is required to validate the conclusion made in the current study. In addition, it is not clear if a similar conclusion will be made if different data resolution is used in modeling, such as 2 min, 5 min. In the future work, different data resolutions can be used in the model and compared with the current study results. Another future work would be the analysis of effects of the spatial heterogeneity, vehicle mix, weather, and driving characteristics on the DTTR and the number of traffic regimes in the GMM. Although the two sites evaluated in this study have different geometric characteristics and two regimes were identified on Site 1, while three regimes optimally describe the operating speed for Site 2, it is not yet clear if sites with similar geometric characteristics will yield a similar number of traffic regimes.

References

  1. 1.
    Ma D, Nakamura H, Asano M (2013) Lane-based breakdown identification method at diverge sections for modeling breakdown probability. Transp Res Board 2395:83–92CrossRefGoogle Scholar
  2. 2.
    Iqbal MS, Hadi M, Xiao Y (2017) Predicting arterial breakdown probability: a data mining approach. J Intell Transp Syst Technol Plan Oper 21(3):190–201CrossRefGoogle Scholar
  3. 3.
    Shojaat S, Geistefeldt J, Parr SA, Wilmot CG, Wolshon B (2016) Sustained Flow Index Stochastic measure of freeway performance. Transp Res Board 2554:158–165CrossRefGoogle Scholar
  4. 4.
    Hong Z, Mahmassani HS, Chen Y (2015) Empirical analysis of freeway flow breakdown and recovery: the effect of snow weather. In: Transportation research board annual meeting, Washington DCGoogle Scholar
  5. 5.
    Brilon W, Geistefeldt J, Regler M (2005) Reliability of freeway traffic flow: a stochastic concept of capacity. In: Transportation and traffic theory: flow, dynamics and human interaction, proceedings of the 16th international symposium on transport, College Park, MarylandGoogle Scholar
  6. 6.
    Modi V, Kondyli A, Washburn SS, McLeod DS (2014) Freeway capacity estimation method for planning applications. J Transp Eng 140(9):05014004CrossRefGoogle Scholar
  7. 7.
    Elefteriadou L, Lertworawanich P (2003) Defining, measuring and estimating freeway capacity. In: TRB annual meeting, Transportation Research Board, Washington, DCGoogle Scholar
  8. 8.
    Lorenz M, Elefteriadou L (2014) A probabilistic approach to defining freeway capacity and breakdown. In: Proceedings of the 4th international symposium on highway capacity, TRB-circular E-C018, Transportation Research Board, Washington, DC, 2000, pp 84–95Google Scholar
  9. 9.
    Persaud B, Yagar S, Brownlee R (1998) Exploration of the breakdown phenomenon in freeway traffic. Transp Res Rec 1643:64–69CrossRefGoogle Scholar
  10. 10.
    Matt L, Elefteriadou L (2001) Defining freeway capacity as function of breakdown probability. Transp Res Rec 1776(1):43–51CrossRefGoogle Scholar
  11. 11.
    Dong J, Mahmassani HS (2009) Flow breakdown and travel time reliability. Transp Res Board 2124–20:203–212CrossRefGoogle Scholar
  12. 12.
    Xu TD, Hao Y, Peng ZR, Sun LJ (2013) Modeling probabilistic traffic breakdown on congested freeway flow. Can J Civ Eng 40(10):999–1008CrossRefGoogle Scholar
  13. 13.
    Xie K, Ozbay K, Yang H (2014) The heterogeneity of capacity distributions among different freeway lanes. In: Symposium celebrating 50 years of traffic flow theory, Portland, OregonGoogle Scholar
  14. 14.
    Gelman Andrew, Hill Jennifer (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, New YorkGoogle Scholar
  15. 15.
    Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2014) Bayesian data analysis. Taylor & Francis Group, Boca RatonzbMATHGoogle Scholar
  16. 16.
    Dehman A (2013) Breakdown maturity phenomenon at Wisconsin freeway bottlenecks. Transp Res Board 2395:1–11CrossRefGoogle Scholar
  17. 17.
    Daganzo CF (2002) A behavioral theory of multi-lane traffic flow part ii: merges and the onset of congestion. Transp Res Part B 36(2):159–169MathSciNetCrossRefGoogle Scholar
  18. 18.
    Fan W, Jiang X, Erdogan S, Sun Y (2016) Modeling and evaluating FAIR highway performance and policy options. Transp Policy 48:156–168CrossRefGoogle Scholar
  19. 19.
    Daganzo CF (2002) A behavioral theory of multi-lane traffic flow. Part I: long homogeneous freeway sections. Transp Res B Methodol 36(2):131–158CrossRefGoogle Scholar
  20. 20.
    Shiomi Y (2016) Controlling lane traffic flow for managing uncertainty in traffic breakdown. In: Transportation Research Board, Washington DCGoogle Scholar
  21. 21.
    Kreft IGG, Leeuw JD (1998) Introducing multilevel modeling. Sage Publications, LondonCrossRefGoogle Scholar
  22. 22.
    Qi Y, Ishak S (2014) A Hidden Markov Model for short term prediction of traffic conditions on freeways. Transp Res Part C Emerg Technol 43(1):95–111CrossRefGoogle Scholar
  23. 23.
    Guo F, Li Q (2011) Multi-state travel time reliability models with skewed component distributions. Transp Res Board 2315:47–53CrossRefGoogle Scholar
  24. 24.
    Kidando E, Moses R, Sando T, Ozguven EE (2018) Evaluating recurring traffic congestion using change point regression and random variation markov structured model. Transp Res Board 2672(20):63–74CrossRefGoogle Scholar
  25. 25.
    Kidando E, Kitali A, Lyimo S, Sando T, Moses R, Kwigizile V, Chimba D (2018) Exploring the influence of rainfall on a Stochastic evolution of traffic conditions. In: Presented at the 97th annual meeting of the trasportation research board, Washington DCGoogle Scholar
  26. 26.
    Ko J, Guensler RL (2005) Characterization of congestion based on speed distribution: a statistical approach using gaussian mixture model. In: CD-ROM proceedings of the 84th annual meeting of the transportation research board, Washington, DCGoogle Scholar
  27. 27.
    Park B-J, Zhang Y, Lord D (2010) Bayesian mixture modeling approach to account for heterogeneity in speed data. Transp Res B Methodol 44(5):662–673CrossRefGoogle Scholar
  28. 28.
    Kidando E, Moses R, Ozguven EE, Sando T (2017) Evaluating traffic congestion using the traffic occupancy and speed distribution relationship: an application of Bayesian Dirichlet process mixtures of generalized linear model. J Transp Technol 7(3):318–335CrossRefGoogle Scholar
  29. 29.
    Watanabe S (2010) Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J Mach Learn Res 11:3571–3594MathSciNetzbMATHGoogle Scholar
  30. 30.
    Elhenawy M, Rakha HA (2016) Expected travel time and reliability prediction using mixture linear regression. In: Presented on 95th annual meeting transportation research board annual meeting, Washington, DCGoogle Scholar
  31. 31.
    Ntzoufras I (2009) Bayesian modeling using WinBUGS. Wiley, New JerseyCrossRefGoogle Scholar
  32. 32.
    Kerner BS, Klenov SL, Wolf DE (2002) Cellular automata approach to three-phase traffic theory. J Phys A: Math Gen 35(47):9971–10013MathSciNetCrossRefGoogle Scholar
  33. 33.
    Kondyli A, Elefteriadou L, Brilon W, Hall FL, Persaud B, Washburn S (2013) Development and evaluation of methods for constructing breakdown probability models. J Transp Eng 139(9):931–940CrossRefGoogle Scholar
  34. 34.
    Kim J, Mahmassani HS, Dong J (2010) Likelihood and duration of flow breakdown: modeling the effect of weather. Transp Res Rec 2188:19–28CrossRefGoogle Scholar
  35. 35.
    Szmaragd C, Clarke P, Steele F (2013) Subject specific and population average models for binary longitudinal data: a tutorial. Longitud Life Course Stud 4(2):147–165Google Scholar
  36. 36.
    Yeon J, Hernandez S, Elefteriadou L (2009) Differences in freeway capacity by day of the week, time of day, and segment type. J Transp Eng 135(7):416–426CrossRefGoogle Scholar

Copyright information

© The Author(s) 2019

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  • Emmanuel Kidando
    • 1
    Email author
  • Ren Moses
    • 2
  • Thobias Sando
    • 3
  • Eren Erman Ozguven
    • 2
  1. 1.Department of Environmental and Civil Engineering, School of EngineeringMercer UniversityMaconUSA
  2. 2.Department of Civil and Environmental EngineeringFAMU-FSU College of EngineeringTallahasseeUSA
  3. 3.School of EngineeringThe University of North FloridaJacksonvilleUSA

Personalised recommendations