Abstract
In an intelligent transportation system, accurate bus information is vital for passengers to schedule their departure time and make reasonable route choice. In this paper, an improved deep belief network (DBN) is proposed to predict the bus travel time. By using Gaussian–Bernoulli restricted Boltzmann machines to construct a DBN, we update the classical DBN to model continuous data. In addition, a back-propagation (BP) neural network is further applied to improve the performance. Based on the real traffic data collected in Shenyang, China, several experiments are conducted to validate the technique. Comparison with typical forecasting methods such as k-nearest neighbor algorithm (k-NN), artificial neural network (ANN), support vector machine (SVM) and random forests (RFs) shows that the proposed method is applicable to the prediction of bus travel time and works better than traditional methods.
Similar content being viewed by others
References
Petersen NC, Rodrigues F, Pereira FC (2019) Multi-output bus travel time prediction with convolutional LSTM neural network. Expert Syst Appl 120:426–435
Yu B, Lam WH, Tam ML (2011) Bus arrival time prediction at bus stop with multiple routes. Transp Res C-Emerg 19(6):1157–1170
Yu B, Wang HZ, Shan WX, Yao BZ (2018) Prediction of bus travel time using random forests based on near neighbors. Comput-Aided Civ Inf 33(4):333–350
Williams BM, Hoel LA (2003) Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: theoretical basis and empirical results. J Transp Eng 129(6):664–672
Jeong RH (2005) The prediction of bus arrival time using automatic vehicle location systems data. Ph.D. dissertation, Department of Civil Engineering, Texas A&M University, Texas, USA
Chung EH, Shalaby A (2007) Expected time of arrival model for school bus transit using real-time global positioning system-based automatic vehicle location data. Int J Intell Transp Syst Res 11(4):157–167
Yang M, Liu Y, You Z (2010) The reliability of travel time forecasting. IEEE Trans Intell Transp Syst 11(1):162–171
Thomas T, Weijermars W, Berkum EV (2010) Predictions of urban volumes in single time series. IEEE Trans Intell Transp Syst 11(1):71–80
Rice J, Van Zwet E (2004) A simple and effective method for predicting travel times on freeways. IEEE Trans Intell Transp Syst 5(3):200–207
Kwon J, Coifman B, Bickel P (2000) Day-to-day travel-time trends and travel-time prediction from loop-detector data. Transp Res Rec 1717:120–129
Kwon J, Petty K (2005) Travel time prediction algorithm scalable to freeway networks with many nodes with arbitrary travel routes. Transp Res Rec 1935:147–153
You J, Kim TJ (2000) Development and evaluation of a hybrid travel time forecasting model. Transp Res C-Emerg 8(1):231–256
Smith BL, Williams BM, Oswald RK (2002) Comparison of parametric and nonparametric models for traffic flow forecasting. Transp Res C-Emerg 10(4):303–321
Chang H, Park D, Lee S, Lee H, Baek S (2010) Dynamic multi-interval bus travel time prediction using bus transit data. Transportmetr A 6(1):19–38
Adeli H (2001) Neural networks in civil engineering: 1989–2000. Comput-Aided Civ Inf 16(2):126–142
Chien SIJ, Ding Y, Wei C (2002) Dynamic bus arrival time prediction with artificial neural networks. J Transp Eng 128(5):429–438
Mazloumi E, Currie G, Rose G (2010) Using traffic flow data to predict bus travel time variability through an enhanced artificial neural network. In: Presented at the 12th world Congress on transport research. Lisbon, Portugal
Yao BZ, Chen C, Zhang L, Yu B, Wang YP (2019) Allocation method for transit lines considering the User Equilibrium for operators. Transp Res C-Emerg 105:666–682
Yu B, Ye T, Tian XM, Ning GB, Zhong SQ (2012) Bus travel-time prediction with a forgetting factor. J Comput Civ Eng 28(3):06014002
Gal A, Mandelbaum A, Schnitzler F, Senderovich A, Weidlich M (2017) Traveling time prediction in scheduled transportation with journey segments. Inf Syst 64:266–280
Yu B, Song XL, Guan F, Yang ZM, Yao BZ (2016) k-Nearest neighbor model for multiple-time-step prediction of short-term traffic condition. J Transp Eng-ASCE 142(6):04016018
Yao BZ, Chen C, Cao QD, Jin L, Zhang MH, Zhu HB, Yu B (2017) Short-term traffic speed prediction for an urban corridor. Comput-Aided Civ Inf 32(2):154–169
Reddy KK, Kumar BA, Vanajakshi L (2016) Bus travel time prediction under high variability conditions. Curr Sci 111(4):700
Wang WS, Liu JM, Yao BZ, Jiang YL, Wang YP, Yu B (2019) A data-driven hybrid control framework to improve transit performance. Transp Res C-Emerg 107:387–410
Chen M, Liu X, Xia J, Chien SI (2004) A dynamic bus arrival time prediction model based on APC data. Comput-Aided Civ Inf 19(5):364–376
Yu B, Yang ZZ, Chen K, Yu B (2010) Hybrid model for prediction of bus arrival times at next station. J Adv Transp 44(3):193–204
Shalaby A, Farhan A (2003) Bus travel time prediction model for dynamic operations control and passenger information systems. In: Presented at the 82nd TRB annual meeting, Washington D.C., USA
Vanajakshi L, Rilett LR (2007) Support vector machine technique for the short term prediction of travel time. In: Proceedings of intelligent vehicles symposium. Istanbul, Turkey, pp 600–605
Billings D, Yang JS (2006) Travel time prediction using a seasonal autoregressive integrated moving average time series model. In: International conference on systems, man, and cybernetics. Taipei, Taiwan, pp 2529–2534
Guin A (2006) Application of the ARIMA models to urban roadway travel time prediction-a case study. In: Intelligent transportation systems conference. Toronto, Ontario, Canada, pp 494–498
Kumar P, Sehgal V, Chauhan DS (2011) Performance evaluation of decision tree versus artificial neural network based classifiers in diversity of datasets. In: World congress on information and communication technologies (WICT). Mumbai, India, pp 798–803
Kumar P, Sehgal VK, Chauhan DS (2012) A benchmark to select data mining based classification algorithms for business intelligence and decision support systems. Int J Data Min Knowl Manage. Process (IJDKP) 2(5):25–42
Weigend A (1993) On overfitting and the effective number of hidden units. Department of Computer Science, University of Colorado, Boulder, Colorado, USA, CU-CS-674-93
Li DQ, Fu BW, Wang YP, Lu GQ, Berezin Y, Stanley HE, Havlin S (2015) Percolation transition in dynamical traffic network with evolving critical bottlenecks. Proc Natl Acad Sci USA 112(3):669–672
Tang TQ, Shi YF, Wang YP, Yu GZ (2012) A bus-following model with an on-line bus station. Nonlinear Dyn 70(1):209–215
Sun D, Ni X, Zhang L (2016) A discriminated release strategy for parking variable message sign display problem using agent-based simulation. IEEE Trans Intell Transp Syst 17(1):38–47
Lv Y, Duan Y, Kang W, Li Z, Wang FY (2015) Traffic flow prediction with big data: a deep learning approach. IEEE Trans Intell Transp Syst 16(2):865–873
Siripanpornchana C, Panichpapiboon S, Chaovalit P (2016) Travel-time prediction with deep learning. In: Region 10 conference (TENCON), Singapore. Singapore, pp 1859–1862
Li LC, Qu X, Zhang J, Wang YG, Ran B (2019) Traffic speed prediction for intelligent transportation system based on a deep feature fusion model. J Intell Transp Syst. https://doi.org/10.1080/15472450.2019.1583965
Ran X, Shan Z, Fang Y, Lin C (2019) An LSTM-based method with attention mechanism for travel time prediction. Sensors 19(4):861
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Huang W, Song G, Hong H, Xie K (2014) Deep architecture for traffic flow prediction: deep belief networks with multitask learning. IEEE Trans Intell Transp Syst 15(5):2191–2201
Koesdwiady A, Soua R, Karray F (2016) Improving traffic flow prediction with weather information in connected cars: a deep learning approach. IEEE Trans Veh Technol 65(12):9508–9517
Soua R, Koesdwiady A, Karray F (2016) Big-data-generated traffic flow prediction using deep learning and dempster-shafer theory. In: International joint conference on neural networks. Vancouver, BC, Canada, pp 3195–3202
Hrasko R, Pacheco AG, Krohling RA (2015) Time series prediction using restricted Boltzmann machines and backpropagation. Proc Comput Sci 55:990–999
Hinton GE (2012) A practical guide to training restricted Boltzmann machines. In: Neural networks: tricks of the trade. Springer, Berlin, Heidelberg, pp 599–619
Kumar A, Johari S, Proch D, Kumar P, Chauhan DS (2018) A tree based approach for data pre-processing and pattern matching for accident mapping on road networks. Proc Natl Acad Sci India Sect A Phys Sci 89(3):453–466
Acknowledgements
This work was supported in National Natural Science Foundation of China (U1811463 and 51578112), The State Key Laboratory of Structural Analysis for Industrial Equipment (S18307). Finally, the authors gratefully acknowledge financial support from China Scholarship Council.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Classical RBM
RBM is a special kind of generative energy-based model that can learn a probability distribution over a set of inputs. A classical RBM has binary valued hidden and visible units. And the energy of a joint configuration \(\left( {v,h} \right)\) of the visible and hidden units can be obtained by:
where \(v_{i}\) and \(h_{j}\) are the binary states of visible unit i and hidden unit j, \(a_{i}\) and \(b_{j}\) are their biases and \(w_{ij}\) is the weight. Then, the probability that is assigned to every possible pair of a visible and a hidden vector is calculated via the energy function:
Then, the probability of a particular visible state configuration \(v\) is derived by summing over all possible hidden vectors:
Similarly, the formula of \(p\left( h \right)\) is entirely analogous to that of \(p\left( v \right)\):
Some other conditional expressions can also be derived as follows:
Thus, the probability of a particular visible unit being on given a hidden configuration, i.e., the state of a visible node, given a hidden vector, is derived by:
Similarly, for randomly selected training input \(v\), the binary state \(h_{j}\) of each hidden unit j is set to 1 with probability:
Given \(\sigma \left( x \right) = \frac{1}{{1 + {\text{e}}^{ - x} }}\), formulas (22) and (23) can be rewritten as follows:
Given a set of \(C\) training cases \(\left\{ {v^{c} \left| {c \in \left\{ {1, \ldots ,C} \right\}} \right.} \right\}\), the goal is to maximize the average log probability of the set under the model’s distribution:
Then, the gradient or the derivative of the log probability of the training vector with respect to a weight \(w_{ij}\) has the following form:
The first term of formula (26) can be written as:
Notice that the term \(\frac{{\sum\nolimits_{h} {{\text{e}}^{{ - E\left( {v^{c} ,h} \right)}} v_{i}^{c} h_{j} } }}{{\sum\nolimits_{h} {{\text{e}}^{{ - E\left( {v^{c} ,h} \right)}} } }}\) is just the expected value of \(v_{i}^{c} h_{j}\) given that \(v\) is clamped to the data vector \(v^{c}\). This is easy to compute since we know \(v_{i}^{c}\) and we can compute the expected value of \(h_{j}\) using formula (25).
The second term of formula (27) can also be written as:
Here, the term \(\frac{{\sum\nolimits_{v,h} {{\text{e}}^{{ - E\left( {v,h} \right)}} v_{i} h_{j} } }}{{\sum\nolimits_{v,h} {{\text{e}}^{{ - E\left( {v,h} \right)}} } }}\) is the expected value of \(v_{i} h_{j}\) under the model’s distribution. This expectation can be approximated well in finite time by the contrastive divergence (CD) algorithm.
By using \(\left\langle . \right\rangle_{d}\) and \(\left\langle . \right\rangle_{m}\) to represent the expected values of the training data and model, respectively, formula (27) can be rewritten.
Thus, the update rule for weight \(w_{ij}\) is shown as follows:
where \(\varepsilon\) is the learning rate.
The update rules for the biases are similarly derived to be:
1.2 Gaussian–Bernoulli RBM
The classical RBM was developed only using binary logistic units for visible and hidden units; in this paper for the traffic data that are continuous, a conversion to continuous-valued inputs is used as described in Refs. [42, 47]. To model continuous data, the binary visible units of RBM are replaced by linear units with Gaussian noise, and then the energy function of GBRBM becomes:
where \(\sigma_{i}\) is the standard deviation of the Gaussian noise for visible unit i.
Given the energy function (34), the distribution \(p\left( {v\left| h \right.} \right)\) can be derived as follows:
Thus, \(p\left( {h_{k} = 1\left| v \right.} \right)\) is computed as follows.
Note that Eq. (36) is the same as in the classical RBM except the \(v_{i}\) scaled by the reciprocal of its standard deviation \(\sigma_{i}\).
The training procedure for a GBRBM is identical to that of an RBM. As in that case, we take the derivative shown in formula (27). We find that
Similarly,
which we estimate, as before, using CD algorithm.
Rights and permissions
About this article
Cite this article
Chen, C., Wang, H., Yuan, F. et al. Bus travel time prediction based on deep belief network with back-propagation. Neural Comput & Applic 32, 10435–10449 (2020). https://doi.org/10.1007/s00521-019-04579-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04579-x