Bus travel time prediction based on deep belief network with back-propagation


In an intelligent transportation system, accurate bus information is vital for passengers to schedule their departure time and make reasonable route choice. In this paper, an improved deep belief network (DBN) is proposed to predict the bus travel time. By using Gaussian–Bernoulli restricted Boltzmann machines to construct a DBN, we update the classical DBN to model continuous data. In addition, a back-propagation (BP) neural network is further applied to improve the performance. Based on the real traffic data collected in Shenyang, China, several experiments are conducted to validate the technique. Comparison with typical forecasting methods such as k-nearest neighbor algorithm (k-NN), artificial neural network (ANN), support vector machine (SVM) and random forests (RFs) shows that the proposed method is applicable to the prediction of bus travel time and works better than traditional methods.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.

    Petersen NC, Rodrigues F, Pereira FC (2019) Multi-output bus travel time prediction with convolutional LSTM neural network. Expert Syst Appl 120:426–435

    Article  Google Scholar 

  2. 2.

    Yu B, Lam WH, Tam ML (2011) Bus arrival time prediction at bus stop with multiple routes. Transp Res C-Emerg 19(6):1157–1170

    Article  Google Scholar 

  3. 3.

    Yu B, Wang HZ, Shan WX, Yao BZ (2018) Prediction of bus travel time using random forests based on near neighbors. Comput-Aided Civ Inf 33(4):333–350

    Article  Google Scholar 

  4. 4.

    Williams BM, Hoel LA (2003) Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: theoretical basis and empirical results. J Transp Eng 129(6):664–672

    Article  Google Scholar 

  5. 5.

    Jeong RH (2005) The prediction of bus arrival time using automatic vehicle location systems data. Ph.D. dissertation, Department of Civil Engineering, Texas A&M University, Texas, USA

  6. 6.

    Chung EH, Shalaby A (2007) Expected time of arrival model for school bus transit using real-time global positioning system-based automatic vehicle location data. Int J Intell Transp Syst Res 11(4):157–167

    MATH  Article  Google Scholar 

  7. 7.

    Yang M, Liu Y, You Z (2010) The reliability of travel time forecasting. IEEE Trans Intell Transp Syst 11(1):162–171

    Article  Google Scholar 

  8. 8.

    Thomas T, Weijermars W, Berkum EV (2010) Predictions of urban volumes in single time series. IEEE Trans Intell Transp Syst 11(1):71–80

    Article  Google Scholar 

  9. 9.

    Rice J, Van Zwet E (2004) A simple and effective method for predicting travel times on freeways. IEEE Trans Intell Transp Syst 5(3):200–207

    Article  Google Scholar 

  10. 10.

    Kwon J, Coifman B, Bickel P (2000) Day-to-day travel-time trends and travel-time prediction from loop-detector data. Transp Res Rec 1717:120–129

    Article  Google Scholar 

  11. 11.

    Kwon J, Petty K (2005) Travel time prediction algorithm scalable to freeway networks with many nodes with arbitrary travel routes. Transp Res Rec 1935:147–153

    Article  Google Scholar 

  12. 12.

    You J, Kim TJ (2000) Development and evaluation of a hybrid travel time forecasting model. Transp Res C-Emerg 8(1):231–256

    Article  Google Scholar 

  13. 13.

    Smith BL, Williams BM, Oswald RK (2002) Comparison of parametric and nonparametric models for traffic flow forecasting. Transp Res C-Emerg 10(4):303–321

    Article  Google Scholar 

  14. 14.

    Chang H, Park D, Lee S, Lee H, Baek S (2010) Dynamic multi-interval bus travel time prediction using bus transit data. Transportmetr A 6(1):19–38

    Article  Google Scholar 

  15. 15.

    Adeli H (2001) Neural networks in civil engineering: 1989–2000. Comput-Aided Civ Inf 16(2):126–142

    Article  Google Scholar 

  16. 16.

    Chien SIJ, Ding Y, Wei C (2002) Dynamic bus arrival time prediction with artificial neural networks. J Transp Eng 128(5):429–438

    Article  Google Scholar 

  17. 17.

    Mazloumi E, Currie G, Rose G (2010) Using traffic flow data to predict bus travel time variability through an enhanced artificial neural network. In: Presented at the 12th world Congress on transport research. Lisbon, Portugal

  18. 18.

    Yao BZ, Chen C, Zhang L, Yu B, Wang YP (2019) Allocation method for transit lines considering the User Equilibrium for operators. Transp Res C-Emerg 105:666–682

    Article  Google Scholar 

  19. 19.

    Yu B, Ye T, Tian XM, Ning GB, Zhong SQ (2012) Bus travel-time prediction with a forgetting factor. J Comput Civ Eng 28(3):06014002

    Article  Google Scholar 

  20. 20.

    Gal A, Mandelbaum A, Schnitzler F, Senderovich A, Weidlich M (2017) Traveling time prediction in scheduled transportation with journey segments. Inf Syst 64:266–280

    Article  Google Scholar 

  21. 21.

    Yu B, Song XL, Guan F, Yang ZM, Yao BZ (2016) k-Nearest neighbor model for multiple-time-step prediction of short-term traffic condition. J Transp Eng-ASCE 142(6):04016018

    Article  Google Scholar 

  22. 22.

    Yao BZ, Chen C, Cao QD, Jin L, Zhang MH, Zhu HB, Yu B (2017) Short-term traffic speed prediction for an urban corridor. Comput-Aided Civ Inf 32(2):154–169

    Article  Google Scholar 

  23. 23.

    Reddy KK, Kumar BA, Vanajakshi L (2016) Bus travel time prediction under high variability conditions. Curr Sci 111(4):700

    Article  Google Scholar 

  24. 24.

    Wang WS, Liu JM, Yao BZ, Jiang YL, Wang YP, Yu B (2019) A data-driven hybrid control framework to improve transit performance. Transp Res C-Emerg 107:387–410

    Article  Google Scholar 

  25. 25.

    Chen M, Liu X, Xia J, Chien SI (2004) A dynamic bus arrival time prediction model based on APC data. Comput-Aided Civ Inf 19(5):364–376

    Article  Google Scholar 

  26. 26.

    Yu B, Yang ZZ, Chen K, Yu B (2010) Hybrid model for prediction of bus arrival times at next station. J Adv Transp 44(3):193–204

    Article  Google Scholar 

  27. 27.

    Shalaby A, Farhan A (2003) Bus travel time prediction model for dynamic operations control and passenger information systems. In: Presented at the 82nd TRB annual meeting, Washington D.C., USA

  28. 28.

    Vanajakshi L, Rilett LR (2007) Support vector machine technique for the short term prediction of travel time. In: Proceedings of intelligent vehicles symposium. Istanbul, Turkey, pp 600–605

  29. 29.

    Billings D, Yang JS (2006) Travel time prediction using a seasonal autoregressive integrated moving average time series model. In: International conference on systems, man, and cybernetics. Taipei, Taiwan, pp 2529–2534

  30. 30.

    Guin A (2006) Application of the ARIMA models to urban roadway travel time prediction-a case study. In: Intelligent transportation systems conference. Toronto, Ontario, Canada, pp 494–498

  31. 31.

    Kumar P, Sehgal V, Chauhan DS (2011) Performance evaluation of decision tree versus artificial neural network based classifiers in diversity of datasets. In: World congress on information and communication technologies (WICT). Mumbai, India, pp 798–803

  32. 32.

    Kumar P, Sehgal VK, Chauhan DS (2012) A benchmark to select data mining based classification algorithms for business intelligence and decision support systems. Int J Data Min Knowl Manage. Process (IJDKP) 2(5):25–42

    Article  Google Scholar 

  33. 33.

    Weigend A (1993) On overfitting and the effective number of hidden units. Department of Computer Science, University of Colorado, Boulder, Colorado, USA, CU-CS-674-93

  34. 34.

    Li DQ, Fu BW, Wang YP, Lu GQ, Berezin Y, Stanley HE, Havlin S (2015) Percolation transition in dynamical traffic network with evolving critical bottlenecks. Proc Natl Acad Sci USA 112(3):669–672

    Article  Google Scholar 

  35. 35.

    Tang TQ, Shi YF, Wang YP, Yu GZ (2012) A bus-following model with an on-line bus station. Nonlinear Dyn 70(1):209–215

    Article  Google Scholar 

  36. 36.

    Sun D, Ni X, Zhang L (2016) A discriminated release strategy for parking variable message sign display problem using agent-based simulation. IEEE Trans Intell Transp Syst 17(1):38–47

    Article  Google Scholar 

  37. 37.

    Lv Y, Duan Y, Kang W, Li Z, Wang FY (2015) Traffic flow prediction with big data: a deep learning approach. IEEE Trans Intell Transp Syst 16(2):865–873

    Google Scholar 

  38. 38.

    Siripanpornchana C, Panichpapiboon S, Chaovalit P (2016) Travel-time prediction with deep learning. In: Region 10 conference (TENCON), Singapore. Singapore, pp 1859–1862

  39. 39.

    Li LC, Qu X, Zhang J, Wang YG, Ran B (2019) Traffic speed prediction for intelligent transportation system based on a deep feature fusion model. J Intell Transp Syst. https://doi.org/10.1080/15472450.2019.1583965

    Article  Google Scholar 

  40. 40.

    Ran X, Shan Z, Fang Y, Lin C (2019) An LSTM-based method with attention mechanism for travel time prediction. Sensors 19(4):861

    Article  Google Scholar 

  41. 41.

    Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    MathSciNet  MATH  Article  Google Scholar 

  42. 42.

    Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    MathSciNet  MATH  Article  Google Scholar 

  43. 43.

    Huang W, Song G, Hong H, Xie K (2014) Deep architecture for traffic flow prediction: deep belief networks with multitask learning. IEEE Trans Intell Transp Syst 15(5):2191–2201

    Article  Google Scholar 

  44. 44.

    Koesdwiady A, Soua R, Karray F (2016) Improving traffic flow prediction with weather information in connected cars: a deep learning approach. IEEE Trans Veh Technol 65(12):9508–9517

    Article  Google Scholar 

  45. 45.

    Soua R, Koesdwiady A, Karray F (2016) Big-data-generated traffic flow prediction using deep learning and dempster-shafer theory. In: International joint conference on neural networks. Vancouver, BC, Canada, pp 3195–3202

  46. 46.

    Hrasko R, Pacheco AG, Krohling RA (2015) Time series prediction using restricted Boltzmann machines and backpropagation. Proc Comput Sci 55:990–999

    Article  Google Scholar 

  47. 47.

    Hinton GE (2012) A practical guide to training restricted Boltzmann machines. In: Neural networks: tricks of the trade. Springer, Berlin, Heidelberg, pp 599–619

    Chapter  Google Scholar 

  48. 48.

    Kumar A, Johari S, Proch D, Kumar P, Chauhan DS (2018) A tree based approach for data pre-processing and pattern matching for accident mapping on road networks. Proc Natl Acad Sci India Sect A Phys Sci 89(3):453–466

    Article  Google Scholar 

Download references


This work was supported in National Natural Science Foundation of China (U1811463 and 51578112), The State Key Laboratory of Structural Analysis for Industrial Equipment (S18307). Finally, the authors gratefully acknowledge financial support from China Scholarship Council.

Author information



Corresponding author

Correspondence to Baozhen Yao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Classical RBM

RBM is a special kind of generative energy-based model that can learn a probability distribution over a set of inputs. A classical RBM has binary valued hidden and visible units. And the energy of a joint configuration \(\left( {v,h} \right)\) of the visible and hidden units can be obtained by:

$$E\left( {v,h} \right) = - \sum\limits_{i = 1}^{m} {a_{i} v_{i} } - \sum\limits_{j = 1}^{k} {b_{j} h_{j} } - \sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{k} {v_{i} h_{j} w_{ij} } }$$

where \(v_{i}\) and \(h_{j}\) are the binary states of visible unit i and hidden unit j, \(a_{i}\) and \(b_{j}\) are their biases and \(w_{ij}\) is the weight. Then, the probability that is assigned to every possible pair of a visible and a hidden vector is calculated via the energy function:

$$p\left( {v,h} \right) = \frac{{{\text{e}}^{{ - E\left( {v,h} \right)}} }}{{\sum\nolimits_{v,h} {{\text{e}}^{{ - E\left( {v,h} \right)}} } }}$$

Then, the probability of a particular visible state configuration \(v\) is derived by summing over all possible hidden vectors:

$$p\left( v \right) = \sum\limits_{h} {p\left( {v,h} \right) = \frac{{\sum\nolimits_{h} {{\text{e}}^{{ - E\left( {v,h} \right)}} } }}{{\sum\nolimits_{v,h} {{\text{e}}^{{ - E\left( {v,h} \right)}} } }}}$$

Similarly, the formula of \(p\left( h \right)\) is entirely analogous to that of \(p\left( v \right)\):

$$p\left( h \right) = \sum\limits_{v} {p\left( {v,h} \right) = \frac{{\sum\nolimits_{v} {{\text{e}}^{{ - E\left( {v,h} \right)}} } }}{{\sum\nolimits_{v,h} {{\text{e}}^{{ - E\left( {v,h} \right)}} } }}}$$

Some other conditional expressions can also be derived as follows:

$$p\left( {v\left| h \right.} \right) = \frac{{p\left( {v,h} \right)}}{p\left( h \right)} = \frac{{{\text{e}}^{{ - E\left( {v,h} \right)}} }}{{\sum\nolimits_{v} {{\text{e}}^{{ - E\left( {v,h} \right)}} } }}$$
$$p\left( {h\left| v \right.} \right) = \frac{{p\left( {v,h} \right)}}{p\left( v \right)} = \frac{{{\text{e}}^{{ - E\left( {v,h} \right)}} }}{{\sum\nolimits_{h} {{\text{e}}^{{ - E\left( {v,h} \right)}} } }}$$

Thus, the probability of a particular visible unit being on given a hidden configuration, i.e., the state of a visible node, given a hidden vector, is derived by:

$$p\left( {v_{i} = 1\left| h \right.} \right) = \frac{{p\left( {v_{i} = 1,h} \right)}}{p\left( h \right)} = \frac{1}{{1 + {\text{e}}^{{ - \left( {a_{i} + \sum\nolimits_{j = 1}^{k} {h_{j} } w_{ij} } \right)}} }}$$

Similarly, for randomly selected training input \(v\), the binary state \(h_{j}\) of each hidden unit j is set to 1 with probability:

$$p\left( {h_{j} = 1\left| v \right.} \right) = \frac{{p\left( {h_{j} = 1,v} \right)}}{p\left( h \right)} = \frac{1}{{1 + {\text{e}}^{{ - \left( {b_{j} + \sum\nolimits_{i = 1}^{m} {v_{i} } w_{ij} } \right)}} }}$$

Given \(\sigma \left( x \right) = \frac{1}{{1 + {\text{e}}^{ - x} }}\), formulas (22) and (23) can be rewritten as follows:

$$p\left( {v_{i} = 1\left| h \right.} \right) = \sigma \left( {a_{i} + \sum\limits_{j = 1}^{k} {h_{j} } w_{ij} } \right)$$
$$p\left( {h_{j} = 1\left| v \right.} \right) = \sigma \left( {b_{j} + \sum\limits_{i = 1}^{m} {v_{i} } w_{ij} } \right)$$

Given a set of \(C\) training cases \(\left\{ {v^{c} \left| {c \in \left\{ {1, \ldots ,C} \right\}} \right.} \right\}\), the goal is to maximize the average log probability of the set under the model’s distribution:

$$\sum\limits_{c = 1}^{C} {\log p\left( {v^{c} } \right)} = \sum\limits_{c = 1}^{C} {\log \frac{{\sum\nolimits_{h} {{\text{e}}^{{ - E\left( {v^{c} ,h} \right)}} } }}{{\sum\nolimits_{v,h} {{\text{e}}^{{ - E\left( {v,h} \right)}} } }}}$$

Then, the gradient or the derivative of the log probability of the training vector with respect to a weight \(w_{ij}\) has the following form:

$$\frac{\partial }{{\partial w_{ij} }}\sum\limits_{c = 1}^{C} {\log p\left( {v^{c} } \right)} = \frac{\partial }{{\partial w_{ij} }}\left( {\sum\limits_{c = 1}^{C} {\log \sum\limits_{h} {{\text{e}}^{{ - E\left( {v^{c} ,h} \right)}} } - \log \sum\limits_{v,h} {{\text{e}}^{{ - E\left( {v,h} \right)}} } } } \right)$$

The first term of formula (26) can be written as:

$$\frac{\partial }{{\partial w_{ij} }}\sum\limits_{c = 1}^{C} {\log \sum\limits_{h} {{\text{e}}^{{ - E\left( {v^{c} ,h} \right)}} } = } - \sum\limits_{c = 1}^{C} {\frac{{\sum\nolimits_{h} {{\text{e}}^{{ - E\left( {v^{c} ,h} \right)}} v_{i}^{c} h_{j} } }}{{\sum\nolimits_{h} {{\text{e}}^{{ - E\left( {v^{c} ,h} \right)}} } }}}$$

Notice that the term \(\frac{{\sum\nolimits_{h} {{\text{e}}^{{ - E\left( {v^{c} ,h} \right)}} v_{i}^{c} h_{j} } }}{{\sum\nolimits_{h} {{\text{e}}^{{ - E\left( {v^{c} ,h} \right)}} } }}\) is just the expected value of \(v_{i}^{c} h_{j}\) given that \(v\) is clamped to the data vector \(v^{c}\). This is easy to compute since we know \(v_{i}^{c}\) and we can compute the expected value of \(h_{j}\) using formula (25).

The second term of formula (27) can also be written as:

$$\frac{\partial }{{\partial w_{ij} }}\sum\limits_{c = 1}^{C} {\log \sum\limits_{v,h} {{\text{e}}^{{ - E\left( {v,h} \right)}} } } = \, - \sum\limits_{c = 1}^{C} {\frac{{\sum\nolimits_{v,h} {{\text{e}}^{{ - E\left( {v,h} \right)}} v_{i} h_{j} } }}{{\sum\nolimits_{v,h} {{\text{e}}^{{ - E\left( {v,h} \right)}} } }}}$$

Here, the term \(\frac{{\sum\nolimits_{v,h} {{\text{e}}^{{ - E\left( {v,h} \right)}} v_{i} h_{j} } }}{{\sum\nolimits_{v,h} {{\text{e}}^{{ - E\left( {v,h} \right)}} } }}\) is the expected value of \(v_{i} h_{j}\) under the model’s distribution. This expectation can be approximated well in finite time by the contrastive divergence (CD) algorithm.

By using \(\left\langle . \right\rangle_{d}\) and \(\left\langle . \right\rangle_{m}\) to represent the expected values of the training data and model, respectively, formula (27) can be rewritten.

$$\frac{\partial }{{\partial w_{ij} }}\log p\left( v \right) = \left\langle {v_{i} h_{j} } \right\rangle_{d} - \left\langle {v_{i} h_{j} } \right\rangle_{m}$$

Thus, the update rule for weight \(w_{ij}\) is shown as follows:

$$\Delta w_{ij} = \varepsilon \left( {\left\langle {v_{i} h_{j} } \right\rangle_{d} - \left\langle {v_{i} h_{j} } \right\rangle_{m} } \right)$$

where \(\varepsilon\) is the learning rate.

The update rules for the biases are similarly derived to be:

$$\Delta v_{i} = \varepsilon \left( {\left\langle {v_{i} } \right\rangle_{d} - \left\langle {v_{i} } \right\rangle_{m} } \right)$$
$$\Delta h_{j} = \varepsilon \left( {\left\langle {h_{j} } \right\rangle_{d} - \left\langle {h_{j} } \right\rangle_{m} } \right)$$

Gaussian–Bernoulli RBM

The classical RBM was developed only using binary logistic units for visible and hidden units; in this paper for the traffic data that are continuous, a conversion to continuous-valued inputs is used as described in Refs. [42, 47]. To model continuous data, the binary visible units of RBM are replaced by linear units with Gaussian noise, and then the energy function of GBRBM becomes:

$$E\left( {v,h} \right) = - \sum\limits_{i = 1}^{m} {\frac{{\left( {v_{i} - a_{i} } \right)^{2} }}{{2\sigma_{i}^{2} }}} - \sum\limits_{j = 1}^{k} {b_{j} h_{j} } - \sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{k} {\frac{{v_{i} }}{{\sigma_{i} }}} } W_{ij} h_{j}$$

where \(\sigma_{i}\) is the standard deviation of the Gaussian noise for visible unit i.

Given the energy function (34), the distribution \(p\left( {v\left| h \right.} \right)\) can be derived as follows:

$$\begin{aligned} p\left( {v\left| h \right.} \right) = \frac{{{\text{e}}^{{ - E\left( {v,h} \right)}} }}{{\int_{v} {{\text{e}}^{{ - E\left( {v,h} \right)}} {\text{d}}v} }} & = \frac{{{\text{e}}^{{ - \sum\nolimits_{i = 1}^{m} {\frac{{\left( {v_{i} - a_{i} } \right)^{2} }}{{2\sigma_{i}^{2} }}} + \sum\nolimits_{j = 1}^{k} {b_{j} h_{j} } + \sum\nolimits_{i = 1}^{m} {\sum\nolimits_{j = 1}^{k} {\frac{{v_{i} }}{{\sigma_{i} }}W_{ij} h_{j} } } }} }}{{\int_{v} {{\text{e}}^{{ - \sum\nolimits_{i = 1}^{m} {\frac{{\left( {v_{i} - a_{i} } \right)^{2} }}{{2\sigma_{i}^{2} }}} + \sum\nolimits_{j = 1}^{k} {b_{j} h_{j} } + \sum\nolimits_{i = 1}^{m} {\sum\nolimits_{j = 1}^{k} {\frac{{v_{i} }}{{\sigma_{i} }}W_{ij} h_{j} } } }} {\text{d}}v} }} \\ & = \prod\nolimits_{i = 1}^{m} {\frac{1}{{\sigma_{i} \sqrt {2\pi } }} \cdot {\text{e}}^{{^{{ - \frac{1}{{2\sigma_{i}^{2} }}\left( {v_{i} - a_{i} - \sigma_{i} \left( {\sum\nolimits_{j = 1}^{k} {W_{ij} h_{j} } } \right)} \right)^{2} }} }} } \\ \end{aligned}$$

Thus, \(p\left( {h_{k} = 1\left| v \right.} \right)\) is computed as follows.

$$\begin{aligned} p\left( {h_{k} = 1\left| v \right.} \right) & = \frac{{\sum\nolimits_{{h_{j} \ne k}} {p\left( {v,h_{k} = 1,h_{j \ne k} } \right)} }}{p\left( v \right)} \\ & = \frac{{\sum\nolimits_{{h_{j} \ne k}} {{\text{e}}^{{\left( {\sum\nolimits_{i = 1}^{m} {\frac{{v_{i} }}{{\sigma_{i} }}w_{ik} + b_{j} } } \right) + \left( {\sum\nolimits_{i = 1}^{m} {\sum\nolimits_{j \ne k}^{k} {\frac{{v_{i} }}{{\sigma_{i} }}W_{ij} h_{j} } } + \sum\nolimits_{i = 1}^{m} {\frac{{\left( {v_{i} - a_{i} } \right)^{2} }}{{2\sigma_{i}^{2} }} + \sum\nolimits_{j \ne k}^{k} {h_{j} b_{j} } } } \right)}} } }}{{\sum\nolimits_{h} {{\text{e}}^{{ - E\left( {v,h} \right)}} } }} \\ & = \frac{1}{{1 + {\text{e}}^{{ - \left( {\sum\nolimits_{i = 1}^{m} {\frac{{v_{i} }}{{\sigma_{i} }}w_{ik} + b_{j} } } \right)}} }} \\ \end{aligned}$$

Note that Eq. (36) is the same as in the classical RBM except the \(v_{i}\) scaled by the reciprocal of its standard deviation \(\sigma_{i}\).

The training procedure for a GBRBM is identical to that of an RBM. As in that case, we take the derivative shown in formula (27). We find that

$$\begin{aligned} \frac{\partial }{{\partial w_{ij} }}\sum\limits_{c = 1}^{C} {\log \sum\limits_{h} {{\text{e}}^{{ - E\left( {v^{c} ,h} \right)}} } } & = - \sum\limits_{c = 1}^{C} {\frac{{\sum\nolimits_{h} {{\text{e}}^{{ - E^{{\left( {v^{c} ,h} \right)}} }} \frac{{\partial E\left( {v^{c} ,h} \right)}}{{\partial w_{ij} }}} }}{{\sum\nolimits_{h} {{\text{e}}^{{ - E\left( {v^{c} ,h} \right)}} } }}} \\ & = - \frac{1}{{\sigma_{i} }}\sum\nolimits_{c = 1}^{C} {\frac{{\sum\nolimits_{h} {{\text{e}}^{{ - E\left( {v^{c} ,h} \right)}} } v_{i}^{c} h_{j}^{c} }}{{\sum\nolimits_{h} {{\text{e}}^{{ - E\left( {v^{c} ,h} \right)}} } }}} \\ \end{aligned}$$


$$\frac{\partial }{{\partial w_{ij} }}\sum\limits_{c = 1}^{C} {\log \sum\limits_{v} {\sum\limits_{h} {{\text{e}}^{{ - E\left( {v,h} \right)}} } } = - \frac{1}{{\sigma_{i} }}\sum\nolimits_{c = 1}^{C} {\frac{{\sum\nolimits_{v} {\sum\nolimits_{h} {{\text{e}}^{{ - E\left( {v,h} \right)}} v_{i} h_{j} } } }}{{\sum\nolimits_{v} {\sum\nolimits_{h} {{\text{e}}^{{ - E\left( {v,h} \right)}} } } }}} }$$

which we estimate, as before, using CD algorithm.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, C., Wang, H., Yuan, F. et al. Bus travel time prediction based on deep belief network with back-propagation. Neural Comput & Applic 32, 10435–10449 (2020). https://doi.org/10.1007/s00521-019-04579-x

Download citation


  • Bus travel time prediction
  • Multi-factor influence
  • Deep belief network
  • Machine learning models