Long Term Traffic Flow Prediction Using Residual Net and Deconvolutional Neural Network

Zang, Di; Fang, Yang; Wang, Dehai; Wei, Zhihua; Tang, Keshuang; Li, Xin

doi:10.1007/978-3-030-03335-4_6

Long Term Traffic Flow Prediction Using Residual Net and Deconvolutional Neural Network

Di Zang¹⁹,
Yang Fang¹⁹,
Dehai Wang¹⁹,
Zhihua Wei¹⁹,
Keshuang Tang²⁰ &
…
Xin Li²¹

Conference paper
First Online: 02 November 2018

2851 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11257))

Abstract

Nowadays accurate and efficient traffic flow prediction is strongly needed by individual travelers and public transport management. Traffic flow prediction, especially long-term prediction, plays an important role in the application of intelligent transportation systems (ITS). In this paper, we propose a personalized design model (ResDeconvNN) based on Convolutional Neural Network (CNN) for long-term traffic flow prediction of elevated highways in Shanghai. The next whole day flow information can be predicted using the previous day flows. Taking the correlation of traffic parameters into account, we analogy flow, speed and occupancy (FSO) to the 3 channels of RGB as the 3 inputs of model. So the raw data collected from loop detectors are transformed into a spatial-temporal matrix which has 3 channels. Our model consists of two modules: Residual net and deconvolutional neural network. First, we take advantage of the residual net in deep network to extract the features of traffic. Then, we develop a deconvolutional network module and apply it to decode the flow of the next day from the comprehensive spatial and temporal traffic features. Experimental results indicate that the proposed model is robust and can achieve a better prediction accuracy compared with the other existing popular approaches.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

With the rapid development of the society, there has been a large increase in urban traffic in recent years, resulting in many transportation problems such as congestion or accidents. ITS aims to address these problems and improve transportation intelligently. Traffic flow prediction, as an essential task of ITS, is to predict the future flow using historic flows. Traffic flow prediction is greatly helpful to make a better travel decision, alleviate traffic congestion and improve traffic operation efficiency for individual travelers, public transport, and transport planning. Thus accurate and efficient prediction will make great significance for ITS.

There exist a great amount of methods for traffic flow prediction, which can be divided into three main classes: data driven statistical methods, machine learning methods and deep learning methods. At the beginning, among the data driven statistical models, a majority of approaches use conventional statistical time series methods such as the Auto-Regressive Integrated Moving Average (ARIMA) model [1] and the seasonal (SARIMA) model [2]. However, a large number of studies have found that the traffic flow data are random, varied and nonlinear. The ARIMA algorithm cannot analyze the nonlinear traffic flow data because it is based on the linear relationship.

Furthermore, several machine learning approaches have also been proposed to deal with traffic flow prediction, such as SVM [3], K-nearest Neighbors (KNN), the online Support Vector Regression (SVR) [4] and so on. KNN has firstly been used in traffic flow prediction [5], Sun et al. use flow-aware WPT KNN to predict traffic parameters [6]. In [7], a spatio-temporal Bayesian multivariate adaptive-regression splines (ST-BMARS) model is developed to predict short-term freeway traffic flow. Additionally, an Artificial Neural Networks (ANN) model are used in road traffic prediction and congestion control in [8].

In recent years, deep learning has drawn growing attention from many researchers. Deep learning methods exploit much deeper and more complex architecture to extract inherent features in data from the lowest level to the highest level. So a lot of deep learning methods have been proposed and employed for traffic flow prediction, such as Stacked Auto Encoder (SAE) [9, 10], DBN [11, 12], RNN. In [13], Ma et al. combined the Restricted Boltzmann Machine (RBM) with Recurrent Neural Network (RNN) and formed a RBM-RNN model that inherits the advantages of both RBM and RNN. Zhao et al. proposed a Long Short-Term Memory (LSTM) based method for traffic flow data prediction, which uses LSTM to extract the temporal feature of traffic flow data [14]. Compared with other deep learning models, Convolutional Neural Network (CNN) has better performance in understanding and exploring the pattern characteristics of traffic data. Thus in [15], A Convolutional Neural Network based method that learns traffic as images was proposed to predict traffic speed.

In general, the short-term traffic flow prediction module has been well exploited in some deep learning models, however, there exist some defects: First, the problem of long-term prediction is still not well solved. Second, the existing models only utilize single parameter to predict and ignores the objective parameter correlation between traffic parameters. Third, most models usually adopt classic models with poor scalability and lack of personalized design for specific prediction problems. In our work, considering the advantage of deep learning, especially Convolutional Neural Network (CNN), we develop a novel CNN based model called ResDeconvNN which has 3 input channels and apply it to long-term traffic flow prediction. The spatio-temporal relations and correlation of the three traffic parameters: flow, speed and occupancy (FSO) are fully considered and applied simultaneously in traffic flow prediction problems. We combine residual net and deconvolutional neural network to form a ResDeconvNN model which can extract the spatial-temporal information of the traffic pattern features well. Experiments demonstrate that the proposed approach gets lower mean relative error, mean absolute error and root mean square error and can achieve better performance than the other existing methods.

2 Proposed Methodology

2.1 Basic Principle

Now it is generally acknowledged that CNN has shown remarkable learning ability in the pattern recognition and has a good ability to extract the input features. Compared to other deep learning models, CNN has fewer weight parameters and the raw data can be directly used as input for automatic feature-learning while avoiding the distortion of input. Based on this, in order to adapt to the transportation environment, we design a ResDeconvNN model for long-term traffic flow prediction. Since flow, speed and occupancy (FSO) are the three main elements of traffic data that have parameter correlation, which describe the traffic features in a certain time and space. So in this model, we use these correlations effectively and predict the flow of next day based on these 3 historic parameters to improve the traffic flow prediction performance.

2.2 FSO Matrix Generation

The raw flow, speed and occupancy (FSO) data are collected by a detector on the road. Generally, FSO data coming from the detector has a time interval of 5 min, and there is a certain distance between the detectors installed on highways. For each of the FSO parameters, traffic information with time and space dimensions should be considered to predict traffic flow. Thus we let x- and y-axes represent time and space dimensions of a matrix. Mathematically, denote the time-space matrix by:

$$ {\text{X}} = \left[ {\begin{array}{*{20}c} {x_{11} } & {x_{12} } & \ldots & {x_{1n} } \\ {x_{21} } & {x_{22} } & \ldots & {x_{2n} } \\ \ldots & \ldots & \ldots & \ldots \\ {x_{m1} } & {x_{m2} } & \ldots & {x_{mn} } \\ \end{array} } \right] $$

(1)

Matrix X can be viewed as one of the three channels of an image, where n is the length of time intervals, m is the length of road. And pixel $ x_{ij} $ is the corresponding value of FSO associated with time i and space j.

As is mentioned above, it explains the process of converting raw data to 3 matrices as 3 channels which represent the value of flow, speed and occupancy respectively in a day. For each matrix, in the time dimension, considering there is quite few traffic at night and the pattern character of traffic is simple, we choose data collected from 7 am to 10 pm. So there will be 180 time series at 5-min sampling interval, and the width of our matrix is 180. In the space dimension, we have 35 detectors and map the spatial sequence of the detector directly to the height dimension. Thus the height of our matrix is 35.

Finally, we merge the 3 channel matrices to generate a time-space FSO matrix. Considering the difference in the numerical range of each parameter. We normalized the data of each channel. Here, we adopt the maximum minimum value normalization method which is defined as:

$$ x_{norm} = \frac{{x - x_{min} }}{{x_{max} - x_{min} }} $$

(2)

Where $ x_{norm} $ is the normalized data of each channel, x represents the original data of each channel, $ x_{max} $ and $ x_{min} $ represent the maximum and minimum values of the original data of each channel.

2.3 The ResDeconvNN Model

The overall structure of the proposed ResDeconvNN model is shown in Fig. 1. Our method mainly incorporates two parts, where the first part is the residual net module and the second part is the deconvolutional neural network module.

For one thing, to make the long-term prediction of traffic flow more accurate, we draw on the ideas of residual, in this model we introduce residual structure to solve the problem of gradient disappearance when the network model is deep. Next, we designed the deconvolution neural network (DeconvNN) module to decode the traffic flow data of the next day from the integrated spatial and temporal characteristics.

The principle of the residual module is as follows.

Previous researches have shown that with the network depth increasing, accuracy gets saturated and then degrades rapidly when adding more layers to the network. Such degradation is not caused by overfitting, but because the deeper network becomes too hard to be optimized. However, when we add the identity mapping to some shallow network and change the optimization of these networks, it can greatly reduce the optimization difficulty of the whole network. Figure 2 shows a building block of residual. Assume that the input of the network is x, the expected output is H(x) = F(x) + x. By connecting the input x directly to the output, the goal of optimization is recast into residual F(x) = H(x) − x. In most cases, optimizing F(x) is much easier than optimizing H(x).

The Input Layer.

Unlike traditional models that have only single input channel, the input layer of our designed model has 3 channels, so that we can fully exploit the parametric correlation among flow, speed and occupancy data. As mentioned in 2.2, we transform the raw data into a spatial-temporal matrix which has 3 channels. Thus, the input data of the model is a four-dimensional matrix, which represent batch size, the number of detectors, the number of time series, the number of channels respectively.

The Convolution Layer.

The convolution layer is the key part of this model to learn the complex spatio-temporal characteristics of traffic data. In convolution layer, first of all, the spatio-temporal feature map of previous layer is convolved by different kernels. The convolutional result is then fed into a nonlinear activation function to form more complex spatio-temporal characteristics. Finally, the convolutional output can be written as:

$$ x_{j}^{l} = \varphi \left( {\sum\nolimits_{i = 1}^{{c^{l - 1} }} {x_{i}^{l - 1} } } \right. * k_{ij}^{l} + \left. {b_{j}^{l} } \right) $$

(3)

Where * represents convolution operation, l is the index at the lth layer and j is the index of feature map at the lth layer, $ c^{l - 1} $ is the number of feature maps of the previous layer. $ x_{i}^{l - 1} $ denotes a output feature map of the (l − 1) layer, $ x_{j}^{l} $, $ k_{ij}^{l} $, $ b_{j}^{l} $ represent the output feature map, kernel weights and bias at the lth layer. $ \varphi $ is the rectified liner unit active function which is defined as:

$$ \varphi \left( x \right) = max\left( 0 \right., \left. x \right) $$

(4)

The Pooling Layer.

The function of the pooling layer is to down sample the convolutional result so as to filter the redundant information of traffic characteristics. Therefore, the pooling operation reduces the size of the feature map and reduces the training parameters of the network, but also retains the significant pattern information. Common pooling operations include mean pooling, max pooling, and random pooling. In this paper, the max pooling technique is employed.

The Deconvolution Layer.

The deconvolution neural network is mainly composed of deconvolution layer and unpooling layer. Deconvolution is the inverse of the convolution. In our model, we associate the forward process of deconvolution with the backward process of the convolution, and realize the deconvolution operation by referring to the reverse derivation formula of the convolution layer and we call it transpose convolution. Therefore, the output of the deconvolution layer can be defined as:

$$ x_{j}^{l} = \varphi \left( {\sum\nolimits_{i = 1}^{{c^{l - 1} }} {x_{i}^{l - 1} } } \right. * \left( {x_{ij}^{l} } \right)^{R} + \left. {b_{j}^{l} } \right) $$

(5)

Where, * represents the convolution operation, R represents the transpose operation of the matrix, and $ \varphi $ is the activation function.

The Unpooling Layer.

Pooling operation can bring the loss of information, which is an irreversible process. However, we can still realize the unpooling operation by referring to the reverse derivation process of the pooling layer. In this paper, we adopted max pooling method. Different from the formula, for the feature map j of lth pooling layer, we need to record the location of the maximum value while computing the pooling result. Then, the output of the lth unpooling layer can be defined as:

$$ x_{j}^{l} = unmp\left( {x_{j}^{l - 1} } \right.\left. {argmax_{j} } \right) $$

(6)

Where, unmp represents the unpooling operation, and $ argmax_{j} $ is the index of the position where the maximum value is.

2.4 Model Optimization

In order to predict traffic flow, parameters need to be trained with training samples and we need a loss function to describe the prediction accuracy of the model. In the training phase, the loss function which is optimized by stochastic gradient descent method of our model is defined as:

$$ L = L_{mse} + L_{reg} + L_{mgdl} $$

(7)

Where $ L_{mse} $ is mean squared error (MSE), which calculates the difference between ground truth and prediction result. $ L_{reg} $ is regularized loss which to avoid the problem of overfitting. $ L_{mgdl} $ measures the gradient loss between the predicted and real values.

3 Experiment and Results

3.1 Dataset Description

The factual FSO data associated with position and time are collected from detectors deployed on Yan’an elevated highways of shanghai in year 2011, as shown in Fig. 3, Yan’an elevated highway is marked in red, which connects the HongQiao transportation hub and the center of the city.

The process of our proposed methodology is illustrated in Fig. 4. Due to the lack of data from March 20 to March 23, there are actually only 361 days of data which are available for the experiment. In addition, there are some abnormal elements and need to be repaired. So the raw data are first preprocessed to remove abnormal elements. And then transformed to generate a spatial-temporal matrix with 3 channels. Because we need to use the previous day’s flow to predict the flow of corresponding next day, thus there are 360 samples.

For the division of training set and test set, we first shuffle the 360 days of samples to disrupt their order. Then the traffic data for the previous day (i.e. the data for experiment) are the i th samples, and the traffic data for the next day (i.e. the labels for experiment) are the (i + 1) th samples (i = 1, 2 … 360). As mentioned in 2.3, the 4th dimension of the input data represents the number of channels. Since the flow, speed and occupancy of the previous day are used to predict the flow of the next day, so here the 4th dimension of the data is 3, and the 4th dimension of the labels is 1. Therefore, for the training set: we select the 1st to 330th data as the training data, and the 1st to 330th label as the training labels; similarly, for the test set, we choose the 331 to 360 data as the test data, and the 331 to 360 label as the test labels. That is, our training set contains 330 samples and our test set contains 30 samples.

3.2 Learning Rate and Network Iteration

We adopt the exponential decay method to set the learning rate. Exponential decay is a more flexible method to set learning rate, which can dynamically adjust the learning rate. In our model, the initial learning rate is set to 1.0, the decay coefficient is set to 0.5, the total number of network iterations is 30000, and the learning rate are calculated every 2000 times to update the original learning rate. With the iteration of network, the learning rate would decrease exponentially, and finally the model tend to be stable and get an optimal value. In our experiment, the value of the loss function L was 0.1545 at the beginning (Where $ {\text{L}}_{\text{mse}} $ = 0.03827, $ {\text{L}}_{\text{reg}} $ = 0.00088, $ {\text{L}}_{\text{mgdl}} $ = 0.11535), and in the process of network iteration, the value of the loss function fluctuated slightly up and down as it decreased and converged gradually. After 30000 iterations, the loss eventually stabilized at 0.0550 (Where $ {\text{L}}_{\text{mse}} $ = 0.00312, $ {\text{L}}_{\text{reg}} $ = 0.00077, $ {\text{L}}_{\text{mgdl}} $ = 0.05111).

3.3 Experimental Environment and Model Configuration

The experiments are conducted on the server with i7-5820 K CPU, 48 GB memory and NVIDIA GeForce GTX1080 GPU. The proposed models and the contrastive models are implemented on TensorFlow framework of deep learning.

The configuration of our proposed model is shown in Table 1.

Table 1. Configuration of ResDeconvNN for traffic flow prediction

Full size table

3.4 Results and Evaluation

We compare our method to multiple existing methods including basic methods (RW), RW is to predict the current value using the last value, and classical methods (ANN) as well as some advanced deep learning methods (DBN, RNN and SAE). The prediction performance is measured by 3 criteria: Mean Relative Error (MRE), Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), MAE and RMSE can evaluate absolute error between the prediction and the reality while MRE can evaluate from the perspective of the relative error. MRE, MAE and RMSE are defined as:

$$ MRE = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {\frac{{\left| {y^{{\prime }} - y} \right|}}{y}} $$

(8)

$$ MAE = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {\left| {y^{{\prime }} - y} \right|} $$

(9)

$$ RMSE = \sqrt {\frac{1}{N}\sum\nolimits_{i = 1}^{N} {\left( {y^{{\prime }} - y} \right)^{2} } } $$

(10)

Where y denotes the prediction, y′ denotes the reality and N denotes the number of samples in test set.

The comparison results of prediction are shown in Table 2. From the table, we find that in terms of MRE, except for DBN, our mode gets the lowest error and value is nearly close to DBN. Further, in terms of the other two criteria, our method performs best when compared with other 5 methods, and we can sum up that our method can achieve better performance than the other existing methods.

Table 2. Results of Experiment

Full size table

Figures 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 and 16 show the predicted and real curves of the randomly selected detector of the Yan’an elevated highway in shanghai. Where Figs. 5, 6, 7, 8, 9 and 10 show the flow fitting curve of the 25th detector on December 12, Figs. 11, 12, 13, 14, 15 and 16 show the flow fitting curve of the 2nd detector on December 28. As shown in these figures, the predicted curves precisely fit to the ground-truth curves except where the peak of the ground-truth curve is obvious.

Figures 17, 18, 19 and 20 show the visualized heat maps transformed of the prediction and the reality. Where Figs. 17 and 18 show the heat map of the predicted and real flow matrix respectively on December 2, Figs. 19 and 20 show the heat map of the predicted and real flow matrix respectively on December 30. Heat map can obviously reveal to us the real situation of traffic flow in a day.

4 Conclusions

In this paper, a model using residual net and deconvolutional neural network is developed to predict long-term traffic flow accurately. The proposed method takes the advantage of residual ideal, which can successfully learn the latent nonlinear traffic flow features. Furthermore, this is also attributed to the correlation of FSO and spatio-temporal correlations of the traffic data. Finally, we apply deconvolutional neural network to decode the flow of the next day accurately. Based on experimental results, our method is robust and obtains better prediction results compared to existing methodologies.

References

Hamed, M.M., Al-Masaeid, H.R., Said, Z.M.B.: Short-term prediction of traffic volume in urban arterials. J. Transp. Eng. 121(3), 249–254 (1995)
Article Google Scholar
Tran, Q.T., Ma, Z., Li, H., Hao, L., Trinh, Q.K.: A multiplicative seasonal ARIMA/GARCH model in EVN traffic prediction. International Journal of Communications, Network and System Sciences 08(04), 43–49 (2015)
Article Google Scholar
Y. Zhang and Y. Xie, “Forecasting of short-term freeway volume with V-support vector machines,” Transp. Res. Rec., J. Transp. Res. Board, vol. 2024, pp. 92–99, 2007
Article Google Scholar
Castro-Neto, M., Jeong, Y., Jeong, M., Han, L.: Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions. Expert Syst. Appl. 36(3), 6164–6173 (2009)
Article Google Scholar
Davis, G.A., Nihan, N.L.: Nonparametric regression and short-term freeway traffic forecasting. J. Transp. Eng.-ASCE 117(2), 178–188 (1991)
Article Google Scholar
Sun, B., Cheng, W., Goswami, P., Bai, G.: Flow-aware wpt k-nearest neighbours regression for short-term traffic prediction. In: 2017 IEEE Symposium on Computers and Communications (ISCC), pp. 48–53. IEEE (2017)
Google Scholar
Xu, Y., Kong, Q.-J., Klette, R., Liu, Y.: Accurate and interpretable Bayesian MARS for traffic flow prediction. IEEE Trans. Intell. Transp. Syst. 15(6), 2457–2469 (2014)
Article Google Scholar
More, R., Mugal, A., Rajgure, S., Adhao, R.B., Pachghare, V.K.: Road traffic prediction and congestion control using artificial neural networks. In: International Conference on Computing, Analytics and Security Trends (CAST), pp. 52–57. IEEE (2016)
Google Scholar
Lv, Y., Duan, Y., Kang, W., Li, Z., Wang, F.Y.: Traffic flow prediction with big data: a deep learning approach. IEEE Trans. Intell. Transp. Syst. 16(2), 865–873 (2015)
Google Scholar
Shin, H., Orton, M.R., Collins, D., Doran, S., Leach, M.: Stacked auto encoders for unsupervised feature learning and multiple organ detection in a pilot study using 4D patient data. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1930–1943 (2013)
Article Google Scholar
Tan, H., Xuan, X., Wu, K, Zhong, Y.: A comparison of traffic flow prediction methods based on DBN. In: 16th COTA International Conference of Transportation, pp. 273–283 (2016)
Google Scholar
Huang, W., Song, G., Hong, H., Xie, K.: Deep architecture for traffic flow prediction: Deep belief networks with multitask learning. IEEE Trans. Intell. Transp. Syst. 15(5), 2191–2201 (2014)
Article Google Scholar
Ma, X., Yu, H., Wang, Y., Wang, Y.: Large-scale transportation network congestion evolution prediction using deep learning theory. PLoS ONE 10(3), e0119044 (2015)
Article Google Scholar
Zhao, Z., Chen, W., Wu, X., Chen, P., Liu, J.: LSTM network: a deep learning approach for short-term traffic forecast. IET Intell. Transp. Syst. 11(2), 68–75 (2017)
Article Google Scholar
Ma, X., Dai, Z., He, Z., Ma, J., Wang, Y., Wang, Y.: Learning traffic as images: a deep convolutional neural network for large-scale transportation network speed prediction. Sensors 17(4), 818 (2017)
Article Google Scholar

Download references

Acknowledgment

This work is supported by National Natural Science Foundation of China (No. 61876218, No. 61573259).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tongji University, Shanghai, China
Di Zang, Yang Fang, Dehai Wang & Zhihua Wei
Department of Transportation Information and Control Engineering, Tongji University, Shanghai, China
Keshuang Tang
Shanghai Lujie Electronic Technology Co., Ltd., Pudong, Shanghai, China
Xin Li

Authors

Di Zang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Fang
View author publications
You can also search for this author in PubMed Google Scholar
Dehai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhihua Wei
View author publications
You can also search for this author in PubMed Google Scholar
Keshuang Tang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Di Zang .

Editor information

Editors and Affiliations

Sun Yat-sen University, Guangzhou, China
Jian-Huang Lai
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xilin Chen
Tsinghua University, Beijing, China
Jie Zhou
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Xi’an Jiaotong University, Xi’an, China
Nanning Zheng
Peking University, Beijing, China
Hongbin Zha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zang, D., Fang, Y., Wang, D., Wei, Z., Tang, K., Li, X. (2018). Long Term Traffic Flow Prediction Using Residual Net and Deconvolutional Neural Network. In: Lai, JH., et al. Pattern Recognition and Computer Vision. PRCV 2018. Lecture Notes in Computer Science(), vol 11257. Springer, Cham. https://doi.org/10.1007/978-3-030-03335-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-03335-4_6
Published: 02 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03334-7
Online ISBN: 978-3-030-03335-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics