Abstract
Background modelling techniques use the time, spatial, intensity and image plane information to detect the objects. These features are integrated to extract the maximum information. The utilization of background techniques are mostly dependent on various parameters that can be learning rate or threshold. High dependency on parameters increase the complexity and make it difficult to control in changing weather conditions. Parameters based techniques do not provide the high efficiency in outdoor computer vision applications where illumination conditions are difficult to predict. This paper presents an algorithm that is based on background modelling with less dependency on parameters and robust to illumination changes. Camera jitter causes the major effect in modelling techniques so camera jitter is also addressed. A new way of separation of shadow from object is also implemented. Performance of the algorithm is compared with other state-of-the-art methods.
Keywords
1 Introduction
Vision-based surveillance has become a fast growing trend in recent times. A good surveillance system has the capability to inform any irregular behavior or movement that occurs on surveillance region. Extraction of information such as traffic flow, traffic density, speed detection and classification of vehicles for a traffic surveillance system is helpful for traffic management authorities to analyze the traffic. These approaches do not need any human assistance for continuous monitoring of video stream. Intelligent transportation systems use the vision and non-vision based sensors for traffic monitoring. Computer vision-based techniques are more popular when compared to the other traditional sensors due to its versatile features [1, 2]. Apparatus of these processes do not require pavement modification of highways during installations. Multiple detection zones and lanes are monitored at the same time. It links information that is gathered from different locations to turn into a wide area surveillance. Vision-based systems have been successfully used in many research applications [3–12]. Sensors of camera and computational power have been improved a lot that results in high reliability and robust performance of the systems [13]. Many techniques are proposed in literature for detection and segmentation of moving objects, such as inter-frame differencing, edge detection, optical flow, thresholding and background subtraction. [14, 15] present the detailed reviews of vision based methodologies for traffic surveillance. Background modelling approach is widely considered in the computer vision systems for traffic surveillance applications [16–19]. The compelling reason to select the background modelling approach in vision based surveillance is that the road conditions are generally remain static so the background is modelled by observing the invariant pixels with regards to time in image plane. All variant pixels represent foreground or moving object which are segmented or categorized based on connected regions. To identify and classify the pixels as variant or invariant is the basic function of background modelling techniques. In general, detection of moving objects from a scene are on just based on intensity has limited scope as compared to spatial domain and texture. Constraints related to spatial domain and texture are helpful to differentiate the information or to perform the comparisons in temporal domain. Background modelling algorithms are highly utilized to provide clues of temporal and spatial domains to manipulate the data at pixels or frame levels. Pixel intensity, correlation with neighboring pixels and edges information are mostly integrated through temporal and spatial domains for defining a pixel as background or foreground. This classification can be effected by various factors. These factors can occur in temporal domain, spatial domain or in both due to any irregular behavior. It is compulsory for parametric techniques of background modelling to update the information in temporal domain which enhances the efficiency of the system. Results of temporal filtering are wrongly estimated if illumination changes are not addressed. Computer vision techniques are enormously used in many indoor and outdoor systems. As compared to indoor, outdoor vision under sunlight and varying climate is more challenging for extracting the information through image and video processing. One of the difficult task is to tackle the poor light or illumination conditions which do not remain constant on image plane due to gradual movement of sun’s position or it is possible that illumination can suddenly change in case of cloud’s movement, fog or rain. These changes of illumination is the one example of that factor which can affect in temporal domain. Camera for capturing the image plane can be installed on the specific height. This height of camera determinants that how much area will be covered for traffic surveillance. If camera is fixed on pole or overhead bridge then wind can shake the pole or high weighted movement on bridge can cause the vibration of bridge. In both cases, camera jitter will occur that disturbs the information related to spatial domain. This problem of camera jitter mainly affect background modelling techniques because these methods rely on update of parameters for each location of pixel. Vibrations result in wrongly updating parameters between two frames for a particular location. Appearance of shadows with moving objects can muddle the information of both temporal and spatial domains. As a result, it will be difficult to categorize the pixels as background or foreground in shaded regions. A robust vision-based surveillance has the capability to detect and track the moving or stopped vehicles under illumination changes in outdoor, camera jitter and shadow appearances (Fig. 1).
2 Algorithm Description
This paper presents a pixel based Tempo-Spatial Compactness Based Background Subtraction (TSCBS) algorithm which is more robust to gradual and sudden illumination. The major contributions of this paper are: (1) a new approach of background subtraction by using the temporal domain with spatial constraints; (2) a new way of determining and normalizing the camera jitter; (3) a simple geometric relations-based shadow extraction approach for vehicles.
3 Background Modeling
This TSCBS method is a pixel based approach which processes each candidate pixel and define as a background pixel or foreground pixel. Two pre-processing steps are also performed before the operation of TCBS which are discussed in Sect. 4. 0–255 range of gray scale is adequate for bit mapping in our approach. A 9 × 9 matrix is used for this purpose which is centered at candidate pixel and adjacent pixels which are positioned according to their location as shown in (1). Current image or frame (\( Curr\;\text{Im} \)) and background image (\( Back\;\text{Im} \)) is used in this operation of TSCBS. Following matrix for current image is named as Current Candidate Matrix (CCM) and Background Candidate Matrix (BCM) is named to matrix of background image.
All those pixels which are considered as background pixels are updated in background image so this image is used as background modelling. CCM is multiplied with a two dimensional Gaussian function (standard deviation = 1.2, size = 9) which has named GCCM. Same process is repeated for BCM and named as GBCM. Purpose of this Gaussian multiplication is given weight to positions in the matrix so center position has more weight as compared to adjacent positions. Following steps make our approach to robust against gradual illumination change and sudden illumination change. When any illumination change occurs then intensity values of pixels are increased or decreased so only rely on intensity in temporal domain does not lead to efficient result. Relationship between adjacent pixels remains invariant (approximately) so following rule is applied to deal with illumination changes. So, derivation is applied on GCCM according to (2) which is named as DGCM and same process is applied on GBCM which is named as DGBM (3). Limited scale (0–255) of RGB and the subsequent conversion to gray scale, along with other external factors make some minor change in assumed invariant relationship (spatial domain) of adjacent pixels between two consecutive frames.
To address this issue, gray scale is further reduced to 1–51 range (255/5) so if spatial relationship between two adjacent pixels varies within five intensity values at current frame as compared to previous frame due to aforementioned factors then reduced scale assigns the same value to these two adjacent pixels. Then, DGCM is divided by DGBM and named as DGM as shown in (4).
Constant \( \alpha = \, 0. 1 \) is used to avoid the indeterminate form. Those values in DGM which are between the limit of 0.98 to 1.02 are reassigned to 1 (foreground pixel) and remaining are 0 (background pixel), according to (5–6). This limit allows little variation of background pixel between two consecutive frames.
If 76 (95 %) out of 81 (9 × 9 Matrix) values in DGM are 1 then the candidate pixel will be considered as background pixel otherwise foreground pixel, according to (7–8). Background pixel is represent with 0 and foreground pixel with 1 in classified image.
This process is repeated for every pixel. Stopped vehicle does not become the part of background model because this approach does not use the learning parameters which change the static foreground region to background region after a specific time. So, a stopped vehicle will again and again appear as foreground region for upcoming frames.
4 Pre-processing and Camera Jitter
A 2D normalized Gaussian (9) with standard deviation 4 and size 9 is applied as a pre-processing step to remove the high frequency noise (10).
This approach is based on temp-spatial relation so if the camera jitter is occurred between two consecutive frames then it disturbs the utilizing features of temporal domain and spatial domain. This jittering of camera between two frames remains very little for a capturing video which has frame rate 25 frames/second so disturbance between adjacent pixels in spatial relation can be measured. Spatial disturbance in camera jitter is a process of shifting of pixels to new locations.
These new positions of pixels are not located as too far due to minor impact of camera jitter between two consecutive frames. To cope with this issue, each position of CCM is used as candidate pixel and new CCM is created for that candidate pixel to perform the operation of TSCBS. So, there will be 81 operations of TSCBS and position which gives the maximum result in DGM is selected as candidate pixel and result of equation is updated in background image. In this way, best relation is found in nearby locations to deal with camera jitter. Figure 2 further elaborates the procedure to deal with camera jitter.
5 Foreground Segmentation
Classified image according to (8) classifies the background pixels as 0 and foreground pixels as 1. In this section, all the connected regions are segmented together by using the Matlab built-in functions. Morphological operation is applied to fill the holes in the segmented regions. Minimum number of pixels are defined to consider a vehicle by using the learning period in start so if a foreground segmented region has less than pixels as compared to defined criteria then region is considered as ghost foreground and discarded. Also, background image is updated in this discarded region. Figure 3 describes the process of foreground segmentation. A true foreground region after vehicle tracking is shown in rectangular box for final detected image (Fig. 4).
6 Vehicle Tracking
All the foreground regions which are expanded in 60 pixels of length, 40 pixels of width and assigned the value of 1 in tracking image. This is done in order to deal with the possibility of vehicle or object movement in any direction within two consecutive frames. Also, all the entrances of image plane are assigned the value 1 so that the new vehicle can be allowed in scene, as shown in Fig. 5. Classified image of (8) is multiplied with this tracking image, according to (11). In this way, ghost foregrounds are removed and vehicle is tracked in predicted region. Main reasons of this process is to eliminate the ghost regions and searching of objects in surroundings of those regions which were detected as vehicles in previous frame.
7 Shadow Extraction
A freeway video is used to apply and evaluate the shadow removal technique. Following procedure is applied on each detected foreground region for determination and extraction of shadow. Shadows which appear vertically along the side windows of vehicle are extracted by using lane marking of the road. Three conditions are used to detect the shadow at front or back side of the vehicle. Here, we assumed that only one side out of front and back will be under the shadow. If front side of the vehicle is shadow free then maximum intensity value in initial five rows of evaluating foreground region is named as Threshold1 and used for comparison, as shown in Fig. 6. In contrast to this, if back side of the vehicle is shadow free then maximum intensity value in last five rows of evaluating foreground region is used as Threshold1.
Foreground pixels are checked from bottom to top in case 1 (12) and top to bottom in case 2 (13). In a detected region which have vehicle and possible shadow, region of shadow does not occupy the more than 50 % area so therefore searching region is limited to half by using the 0.5 in (12–13). Following two possibilities are used to determine the associated shadow.
-
Possibility 1: Intensity value same or higher than threshold1 in Shadow_SearchingRegion
-
Possibility 2: Change of intensity value in Shadow_SearchingRegion
If any possibility of these two is satisfied than that point where this possibility is occurred used as cut-off point and all the searched area discarded from foreground region. Gray scale is reduced to 1–51 range (255/15) in this step so that the major change in intensity values are determined which leads to aforementioned possibilities. At this stage, our approach cannot separate the occluded vehicles and considers as one object (Fig. 7).
8 Experiment Results
This paper presents a new background subtraction technique which processes and classifies each pixel either as background pixel or foreground pixel and then updates the background model. Decision is based on that how strong compactness of a candidate pixel with adjacent pixels exist in temporal domain and spatial domain. Spatial relationship of pixels makes it possible to handle both gradual illumination change and sudden illumination change without the dependency of pixel intensity. Algorithm also handles the camera jitter by selecting the best optimum result. A geometric based technique extracts the shadow from foreground region by using the features of vehicle. This algorithm was developed by using a 1440 × 1080 size freeway video but scope is global so applied for various sizes as shown in Table 1. ChangeDetection.net (CDNet 2014) benchmark dataset [20] is used to compare the result of our method in terms of F-Measure [21] with different types of MoG based background subtraction algorithms. F-Measure is the weighted average of recall and precision. Precision is the fraction of detected pixels that belong to the object by comparison with the ground truth, while recall is associated with the fraction of pixels missed. In this process, detected result is compared with ground truth which is provided in [20]. F-Measure is computed by using the software named as BMC Wizard [22]. MoG (S&G) – Stauffer and Grimson’s MoG algorithm [23], MoG (ZZ) – Zoran Zivkovic’s MoG algorithm [24], MoG (K&B) – KaewTraKulPong and Bowden’s MoG algorithm [25], Regularized RMoG [21]. Following five datasets are selected from [20] to evaluate the performance of our algorithm.
We believe that our approach shows the good potential to deal with traffic surveillance applications. Numerous challenges which can affects the true segmentation of the vehicles are sorted-out in our suggested approach. Requirements and utilization of this system are independent especially it does not require any pre-defined threshold or learning time which makes it fast, reliable and robust for all kinds of traffic roads.
References
Kanhere, N.K., Pundlik, S.J., Birchfield, S.T.: Vehicle segmentation and tracking from a low-angle off-axis camera. In: Proceeding IEEE Conference Computer Vision Pattern Recognition, vol. 2, pp. 1152–1157 (2005)
Pang, C.C.C., Lam, W.W.L., Yung, N.H.C.: A method for vehicle count in the presence of multiple-vehicle occlusions in traffic images. IEEE Trans. Intell. Trans. Syst. 8(3), 441–459 (2007)
Premaratne, P., Ajaz, S., Premaratne, M.: Hand gesture tracking and recognition system for control of consumer electronics. In: Huang, D.-S., Gan, Y., Gupta, P., Gromiha, M. (eds.) ICIC 2011. LNCS (LNAI), vol. 6839, pp. 588–593. Springer, Heidelberg (2012)
Premaratne, P., Nguyen, Q., Premaratne, M.: Human computer interaction using hand gestures. In: Huang, D.-S., McGinnity, M., Heutte, L., Zhang, X.-P. (eds.) ICIC 2010. CCIS, vol. 93, pp. 381–386. Springer, Heidelberg (2010)
Premaratne, P., Safaei, F., Nguyen, Q.: Moment invariant based control system using hand gestures. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCIS, vol. 345, pp. 322–333. Springer, Heidelberg (2006)
Premaratne, P., Premaratne, M.: Image Matching using Moment Invariants. Neurocomputing 137, 65–70 (2014)
Premaratne, P., Ajaz, S., Premaratne, M.: Hand gesture tracking and recognition system using Lucas-Kanade algorithm for control of consumer electronics. Neurocomputing 116(20), 242–249 (2013)
Premaratne, P., Nguyen, Q.: Consumer electronics control system based on hand gesture moment invariants. IET Comput. Vis. 1, 35–41 (2007)
Yang, S., Premaratne, P., Vial, P.: Hand gesture recognition: an overview. In: 5th IEEE International Conference on Broadband Network and Multimedia Technology (2013)
Zou, Z., Premaratne, P., Premaratne, M., Monaragala, R., Bandara, N.: Dynamic hand gesture recognition system using moment invariants. In: ICIAfS, IEEE Computational Intelligence Society, Colombo, Sri Lanka, pp. 108-113 (2010)
Herath, D.C., Kroos, C., Stevens, C.J., Cavedon, L., Premaratne, P.: Thinking head: towards human centred robotic. In: 2010 11th International Conference on Control, Automation, Robotics and Vision (ICARCV), Singapore, pp. 2042–2047 (2010)
Minge, E.: Evaluation of Non-intrusive Technologies for Traffic Detection. Minnesota Department of Transportation, Office of Policy Analysis, Research and Innovation, SRF Consulting Group, US Department of Transportation, Federal Highway Administration (2010)
Morris, B., Trivedi, M.: Robust classification and tracking of vehicles in traffic video streams. In: IEEE Conference Intelligent Transportation Systems, pp. 1078–1083 (2006)
Kastrinaki, V., Zervakis, M., Kalaitzakis, K.: A survey of video processing techniques for traffic applications. Image Vis. Comput. 21, 359–381 (2003)
Buch, N., Velastin, S., Orwell, J.: A review of computer vision techniques for the analysis of urban traffic. IEEE Trans. Intell. Transp. Syst. 12, 920–939 (2011)
Mandellos, N.A., Keramitsoglou, I., Kiranoudis, C.T.: A background subtraction algorithm for detecting and tracking vehicle. Expert Syst. Appl. 38, 1619–1631 (2011)
Lima Azevedo, C., Cardoso, J., Ben-Akiva, M., Costeira, J.P., Marques, M.: Automatic vehicle trajectory extraction by aerial remote sensing. Presented at the 16th Euro Working Group Transportation, Procedia Soc. Behav. Sci., Porto, Portugal (2013)
Sánchez, A., Nunes, E., Conci, A.: Using adaptive background subtraction into a multilevel model for traffic surveillance. Integr. Comput. Aided Eng. 19(3), 239–256 (2012)
Unzueta, L., et al.: Adaptive multicue background subtraction for robust vehicle counting and classification. IEEE Trans. Intell. Transp. Syst. 13(2), 527–540 (2012)
Wang, Y., Jodoin, P.-M., Fatih, P., Janusz, K., Yannick, B., Prakash, I.: Change Detection 2014 Benchmark (2014). http://wordpress-jodoin.dmi.usherb.ca/results2014/
Varadarajan, S., Wang, H., Miller, P., Zhou, H.: Fast convergence of regularised region-based mixture of Gaussians for dynamic background modelling. Comput. Vis. Imaging Underst. (CVIU) 136, 45–58 (2015)
Stauffer, C., Grimson, W.E.L.: Adaptive background mixture models for real-time tracking. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2246–2252 (1999)
Zivkovic, Z.: Improved adaptive gaussian mixture model for background subtraction. In: ICPR 2004, Proceedings of the 17th International Conference on Pattern Recognition, vol. 2, pp. 28–31 (2004)
KaewTraKulPong, P., Bowden, R.: An improved adaptive background mixture model for real-time tracking with shadow detection. In: Remagnino, P., Jones, G.A., Paragios, N., Regazzoni, C.S. (eds.) Advanced Video Based Surveillance Systems, pp. 135–144. Springer, New York (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Iftikhar, Z., Premaratne, P., Vial, P., Yang, S. (2016). Tempo-Spatial Compactness Based Background Subtraction for Vehicle Detection and Tracking. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2016. Lecture Notes in Computer Science(), vol 9771. Springer, Cham. https://doi.org/10.1007/978-3-319-42291-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-42291-6_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42290-9
Online ISBN: 978-3-319-42291-6
eBook Packages: Computer ScienceComputer Science (R0)