Abstract
In this paper, we develop a method to transform a sequence of images to a sequence of events. Optical flow, which is the vector fields of pointwise motion computed from monocular image sequences, describes pointwise motion in an environment. The method extracts the global smoothness and continuity of motion fields and detects collapses of the smoothness of the motion fields in long-time image sequences using transportation of the temporal optical flow field.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
Optical flow is fundamentally pointwise local motion on an imaging plane (retina) [1,2,3,4]. This pointwise motion is low-level information for perception of global motion [5,6,7,8, 34, 35]. In this paper, we introduce a model for the extraction of queues for perception of global motion from the optical flow fields using the temporal transportation [9, 10] of optical flow fields along times.
Flow vectors locally extract point correspondences between a pair of successive images on the retina [1]. These local correspondences are applied to motion tracking because temporal evolution of a correspondence describes the temporal trajectory of a point in a video stream of images.
In Field VI of the human brain for motion perception, independent components of optical flow field on the retina [11, 36] are transmitted from the medial superior temporal area (MST) to the middle temporal area (MT) [11,12,13,14,15,16]. Then, pointwise local motion is transformed to intermediate-level information for motion cognition. Flying insects also control motion using optical flow. Honey bees navigate using optical flow [17,18,19,20,21]. The compound eyes [18, 19, 38, 39] of insects perceive spherical optical flow fields [38, 39]. The divergence on the spherical optical flow filed indicates the direction of flying in the global environment [19]. Disparities of optical flow fields between the fields on the left and right hemi-spheres control the direction of fling in the local environment. Therefore, temporal optical flow fields generated on the spherical retina of an omnidirectional camera system provide queue for navigation [39]. These geometric properties of optical flow fields on the spherical retina are the basis of insect-inspired visual navigation. Geometrical processing of optical flow fields on the spherical retina yields syntactical information for robot navigation [37,38,39].
Autonomous vehicles navigate using images captured by a planar retina [40, 41]. We have develop an algorithm for the generation of motion semantics from optical flow fields generated on a planar retina, which is a common imaging process for non-compound eye systems.
In the previous paper, we introduced a model for the extraction of queue for recognising global spatial motion from scene flow fields using the temporal transportation of the vector field [40]. As a comparative study with our previous results, we apply the same idea to the optical flow field on a planar retina. This comparative study implies that for global motion perception, the optical flow fields, which is computed from monocular image sequence, possess the similar properties with those of the scene flow fields.
2 Metric for Optical Flow Fields
Setting \(\varvec{u}(\varvec{x})=(u(\varvec{x}),v(\varvec{x}))^\top \) for \(\varvec{x}=(x,y)^\top \in \mathbf{R}^2\) to be the optical flow field on two-dimensional Euclidean space, the directional histogram [22] of \(\varvec{u}(\varvec{x})\) is obtained by integration of the magnitude of \(\varvec{u}(\varvec{x})\) in the region of interest (ROI), that is,
where \(\varOmega (\varvec{x})\), \(|\varOmega (\varvec{x})|\) and \(\varvec{x}\in \mathbf{R}^2\) are the ROI, the area measure of the ROI and the reference point of the ROI, respectively.
The distance between two optical flow fields \(\varvec{u}(\varvec{x})\) and \(\varvec{v}(\varvec{x})\) in the region \(\varLambda \) is defined as
where
for \(c_{\varvec{x}}(\theta ,\theta ')\ge 0\), using the transportation [9] of the directional histograms [22] of the fields.
For the discrete optical flow field \(\varvec{u}_{mn}=(u_{mn},v_{mn})^\top \) at the point \((m,n)^\top \) on discrete plane \(\mathbf{Z}^2\), let \(\{f_{mn}(p)\}_{p=0}^{N-1}\) be the cyclic directional histogram for the directions \(\varvec{\omega }_{N}=(\cos 2\pi \frac{p}{N} \sin 2\pi \frac{p}{N})^\top \). For the discrete cyclic histograms \(F_{mn}=\{f_{mn}(i)\}_{i=0}^{N-1}\) and \(G_{mn}=\{g_{mn}(i)\}_{i=0}^{N-1}\), such that \(f_{mn}(i+N)=f_{mn}(i)\) and \(g_{mn}(i+N)=g_{mn}(i)\), we define the transportation between the histograms as
Setting \(A_{ij}^{mn}(k)=|f_{mn}(i)-g_{mn}(j-k)|^2\), the minimisation of \(J_{mn}(k)\)
with the constraints of Eq. (5) is solved by linear programming for each \(k=0,1,\cdots , N-1\). Then, we define the metric between discrete vector fields \(\varvec{u}_{mn}\) and \(\varvec{v}_{mn}\) in the ROI \(\varLambda \) on the two-dimensional discrete plane \(\mathbf{Z}^2\) as
Figure 1 shows the process of transportation of a pair of circular histograms f and g. (a) and (b) show two probabilistic distribution on a circle. and their samples on the circle. The top row in (c) shows the residual values after the maximum flows moved from each bin of P to bins of Q. The bottom row in (c) shows the flows that moved from P to Q as the maximum flow between histograms.
3 Symbolisation of Global Motion
The temporal trajectory of the distance between a successive pair of optical flow fields \(\varvec{u}(\varvec{x},t+1)\) and \(\varvec{u}(\varvec{x},t)\) of the spatiotemporal image \(f(\varvec{x},t)\) is
Setting \(H_t(t;f)\) and \(H_{tt}(t;f)\) to be the first and second derivatives, respectively, of the histogram H(t : f), we define the interval \(I_i=[t_i,t_{i+1}]\) along the time axis t using a pair of successive points for extremals \(H_{tt}(t;f)=0\). Using the \(l_1\) linear approximation of H(t; f) such that
which minimises the criterion
where \(t_{i(j)}\in I_i\), we allocate signs for spatial motion.
From the sign of \(a_i\), we define the symbols of motion of \(f(\varvec{x},t)\) in the interval \(I_i=[t_i,t_{i+1}]\) as \(\{\nearrow , \rightarrow , \searrow \}\), where
4 Numerical Examples
For numerical experiments, three image sequences from left images of KITTI-Scene Flow Dataset2015 [44] are selected.
For event extraction using Eq. (11), we employ S(H(t; f)) and \(S(\log H(t;f))\), since \(S(\log H(t;f))\) allows us to detect symbols from small perturbations of H(t; f).
Table 1 lists statuses of the images. Figure 2 shows the temporal trajectories of the transportation of the vector fields. Table 2 shows the event strings extracted by linear approximation by using Eq. (10) and symbolisation by using Eq. (11). These experiments show that the algorithm extracts symbol strings, which describe the states in front of driving cars in various environments.
5 Dictionary Generation
Tables 4 and 3 show status and speed of objects in synthetic video sequences. Figure 3 shows top views of geometric configurations of objects in synthetic video sequences. Table 5 shows combinations of events as symbol strings captured by vehicle mounted camera in a synthetic world. In Figs. 4, 5 and 6, (a) and (b) show a frame view of the image sequence and its optical flow field, respectively. In Fig. Figures 4, 5 and 6, (c), \(\log H(t;f)\) and \(\bar{H}(t;f)\) are the blue curve and red polygonal curve, respectively.
Tables 6 and 7 show the strings \(S(\log H(t;f)) \) and S(H(t; f)) detected by the algorithm using \(\log H(t;f) \) and H(t; f), respectively. Since both \(\log H(t;f) \) and H(t; f) are approximated by polygonal curves for the extraction of symbol strings, events are described by using \(\vee \), \(\wedge \) and M based on the semi-local shapes of the curves.
Five pairs 1 and 2, 5 and 7, 6 and 8, 13 and 14, and 15 and 16 provide same environments with and without oncoming vehicles. These examples show that pairs 1 and 2, 5 and 7, 6 and 8, 13 and 14, possess same properties for symbol string. Pairs 1 and 2, and 13 and 14 imply that the temporal transportation of optical flow vector fields achieves recognition of oncoming vehicles. The algorithm detects acceleration and deceleration of the ego-vehicle.
The results observed in a pair 7 and 8 show that for the detection of the directions of turning additional information is required, since the optical flow fields for left and right turning possess the same statistical properties.
The difference of the results observed in a pair 15 and 16 depends on the background properties caused by trees, since the correspondences between a pair of natural scene contains ambiguities. Moreover, for the detection of oncoming vehicle, the pointwise optical flow vectors are required.
The algorithm does not distinguish left and right turns, since the time trajectory of the distance between to field possess the shape profiles. However, it is possible to detect the stating frame of the turns, since the symbol \(\wedge \) is detected on the frame. For detection of the straight motion from real sequences, symbol strings both S(H(t; f)) and \(S(\log H(t;f))\) are necessary, since in real sequences of the straight motion temporal local-perturbation of the optical flow vectors are detected. This local-perturbation derives perturbations on H(t; f) and \(\log H(t;f)\).
6 Discussions
For the function \(f(\varvec{x},t)\) defined in \(\mathbf{R}^n\), the total derivative with respect to the variable t is
Mathematically, optical flow is the solution of the linear equation \(\frac{df}{dt}=0\). This inconsistent linear equation is solved by regularisation [3, 23] and using local geometric constraints [1, 33].
In the medical volumetric-image analysis, for instance, the motion analysis of the moving organs, we are required to deal with volumetric images defined in three-dimensional Euclidean space \(\mathbf{R}^3\). In computer vision, optical flow is usually computed from planar images.
For motion analysis with range data, setting f(x, y, t) to be a grey-label image, we deal with the following system of equations
where \(g(x,y)=h(x,y,t)-z\) for depth z of the temporal range image h(x, y, t) [24].
For colour and multi-channel images, the system of equations
is derived from the k-channel images [25, 26].
For the left image \(f(x_l,y_l,t)\) and the right image \(g(x_r,y_r,t)\) of temporal stereo-pair images, the system of equations
derive the optical flow vectors \(\varvec{u}_l=(u_l,v_l)^\top \) and \(\varvec{u}_r=(u_r,v_r)^\top \) on the left and right images, respectively. After establishing correspondences between \(\varvec{x}_l\) and \(\varvec{x}_r\) and between \(\varvec{x}_l+\varvec{u}_l\) and \(\varvec{x}_r+\varvec{u}_r\), the stereo reconstruction algorithm computes scene flow \(\dot{\varvec{X}}\) in the space using disparities between temporal stereo-pair images. Estimation of correspondences is established by solving system of equations
for the displacement \(\varvec{d}=(d,0)^\top \) and \(\varvec{d}'=(d'_1,d'_2)^\top \).
For images on a manifold \(\mathcal {M}\), the optical flow vector filed is the solution of the equation
where \(\nabla _{\mathcal {M}}\) is the gradient operation on the manifold. For example, if \(\mathcal {M}\) is the unit sphere \(S^2\) in three-dimensional Euclidean space \(\mathbf{R}^3\), the gradient operation is
Equation (18) allows us to compute the optical flow vectors on a spherical retina, which is the mathematical model of compound eyes.
In this paper, we have shown a method to extract intermediate queues for motion perception from optical flow on flow fields on the plane [34,35,36, 41]. It is possible to apply the event extraction method based on the transportation of optical flow fields for scene flow [40] and the optical flow field on non-planar retina [38]. In reference [38], we have shown a method to extract intermediate queue for motion perception from optical flow fields on a sphere.
Moreover, we have developed a method to decompose the optical flow fields [27, 28] on the surface of the moving organs [42] employing three-dimensional optical flow computation.
The optical flow fields between a pair of successive images in a sequence provide queues for image alignment. Aligning images along the time axis achieves the tracking of images in a video sequence [2]. Therefore, tracking is a sequential alignment. Multiple alignment in a space by deformation fields derives the deformation-based average of images.
For a collection of images \(\{f_i(\varvec{x})\}_{i=1}^m\), setting \(\varvec{u}_i(\varvec{x})\) to be the deformation fields, the minimiser f of the energy functional
with appropriate constraints derives the deformation-based average of the collection of images \(\{f_i(\varvec{x})\}_{i=1}^m\) [29, 43]. The deformation-based average was applied for motion analysis of a volumetric beating-heart sequence.
The directional gradient of an image \(f(\varvec{x})\) at the point \(\varvec{x} = (x,y)^{\top }\) in the direction of \(\varvec{\omega } = (\cos \theta , \sin \theta )^{\top }\) is computed as \(\varvec{\omega }^\top \nabla f\). The directional gradient evaluates the steepness, smoothness and flatness of \(f(\varvec{x})\) along the direction of vector \(\varvec{\omega }\). Setting F to be a injective mapping for gradient, the gradient-based feature constructed by F satisfies the relations \(F(\nabla f)=0\) and \(F(\nabla f)=F(\nabla g)\) if \(f=0\) and \(f=g+a\) for constant a, respectively.
The census transform is computed by
where u is the Heaviside function. The directional histogram (DH) is computed by
such that \(h_{\varvec{x}}(\theta +2\pi ) = h_{\varvec{x}}(\theta )\), where \(\varvec{x}\in \mathbf{R}^2\) is the centre of the region \(\varOmega (\varvec{x})\). The vector \(\varvec{x}\) is used as the index of the DH. We call \(h_{\varvec{x}}(\theta )\) the HoG signature of f.
The census transform encodes local geometric property of the gradient vector fields as scalar function. The HoG signature encodes semi-global geometric properties of the gradient vector field as a scalar function. These encoded features are used for matching of images and motion detection [30]. Our transform in Eq. (1) encodes the global geometric properties of motions on the retina as a scalar function using optical flow vector fields. Then, using this encoded motion vector field, we define a metric between a pair of motion fields for the extraction of events on video streams.
Since \(\varvec{v}=\frac{-f_t}{|\nabla f|^2}\nabla f\) is a solution of \(\frac{df}{dt}=0\), the optical flow vector is expressed as \(\varvec{u}=\frac{-f_t}{|\nabla f|^2}\nabla f+\alpha \nabla f^\perp \) for an appropriate scalar \(\alpha \), where \(\nabla f^\top (\nabla f^\perp )=0\). If the motion perpendicular to the gradient of the edges of the segments is small, that is, \(\alpha \) is small, \(\varvec{u}\sim \mu \nabla f\) for an appropriate real number \(\mu \). This relation between the optical flow filed and the gradient field implies that events in the image stream detected by the features encoded by Eq. (1) are those caused by the temporal fluctuations of the gradient of the foreground.
In ref. [32] the on-line algorithm for detection of a polygonal curve from a time signal of a string of conversation dialogs. was proposed based on the randomized Hough transform. This algorithm is pre-processing for the construction of the syntactic trees of conversation dialogs. The event detection from video sequence is an extension of syntactic analysis of dialog signals to image sequences.
In pedestrian detection, annotated data for designing classifier is generated using artificially generated virtual world [31]. It is possible to extend the idea for event detection from image observed by vehicle mounted camera system. We generated symbol sequences from events in virtual world. Events detected from generated symbol strings coincide with the events detected from real world test data sequences.
7 Conclusions
We proposed a method for the symbolisation of the temporal transition of environments using statistical analysis of the flow field. The algorithm allows us to interpret a sequence of images as a string of events.
A machine can control a car to avoid incidents by detecting abnormalities using event strings stored in a dictionary. The symbolisation of temporal optical fields is suitable for the generation of entries in such a dictionary.
We have introduced a framework for syntactical interpretation of dynamic scenes using the temporal transportation of the optical flow fields. The future work for us is to derive semantics of the motion fields from strings of symbols. Multiscale image analysis of the dynamic scenes provides hierarchies of the motions [32] in the scenes from temporal local deformation to global fluctuations. Therefore, these hierarchies of motions would define the syntactic structure and semantic meaning of dynamic scene. The optical flow fields are the important queries for linguistic analysis of the dynamic scene.
References
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of IJCAI 1981, pp. 674–679 (1981)
Tomasi, C., Kanade, T.: Detection and tracking of point features. Int. J. Comput. Vis. 9, 137–154 (1991)
Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artif. Intell. 17, 185–203 (1981)
Hwang, S.-H., Lee, U.-K.: A hierarchical optical flow estimation algorithm based on the interlevel motion smoothness constraint. Pattern Recogn. 26, 939–952 (1993)
Vaina, L.M., Beardsley, S.A., Rushton, S.K. (eds.): Optic Flow and Beyond. SL, vol. 324. Springer, Dordrecht (2004). https://doi.org/10.1007/978-1-4020-2092-6
Duffy, C.J.: Optic flow analysis for self-movement prerception. Int. Rev. Neurobiol. 44, 199–218 (2000)
Lappe, M., Bremmer, F., van den Berg, A.V.: Perception of self-motion from visual flow. Trends Cogn. Sci. 3, 329–336 (1999)
Calow, D., Krüger, N., Wörgötter, F., Lappe, M.: Statistics of optic flow for self-motion through natural scenes. In: Ilg, U., Bülthoff, H.H., Mallot, A.H., et al. (eds.) Dynamic Perception, pp. 133–138. IOS Press (2004)
Villani, C.: Optimal Transport. Old and New. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9
Rabin, J., Delon, J., Gousseau, Y.: Transportation distances on the circle. JMIV 41, 147–167 (2011)
Sabatini, S.P.: A physicalist approach to first-order analysis of optic flow fields in extrastriate cortical areas. In: ICANN 1999 (1999)
Park, K.-Y., Jabri, M, Lee, S.-Y., Sejnowski, T.J.: Independent components of optical flows have MSTd-like receptive fields. In: Proceedings of the 2nd International Workshop on ICA and Blind Signal Separation, pp. 597–601 (2000)
Wurtz, R.: Optic flow: a brain region devoted to optic flow analysis? Curr. Biol. 8, R554–R556 (1998)
Greenlee, M.: Human cortical areas underlying the perception of optic flow: brain imaging studies. Int. Rev. Neurobiol. 44, 269–292 (2000)
Andersen, R.A.: Neural mechanisms of visual motion perception in primates. Neuron 18, 865–872 (1997)
Newsome, W.T., Baré, E.B.: A selective impariment of motion perception following lesions of the middle temporal visual area (MT). J. Neurosci. 8, 2201–2211 (1988)
Pan, C., Deng, H., Yin, X.-F., Liu, J.-G.: An optical flow-based integrated navigation system inspired by insect vision. Biol. Cybern. 105, 239–252 (2011)
Franceschini, N.: Visual guidance based on optic flow: a biorobotic approach. J. Physiol. Paris 98, 281–292 (2004)
Srinivasan, M.V.: Honeybees as a model for the study of visually guided flight, navigation, and biologically inspired robotics. Physiol. Rev. 91, 413–460 (2011)
Serres, J.R., Ruffier, F.: Optic flow-based collision-free strategies: from insects to robots. Arthropod Struct. Dev. 46, 703–717 (2017)
Sobey, P.J.: Active navigation with a monocular robot. Biol. Cybern. 71, 433–440 (1994)
Fisher, N.I.: Statistical Analysis of Circular Data. Cambridge University Press, Cambridge (1993)
Weickert, J., Schnörr, C.: Variational optic flow computation with a spatio-temporal smoothness constraint. J. Math. Imaging Vis. 14, 245–255 (2001)
Spies, H., Jähne, B., Barron, J.L.: Range flow estimation. Comput. Vis. Image Underst. 85, 209–231 (2002)
Barron, J.L., Klette, R.: Quantitative color optical flow. In: Proceedings of ICPR 2002, vol. 4, pp. 251–255 (2002)
Golland, P., Bruckstein, A.M.: Motion from color. Comput. Vis. Image Underst. 68, 346–362 (1997)
Kirisits, C., Lang, L.F., Scherzer, O.: Decomposition of optical flow on the sphere. GEM Int. J. Geomathematics 5, 17–141 (2014)
Lukas, F., Lang, L.K., Scherzer, O.: Optical flow on evolving sphere-like surfaces. Inverse Probl. Imaging 11, 305–338 (2017)
Rumpf, M., Wirth, B.: Variational methods in shape analysis. In: Scherzer, O. (ed.) Handbook of Mathematical Methods in Imaging, pp. 1819–1858. Springer, New York (2015). https://doi.org/10.1007/978-1-4939-0790-8_56
Hafner, D., Demetz, O., Weickert, J.: Why is the census transform good for robust optic flow computation? In: Kuijper, A., Bredies, K., Pock, T., Bischof, H. (eds.) SSVM 2013. LNCS, vol. 7893, pp. 210–221. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38267-3_18
Vázquez, D., López, A.M., Marín, J., Ponsa, D., Gómez, D.: Virtual and real world Adaptation for pedestrian detection. IEEE PAMI 36, 797–809 (2014)
Imiya, A.: Detection of piecewise-linear signals by the randomized Hough transform. Pattern Recogn. Lett. 17, 771–776 (1996)
Imiya, A., Iwawaki, K.: Voting method for the detection of subpixel flow field. Pattern Recognit. Lett. 24, 197–214 (2003)
Ohnishi, N., Imiya, A.: Featureless robot navigation using optical flow. Connect. Sci. 17, 23–46 (2005)
Ohnishi, N., Imiya, A.: Appearance-based navigation and homing for autonomous mobile robot. Image Vis. Comput. 31, 511–532 (2013)
Ohnishi, N., Imiya, A.: Independent component analysis of optical flow for robot navigation. Neurocomputing 71, 2140–2163 (2008)
Alibouch, B., Radgui, A., Rziza, M., Aboutajdine, D.: Optical flow estimation on omnidirectional images: an adapted phase based method. In: Elmoataz, A., Mammass, D., Lezoray, O., Nouboud, F., Aboutajdine, D. (eds.) ICISP 2012. LNCS, vol. 7340, pp. 468–475. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31254-0_53
Torii, A., Imiya, A., Sugaya, H., Mochizuki, Y.: Optical flow computation for compound eyes: variational analysis of omni-directional views. In: De Gregorio, M., Di Maio, V., Frucci, M., Musio, C. (eds.) BVAI 2005. LNCS, vol. 3704, pp. 527–536. Springer, Heidelberg (2005). https://doi.org/10.1007/11565123_51
Mochizuki, Y., Imiya, A.: Pyramid transform and scale-space analysis in image analysis. In: Dellaert, F., Frahm, J.-M., Pollefeys, M., Leal-Taixé, L., Rosenhahn, B. (eds.) Outdoor and Large-Scale Real-World Scene Analysis. LNCS, vol. 7474, pp. 78–109. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34091-8_4
Kato, T., Itoh, H., Imiya, A.: Motion language of stereo image sequence. In: CVPR Workshops, pp. 1211–1218 (2017)
Ohnishi, N., Mochizuki, Y., Imiya, A., Sakai, T.: On-line planar area segmentation from sequence of monocular monochrome images for visual navigation of autonomous robot. In: VISAPP 2010, pp. 435–442 (2010)
Kameda, Y., Imiya, A.: The William Harvey code: mathematical analysis of optical flow computation for cardiac motion. In: Rosenhahn, B., Klette, R., Metaxas, D.N. (eds.) Human Motion, Understanding, Modelling, Capture, and Animation, Computational Imaging and Vision, vol. 36, pp. 81–104. Springer, Dordrecht (2006). https://doi.org/10.1007/978-1-4020-6693-1_4
Inagaki, S., Itoh, H., Imiya, A.: Multiple alignment of spatiotemporal deformable objects for the average-organ computation. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8928, pp. 353–366. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16220-1_25
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Gotoh, I., Hiraoka, H., Imiya, A. (2019). Event Extraction Using Transportation of Temporal Optical Flow Fields. In: Leal-Taixé, L., Roth, S. (eds) Computer Vision – ECCV 2018 Workshops. ECCV 2018. Lecture Notes in Computer Science(), vol 11134. Springer, Cham. https://doi.org/10.1007/978-3-030-11024-6_52
Download citation
DOI: https://doi.org/10.1007/978-3-030-11024-6_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11023-9
Online ISBN: 978-3-030-11024-6
eBook Packages: Computer ScienceComputer Science (R0)