Abstract
In this chapter, Motion Analysis algorithms for the perception of the dynamics of the scene, in the analogy of the vision systems of different living beings are described. With the motion analysis algorithms, it is possible to derive the 3D motion, almost in real time, from the analysis of sequences of time-varying 2D images. Paradigms on motion analysis have shown that the perception of movement derives from the information of the objects evaluating the presence of occlusions, texture, contours, etc. The algorithms for the perception of the motion occurring in the real physical world and not the apparent motion are described. Different methods of motion analysis are described from those with a limited computational load such as those based on time-varying image difference, to the more complex ones based on optical flow, by considering contexts with different levels of motion entities and scene-environment with different complexity. Algorithms to reconstruct the 3D structure of the scene (and the motion) are described, i.e., to calculate the coordinates of 3D points of the scene whose 2D projection is known in each image of the time-varying sequence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Panel used in the Middle Ages by barbers. On a white cylinder, a red ribbon is wrapped helically. The cylinder continuously rotates around its vertical axis and all the points of the cylindrical surface move horizontally. It is observed instead that this rotation produces the illusion that the red ribbon moves vertically upwards. The motion is ambiguous because it is not possible to find corresponding points in motion in the temporal analysis as shown in the figure. Hans Wallach in 1935, a psychologist, discovered that the illusion is less if the cylinder is shorter and wider, and the perceived motion is correctly lateral. The illusion is also solved if the texture is present on the tape.
- 2.
We recall that we also have the phenomenon of the spatial aliasing already described in Sect. 5.10 Vol. I. According to the Shannon–Nyquist theorem, a sampled continuous function (in the time or space domain) can be completely reconstructed if (a) the sampling frequency is equal to or greater than twice the frequency of the maximum spectral component of the input signal (also called Nyquist frequency) and (b) the spectrum replicas are removed in the Fourier domain, remaining only the original spectrum. The latter process of removal is the anti-aliasing process of signal correction by eliminating spurious space–time components.
- 3.
In fact, considering the smoothness term of (6.18) and deriving with respect to u we have
$$\begin{aligned} \begin{aligned} \frac{\partial (u_x^2+v_x^2+u_y^2+v_y^2)}{\partial u}&=2 \frac{\partial }{\partial u}\frac{\partial u(x,y)}{\partial x}+2 \frac{\partial }{\partial u}\frac{\partial u(x,y)}{\partial y}\\&=2\Biggl (\frac{\partial ^2 u }{\partial x^2}+\frac{\partial ^2 u }{\partial y^2}\Biggr )=2(u_{xx}+u_{yy})=2\nabla ^2 u \end{aligned} \end{aligned}$$which corresponds to the second-order differential operator defined as the divergence of the gradient of the function u(x, y) in a Euclidean space. This operator is known as the Laplace operator or simply Laplacian. Similarly, the Laplacian of the function v(x, y) is derived.
- 4.
The matrix \((A^TA)\) is known in the literature as the tensor structure of the image relative to a pixel p. A term that derives from the concept of tensor which generically indicates a linear algebraic structure able to mathematically describe an invariable physical phenomenon with respect to the adopted reference system. In this case, it concerns the analysis of the local motion associated with the pixels of the W window centered in p.
- 5.
Represents the generalization of iterative methods. In general, we want to solve a nonlinear equation \( f (x) = 0 \) by leading back to the problem of finding a fixed point of a function \( y = g (x) \), that is, we want to find a solution \( \alpha \) such that \(f(\alpha )=0\Longleftrightarrow \alpha =g(\alpha )\). The iteration function is in the form \(x^{(k+1)}= g(x^{(k)})\), which iteratively produces a sequence of x for each \(k\ge 0\) and for a \( x ^ {(0)} \) initial assigned. Not all the iteration functions g(x) guarantee convergence at the fixed point. It is shown that, if g(x) is continuous and the sequence \( x ^ {(k)} \) converges, then this converges to a fixed point \( \alpha \), that is, \(\alpha =g(\alpha )\) which is also a solution of \( f (x) = 0 \).
- 6.
A function f(x) with real values and defined in an interval is called convex if the segment joining any two points of its graph is above the graph itself. Convex optimization problems simplify the analysis and solution of a convex problem. It is shown that a convex function, defined in a convex set, has no solution, or has only global solutions, and cannot have exclusively local solutions.
- 7.
Goal–NoGoal event, according to FIFA regulations, occurs when the ball passes entirely the ideal vertical plane parallel to the door, passing through the inner edge of the horizontal white line separating the playing field from the inner area of the goal itself.
- 8.
Let’s better specify how the uncertainty of a stochastic variable is evaluated, which we know to be its variance, to motivate Eq. (6.110). In this case, we are initially interested in evaluating the uncertainty of the state vector prediction \(\varvec{\hat{x}}_{t-1}\) which is given, being multidimensional, from its covariance matrix \(\varvec{P}_{t-1}=Cov(\varvec{\hat{x}}_{t-1})\). Similarly, the uncertainty of the next value of the prediction vector \(\varvec{\hat{x}}_{t}\) at time t, after the transformation \( \mathbf {F} _ {t} \) obtained with (6.109), is given by
$$\begin{aligned} \varvec{P}_{t}=Cov(\varvec{\hat{x}}_{t})=Cov(\varvec{F}_t\varvec{\hat{x}}_{t-1})=\mathbf {F}_tCov(\varvec{\hat{x}}_{t-1})\mathbf {F}_t^T=\mathbf {F}_{t}\mathbf {P}_{t-1}\mathbf {F}_t^T \end{aligned}$$(6.111)Essentially, the linear transformation of the values \(\varvec{\hat{x}}_{t-1}\) (which have a covariance matrix \(\varvec{P}_{t-1}\)) with the prediction matrix \( \varvec{F} _t \) modifies the covariance matrix of the vectors of the next state \(\varvec{\hat{x}}_{t}\) according to Eq. (6.114), where \( \varvec{ P} _ {t} \) in fact represents the output covariance matrix of the linear prediction transformation assuming the Gaussian state variable. In the literature, (6.114) represents the error propagation law for a linear transformation of a random variable whose mean and covariance are known without necessarily knowing its exact probability distribution. Its proof is easily obtained, remembering the definition and properties of the expected value \( \mu _x \) and covariance \( \Sigma _x \) of a random variable \( \mathbf {x} \). In fact, considering a generic linear transformation \( y = Ax + b \), the new expected value and the covariance matrix are derived from the original distribution as follows:
$$\begin{aligned} \varvec{\mu }_{y}=E[\varvec{y}]=E[\varvec{Ax}+\varvec{b}]=\varvec{A}E[\mathbf {x}]+\varvec{b}=\varvec{A\mu }_x+\varvec{b} \end{aligned}$$$$\begin{aligned} \begin{aligned} \Sigma _{y}&= E[Var(\varvec{y})Var(\varvec{y})^T]=E[(\varvec{y}-E[\varvec{y}])(\varvec{y}-E[\varvec{y}])^T ]\\&=E[(\varvec{Ax}+\varvec{b}-\varvec{A}E[\mathbf {x}]-\varvec{b}) (\varvec{Ax}+\varvec{b}-\varvec{A}E[\mathbf {x}]-\varvec{b}) ^T ]\\&=E[(\varvec{A}(\mathbf {x}-E[\mathbf {x}]))(\varvec{A}(\mathbf {x}-E[\mathbf {x}])) ^T ]\\&=E[(\varvec{A}(\mathbf {x}-E[\mathbf {x}]))((\mathbf {x}-E[\mathbf {x}]) ^T\varvec{A}^T ]\\&=\varvec{A}E[(\mathbf {x}-E[\mathbf {x}])(\mathbf {x}-E[\mathbf {x}]) ^T]\varvec{A}^T \\&=\varvec{A}\varvec{\Sigma }\varvec{A}^T \end{aligned} \end{aligned}$$In the context of the Kalman filter, the error propagation also occurs in the propagation of the measurement prediction uncertainty (Eq. 6.105) from the previous state \(\mathbf {x}_{t|t-1}\), whose uncertainty is given by the covariance matrix \(\mathbf {P}_{t|t-1}\).
- 9.
In the aeronautical context, the attitude of an aircraft (integral with the axes (X, Y, Z) ), in 3D space, is indicated with the angles of rotation around the axes, indicated, respectively, as lateral, vertical and longitudinal. The longitudinal rotation, around the Z-axis, indicates the roll, the lateral one, around the X-axis, indicates the pitch, and the vertical one, around the Y axis indicates the yaw. In the robotic context (for example, in the case of an autonomous vehicle) the attitude of the camera can be defined with 3 degrees of freedom with the axes (X, Y, Z) indicating the lateral direction (side-to-side), vertical (up-down), and camera direction (looking). The rotation around the lateral, up-down, and looking axes retain the same meaning as the axes considered for the aircraft.
- 10.
The physical position of a point projected in the image plane, normally expressed with a metric, for example, in mm, must be transformed into units of the image sensor expressed in pixel which typically does not correspond to a metric such as for example in mm. The physical image plane is discretized by the pixels of the sensor characterized by its horizontal and vertical spatial resolution expressed in pixels/mm. Therefore, the transformation of coordinates from mm to pixels introduces a horizontal scale factor \(p_x=\frac{npix_x}{dimx}\) and vertical \(p_y=\frac{npix_y}{dimy}\) with which to multiply the physical image coordinates (x, y). In particular, \(npix_x\times npix_y\) represents the horizontal and vertical resolution of the sensor in pixels, while \(dimx\times dimy\) indicates the horizontal and vertical dimensions of the sensor given in mm. Often to define the pixel’s rectangles is given the aspect ratio as the ratio of width to height of the pixel (usually expressed in decimal form, for example, 1.25, or in fractional form as 5/4 to break free from the problem of the periodic decimal approximation). Furthermore, the coordinates must be translated with respect to the position of the principal point expressed in pixels since the center of the sensor’s pixel grid does not always correspond with the position of the principal point. The pixel coordinate system of the sensor is indicated with (u, v) with the principal point \( (u_0, v_0) \) given in pixels and assumed with the axes parallel to those of the physical system (x, y) . The accuracy of the coordinates in pixels with respect to the physical ones depends on the resolution of the sensor and its dimensions.
- 11.
The roto-translation transformation expressed by (6.209) indicates that the rotation \( \varvec{R} \) was performed first and then the translation \( \varvec{T} \). Often it is reported with inverted operations, that is, before the translation and after the rotation, having thus
$$\varvec{X}=\varvec{R}(\varvec{X_w}-\varvec{T})=\varvec{R}\varvec{X_w}+(-\varvec{R}\varvec{T})$$and in this case, in Eq. (6.210), the translation term \( \varvec{T} \) is replaced with \(-\varvec{R}\varvec{T}\).
- 12.
The distance between the projection center and the image plane is assumed infinite with focal length \(f\rightarrow \infty \) and parallel projection lines.
- 13.
Recall that the rows of \( \varvec{R} \) represent the coordinates in the original space of the unit vectors along the coordinate axes of the rotated space, while the columns of \( \varvec{R} \) represent the coordinates in the rotated space of unit vectors along the axes of the original space.
- 14.
Let’s remember here from the properties of the rotation matrix \( \varvec{R} \) which is normalized, that is, the squares of the elements in a row or in a column are equal to 1, and it is orthogonal, i.e., the inner product of any pair of rows or any pair of columns is 0.
- 15.
The Grotta dei Cervi (Deer Cave) is located in Porto Badisco near Otranto in Apulia–Italy at a depth of 26 m below sea level and represents an important cave, it is, in fact, the most impressive Neolithic pictorial complex in Europe, only recently discovered in 1970.
- 16.
Virtual Reality Modeling Language (VRML) is a programming language that allows the simulation of three-dimensional virtual worlds. With VRML it is possible to describe virtual environments that include objects, light sources, images, sounds, movies.
References
J.J. Gibson, The Perception of the Visual World (Sinauer Associates, 1995)
T. D’orazio, M. Leo, N. Mosca, M. Nitti, P. Spagnolo, A. Distante, A visual system for real time detection of goal events during soccer matches. Comput. Vis. Image Underst. 113 (2009a), 622–632
T. D’Orazio, M. Leo, P. Spagnolo, P.L. Mazzeo, N. Mosca, M. Nitti, A. Distante, An investigation into the feasibility of real-time soccer offside detection from a multiple camera system. IEEE Trans. Circuits Syst. Video Surveill. 19(12), 1804–1818 (2009b)
A. Distante, T. D’Orazio, M. Leo, N. Mosca, M. Nitti, P. Spagnolo, E. Stella, Method and system for the detection and the classification of events during motion actions. Patent PCT/IB2006/051209, International Publication Number (IPN) WO/2006/111928 (2006)
L. Capozzo, A. Distante, T. D’Orazio, M. Ianigro, M. Leo, P.L. Mazzeo, N. Mosca, M. Nitti, P. Spagnolo, E. Stella, Method and system for the detection and the classification of events during motion actions. Patent PCT/IB2007/050652, International Publication Number (IPN) WO/2007/099502 (2007)
B.A. Wandell, Book Rvw: Foundations of vision. By B.A. Wandell. J. Electron. Imaging 5(1), 107 (1996)
B.K.P. Horn, B.G. Schunck, Determining optical flow. Artif. Intell. 17, 185–203 (1981)
Ramsey Faragher, Understanding the basis of the kalman filter via a simple and intuitive derivation. IEEE Signal Process. Mag. 29(5), 128–132 (2012)
P. Musoff, H. Zarchan, Fundamentals of Kalman Filtering: A Practical Approach. (American Institute of Aeronautics and Astronautics, Incorporated, 2000). ISBN 1563474557, 9781563474552
E. Meinhardt-Llopis, J.S. Pérez, D. Kondermann, Horn-schunck optical flow with a multi-scale strategy. Image Processing On Line, 3, 151–172 (2013). https://doi.org/10.5201/ipol.2013.20
H.H. Nagel, Displacement vectors derived from second-order intensity variations in image sequences. Comput. Vis., Graph. Image Process. 21, 85–117 (1983)
B.D. Lucas, T. Kanade, An iterative image registration technique with an application to stereo vision, in Proceedings of Imaging Understanding Workshop (1981), pp. 121–130
J.L. Barron, D.J. Fleet, S. Beauchemin, Performance of optical flow techniques, Performance of optical flow techniques. Int. J. Comput. Vis. 12(1), 43–77 (1994)
T. Brox, A. Bruhn, N. Papenberg, J. Weickert, High accuracy optical flow estimation based on a theory for warping. in Proceedings of the European Conference on Computer Vision, vol. 4 (2004), pp. 25–36
M.J. Black, P. Anandan, The robust estimation of multiple motions: parametric and piecewise smooth flow fields. Comput. Vis. Image Underst. 63(1), 75–104 (1996)
J. Wang, E. Adelson, Representing moving images with layers. Proc. IEEE Trans. Image Process. 3(5), 625–638 (1994)
E. Mémin, P. Pérez, Hierarchical estimation and segmentation of dense motion fields. Int. J. Comput. Vis. 46(2), 129–155 (2002)
Simon Baker, Iain Matthews, Lucas-kanade 20 years on: a unifying framework. Int. J. Comput. Vis. 56(3), 221–255 (2004)
H.-Y. Shum, R. Szeliski, Construction of panoramic image mosaics with global and local alignment. Int. J. Comput. Vis. 16(1), 63–84 (2000)
David G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
F. Marino, E. Stella, A. Branca, N. Veneziani, A. Distante, Specialized hardware for real-time navigation. R.-Time Imaging, Acad. Press 7, 97–108 (2001)
Stephen T. Barnard, William B. Thompson, Disparity analysis of images. IEEE Trans. Pattern Anal. Mach. Intell. 2(4), 334–340 (1980)
M. Leo, T. D’Orazio, P. Spagnolo, P.L. Mazzeo, A. Distante, Sift based ball recognition in soccer images, in Image and Signal Processing, vol. 5099, ed. by A. Elmoataz, O. Lezoray, F. Nouboud, D. Mammass (Springer, Berlin, Heidelberg, 2008), pp. 263–272
M. Leo, P.L. Mazzeo, M. Nitti, P. Spagnolo, Accurate ball detection in soccer images using probabilistic analysis of salient regions. Mach. Vis. Appl. 24(8), 1561–1574 (2013)
T. D’Orazio, M. Leo, C. Guaragnella, A. Distante, A new algorithm for ball recognition using circle hough transform and neural classifier. Pattern Recognit. 37(3), 393–408 (2004)
T. D’Orazio, N. Ancona, G. Cicirelli, M. Nitti, A ball detection algorithm for real soccer image sequences, in Proceedings of the 16th International Conference on Pattern Recognition (ICPR’02), vol. 1 (2002), pp. 201–213
Y. Bar-Shalom, X.R. Li, T. Kirubarajan, Estimation with Applications to Tracking and Navigation (Wiley, 2001). ISBN 0-471-41655-X, 0-471-22127-9
S.C.S. Cheung, C. Kamath, Robust techniques for background subtraction in urban traffic video. Vis. Commun. Image Process. 5308, 881–892 (2004)
C.R. Wren, A. Azarbayejani, T. Darrell, A. Pentland, Pfinder: real-time tracking of the human body. IEEE Trans. Pattern Anal. 19(7), 780–785 (1997)
D. Makris, T. Ellis, Path detection in video surveillance. Image Vis. Comput. 20, 895–903 (2002)
C. Stauffer, W.E. Grimson, Adaptive background mixture models for real-time tracking, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2 (1999)
A Elgammal, D. Harwood, L. Davis, Non-parametric model for background subtraction, in European Conference on Computer Vision (2000), pp. 751–767
N.M. Oliver, B. Rosario, A.P. Pentland, A bayesian computer vision system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 831–843 (2000)
R. Li, Y. Chen, X. Zhang, Fast robust eigen-background updating for foreground detection, in International Conference on Image Processing (2006), pp. 1833–1836
A. Sobral, A. Vacavant, A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos. Comput. Vis. Image Underst. 122(05), 4–21 (2014). https://doi.org/10.1016/j.cviu.2013.12.005
Y. Benezeth, P.M. Jodoin, B. Emile, H. Laurent, C. Rosenberger, Comparative study of background subtraction algorithms. J. Electron. Imaging 19(3) (2010)
M. Piccardi, T. Jan, Mean-shift background image modelling. Int. Conf. Image Process. 5, 3399–3402 (2004)
B. Han, D. Comaniciu, L. Davis, Sequential kernel density approximation through mode propagation: applications to background modeling, in Proceedings of the ACCV-Asian Conference on Computer Vision (2004)
D.J. Heeger, A.D. Jepson, Subspace methods for recovering rigid motion i: algorithm and implementation. Int. J. Comput. Vis. 7, 95–117 (1992)
H.C. Longuet-Higgins, K. Prazdny, The interpretation of a moving retinal image. Proc. R. Soc. Lond. 208, 385–397 (1980)
H.C. Longuet-Higgins, The visual ambiguity of a moving plane. Proc. R. Soc. Lond. 223, 165–175 (1984)
W. Burger, B. Bhanu, Estimating 3-D egomotion from perspective image sequences. IEEE Trans. Pattern Anal. Mach. Intell. 12(11), 1040–1058 (1990)
A. Branca, E. Stella, A. Distante, Passive navigation using focus of expansion, in WACV96 (1996), pp. 64–69
G. Convertino, A. Branca, A. Distante, Focus of expansion estimation with a neural network, in IEEE International Conference on Neural Networks, 1996, vol. 3 (IEEE, 1996), pp. 1693–1697
G. Adiv, Determining three-dimensional motion and structure from optical flow generated by several moving objects. IEEE Trans. Pattern Anal. Mach. Intell. 7(4), 384–401 (1985)
A.R. Bruss, B.K.P. Horn, Passive navigation. Comput. Vis., Graph., Image Process. 21(1), 3–20 (1983)
M. Armstong, A. Zisserman, P. Beardsley, Euclidean structure from uncalibrated images, in British Machine Vision Conference (1994), pp. 509–518
T.S. Huang, A.N. Netravali, Motion and structure from feature correspondences: a review. Proc. IEEE 82(2), 252–267 (1994)
S.J. Maybank, O.D. Faugeras, A theory of self-calibration of a moving camera. Int. J. Comput. Vis. 8(2), 123–151 (1992)
D.C. Brown, The bundle adjustment - progress and prospects. Int. Arch. Photogramm. 21(3) (1976)
W. Triggs, P. McLauchlan, R. Hartley, A. Fitzgibbon, Bundle adjustment - a modern synthesis, in Vision Algorithms: Theory and Practice ed. by W. Triggs, A. Zisserman, R. Szeliski (Springer, Berlin, 2000), pp. 298–375
C. Tomasi, T. Kanade, Shape and motion from image streams under orthography: a factorization method. Int. J. Comput. Vis. 9(2), 137–154 (1992)
T. Gramegna, L. Venturino, M. Ianigro, G. Attolico, A. Distante, Pre-historical cave fruition through robotic inspection, in IEEE International Conference on Robotics an Automation (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Distante, A., Distante, C. (2020). Motion Analysis. In: Handbook of Image Processing and Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-030-42378-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-42378-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-42377-3
Online ISBN: 978-3-030-42378-0
eBook Packages: Computer ScienceComputer Science (R0)