Motion Analysis

Distante, Arcangelo; Distante, Cosimo

doi:10.1007/978-3-030-42378-0_6

1662 Accesses

Abstract

In this chapter, Motion Analysis algorithms for the perception of the dynamics of the scene, in the analogy of the vision systems of different living beings are described. With the motion analysis algorithms, it is possible to derive the 3D motion, almost in real time, from the analysis of sequences of time-varying 2D images. Paradigms on motion analysis have shown that the perception of movement derives from the information of the objects evaluating the presence of occlusions, texture, contours, etc. The algorithms for the perception of the motion occurring in the real physical world and not the apparent motion are described. Different methods of motion analysis are described from those with a limited computational load such as those based on time-varying image difference, to the more complex ones based on optical flow, by considering contexts with different levels of motion entities and scene-environment with different complexity. Algorithms to reconstruct the 3D structure of the scene (and the motion) are described, i.e., to calculate the coordinates of 3D points of the scene whose 2D projection is known in each image of the time-varying sequence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Hardcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Panel used in the Middle Ages by barbers. On a white cylinder, a red ribbon is wrapped helically. The cylinder continuously rotates around its vertical axis and all the points of the cylindrical surface move horizontally. It is observed instead that this rotation produces the illusion that the red ribbon moves vertically upwards. The motion is ambiguous because it is not possible to find corresponding points in motion in the temporal analysis as shown in the figure. Hans Wallach in 1935, a psychologist, discovered that the illusion is less if the cylinder is shorter and wider, and the perceived motion is correctly lateral. The illusion is also solved if the texture is present on the tape.
2.
We recall that we also have the phenomenon of the spatial aliasing already described in Sect. 5.10 Vol. I. According to the Shannon–Nyquist theorem, a sampled continuous function (in the time or space domain) can be completely reconstructed if (a) the sampling frequency is equal to or greater than twice the frequency of the maximum spectral component of the input signal (also called Nyquist frequency) and (b) the spectrum replicas are removed in the Fourier domain, remaining only the original spectrum. The latter process of removal is the anti-aliasing process of signal correction by eliminating spurious space–time components.
3.
In fact, considering the smoothness term of (6.18) and deriving with respect to u we have
$$\begin{aligned} \begin{aligned} \frac{\partial (u_x^2+v_x^2+u_y^2+v_y^2)}{\partial u}&=2 \frac{\partial }{\partial u}\frac{\partial u(x,y)}{\partial x}+2 \frac{\partial }{\partial u}\frac{\partial u(x,y)}{\partial y}\\&=2\Biggl (\frac{\partial ^2 u }{\partial x^2}+\frac{\partial ^2 u }{\partial y^2}\Biggr )=2(u_{xx}+u_{yy})=2\nabla ^2 u \end{aligned} \end{aligned}$$
which corresponds to the second-order differential operator defined as the divergence of the gradient of the function u(x, y) in a Euclidean space. This operator is known as the Laplace operator or simply Laplacian. Similarly, the Laplacian of the function v(x, y) is derived.
4.
The matrix $(A^TA)$ is known in the literature as the tensor structure of the image relative to a pixel p. A term that derives from the concept of tensor which generically indicates a linear algebraic structure able to mathematically describe an invariable physical phenomenon with respect to the adopted reference system. In this case, it concerns the analysis of the local motion associated with the pixels of the W window centered in p.
5.
Represents the generalization of iterative methods. In general, we want to solve a nonlinear equation $ f (x) = 0 $ by leading back to the problem of finding a fixed point of a function $ y = g (x) $, that is, we want to find a solution $ \alpha $ such that $f(\alpha )=0\Longleftrightarrow \alpha =g(\alpha )$. The iteration function is in the form $x^{(k+1)}= g(x^{(k)})$, which iteratively produces a sequence of x for each $k\ge 0$ and for a $ x ^ {(0)} $ initial assigned. Not all the iteration functions g(x) guarantee convergence at the fixed point. It is shown that, if g(x) is continuous and the sequence $ x ^ {(k)} $ converges, then this converges to a fixed point $ \alpha $, that is, $\alpha =g(\alpha )$ which is also a solution of $ f (x) = 0 $.
6.
A function f(x) with real values and defined in an interval is called convex if the segment joining any two points of its graph is above the graph itself. Convex optimization problems simplify the analysis and solution of a convex problem. It is shown that a convex function, defined in a convex set, has no solution, or has only global solutions, and cannot have exclusively local solutions.
7.
Goal–NoGoal event, according to FIFA regulations, occurs when the ball passes entirely the ideal vertical plane parallel to the door, passing through the inner edge of the horizontal white line separating the playing field from the inner area of the goal itself.
8.
Let’s better specify how the uncertainty of a stochastic variable is evaluated, which we know to be its variance, to motivate Eq. (6.110). In this case, we are initially interested in evaluating the uncertainty of the state vector prediction $\varvec{\hat{x}}_{t-1}$ which is given, being multidimensional, from its covariance matrix $\varvec{P}_{t-1}=Cov(\varvec{\hat{x}}_{t-1})$. Similarly, the uncertainty of the next value of the prediction vector $\varvec{\hat{x}}_{t}$ at time t, after the transformation $ \mathbf {F} _ {t} $ obtained with (6.109), is given by
$$\begin{aligned} \varvec{P}_{t}=Cov(\varvec{\hat{x}}_{t})=Cov(\varvec{F}_t\varvec{\hat{x}}_{t-1})=\mathbf {F}_tCov(\varvec{\hat{x}}_{t-1})\mathbf {F}_t^T=\mathbf {F}_{t}\mathbf {P}_{t-1}\mathbf {F}_t^T \end{aligned}$$
(6.111)
Essentially, the linear transformation of the values $\varvec{\hat{x}}_{t-1}$ (which have a covariance matrix $\varvec{P}_{t-1}$) with the prediction matrix $ \varvec{F} _t $ modifies the covariance matrix of the vectors of the next state $\varvec{\hat{x}}_{t}$ according to Eq. (6.114), where $ \varvec{ P} _ {t} $ in fact represents the output covariance matrix of the linear prediction transformation assuming the Gaussian state variable. In the literature, (6.114) represents the error propagation law for a linear transformation of a random variable whose mean and covariance are known without necessarily knowing its exact probability distribution. Its proof is easily obtained, remembering the definition and properties of the expected value $ \mu _x $ and covariance $ \Sigma _x $ of a random variable $ \mathbf {x} $. In fact, considering a generic linear transformation $ y = Ax + b $, the new expected value and the covariance matrix are derived from the original distribution as follows:
$$\begin{aligned} \varvec{\mu }_{y}=E[\varvec{y}]=E[\varvec{Ax}+\varvec{b}]=\varvec{A}E[\mathbf {x}]+\varvec{b}=\varvec{A\mu }_x+\varvec{b} \end{aligned}$$

$$\begin{aligned} \begin{aligned} \Sigma _{y}&= E[Var(\varvec{y})Var(\varvec{y})^T]=E[(\varvec{y}-E[\varvec{y}])(\varvec{y}-E[\varvec{y}])^T ]\\&=E[(\varvec{Ax}+\varvec{b}-\varvec{A}E[\mathbf {x}]-\varvec{b}) (\varvec{Ax}+\varvec{b}-\varvec{A}E[\mathbf {x}]-\varvec{b}) ^T ]\\&=E[(\varvec{A}(\mathbf {x}-E[\mathbf {x}]))(\varvec{A}(\mathbf {x}-E[\mathbf {x}])) ^T ]\\&=E[(\varvec{A}(\mathbf {x}-E[\mathbf {x}]))((\mathbf {x}-E[\mathbf {x}]) ^T\varvec{A}^T ]\\&=\varvec{A}E[(\mathbf {x}-E[\mathbf {x}])(\mathbf {x}-E[\mathbf {x}]) ^T]\varvec{A}^T \\&=\varvec{A}\varvec{\Sigma }\varvec{A}^T \end{aligned} \end{aligned}$$
In the context of the Kalman filter, the error propagation also occurs in the propagation of the measurement prediction uncertainty (Eq. 6.105) from the previous state $\mathbf {x}_{t|t-1}$, whose uncertainty is given by the covariance matrix $\mathbf {P}_{t|t-1}$.
9.
In the aeronautical context, the attitude of an aircraft (integral with the axes (X, Y, Z) ), in 3D space, is indicated with the angles of rotation around the axes, indicated, respectively, as lateral, vertical and longitudinal. The longitudinal rotation, around the Z-axis, indicates the roll, the lateral one, around the X-axis, indicates the pitch, and the vertical one, around the Y axis indicates the yaw. In the robotic context (for example, in the case of an autonomous vehicle) the attitude of the camera can be defined with 3 degrees of freedom with the axes (X, Y, Z) indicating the lateral direction (side-to-side), vertical (up-down), and camera direction (looking). The rotation around the lateral, up-down, and looking axes retain the same meaning as the axes considered for the aircraft.
10.
The physical position of a point projected in the image plane, normally expressed with a metric, for example, in mm, must be transformed into units of the image sensor expressed in pixel which typically does not correspond to a metric such as for example in mm. The physical image plane is discretized by the pixels of the sensor characterized by its horizontal and vertical spatial resolution expressed in pixels/mm. Therefore, the transformation of coordinates from mm to pixels introduces a horizontal scale factor $p_x=\frac{npix_x}{dimx}$ and vertical $p_y=\frac{npix_y}{dimy}$ with which to multiply the physical image coordinates (x, y). In particular, $npix_x\times npix_y$ represents the horizontal and vertical resolution of the sensor in pixels, while $dimx\times dimy$ indicates the horizontal and vertical dimensions of the sensor given in mm. Often to define the pixel’s rectangles is given the aspect ratio as the ratio of width to height of the pixel (usually expressed in decimal form, for example, 1.25, or in fractional form as 5/4 to break free from the problem of the periodic decimal approximation). Furthermore, the coordinates must be translated with respect to the position of the principal point expressed in pixels since the center of the sensor’s pixel grid does not always correspond with the position of the principal point. The pixel coordinate system of the sensor is indicated with (u, v) with the principal point $ (u_0, v_0) $ given in pixels and assumed with the axes parallel to those of the physical system (x, y) . The accuracy of the coordinates in pixels with respect to the physical ones depends on the resolution of the sensor and its dimensions.
11.
The roto-translation transformation expressed by (6.209) indicates that the rotation $ \varvec{R} $ was performed first and then the translation $ \varvec{T} $. Often it is reported with inverted operations, that is, before the translation and after the rotation, having thus
$$\varvec{X}=\varvec{R}(\varvec{X_w}-\varvec{T})=\varvec{R}\varvec{X_w}+(-\varvec{R}\varvec{T})$$
and in this case, in Eq. (6.210), the translation term $ \varvec{T} $ is replaced with $-\varvec{R}\varvec{T}$.
12.
The distance between the projection center and the image plane is assumed infinite with focal length $f\rightarrow \infty $ and parallel projection lines.
13.
Recall that the rows of $ \varvec{R} $ represent the coordinates in the original space of the unit vectors along the coordinate axes of the rotated space, while the columns of $ \varvec{R} $ represent the coordinates in the rotated space of unit vectors along the axes of the original space.
14.
Let’s remember here from the properties of the rotation matrix $ \varvec{R} $ which is normalized, that is, the squares of the elements in a row or in a column are equal to 1, and it is orthogonal, i.e., the inner product of any pair of rows or any pair of columns is 0.
15.
The Grotta dei Cervi (Deer Cave) is located in Porto Badisco near Otranto in Apulia–Italy at a depth of 26 m below sea level and represents an important cave, it is, in fact, the most impressive Neolithic pictorial complex in Europe, only recently discovered in 1970.
16.
Virtual Reality Modeling Language (VRML) is a programming language that allows the simulation of three-dimensional virtual worlds. With VRML it is possible to describe virtual environments that include objects, light sources, images, sounds, movies.

References

J.J. Gibson, The Perception of the Visual World (Sinauer Associates, 1995)
Google Scholar
T. D’orazio, M. Leo, N. Mosca, M. Nitti, P. Spagnolo, A. Distante, A visual system for real time detection of goal events during soccer matches. Comput. Vis. Image Underst. 113 (2009a), 622–632
Google Scholar
T. D’Orazio, M. Leo, P. Spagnolo, P.L. Mazzeo, N. Mosca, M. Nitti, A. Distante, An investigation into the feasibility of real-time soccer offside detection from a multiple camera system. IEEE Trans. Circuits Syst. Video Surveill. 19(12), 1804–1818 (2009b)
Article Google Scholar
A. Distante, T. D’Orazio, M. Leo, N. Mosca, M. Nitti, P. Spagnolo, E. Stella, Method and system for the detection and the classification of events during motion actions. Patent PCT/IB2006/051209, International Publication Number (IPN) WO/2006/111928 (2006)
Google Scholar
L. Capozzo, A. Distante, T. D’Orazio, M. Ianigro, M. Leo, P.L. Mazzeo, N. Mosca, M. Nitti, P. Spagnolo, E. Stella, Method and system for the detection and the classification of events during motion actions. Patent PCT/IB2007/050652, International Publication Number (IPN) WO/2007/099502 (2007)
Google Scholar
B.A. Wandell, Book Rvw: Foundations of vision. By B.A. Wandell. J. Electron. Imaging 5(1), 107 (1996)
Google Scholar
B.K.P. Horn, B.G. Schunck, Determining optical flow. Artif. Intell. 17, 185–203 (1981)
Article Google Scholar
Ramsey Faragher, Understanding the basis of the kalman filter via a simple and intuitive derivation. IEEE Signal Process. Mag. 29(5), 128–132 (2012)
Article Google Scholar
P. Musoff, H. Zarchan, Fundamentals of Kalman Filtering: A Practical Approach. (American Institute of Aeronautics and Astronautics, Incorporated, 2000). ISBN 1563474557, 9781563474552
Google Scholar
E. Meinhardt-Llopis, J.S. Pérez, D. Kondermann, Horn-schunck optical flow with a multi-scale strategy. Image Processing On Line, 3, 151–172 (2013). https://doi.org/10.5201/ipol.2013.20
H.H. Nagel, Displacement vectors derived from second-order intensity variations in image sequences. Comput. Vis., Graph. Image Process. 21, 85–117 (1983)
Article Google Scholar
B.D. Lucas, T. Kanade, An iterative image registration technique with an application to stereo vision, in Proceedings of Imaging Understanding Workshop (1981), pp. 121–130
Google Scholar
J.L. Barron, D.J. Fleet, S. Beauchemin, Performance of optical flow techniques, Performance of optical flow techniques. Int. J. Comput. Vis. 12(1), 43–77 (1994)
Google Scholar
T. Brox, A. Bruhn, N. Papenberg, J. Weickert, High accuracy optical flow estimation based on a theory for warping. in Proceedings of the European Conference on Computer Vision, vol. 4 (2004), pp. 25–36
Google Scholar
M.J. Black, P. Anandan, The robust estimation of multiple motions: parametric and piecewise smooth flow fields. Comput. Vis. Image Underst. 63(1), 75–104 (1996)
Article Google Scholar
J. Wang, E. Adelson, Representing moving images with layers. Proc. IEEE Trans. Image Process. 3(5), 625–638 (1994)
Google Scholar
E. Mémin, P. Pérez, Hierarchical estimation and segmentation of dense motion fields. Int. J. Comput. Vis. 46(2), 129–155 (2002)
Article MATH Google Scholar
Simon Baker, Iain Matthews, Lucas-kanade 20 years on: a unifying framework. Int. J. Comput. Vis. 56(3), 221–255 (2004)
Article Google Scholar
H.-Y. Shum, R. Szeliski, Construction of panoramic image mosaics with global and local alignment. Int. J. Comput. Vis. 16(1), 63–84 (2000)
Google Scholar
David G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Article Google Scholar
F. Marino, E. Stella, A. Branca, N. Veneziani, A. Distante, Specialized hardware for real-time navigation. R.-Time Imaging, Acad. Press 7, 97–108 (2001)
Article MATH Google Scholar
Stephen T. Barnard, William B. Thompson, Disparity analysis of images. IEEE Trans. Pattern Anal. Mach. Intell. 2(4), 334–340 (1980)
Google Scholar
M. Leo, T. D’Orazio, P. Spagnolo, P.L. Mazzeo, A. Distante, Sift based ball recognition in soccer images, in Image and Signal Processing, vol. 5099, ed. by A. Elmoataz, O. Lezoray, F. Nouboud, D. Mammass (Springer, Berlin, Heidelberg, 2008), pp. 263–272
Google Scholar
M. Leo, P.L. Mazzeo, M. Nitti, P. Spagnolo, Accurate ball detection in soccer images using probabilistic analysis of salient regions. Mach. Vis. Appl. 24(8), 1561–1574 (2013)
Article Google Scholar
T. D’Orazio, M. Leo, C. Guaragnella, A. Distante, A new algorithm for ball recognition using circle hough transform and neural classifier. Pattern Recognit. 37(3), 393–408 (2004)
Article Google Scholar
T. D’Orazio, N. Ancona, G. Cicirelli, M. Nitti, A ball detection algorithm for real soccer image sequences, in Proceedings of the 16th International Conference on Pattern Recognition (ICPR’02), vol. 1 (2002), pp. 201–213
Google Scholar
Y. Bar-Shalom, X.R. Li, T. Kirubarajan, Estimation with Applications to Tracking and Navigation (Wiley, 2001). ISBN 0-471-41655-X, 0-471-22127-9
Google Scholar
S.C.S. Cheung, C. Kamath, Robust techniques for background subtraction in urban traffic video. Vis. Commun. Image Process. 5308, 881–892 (2004)
Google Scholar
C.R. Wren, A. Azarbayejani, T. Darrell, A. Pentland, Pfinder: real-time tracking of the human body. IEEE Trans. Pattern Anal. 19(7), 780–785 (1997)
Google Scholar
D. Makris, T. Ellis, Path detection in video surveillance. Image Vis. Comput. 20, 895–903 (2002)
Article Google Scholar
C. Stauffer, W.E. Grimson, Adaptive background mixture models for real-time tracking, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2 (1999)
Google Scholar
A Elgammal, D. Harwood, L. Davis, Non-parametric model for background subtraction, in European Conference on Computer Vision (2000), pp. 751–767
Google Scholar
N.M. Oliver, B. Rosario, A.P. Pentland, A bayesian computer vision system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 831–843 (2000)
Article Google Scholar
R. Li, Y. Chen, X. Zhang, Fast robust eigen-background updating for foreground detection, in International Conference on Image Processing (2006), pp. 1833–1836
Google Scholar
A. Sobral, A. Vacavant, A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos. Comput. Vis. Image Underst. 122(05), 4–21 (2014). https://doi.org/10.1016/j.cviu.2013.12.005
Y. Benezeth, P.M. Jodoin, B. Emile, H. Laurent, C. Rosenberger, Comparative study of background subtraction algorithms. J. Electron. Imaging 19(3) (2010)
Google Scholar
M. Piccardi, T. Jan, Mean-shift background image modelling. Int. Conf. Image Process. 5, 3399–3402 (2004)
Google Scholar
B. Han, D. Comaniciu, L. Davis, Sequential kernel density approximation through mode propagation: applications to background modeling, in Proceedings of the ACCV-Asian Conference on Computer Vision (2004)
Google Scholar
D.J. Heeger, A.D. Jepson, Subspace methods for recovering rigid motion i: algorithm and implementation. Int. J. Comput. Vis. 7, 95–117 (1992)
Article Google Scholar
H.C. Longuet-Higgins, K. Prazdny, The interpretation of a moving retinal image. Proc. R. Soc. Lond. 208, 385–397 (1980)
Google Scholar
H.C. Longuet-Higgins, The visual ambiguity of a moving plane. Proc. R. Soc. Lond. 223, 165–175 (1984)
Google Scholar
W. Burger, B. Bhanu, Estimating 3-D egomotion from perspective image sequences. IEEE Trans. Pattern Anal. Mach. Intell. 12(11), 1040–1058 (1990)
Article Google Scholar
A. Branca, E. Stella, A. Distante, Passive navigation using focus of expansion, in WACV96 (1996), pp. 64–69
Google Scholar
G. Convertino, A. Branca, A. Distante, Focus of expansion estimation with a neural network, in IEEE International Conference on Neural Networks, 1996, vol. 3 (IEEE, 1996), pp. 1693–1697
Google Scholar
G. Adiv, Determining three-dimensional motion and structure from optical flow generated by several moving objects. IEEE Trans. Pattern Anal. Mach. Intell. 7(4), 384–401 (1985)
Article Google Scholar
A.R. Bruss, B.K.P. Horn, Passive navigation. Comput. Vis., Graph., Image Process. 21(1), 3–20 (1983)
Article Google Scholar
M. Armstong, A. Zisserman, P. Beardsley, Euclidean structure from uncalibrated images, in British Machine Vision Conference (1994), pp. 509–518
Google Scholar
T.S. Huang, A.N. Netravali, Motion and structure from feature correspondences: a review. Proc. IEEE 82(2), 252–267 (1994)
Google Scholar
S.J. Maybank, O.D. Faugeras, A theory of self-calibration of a moving camera. Int. J. Comput. Vis. 8(2), 123–151 (1992)
Article Google Scholar
D.C. Brown, The bundle adjustment - progress and prospects. Int. Arch. Photogramm. 21(3) (1976)
Google Scholar
W. Triggs, P. McLauchlan, R. Hartley, A. Fitzgibbon, Bundle adjustment - a modern synthesis, in Vision Algorithms: Theory and Practice ed. by W. Triggs, A. Zisserman, R. Szeliski (Springer, Berlin, 2000), pp. 298–375
Google Scholar
C. Tomasi, T. Kanade, Shape and motion from image streams under orthography: a factorization method. Int. J. Comput. Vis. 9(2), 137–154 (1992)
Article Google Scholar
T. Gramegna, L. Venturino, M. Ianigro, G. Attolico, A. Distante, Pre-historical cave fruition through robotic inspection, in IEEE International Conference on Robotics an Automation (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Applied Sciences and Intelligent Systems, Consiglio Nazionale delle Ricerche, Lecce, Italy
Arcangelo Distante & Cosimo Distante

Authors

Arcangelo Distante
View author publications
You can also search for this author in PubMed Google Scholar
Cosimo Distante
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arcangelo Distante .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Distante, A., Distante, C. (2020). Motion Analysis. In: Handbook of Image Processing and Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-030-42378-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-42378-0_6
Published: 09 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-42377-3
Online ISBN: 978-3-030-42378-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics