Skip to main content

Abstract

In this chapter, Motion Analysis algorithms for the perception of the dynamics of the scene, in the analogy of the vision systems of different living beings are described. With the motion analysis algorithms, it is possible to derive the 3D motion, almost in real time, from the analysis of sequences of time-varying 2D images. Paradigms on motion analysis have shown that the perception of movement derives from the information of the objects evaluating the presence of occlusions, texture, contours, etc. The algorithms for the perception of the motion occurring in the real physical world and not the apparent motion are described. Different methods of motion analysis are described from those with a limited computational load such as those based on time-varying image difference, to the more complex ones based on optical flow, by considering contexts with different levels of motion entities and scene-environment with different complexity. Algorithms to reconstruct the 3D structure of the scene (and the motion) are described, i.e., to calculate the coordinates of 3D points of the scene whose 2D projection is known in each image of the time-varying sequence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Panel used in the Middle Ages by barbers. On a white cylinder, a red ribbon is wrapped helically. The cylinder continuously rotates around its vertical axis and all the points of the cylindrical surface move horizontally. It is observed instead that this rotation produces the illusion that the red ribbon moves vertically upwards. The motion is ambiguous because it is not possible to find corresponding points in motion in the temporal analysis as shown in the figure. Hans Wallach in 1935, a psychologist, discovered that the illusion is less if the cylinder is shorter and wider, and the perceived motion is correctly lateral. The illusion is also solved if the texture is present on the tape.

  2. 2.

    We recall that we also have the phenomenon of the spatial aliasing already described in Sect. 5.10 Vol. I. According to the Shannon–Nyquist theorem, a sampled continuous function (in the time or space domain) can be completely reconstructed if (a) the sampling frequency is equal to or greater than twice the frequency of the maximum spectral component of the input signal (also called Nyquist frequency) and (b) the spectrum replicas are removed in the Fourier domain, remaining only the original spectrum. The latter process of removal is the anti-aliasing process of signal correction by eliminating spurious space–time components.

  3. 3.

    In fact, considering the smoothness term of (6.18) and deriving with respect to u we have

    $$\begin{aligned} \begin{aligned} \frac{\partial (u_x^2+v_x^2+u_y^2+v_y^2)}{\partial u}&=2 \frac{\partial }{\partial u}\frac{\partial u(x,y)}{\partial x}+2 \frac{\partial }{\partial u}\frac{\partial u(x,y)}{\partial y}\\&=2\Biggl (\frac{\partial ^2 u }{\partial x^2}+\frac{\partial ^2 u }{\partial y^2}\Biggr )=2(u_{xx}+u_{yy})=2\nabla ^2 u \end{aligned} \end{aligned}$$

    which corresponds to the second-order differential operator defined as the divergence of the gradient of the function u(xy) in a Euclidean space. This operator is known as the Laplace operator or simply Laplacian. Similarly, the Laplacian of the function v(xy) is derived.

  4. 4.

    The matrix \((A^TA)\) is known in the literature as the tensor structure of the image relative to a pixel p. A term that derives from the concept of tensor which generically indicates a linear algebraic structure able to mathematically describe an invariable physical phenomenon with respect to the adopted reference system. In this case, it concerns the analysis of the local motion associated with the pixels of the W window centered in p.

  5. 5.

    Represents the generalization of iterative methods. In general, we want to solve a nonlinear equation \( f (x) = 0 \) by leading back to the problem of finding a fixed point of a function \( y = g (x) \), that is, we want to find a solution \( \alpha \) such that \(f(\alpha )=0\Longleftrightarrow \alpha =g(\alpha )\). The iteration function is in the form \(x^{(k+1)}= g(x^{(k)})\), which iteratively produces a sequence of x for each \(k\ge 0\) and for a \( x ^ {(0)} \) initial assigned. Not all the iteration functions g(x) guarantee convergence at the fixed point. It is shown that, if g(x) is continuous and the sequence \( x ^ {(k)} \) converges, then this converges to a fixed point \( \alpha \), that is, \(\alpha =g(\alpha )\) which is also a solution of \( f (x) = 0 \).

  6. 6.

    A function f(x) with real values and defined in an interval is called convex if the segment joining any two points of its graph is above the graph itself. Convex optimization problems simplify the analysis and solution of a convex problem. It is shown that a convex function, defined in a convex set, has no solution, or has only global solutions, and cannot have exclusively local solutions.

  7. 7.

    Goal–NoGoal event, according to FIFA regulations, occurs when the ball passes entirely the ideal vertical plane parallel to the door, passing through the inner edge of the horizontal white line separating the playing field from the inner area of the goal itself.

  8. 8.

    Let’s better specify how the uncertainty of a stochastic variable is evaluated, which we know to be its variance, to motivate Eq. (6.110). In this case, we are initially interested in evaluating the uncertainty of the state vector prediction \(\varvec{\hat{x}}_{t-1}\) which is given, being multidimensional, from its covariance matrix \(\varvec{P}_{t-1}=Cov(\varvec{\hat{x}}_{t-1})\). Similarly, the uncertainty of the next value of the prediction vector \(\varvec{\hat{x}}_{t}\) at time t, after the transformation \( \mathbf {F} _ {t} \) obtained with (6.109), is given by

    $$\begin{aligned} \varvec{P}_{t}=Cov(\varvec{\hat{x}}_{t})=Cov(\varvec{F}_t\varvec{\hat{x}}_{t-1})=\mathbf {F}_tCov(\varvec{\hat{x}}_{t-1})\mathbf {F}_t^T=\mathbf {F}_{t}\mathbf {P}_{t-1}\mathbf {F}_t^T \end{aligned}$$
    (6.111)

    Essentially, the linear transformation of the values \(\varvec{\hat{x}}_{t-1}\) (which have a covariance matrix \(\varvec{P}_{t-1}\)) with the prediction matrix \( \varvec{F} _t \) modifies the covariance matrix of the vectors of the next state \(\varvec{\hat{x}}_{t}\) according to Eq. (6.114), where \( \varvec{ P} _ {t} \) in fact represents the output covariance matrix of the linear prediction transformation assuming the Gaussian state variable. In the literature, (6.114) represents the error propagation law for a linear transformation of a random variable whose mean and covariance are known without necessarily knowing its exact probability distribution. Its proof is easily obtained, remembering the definition and properties of the expected value \( \mu _x \) and covariance \( \Sigma _x \) of a random variable \( \mathbf {x} \). In fact, considering a generic linear transformation \( y = Ax + b \), the new expected value and the covariance matrix are derived from the original distribution as follows:

    $$\begin{aligned} \varvec{\mu }_{y}=E[\varvec{y}]=E[\varvec{Ax}+\varvec{b}]=\varvec{A}E[\mathbf {x}]+\varvec{b}=\varvec{A\mu }_x+\varvec{b} \end{aligned}$$
    $$\begin{aligned} \begin{aligned} \Sigma _{y}&= E[Var(\varvec{y})Var(\varvec{y})^T]=E[(\varvec{y}-E[\varvec{y}])(\varvec{y}-E[\varvec{y}])^T ]\\&=E[(\varvec{Ax}+\varvec{b}-\varvec{A}E[\mathbf {x}]-\varvec{b}) (\varvec{Ax}+\varvec{b}-\varvec{A}E[\mathbf {x}]-\varvec{b}) ^T ]\\&=E[(\varvec{A}(\mathbf {x}-E[\mathbf {x}]))(\varvec{A}(\mathbf {x}-E[\mathbf {x}])) ^T ]\\&=E[(\varvec{A}(\mathbf {x}-E[\mathbf {x}]))((\mathbf {x}-E[\mathbf {x}]) ^T\varvec{A}^T ]\\&=\varvec{A}E[(\mathbf {x}-E[\mathbf {x}])(\mathbf {x}-E[\mathbf {x}]) ^T]\varvec{A}^T \\&=\varvec{A}\varvec{\Sigma }\varvec{A}^T \end{aligned} \end{aligned}$$

    In the context of the Kalman filter, the error propagation also occurs in the propagation of the measurement prediction uncertainty (Eq. 6.105) from the previous state \(\mathbf {x}_{t|t-1}\), whose uncertainty is given by the covariance matrix \(\mathbf {P}_{t|t-1}\).

  9. 9.

    In the aeronautical context, the attitude of an aircraft (integral with the axes (XYZ) ), in 3D space, is indicated with the angles of rotation around the axes, indicated, respectively, as lateral, vertical and longitudinal. The longitudinal rotation, around the Z-axis, indicates the roll, the lateral one, around the X-axis, indicates the pitch, and the vertical one, around the Y axis indicates the yaw. In the robotic context (for example, in the case of an autonomous vehicle) the attitude of the camera can be defined with 3 degrees of freedom with the axes (XYZ) indicating the lateral direction (side-to-side), vertical (up-down), and camera direction (looking). The rotation around the lateral, up-down, and looking axes retain the same meaning as the axes considered for the aircraft.

  10. 10.

    The physical position of a point projected in the image plane, normally expressed with a metric, for example, in mm, must be transformed into units of the image sensor expressed in pixel which typically does not correspond to a metric such as for example in mm. The physical image plane is discretized by the pixels of the sensor characterized by its horizontal and vertical spatial resolution expressed in pixels/mm. Therefore, the transformation of coordinates from mm to pixels introduces a horizontal scale factor \(p_x=\frac{npix_x}{dimx}\) and vertical \(p_y=\frac{npix_y}{dimy}\) with which to multiply the physical image coordinates (xy). In particular, \(npix_x\times npix_y\) represents the horizontal and vertical resolution of the sensor in pixels, while \(dimx\times dimy\) indicates the horizontal and vertical dimensions of the sensor given in mm. Often to define the pixel’s rectangles is given the aspect ratio as the ratio of width to height of the pixel (usually expressed in decimal form, for example, 1.25, or in fractional form as 5/4 to break free from the problem of the periodic decimal approximation). Furthermore, the coordinates must be translated with respect to the position of the principal point expressed in pixels since the center of the sensor’s pixel grid does not always correspond with the position of the principal point. The pixel coordinate system of the sensor is indicated with (uv) with the principal point \( (u_0, v_0) \) given in pixels and assumed with the axes parallel to those of the physical system (xy) . The accuracy of the coordinates in pixels with respect to the physical ones depends on the resolution of the sensor and its dimensions.

  11. 11.

    The roto-translation transformation expressed by (6.209) indicates that the rotation \( \varvec{R} \) was performed first and then the translation \( \varvec{T} \). Often it is reported with inverted operations, that is, before the translation and after the rotation, having thus

    $$\varvec{X}=\varvec{R}(\varvec{X_w}-\varvec{T})=\varvec{R}\varvec{X_w}+(-\varvec{R}\varvec{T})$$

    and in this case, in Eq. (6.210), the translation term \( \varvec{T} \) is replaced with \(-\varvec{R}\varvec{T}\).

  12. 12.

    The distance between the projection center and the image plane is assumed infinite with focal length \(f\rightarrow \infty \) and parallel projection lines.

  13. 13.

    Recall that the rows of \( \varvec{R} \) represent the coordinates in the original space of the unit vectors along the coordinate axes of the rotated space, while the columns of \( \varvec{R} \) represent the coordinates in the rotated space of unit vectors along the axes of the original space.

  14. 14.

    Let’s remember here from the properties of the rotation matrix \( \varvec{R} \) which is normalized, that is, the squares of the elements in a row or in a column are equal to 1, and it is orthogonal, i.e., the inner product of any pair of rows or any pair of columns is 0.

  15. 15.

    The Grotta dei Cervi (Deer Cave) is located in Porto Badisco near Otranto in Apulia–Italy at a depth of 26 m below sea level and represents an important cave, it is, in fact, the most impressive Neolithic pictorial complex in Europe, only recently discovered in 1970.

  16. 16.

    Virtual Reality Modeling Language (VRML) is a programming language that allows the simulation of three-dimensional virtual worlds. With VRML it is possible to describe virtual environments that include objects, light sources, images, sounds, movies.

References

  1. J.J. Gibson, The Perception of the Visual World (Sinauer Associates, 1995)

    Google Scholar 

  2. T. D’orazio, M. Leo, N. Mosca, M. Nitti, P. Spagnolo, A. Distante, A visual system for real time detection of goal events during soccer matches. Comput. Vis. Image Underst. 113 (2009a), 622–632

    Google Scholar 

  3. T. D’Orazio, M. Leo, P. Spagnolo, P.L. Mazzeo, N. Mosca, M. Nitti, A. Distante, An investigation into the feasibility of real-time soccer offside detection from a multiple camera system. IEEE Trans. Circuits Syst. Video Surveill. 19(12), 1804–1818 (2009b)

    Article  Google Scholar 

  4. A. Distante, T. D’Orazio, M. Leo, N. Mosca, M. Nitti, P. Spagnolo, E. Stella, Method and system for the detection and the classification of events during motion actions. Patent PCT/IB2006/051209, International Publication Number (IPN) WO/2006/111928 (2006)

    Google Scholar 

  5. L. Capozzo, A. Distante, T. D’Orazio, M. Ianigro, M. Leo, P.L. Mazzeo, N. Mosca, M. Nitti, P. Spagnolo, E. Stella, Method and system for the detection and the classification of events during motion actions. Patent PCT/IB2007/050652, International Publication Number (IPN) WO/2007/099502 (2007)

    Google Scholar 

  6. B.A. Wandell, Book Rvw: Foundations of vision. By B.A. Wandell. J. Electron. Imaging 5(1), 107 (1996)

    Google Scholar 

  7. B.K.P. Horn, B.G. Schunck, Determining optical flow. Artif. Intell. 17, 185–203 (1981)

    Article  Google Scholar 

  8. Ramsey Faragher, Understanding the basis of the kalman filter via a simple and intuitive derivation. IEEE Signal Process. Mag. 29(5), 128–132 (2012)

    Article  Google Scholar 

  9. P. Musoff, H. Zarchan, Fundamentals of Kalman Filtering: A Practical Approach. (American Institute of Aeronautics and Astronautics, Incorporated, 2000). ISBN 1563474557, 9781563474552

    Google Scholar 

  10. E. Meinhardt-Llopis, J.S. Pérez, D. Kondermann, Horn-schunck optical flow with a multi-scale strategy. Image Processing On Line, 3, 151–172 (2013). https://doi.org/10.5201/ipol.2013.20

  11. H.H. Nagel, Displacement vectors derived from second-order intensity variations in image sequences. Comput. Vis., Graph. Image Process. 21, 85–117 (1983)

    Article  Google Scholar 

  12. B.D. Lucas, T. Kanade, An iterative image registration technique with an application to stereo vision, in Proceedings of Imaging Understanding Workshop (1981), pp. 121–130

    Google Scholar 

  13. J.L. Barron, D.J. Fleet, S. Beauchemin, Performance of optical flow techniques, Performance of optical flow techniques. Int. J. Comput. Vis. 12(1), 43–77 (1994)

    Google Scholar 

  14. T. Brox, A. Bruhn, N. Papenberg, J. Weickert, High accuracy optical flow estimation based on a theory for warping. in Proceedings of the European Conference on Computer Vision, vol. 4 (2004), pp. 25–36

    Google Scholar 

  15. M.J. Black, P. Anandan, The robust estimation of multiple motions: parametric and piecewise smooth flow fields. Comput. Vis. Image Underst. 63(1), 75–104 (1996)

    Article  Google Scholar 

  16. J. Wang, E. Adelson, Representing moving images with layers. Proc. IEEE Trans. Image Process. 3(5), 625–638 (1994)

    Google Scholar 

  17. E. Mémin, P. Pérez, Hierarchical estimation and segmentation of dense motion fields. Int. J. Comput. Vis. 46(2), 129–155 (2002)

    Article  MATH  Google Scholar 

  18. Simon Baker, Iain Matthews, Lucas-kanade 20 years on: a unifying framework. Int. J. Comput. Vis. 56(3), 221–255 (2004)

    Article  Google Scholar 

  19. H.-Y. Shum, R. Szeliski, Construction of panoramic image mosaics with global and local alignment. Int. J. Comput. Vis. 16(1), 63–84 (2000)

    Google Scholar 

  20. David G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)

    Article  Google Scholar 

  21. F. Marino, E. Stella, A. Branca, N. Veneziani, A. Distante, Specialized hardware for real-time navigation. R.-Time Imaging, Acad. Press 7, 97–108 (2001)

    Article  MATH  Google Scholar 

  22. Stephen T. Barnard, William B. Thompson, Disparity analysis of images. IEEE Trans. Pattern Anal. Mach. Intell. 2(4), 334–340 (1980)

    Google Scholar 

  23. M. Leo, T. D’Orazio, P. Spagnolo, P.L. Mazzeo, A. Distante, Sift based ball recognition in soccer images, in Image and Signal Processing, vol. 5099, ed. by A. Elmoataz, O. Lezoray, F. Nouboud, D. Mammass (Springer, Berlin, Heidelberg, 2008), pp. 263–272

    Google Scholar 

  24. M. Leo, P.L. Mazzeo, M. Nitti, P. Spagnolo, Accurate ball detection in soccer images using probabilistic analysis of salient regions. Mach. Vis. Appl. 24(8), 1561–1574 (2013)

    Article  Google Scholar 

  25. T. D’Orazio, M. Leo, C. Guaragnella, A. Distante, A new algorithm for ball recognition using circle hough transform and neural classifier. Pattern Recognit. 37(3), 393–408 (2004)

    Article  Google Scholar 

  26. T. D’Orazio, N. Ancona, G. Cicirelli, M. Nitti, A ball detection algorithm for real soccer image sequences, in Proceedings of the 16th International Conference on Pattern Recognition (ICPR’02), vol. 1 (2002), pp. 201–213

    Google Scholar 

  27. Y. Bar-Shalom, X.R. Li, T. Kirubarajan, Estimation with Applications to Tracking and Navigation (Wiley, 2001). ISBN 0-471-41655-X, 0-471-22127-9

    Google Scholar 

  28. S.C.S. Cheung, C. Kamath, Robust techniques for background subtraction in urban traffic video. Vis. Commun. Image Process. 5308, 881–892 (2004)

    Google Scholar 

  29. C.R. Wren, A. Azarbayejani, T. Darrell, A. Pentland, Pfinder: real-time tracking of the human body. IEEE Trans. Pattern Anal. 19(7), 780–785 (1997)

    Google Scholar 

  30. D. Makris, T. Ellis, Path detection in video surveillance. Image Vis. Comput. 20, 895–903 (2002)

    Article  Google Scholar 

  31. C. Stauffer, W.E. Grimson, Adaptive background mixture models for real-time tracking, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2 (1999)

    Google Scholar 

  32. A Elgammal, D. Harwood, L. Davis, Non-parametric model for background subtraction, in European Conference on Computer Vision (2000), pp. 751–767

    Google Scholar 

  33. N.M. Oliver, B. Rosario, A.P. Pentland, A bayesian computer vision system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 831–843 (2000)

    Article  Google Scholar 

  34. R. Li, Y. Chen, X. Zhang, Fast robust eigen-background updating for foreground detection, in International Conference on Image Processing (2006), pp. 1833–1836

    Google Scholar 

  35. A. Sobral, A. Vacavant, A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos. Comput. Vis. Image Underst. 122(05), 4–21 (2014). https://doi.org/10.1016/j.cviu.2013.12.005

  36. Y. Benezeth, P.M. Jodoin, B. Emile, H. Laurent, C. Rosenberger, Comparative study of background subtraction algorithms. J. Electron. Imaging 19(3) (2010)

    Google Scholar 

  37. M. Piccardi, T. Jan, Mean-shift background image modelling. Int. Conf. Image Process. 5, 3399–3402 (2004)

    Google Scholar 

  38. B. Han, D. Comaniciu, L. Davis, Sequential kernel density approximation through mode propagation: applications to background modeling, in Proceedings of the ACCV-Asian Conference on Computer Vision (2004)

    Google Scholar 

  39. D.J. Heeger, A.D. Jepson, Subspace methods for recovering rigid motion i: algorithm and implementation. Int. J. Comput. Vis. 7, 95–117 (1992)

    Article  Google Scholar 

  40. H.C. Longuet-Higgins, K. Prazdny, The interpretation of a moving retinal image. Proc. R. Soc. Lond. 208, 385–397 (1980)

    Google Scholar 

  41. H.C. Longuet-Higgins, The visual ambiguity of a moving plane. Proc. R. Soc. Lond. 223, 165–175 (1984)

    Google Scholar 

  42. W. Burger, B. Bhanu, Estimating 3-D egomotion from perspective image sequences. IEEE Trans. Pattern Anal. Mach. Intell. 12(11), 1040–1058 (1990)

    Article  Google Scholar 

  43. A. Branca, E. Stella, A. Distante, Passive navigation using focus of expansion, in WACV96 (1996), pp. 64–69

    Google Scholar 

  44. G. Convertino, A. Branca, A. Distante, Focus of expansion estimation with a neural network, in IEEE International Conference on Neural Networks, 1996, vol. 3 (IEEE, 1996), pp. 1693–1697

    Google Scholar 

  45. G. Adiv, Determining three-dimensional motion and structure from optical flow generated by several moving objects. IEEE Trans. Pattern Anal. Mach. Intell. 7(4), 384–401 (1985)

    Article  Google Scholar 

  46. A.R. Bruss, B.K.P. Horn, Passive navigation. Comput. Vis., Graph., Image Process. 21(1), 3–20 (1983)

    Article  Google Scholar 

  47. M. Armstong, A. Zisserman, P. Beardsley, Euclidean structure from uncalibrated images, in British Machine Vision Conference (1994), pp. 509–518

    Google Scholar 

  48. T.S. Huang, A.N. Netravali, Motion and structure from feature correspondences: a review. Proc. IEEE 82(2), 252–267 (1994)

    Google Scholar 

  49. S.J. Maybank, O.D. Faugeras, A theory of self-calibration of a moving camera. Int. J. Comput. Vis. 8(2), 123–151 (1992)

    Article  Google Scholar 

  50. D.C. Brown, The bundle adjustment - progress and prospects. Int. Arch. Photogramm. 21(3) (1976)

    Google Scholar 

  51. W. Triggs, P. McLauchlan, R. Hartley, A. Fitzgibbon, Bundle adjustment - a modern synthesis, in Vision Algorithms: Theory and Practice ed. by W. Triggs, A. Zisserman, R. Szeliski (Springer, Berlin, 2000), pp. 298–375

    Google Scholar 

  52. C. Tomasi, T. Kanade, Shape and motion from image streams under orthography: a factorization method. Int. J. Comput. Vis. 9(2), 137–154 (1992)

    Article  Google Scholar 

  53. T. Gramegna, L. Venturino, M. Ianigro, G. Attolico, A. Distante, Pre-historical cave fruition through robotic inspection, in IEEE International Conference on Robotics an Automation (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arcangelo Distante .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Distante, A., Distante, C. (2020). Motion Analysis. In: Handbook of Image Processing and Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-030-42378-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-42378-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-42377-3

  • Online ISBN: 978-3-030-42378-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics