3D Dynamic Pose Estimation from Marker-Based Optical Data

  • W. Scott Selbie
  • Marcus J. Brown
Reference work entry


The desire to capture images of human movement has existed since prehistoric times (see chapter “Observing and Revealing the Hidden Structure of the Human Form in Motion Throughout the Centuries”). However, it is only since the late nineteenth century and the development of cameras able to capture multiple sequential images that the recording and quantitative analysis of movement has become possible. With modern cameras and high computational power now available, it is commonplace for researchers and clinicians to make detailed measurements, from which an estimation of the position and orientation (pose) of a human body during motion can be computed. This chapter focuses on the estimation of dynamic 3D pose based on optical motion capture systems that record the 3D location of markers attached to the body (see Fig. 1). In this chapter, we describe the estimation of the pose of a multibody model comprising segments that are connected by joints that constrain the direction and range of motion between those segments. There are three common deterministic solutions to the problem of pose estimation; direct, single body, and multibody. This chapter focuses on the two optimization methods, single body and multibody, that provide a deterministic and a discriminative solution to the problem of pose estimation. Unlike the direct pose estimation, these two approaches mitigate, to some extent, uncertainty in the data.


Skeletal modeling Pose estimation Motion-capture Inverse kinematics Soft tissue artifact Optimization 


For this chapter, the assumption underlying pose estimation is that the human body model is constructed from a set of rigid (nondeformable) segments (or bodies) (see  “Three-Dimensional Reconstruction of the Human Skeleton in Motion”). While this is not literally true, it allows a straightforward model that, for many biomechanical analyses, provides an adequate representation of the underlying skeletal structure for describing motor coordination and functional performance. Each segment is defined by a local anatomical reference frame (Cartesian coordinate system). These subject-specific anatomical reference frames (AF) are often defined by the location of anatomically palpable landmarks, by matching statistical shape models to surface geometry, or by system identification methods such as functional joints estimated from recorded movements. Regardless of the technique used to establish the reference frame, the common goal is to establish an anatomically relevant reference frame that can be determined reliably and reproducibly. The origin of the reference frame can be located anywhere, but for convenience in this chapter the origin is placed at the proximal end of a segment coincident with the distal end of an adjacent segment (a joint connecting to the parent segment) (Fig. 1). Each segment is restricted to having one parent segment, and the segment’s interaction with its parent segment is described by the specification of joint constraints acting at and around the origin of a segment relative to the parent segment. These joint constraints define the number of degrees of freedom and possibly a prescribed relative path of the segments comprising the joint. The number of degrees of freedom can be any integer value between zero and six. A joint constrained to zero degrees of freedom with no path constraint allows no relative motion between segments, while a joint with six degrees of freedom allows the segments to move independently of each other.
Fig. 1

A rigid body (part of a multibody model) is defined by an anatomical reference frame (AF) drawn as red vectors. The left leg is shown with N tracking markers m i and vectors a i describing the location of m i in AF for the right thigh and shank. Note that this configuration of tracking markers is but one of many different configurations in common use. The right leg displays a cluster of markers (sometimes secured to a rigid shell) and the left leg displays skin-based tracking markers attached at palpable anatomical landmarks

To estimate the pose of the multibody model, the 3D location of reflective markers attached to the segments is recorded by one or more optical sensors. It is beyond the scope of this chapter to describe the algorithms for identifying these 3D locations from the optical sensors, but regardless of the optical technology, the resultant 3D locations are used consistently between approaches. The tracking of each segment (pose estimation during a dynamic trial) is accomplished by establishing the location of the markers in the segment’s anatomical reference frame to which they are attached, recording the location of these markers in each frame of a motion trial, and by satisfying the specified joint constraints. A fundamental assumption of the algorithms presented in this chapter is that segments are rigid and the markers attached to those segments are secured rigidly and do not move relative to the segment to which they are attached. The number of markers required, and the number of segments to which markers are attached, depends on the structure of the multibody model and the pose estimation algorithm being used. The most important concept within this methodology is observability. Observability is dealt with in more detail later in the chapter; however, in short, a system is observable if the data are sufficient to describe, uniquely, the pose of the model. If the markers were truly attached rigidly to the underlying skeleton, i.e., a marker’s coordinates in the AF were invariant during movement, and the segments of the multibody model were truly rigid, and the markers were never occluded, this would be a straightforward chapter as all the pose estimation methods described in the scientific literature and textbooks would yield reliable pose estimations, and we could choose the mathematically simplest approach.

Any marker that is attached to the skin, however, can move relative to the underlying skeleton (Cappozzo et al. 1996). This relative motion occurs as flesh between the marker and skeleton deforms during movements, and is commonly known in the biomechanics community as soft tissue artifact (STA). It is, as yet, challenging to mitigate STA through mathematical approaches because, while STA is systematic, it varies on a case-by-case basis between individuals, between locations on the body, and between movements. Pose estimation algorithms that mitigate these “uncertainties” resulting from STA can improve the effectiveness of pose estimation dramatically.

The two pose estimation algorithms discussed in this chapter are common in the biomechanics community and are deterministic and discriminative. In other words, they rely solely on the structure of the multibody model and instantaneous data to estimate pose. This is in contrast with probabilistic pose estimation, in which prior information (e.g., models of STA or predictions based on the statistics of past performance) are incorporated into the pose estimation algorithm (see chapter  “3D Dynamic Probabilistic Pose Estimation From Data Collected Using Cameras and Reflective Markers”).

State of the Art

Six Degree of Freedom (6DOF) Pose Estimation

This section describes an algorithm for six degree of freedom (6DOF) pose estimation, sometimes referred to as a segment optimization algorithm (Lu and O’Connor 1999) or single-body optimization. To estimate the pose of a segment at each frame of data, the 6DOF algorithm requires that a set of not less than three noncollinear markers be attached to each segment. To clarify the need for three markers, we will describe the information available from 1, 2, or more markers on a segment. If a segment was to have a single marker attached to it, this marker would permit the estimation of translations of the segment along the three principal axes of the global reference frame (e.g., 3DOF). If a second marker was added, it would be possible to estimate rotations about two principal axes of the segment; however, rotations about an axis between the two markers would be undetectable (e.g., 5DOF). When a third marker is added, offset from the line between the first two, rotations about all three segmental axes become observable (e.g., 6DOF). Additional markers on a segment cannot increase the number of degrees of freedom but, as will be see below, can be useful in a least-squares sense. This method is referred to as a 6DOF method because each segment (or joint) is considered to have six independent variables that describe its pose; three variables describe the location of the segment’s origin within the global reference frame (its position) and three variables describe the rotation about each of the principal axes of the segment (its orientation). In principle, each segment can be tracked independently of any other segment. This independence infers that there is no explicit linkage defined, i.e., there are no preconceived assumptions about the properties of any joint connecting segments. This means that the endpoints of a segment, and those of its the proximal and distal adjacent neighbors, are free to move relative to each other, based directly and solely on the recorded motion capture (MoCap) data (Cappozzo et al. 1995). This independent estimation of the pose of the segments requires that markers used to track one segment are not used to track any other segment. It is quite common, however, for one marker to be used as a tracking marker on two adjacent segments. For example, a lateral knee marker may be used as a tracking marker on the thigh and the shank. In this situation, the thigh and shank segments are still 6DOF because six variables describe the motion of a segment, but in this case the segments are not actually independent of each other.

The 6DOF algorithm we describe here estimates the pose of a segment using a least-squares procedure (Kepple and Stanhope 2000). Consider a point m i attached to a segment, whose location is represented by vector a i in the AF. The location of the same marker m i is represented by vector v i in the GF (v i  = the data recorded). The relationship between a i and v i is given by:
$$ {v}_i={R}_{AG}{a}_i+{O}_{AG} $$
  • R AG is a rotation matrix from AF to GF

  • O AG is the translation from AF to GF.

The rotation matrix R AG and translation vector O AG may be computed at any instant, given that at least three noncollinear vectors a i are assumed stationary in the AF, and v i are recorded in GF, by minimizing the sum of squares error expression:
$$ f\left({R}_{AG},{O}_{AG}\right)=\sum \limits_{i=1}^N{\left({v}_i-{R}_{AG}{a}_i-{O}_{AG}\right)}^2 $$
where N is equal to the number of tracking targets on the segment. There are an infinite number of solutions of R AG and O AG that will produce minima for Eq. 2. Not all of these solutions result in R AG being a rotation matrix, so we specify the orthonormal constraint \( {R}_{AG}^T{R}_{AG}=I \)as a boundary condition on the solution (Spoor and Veldpaus 1980):
$$ g\left({R}_{AG}\right)={R}_{AG}^T{R}_{AG}-I=0 $$
The method sets the gradient of Eq. 2 equal to the gradient of Eq. 3 times a set of Lagrangian multipliers:
$$ \mathit{\nabla f}\left({R}_{AG},{O}_{AG}\right)=\lambda \mathit{\nabla g}\left({R}_{AG}\right) $$
This results in a system of algebraic equations:
$$ \mathit{\nabla f}\left({R}_{AG},{O}_{AG}\right)-\lambda \mathit{\nabla g}\left({R}_{AG}\right)=0 $$
for which there exists an exact solution as long as N ≥ 3

The 6DOF algorithm requires a minimum of three noncollinear tracking markers, but more can be accommodated because the 6DOF algorithm permits a solution for an over-specified system with an unlimited number of tracking markers on a segment. This over-specification means that, provided noise (or some features of STA) in the data is uncorrelated, the least-squares algorithm will act to minimize the effects of the noise. If one or more tracking targets are missing in any frame(s), the over-specification still allows a calculated segmental pose, provided at least three noncollinear targets are present. The observability for a 6DOF method is straightforward because it is simply N ≥ 3, provided the locations of the markers are fixed in the AF, and are not collinear. In principle, tracking markers can be placed anywhere on a rigid segment. In practice, marker placement on an anatomical segment is a compromise between distributing markers over the entire surface of a segment and placing markers in areas that exhibit minimal STA (Cappozzo et al. 1997). As concluded in a review article by Cereatti et al. (2006), there have been attempts to modify the 6DOF algorithm in order to mitigate the effects of STA (Cappozzo et al. 1997; Andriacchi et al. 1998), but none of these approaches have proved satisfactory.

Pose Estimation Using a Technical Reference Frame (TF)

While this chapter is focused on estimating pose from marker data, it is convenient at this time to discuss briefly pose estimation from two other 6DOF sensors: electromagnetic sensors and Moiré-phase tracking. It is beyond the scope of this chapter to describe the theory behind the sensor technology, but in summary, electromagnetic systems record the 6DOF pose of a sensor relative to an emitted electromagnetic dipole field. The Moiré-phase tracking (MPT) 3D motion capture system (Weinhandl et al. 2010) is a single-camera 3D motion tracking technology that tracks the 6DOF pose of a Moiré target (a lightweight, multilayer passive optical target; Weinhandl et al. 2010). The important idea to note is that these sensors describe their pose relative to an internal reference frame, not an anatomical frame. To put these 6DOF sensors in the context of marker-based MoCap (the focus of the chapter), we consider a slightly different approach to the 6DOF algorithm.

Consider the same markers m i from Fig. 1, but instead of creating vectors a i in the anatomical reference frame of the segment, we create vectors b i in a technical reference frame (TF) defined by the markers (Fig. 2). In this description, the segment origin is located at one of the markers (m2), the principal axis is defined by vector from (m2 to m1), and the reference frame is established from the principal axis and m3. This adds another “layer” to the pose estimation as it requires an additional step to include the transformation between this TF and the associated AF (R TA , O TA ).
Fig. 2

A rigid body (part of a multibody model) is defined by an anatomical reference frame (AF) drawn as red vectors. The left leg is shown with N tracking markers m i and vectors b i describing the location of m i in a technical reference frame TF for the right thigh and shank. The left leg displays a Moiré-phase tracking sensor (top) and electromagnetic sensor (middle)

Using the same markers (m i ) from one frame of data, and assuming that the transformation from TF to AF is invariant, we can identify (R TA , O TA ) from vector calculus using the same methods used to define AF in the first place.

Consider a point m i attached to a segment, whose location is represented by vector b i in the TF. Eq. (1) is written as:
$$ {v}_i={R}_{TG}{b}_i+{O}_{TG} $$
  • R TG is a rotation matrix from TF to GF

  • O TG is the translation from TF to GF

  • R TG and O TG are computed as in Eq. 4.

The resulting pose estimation (R TG , O TG ) in a local reference frame, defined by the markers independently of the anatomy, is similar to the pose estimates of the other 6DOF sensors. In marker-based MoCap, it is possible to define the relationship between the markers and the anatomy because markers can be placed in locations that have anatomical meaning. With the electromagnetic and MPT technologies, the anatomical locations can be identified with a pointer, or system identification methods can be used to identify the AF. As in the previous section, our goal is once again to identify (R AG ,O AG ), but in this case, the least-squares solution computes the transformation from the TF to the GF (R TG ,O TG ). The additional step is to include the additional transform from TF to AF (R TA ,O TA ).
$$ {R}_{AG}={R}_{TA}^T{R}_{TG} $$
$$ {O}_{AG}={O}_{TG}-{O}_{AT} $$

There is a considerable benefit to the 6DOF approach to pose estimation, as it is straightforward to implement with results that are easy to understand. The 6DOF solution has no local minima, and requires no guidance from users. Notably, 6DOF estimates a pose that is an accurate representation of the data, which is useful for identifying local problems. An example of such a local problem would be the swapping of the names of two markers between trials, or even within a trial (something not uncommon when working with many passive reflected marker-based MoCap systems). Such mislabeling of markers will cause obvious discontinuities in the pose estimations of a 6DOF segment, which can be easily identified and corrected. The deterministic assumption that neither STA nor noisy marker data occur can result in pose estimations where the adjacent endpoints of segments are dislocated from each other or “merge” together. While these pose solutions reflect the true marker data, and thus highlight the presence of noise and/or STA, they can present estimations of pose that are anatomically impossible. To highlight the serious challenge of STA, if the entire set of markers translates in unison (e.g., through inertial forces or impact), the estimated pose of the segment can be quite wrong. There is, however, no information in the relative configuration of the tracking markers to indicate that anything has gone awry, so this artifact cannot be mitigated. The next section describing inverse kinematics discusses a deterministic approach to remove such an obvious artifact as joint disarticulation from the 6DOF model.

Inverse Kinematics (IK) Pose Estimation

Inverse kinematics ( IK) is the search for, and identification of, an optimal pose of a multibody model with explicit joint constraints, such that the overall differences between the measured and model-estimated marker coordinates are minimized, in a least-squares sense, at a system level. Lu and O’Connor (1999) termed this process global optimization, but in this chapter, we will refer to this as multibody optimization or IK. IK, as described here, is a least-squares solution that may be considered an extension to the 6DOF pose estimation because if a joint is ascribed six degrees of freedom within the IK, the IK and 6DOF solutions are equivalent. Selecting appropriate joint constraints is idiosyncratically based on the number of markers being tracked, the context of the motion being analyzed, and many other factors; some of these factors will be discussed later in this section. As with 6DOF, the algorithms involved in IK pose estimation will be described in the context of marker-based optical 3D MoCap (Fig. 3).
Fig. 3

A multibody model showing the pelvis as a root segment (e.g., 6DOF with respect to the global reference frame) and joint constraints at the hip, knee, and ankle of the left leg. In this figure, the hip has 3DOF, the knee 5DOF, and the ankle 3DOF, but many other multibody configurations can be found in the literature

The solution to the IK is the pose of a multibody model that best matches the MoCap data, in terms of a least-squares criterion . In the Lu and O’Connor (1999) approach, the IK solution is found for each frame of data, independent of any previous or subsequent frames of data. Mathematically, van den Bogert and Su (2008) described this approach, based on the overall configuration of the multibody model, using a set of generalized coordinates q.

Generalized coordinates are the minimum set of independent variables that describe the pose. In this case, R and O of Eq. 1 now consist of multiple transformations and become a function of the generalized coordinate vector q:
$$ {v}_i=R(q){a}_i+O(q) $$
The expression that is minimized becomes:
$$ f\left(R,O\right)=f(q)=\sum \limits_{i=1}^N{\left({v}_i-R(q){a}_i-O(q)\right)}^2 $$
where N is the total number of targets on all the segments in the IK chain.


Within an IK model, it is possible to rely more on data that are known, a priori, to contain less noise or be less affected by STA. This can be achieved via a weighting.
$$ f\left(R,O\right)=f(q)=\sum \limits_{i=1}^N{\alpha}_i{\left({v}_i-R(q){a}_i-O(q)\right)}^2 $$

The selection of the weights, α i , can be made pragmatically and heuristically, or rules may be used that allow the computation of an optimal set of weights. Without a priori information, it is usually best to set α i to 1, but on occasion, when estimating pose, the user may want to ensure that certain segments follow the tracking targets with a higher degree of accuracy than other segments. For example, the user may want the distance between the foot and the floor (or recorded ground reaction force) to remain similar to the values that would be obtained using a 6DOF method because 6DOF is likely the best local estimate of the pose of the foot. Likewise, data from some markers may not be considered representative of the pose because they are noisy, so the weight of these data can be reduced. In some cases, the marker may be known to have substantial STA relative to one of the degrees of freedom (generalized coordinates) and the influence of the marker on this generalized coordinate can be removed.

Observability of the Inverse Kinematics

As mentioned previously, the pose of a multibody model is observable if the data are sufficient to describe the pose uniquely. In the case of the 6DOF pose estimation, three or more rigidly attached, noncollinear targets are required to track each segment. When one target is placed on a rigid segment, three independent pieces of information can be obtained, the X, Y, and Z coordinates of the target. When a second target is placed on the segment, two further pieces of information are obtained. The number of new pieces of information for the second target is two, not three like the first target, because if we know the X and Y locations of the second target, then the Z coordinate is known because the distance between the first and second targets is fixed. Thus, two targets only supply five of the six unknowns. When a third target is added, one additional piece of information is supplied; note the third target only adds one new piece of information because the distance from the third target to the first target and the distance from the third target to the second target are fixed. Still with three noncollinear targets, we have sufficient information to fully solve the pose of a 6DOF segment.

With IK, not only is there the assumption of rigid segments, but there are also constraints added at the joints. A consequence of the joint constraints is that fewer than three markers may be sufficient to fully determine the pose of a segment. For example, a segment that has only one degree of freedom (e.g., one connected to a parent segment by a hinge joint) only requires one marker to fully determine the joint angle. It is not possible to just count markers, however, because if this one marker is coincident with the hinge joint, it does not provide any information and the pose is nonobservable. Therefore, the question of whether the markers provide sufficient information to determine the model’s pose is far more complex when joint constraints exist.

A straightforward approach to the problem would be to specify the number of targets required to track a segment, based solely on the type of joint connecting that segment to its parent. For example, Yeadon (1984) required two markers to track a segment connected to the parent via a ball joint (three degrees of freedom) or a universal joint (two degrees of freedom) and required only one marker when the segment was connected via a one degree of freedom hinge joint. Although this approach will guarantee that the system will likely be observable, if these requirements are met, it can be overly conservative and will occasionally consider the model to be unobservable, when in fact there is sufficient information available. For example, Schulz and Kimmel (2010) demonstrated that it is possible to track the pose of the thigh segment without actually placing any markers on the thigh. Yeadon’s method would declare this model to be unobservable. This is important because for many activities, the STA of markers on the thigh is detrimental to an accurate estimate of the pose and if Schulz’s assumption that the hip has three degrees of freedom and the knee has one degree of freedom is an accurate reflection of the movement, his approach could be useful for studying many activities.

To demonstrate how it is possible to calculate a general solution to the observability problem, consider the simple example of a single segment constrained to its parent (in this example, the ground) by a ball joint. This system can be fully described by three degrees of freedom: the Euler rotations, θ x  , θ y  , and θ z .

For this case, the general IK objective function Eq. 8 becomes:
$$ P(q)=\sum \limits_{i=1}^m{\left\{\left({R}^{\prime }(q){A}_i\right)\right\}}^2 $$
Assume there is only one target (m = 1) fixed to the segment, the local coordinates, in the AF, of that target (A x , A y , A z ) and the global coordinates of the targets, in the GF, are (P x , P y , P z ). Applying Eq. (10) for this simple case of one segment connected to the ground via a ball joint with a single tracking target, the objective function f(q) is:
$$ {\displaystyle \begin{array}{ll}& P(q)\\ {}& =\left|\begin{array}{c}{A}_X\mathit{\cos}\left({\theta}_z\right)\mathit{\cos}\left({\theta}_y\right)+{A}_y\left[\mathit{\cos}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_y\right)\mathit{\sin}\left({\theta}_x\right)+\mathit{\sin}\left({\theta}_z\right)\mathit{\cos}\left({\theta}_x\right)\right]+{A}_z\left[-\mathit{\cos}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_y\right)\mathit{\cos}\left({\theta}_x\right)+\mathit{\sin}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_x\right)\right]-{P}_x\\ {}-{A}_x\mathit{\sin}\left({\theta}_z\right)\mathit{\cos}\left({\theta}_y\right)+{A}_y\left[-\mathit{\sin}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_y\right)\mathit{\sin}\left({\theta}_x\right)+\mathit{\cos}\left({\theta}_z\right)\mathit{\cos}\left({\theta}_x\right)\right]+{A}_z\left[\mathit{\sin}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_y\right)\mathit{\cos}\left({\theta}_x\right)+\mathit{\cos}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_x\right)\right]-{P}_y\\ {}{A}_X\mathit{\sin}\left({\theta}_y\right)+-{A}_y\mathit{\cos}\left({\theta}_y\right)\mathit{\sin}\left({\theta}_x\right)+{A}_z\mathit{\cos}\left({\theta}_y\right)\mathit{\cos}\left({\theta}_x\right)-{P}_z\end{array}\right|\end{array}} $$
If a change in rotation (some combination of a change in θ x  , θ y  , θ z ) exists for which the target does not move, then the system is not observable. In order to establish whether this is the case, it is necessary to discover if a situation exists where the cost function does not change with respect to changes in the joint angle. This exactly describes the Jacobian (or matrix of partial derivatives) of the cost function:
$$ \mathrm{Jacobian}=\left|\begin{array}{ccc}\frac{d\left({P}_x\right)}{d\left({\theta}_x\right)}& \frac{d\left({P}_x\right)}{d\left({\theta}_y\right)}& \frac{d\left({P}_x\right)}{d\left({\theta}_z\right)}\\ {}\frac{d\left({P}_y\right)}{d\left({\theta}_x\right)}& \frac{d\left({P}_y\right)}{d\left({\theta}_y\right)}& \frac{d\left({P}_y\right)}{d\left({\theta}_z\right)}\\ {}\frac{d\left({P}_z\right)}{d\left({\theta}_x\right)}& \frac{d\left({P}_z\right)}{d\left({\theta}_y\right)}& \frac{d\left({P}_z\right)}{d\left({\theta}_z\right)}\end{array}\right| $$
Calculating the Jacobian of the cost function described in Eq. 10:
$$ \frac{d\left({P}_x\right)}{d\left({\theta}_x\right)}={A}_y\left[\cos \left({\theta}_z\right)\sin \left({\theta}_y\right)\cos \left({\theta}_x\right)-\sin \left({\theta}_z\right)\sin \left({\theta}_x\right)\right]+{A}_z\left[\cos \left({\theta}_z\right)\sin \left({\theta}_y\right)\sin \left({\theta}_x\right)+\sin \left({\theta}_z\right)\cos \left({\theta}_x\right)\right] $$
$$ \frac{d\left({P}_x\right)}{d\left({\theta}_y\right)}=- Ax\left[\cos \left({\theta}_z\right)\sin {\theta}_y\right]+{A}_y\left[\sin \left({\theta}_z\right)\cos \left({\theta}_y\right)\sin \left({\theta}_x\right)-\sin \left({\theta}_z\right)\sin \left({\theta}_x\right)\right]+{A}_z\left[-\cos \left({\theta}_z\right)\cos \left({\theta}_y\right)\cos \left({\theta}_x\right)\right] $$
$$ \frac{d\left({P}_x\right)}{d\left({\theta}_z\right)}=- Ax\left[\sin \left({\theta}_z\right)\cos {\theta}_y\right]+{A}_y\left[-\sin \left({\theta}_z\right)\sin \left({\theta}_y\right)\sin {\theta}_x\right)+\cos \left({\theta}_z\right)\cos \left({\theta}_x\right)\Big]+{A}_z\left[\sin \left({\theta}_z\right)\sin \left({\theta}_y\right)\sin \left({\theta}_x\right)+\cos \left({\theta}_z\right)\sin \left({\theta}_x\right)\right] $$
$$ \frac{d\left({P}_y\right)}{d\left({\theta}_x\right)}={A}_y\left[-\sin \left({\theta}_z\right)\sin \left({\theta}_y\right)\cos \left({\theta}_x\right)-\cos \left({\theta}_z\right)\sin \left({\theta}_x\right)\right]+{A}_z\left[-\sin \left({\theta}_z\right)\sin \left({\theta}_y\right)\sin \left({\theta}_x\right)+\cos \left({\theta}_z\right)\cos \left({\theta}_x\right)\right] $$
$$ \frac{d\left({P}_y\right)}{d\left({\theta}_y\right)}= Ax\left[\sin \left({\theta}_z\right)\sin {\theta}_y\right]+{A}_y\left[-\sin \left({\theta}_z\right)\cos \left({\theta}_y\right)\sin \left({\theta}_x\right)-\cos \left({\theta}_z\right)\sin \left({\theta}_x\right)\right]+{A}_z\left[\sin \left({\theta}_z\right)\cos \left({\theta}_y\right)\cos \left({\theta}_x\right)\right] $$
$$ \frac{d\left({P}_y\right)}{d\left({\theta}_z\right)}=- Ax\left[\cos \left({\theta}_z\right)\cos {\theta}_y\right]+{A}_y\left[-\cos \left({\theta}_z\right)\sin \left({\theta}_y\right)\sin \left({\theta}_x\right)-\sin \left({\theta}_z\right)\cos {\theta}_x\right)\Big]+{A}_z\left[\cos \left({\theta}_z\right)\sin \left({\theta}_y\right)\sin \left({\theta}_x\right)-\sin \left({\theta}_z\right)\sin \left({\theta}_x\right)\right] $$
$$ \frac{d\left({P}_z\right)}{d\left({\theta}_x\right)}=-{A}_y\cos \left({\theta}_y\right)\cos \left({\theta}_x\right)-{A}_z\cos \left({\theta}_y\right)\sin \left({\theta}_x\right) $$
$$ \frac{d\left({P}_z\right)}{d\left({\theta}_y\right)}={A}_x\mathit{\cos}\left({\theta}_y\right)+{A}_y\sin \left({\theta}_y\right)\sin \left({\theta}_x\right)-{A}_z\sin \left({\theta}_y\right)\cos \left({\theta}_x\right) $$
$$ \frac{d\left({P}_z\right)}{d\left({\theta}_z\right)}=0 $$

To simplify this equation, consider the state where θ x  = 0,  θ y  = 0,   θ z  = 0

The Jacobian now becomes:
$$ \mathrm{Jacobian}\ \mathrm{of}\ \mathrm{cost}\ \mathrm{function}=\left|\begin{array}{ccc}0& -{A}_z& {A}_y\\ {}{A}_z& 0& -{A}_x\\ {}-{A}_y& {A}_x& 0\end{array}\right| $$
The determinant of the Jacobian is:
$$ Det=0\left(0-{A}_x{A}_x\right)-{A}_Z\left(\ {A}_x{A}_y-0\right)+{A}_y\ \left(\ {A}_z{A}_x-0\right)=0-{A}_x{A}_y{A}_z+{A}_x{A}_y{A}_z $$

Since the determinant of the Jacobian is zero, it is not invertible and its rank is not full; thus, one target is not sufficient to estimate the pose of a segment connected to ground via a ball joint.

Assume now that two targets are attached to the segment: A1 and A2. In this case, the cost function (Eq. 10) for one segment that is connected to ground by a ball joint is:
$$ {\displaystyle \begin{array}{ll}& P(p)\\ {}& =\left|\begin{array}{c}{A}_{x_1}\mathit{\cos}\left({\theta}_z\right)\mathit{\cos}\left({\theta}_y\right)+{A}_{y_1}\left[\mathit{\cos}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_y\right)\mathit{\sin}\left({\theta}_x\right)+\mathit{\sin}\left({\theta}_z\right)\mathit{\cos}\left({\theta}_x\right)\right]+{A}_{z_1}\left[-\mathit{\cos}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_y\right)\mathit{\cos}\left({\theta}_x\right)+\mathit{\sin}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_x\right)\right]-{P}_{x_1}\\ {}-{A}_{x_1}\mathit{\sin}\left({\theta}_z\right)\mathit{\cos}\left({\theta}_y\right)+{A}_{y_1}\left[-\mathit{\sin}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_y\right)\mathit{\sin}\left({\theta}_x\right)+\mathit{\cos}\left({\theta}_z\right)\mathit{\cos}\left({\theta}_x\right)\right]+{A}_{z_1}\left[\mathit{\sin}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_y\right)\mathit{\cos}\left({\theta}_x\right)+\mathit{\cos}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_x\right)\right]-{P}_y\\ {}{A}_{x_1}\mathit{\sin}\left({\theta}_y\right)+-{A}_{y_1}\mathit{\cos}\left({\theta}_y\right)\mathit{\sin}\left({\theta}_x\right)+{A}_{z_1}\mathit{\cos}\left({\theta}_y\right)\mathit{\cos}\left({\theta}_x\right)-{P}_z\\ {}{A}_{x_2}\mathit{\cos}\left({\theta}_z\right)\mathit{\cos}\left({\theta}_y\right)+{A}_{y_2}\left[\mathit{\cos}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_y\right)\mathit{\sin}\left({\theta}_x\right)+\mathit{\sin}\left({\theta}_z\right)\mathit{\cos}\left({\theta}_x\right)\right]+{A}_{z_2}\left[-\mathit{\cos}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_y\right)\mathit{\cos}\left({\theta}_x\right)+\mathit{\sin}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_x\right)\right]-{P}_{x_2}\\ {}-{A}_{x_2}\mathit{\sin}\left({\theta}_z\right)\mathit{\cos}\left({\theta}_y\right)+{A}_{y_2}\left[-\mathit{\sin}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_y\right)\mathit{\sin}\left({\theta}_x\right)+\mathit{\cos}\left({\theta}_z\right)\mathit{\cos}\left({\theta}_x\right)\right]+{A}_{z_2}\left[\mathit{\sin}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_y\right)\mathit{\cos}\left({\theta}_x\right)+\mathit{\cos}\left({\theta}_z\right)\mathit{\sin}\left({\theta}_x\right)\right]-{P}_{y_2}\\ {}{A}_{x_2}\mathit{\sin}\left({\theta}_y\right)+-{A}_{y_2}\mathit{\cos}\left({\theta}_y\right)\mathit{\sin}\left({\theta}_x\right)+{A}_{z_2}\mathit{\cos}\left({\theta}_y\right)\mathit{\cos}\left({\theta}_x\right)-{P}_{z_2}\end{array}\right|\end{array}} $$
Again, taking the simplest case and setting the orientation to
$$ {\theta}_x=0, \, {\theta}_y=0, \, {\theta}_z=0 $$

The Jacobian of the cost function now reduces to:

Jacobian of cost function = \( \left|\begin{array}{ccc}0& -{A}_{1_z}& {A}_{1_y}\\ {}{A}_{1_z}& 0& -{A}_{1_x}\\ {}-{A}_{1_y}& {A}_{1_x}& 0\\ {}0& -{A}_{2_z}& {A}_{2_y}\\ {}{A}_{2_z}& 0& -{A}_{2_x}\\ {}-{A}_{2_y}& {A}_{2_x}& 0\end{array}\right| \)

If targets A1 and A2 have coordinates (0, 0, A1z) and (0, 0, A2z) which they are collinear along the Z axis, we would expect the system to be unobservable as the targets will not register rotation about the Z axis. For this case:

Jacobian of cost function = \( \left|\begin{array}{ccc}0& -{A}_{1_z}& 0\\ {}{A}_{1_z}& 0& 0\\ {}0& 0& 0\\ {}0& -{A}_{2_z}& 0\\ {}{A}_{2_z}& 0& 0\\ {}0& 0& 0\end{array}\right| \)

Column 3 equals zero, not full column rank, and thus the system is not observable as expected.

Now If the two targets do not form a line that points to the joint center, for example:
$$ {A}_1=\left(0.1,0.1,0.1\right)\ \mathrm{and}\ {A}_2=\left(0.1,0.1,0.1\right) $$
The Jacobian now is:
$$ \mathrm{Jacobian}\ \mathrm{of}\ \mathrm{cost}\ \mathrm{function}=\left|\begin{array}{ccc}0& -0.1& 0.1\\ {}0.1& 0& -0.1\\ {}-0.1& 0.1& 0\\ {}0& -0.1& 0.1\\ {}0.1& 0& 0.1\\ {}-0.1& -0.1& 0\end{array}\right| $$

This matrix has a rank = 3, which is full column rank and thus marker information (A1 and A2) is independent and the model is fully observable.

Therefore, the general solution for observability in inverse kinematics reduces to determining whether the Jacobian for cost function of Eq. 10 has full rank. If it does, we have sufficient information to determine the pose of the model. Conversely, if the rank of the Jacobian of the IK cost function is not full rank, there is not enough information to determine a unique pose for the model.

IK Optimization Algorithms

In the general case, there is no analytic solution for the IK problem. We, therefore, summarize examples from two classes of implementation of a numerical solution to this optimization problem: direction search methods and global search methods.

Direction Search Methods (Newton’s Method)

To understand Newton’s method, consider a function f(q) that starts at an initial vector q0, moves through a series of vectors q k , and converges to a solution at q min .

Newton’s method is a three step process:
  1. 1.

    Compute the search direction

  2. 2.

    Determine the length of the next step

  3. 3.

    Use the results of steps 1 and 2 to obtain a new point q k . These steps are repeated until a minimum is found

To find the search direction (Step 1) using Newton’s method, consider a vector, q, on f(q) located near the current value q k .The vector q − q k can be approximated by a second-order Taylor series expansion:
$$ f\left(q-{q}_k\right)=f\left({q}_k\right)+\varDelta f\left({q}_k\right)\left(q-{q}_k\right)+\frac{B\left({q}_k\right){\left(q-{q}_k\right)}^2}{2}+\dots $$
where Δf(q k ) is the gradient of f at the current value q k and B is the Hessian, or matrix, of second partial derivatives at q k .
Taking the derivative of the function in Eq. 11 with respect to q and ignoring the derivative of the third term (e.g., the Hessian), we obtain:
$$ \varDelta f\left(q-{q}_k\right)=\varDelta f\left({q}_k\right)+B\left({q}_k\right)\left(q-{q}_k\right) $$
The derivative has a minimum at:
$$ 0=\varDelta f\left({q}_k\right)+B\left({q}_k\right)\left(q-{q}_k\right) $$
and thus the search direction, (q − q k ), can be obtained from:
$$ \left(q-{q}_k\right)=-\varDelta f\left({q}_k\right){B}^{-1}\left({q}_k\right) $$

After solving for the search direction, (q − q k ), the next point in the search, qk + 1, is found by moving in the direction of (q − q k ). Ideally, the step size is determined by the magnitude of the eigenvalues of the movement to ensure that we obtain a sufficient decrease in the cost function, without taking excessively small steps. In practice, steps sizes that have worked for previous data sets are assumed to be sufficient. Once qk + 1 is obtained, it is checked against a termination criterion (is (q − q k ) small). If the termination criterion is satisfied, then the minimum for the global IK problem is found. If the criteria is not met, the process is repeated, beginning at step 1 with qk + 1 acting as the new current value q k .

Ideally, Δf(q k ) and B (Hessian) are derived symbolically but this is not always straightforward. Furthermore, even if the symbolic version of the Hessian is derived, computing the inverse of the Hessian, B−1, requires a series of linear equations to be solved, which can be computationally costly. An alternative to this method, called the quasi-Newton method, Δf(q k ) and B are approximated numerically by the change in the gradient between steps. Several methods of approximation have been proposed that all follow three primary assumptions:
  1. 1.

    The Hessian must be symmetrical

  2. 2.

    The model gradient must be equal to the function gradient at the current step and at the previous step

  3. 3.

    The Hessian cannot change drastically between successive steps


The consequence of these assumptions is that convergence may be compromised. Unlike the 6DOF least-squares solution, there are many possible solutions to the IK optimization as the solution space typically has many local minima. If the initial estimated position, q0, is “close” to the global minimum, the solution will likely converge to the correct solution. The initial estimated position, or “seed,” is therefore critical to the success of the algorithm. For the first frame of data, it is possible to use a 6DOF solution as the seed. For subsequent frames, the seed for the optimization algorithm at any given frame is the state of the model at the previous frame. This could be problematic if the solution at the previous frame was an inappropriate local minimum, resulting in subsequent pose estimates diverging from the real solution due to being held in this local minimum. For example, the data collection volumes of most optical MoCap systems are smaller than the laboratory that they are in, and subjects often begin their movements outside the volume (for example, to perhaps ensure that they are at a constant speed while walking or running through the data collection volume). The first frame with complete data can often be relatively unreliable because it is captured near to the edge of the calibrated volume, and therefore the likelihood of the optimization solution becoming trapped in a local minimum increases. In order to avoid this, one potential improvement to the algorithm is to compute the solution both forward and backward, in the hope that one of the passes will provide a more optimal solution path.

Global Search Methods: Simulated Annealing

Simulated annealing (Higginson et al. 2005; Ingber 2012) is a Monte Carlo method in which the solution space is explored probabilistically by randomly searching near the best known solution. Simulated annealing is not prone to finding a local minima and therefore, given “enough” computing time (unfortunately, “enough” cannot be calculated but needs to be learned from experience), finds the global minimum. It is modeled after annealing in metallurgy, in which the thermodynamic free energy of a metal decreases as its temperature cools. In simulated annealing, as the virtual temperature cools, the algorithm searches in a smaller and smaller region around the best known solution (Fig. 4).
Fig. 4

Flowchart of the simulated annealing algorithm. The size of the perturbation is based on current temperature and the Metropolis criteria: \( \mathrm{Rand}\left(0,1\right)<{e}^{\frac{f_i-{f}_{\mathrm{best}}}{T}} \)

Simulated annealing functions using two nonobvious principles:
  1. 1.

    Some new values that do not actually reduce the minimum value are allowed so that more of the solution space can be explored. (The allowed values are determined by the Metropolis criteria.)

  2. 2.

    After making many estimates, and observing that the cost function declines slowly, one lowers the temperature and thus limits the size of allowed values that are larger than the current minimum. After lowering the temperature several times, only more optimal values are accepted, and the optimization approaches the global minimum.


One of the biggest challenges to simulated annealing is that the algorithm is computationally expensive, and perhaps more problematically, it is not possible to determine if the current solution is actually a global minimum without continuing the optimization indefinitely. In other words, there is no threshold or criterion for identifying that the search is complete. The user must decide how many iterations to perform in the optimization and accept that the minimum found in that time period may not be the global minimum. Despite the computational cost (time), simulated annealing is a more robust algorithm than direction search algorithms. Despite the robustness, however, most IK users opt for direction search algorithms because of time constraints.

6DOF Versus IK

In many circumstances, the IK solution is likely to be more anatomically congruent and therefore preferable to the 6DOF solution, but the user must attend to the determination of the appropriateness of the selected joint constraints. For example, an experiment that was focused on understanding the kinematics of an injured knee, where translations and rotations occur as a result of the injury (e.g., anterior cruciate ligament damage), would likely not benefit from an IK solution where the constraints, and consequent prescribed motion of the knee joint, “hide” the pathology. Finally, it is well known that residual errors, i.e., differences between model predictions and marker measurements, computed by IK algorithms are reflections of noise in the marker data, soft tissue artifact, and inaccurate marker placement. A limitation of the IK algorithm, however, is that it has no straightforward mechanism to compensate for systematic noise, even though it can be used to identify its presence.

Future Directions

In this chapter, we have described the current state of deterministic pose estimation algorithms.

The future evolution of deterministic algorithms is quite limited. Begon et al. (2016), for example, has introduced an approach that removes STA without modeling STA but rather by ignoring information in markers that are considered unreliable. For many segments of the human body, STA has a particularly disastrous effect on the axial rotation of the segment. In other words, the markers rotate about the long axis of the segment (upper arm, forearm, thigh to name a few). Begon’s solution was to ignore any information in the marker that would reflect axial rotation by projecting tracking markers onto the long axis of the segment. These projected markers influence five of the degrees of freedom of a segment only. The long axis rotation is then estimated based on the pose of adjacent-constrained segment. The example given by Begon is movement of the upper arm, in which the axial rotation of the upper arm is estimated by constraining the elbow joint to have only two rotational degrees of freedom, and therefore the axial rotation of the upper arm is based on the pose of the forearm. There is some potential for improvements to deterministic pose estimation algorithms based on similarly clever rejection of data in isolated/idiosyncratic cases.

It is our believe that the future of marker-based pose estimation lies not in deterministic algorithms but in algorithms based on Bayesian Inference (Todorov 2007) (chapter  “3D Dynamic Probabilistic Pose Estimation from Data Collected Using Cameras and Reflective Markers”) and algorithms based on optimal control theory (Miller and Hamill 2015) ( “Optimal Control Modeling of Human Movement”). Bayesian Inference allows a principled way to mitigate the effects of STA by modeling artifact and removing it. Optimal control theory is capable of generating motion independently of any recorded data based on generated simulated motion of the behavior based on some optimization criteria (e.g., minimum energy). The technique can be influenced by recorded data to ensure that the pose estimation is arbitrarily close to the recorded motion. Optimal control theory has the additional benefit of being able to generate solutions for unobservable, and even sparse, marker sets.

Lastly, it is important to consider algorithms for which the soft tissue artifact is considered important data reflective of an individual subject instead of an artifact to be removed. Michael Black’s laboratory at the Max Planck Institute for Intelligent systems has been developing pose estimation algorithms based on statistical shape models (Loper et al. 2015). Instead of defining pose based on the position and orientation of an underlying skeleton, this research has focused on modeling the surface geometry of the subject and estimating the pose of the surface. Based on high-density surface scans of subjects performing movement, the statistical shape model is a parameterized surface that can be subsequently fit to sparse surface data (e.g., markers). These models are remarkably good at representing the surface of the body during motion. From a biomechanics perspective, a fundamental question is whether we can infer the multibody skeletal pose from this parameterized surface data.



  1. Andriacchi TP, Alexander EJ, Toney MK, Dyrby C, Sum J (1998) A point cluster method for in vivo motion analysis: applied to a study of knee kinematics. J Biomech Eng 120:743–749CrossRefGoogle Scholar
  2. Begon M, Bélaise C, Naaim A, Lundberg A, Chèze L (2016) Multibody kinematics optimization with marker projection improves the accuracy of the humerus rotational kinematics. J Biomech (16):31111–31113Google Scholar
  3. Cappozzo A, Catani F, Croce UD, Leardini A (1995) Position and orientation in space of bones during movement: anatomical definition and determination. Clin Biomech 10(4):171–178CrossRefGoogle Scholar
  4. Cappozzo A, Catani F, Leardini A, Benedetti MG, Della Croce U (1996) Position and orientation in space of bones during movement: experimental artefacts. Clin Biomech 11(2):90–100CrossRefGoogle Scholar
  5. Cappozzo A, Cappello A, Della Croce U, Pensalfini F (1997) Surface-marker cluster design criteria for 3-D bone movement reconstruction. IEEE Trans Biomed Eng 44(12):1165–1174CrossRefGoogle Scholar
  6. Cereatti A, Della Croce U, Cappozzo A (2006) Reconstruction of skeletal movement using skin markers: comparative assessment of bone pose estimators. J Neuro Eng Rehabil 3(1):7CrossRefGoogle Scholar
  7. Higginson JS, Neptune RR, Anderson FC (2005) Simulated parallel annealing within a neighborhood for optimization of biomechanical systems. J Biomech 38:1938–1942CrossRefGoogle Scholar
  8. Ingber L (2012) In: Oliveira H, Petraglia A, Ingber L, Machado M, Petraglia M (eds) Adaptive simulated annealing, in stochastic global optimization and its applications with fuzzy adaptive simulated annealing. Springer, New York, pp 33–61Google Scholar
  9. Kepple T, Stanhope S (2000) Moved software. In: Winters, Crago (eds) Biomechanics and neural control of posture and movement. Springer, New YorkGoogle Scholar
  10. Loper M, Mahmood N, Romero J, Pons-Mol G, Black MJ (2015) SMPL: a skinned multi-person linear model. ACM Trans Graph 34(6):248:1–248:16. ACMCrossRefGoogle Scholar
  11. Lu TW, O’Connor JJ (1999) Bone position estimation from skin marker co-ordinates using global optimization with joint constraints. J Biomech 32:129–134CrossRefGoogle Scholar
  12. Miller R, Hamill J (2015) Optimal footfall patterns for cost minimization in running. J Biomech 48:2858–2864CrossRefGoogle Scholar
  13. Schulz BW, Kimmel WL (2010) Can hip and knee kinematics be improved by eliminating thigh markers?Clinical. Biomechanics 25(2010):687–692CrossRefGoogle Scholar
  14. Spoor C, Veldpaus F (1980) Rigid body motion calculated from spatial coordinates of markers. J Biomech 13(4):391–393CrossRefGoogle Scholar
  15. Todorov E (2007) Probabilistic inference of multijoint movements, skeletal parameters and marker attachment from diverse motion capture data. IEEE Trans on Biomed Eng 54:1927–1939CrossRefGoogle Scholar
  16. Van Den Bogert AJ, Su A (2008) A weighted least squares method for inverse dynamic analysis. Comput Methods Biomech Biomed Eng 11(1):3–9CrossRefGoogle Scholar
  17. Weinhandl JT, Armstrong BSR, Kusik TP, Barrows RT, O’Connor KM (2010) Validation of a single camera three-dimensional motion tracking system. J Biomech 43(7):1437–1440CrossRefGoogle Scholar
  18. Yeadon MR (1984) The mechanics of twisting somersaults. Doctoral thesis. University of CalgaryGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.HAS-Motion, Inc.KingstonCanada
  2. 2.C-Motion Inc.GermantownUSA

Section editors and affiliations

  • William Scott Selbie
    • 1
  1. 1.Research, C-Motion, Inc.GermantownUSA

Personalised recommendations