Taylor's Method

What is it?

It's a method for recovering the configuration of a 3D articulated object as e.g. the human body.

The only assumption we do is that the 3D → 2D projection can be roughly modeled by a scaled orthographic projection.

Thereby only three sources of information are needed:

• the 2D projection (2D position) of some points of the 3D model
• the lengths l of the segments of the 3D model (for the humand body: the limb lengths)
• the scale s of the scaled orthographic projection

Who invented it?

The idea was presented by Camillo J. Taylor in his paper

which appeared in the Journal of Computer Vision and Image Understanding, Vol.80, pp. 677-684, 2000.

When is a scaled orthographic projection a good assumption?

3d to 2d projections are divided into

• parallel projections where the direction of projection (DOP) is the same for all 3d points
• in orthographic projections the DOPs are perpendicular to the image plane
• in oblique projections the DOPs are not perpendicular to the image plane
• and perspective projections where the DOPs are not the same for all 3d points

In general 2D images of the 3D world are perspective projections: objects become smaller as they get further from the camera.

In a parallel projection scenario two objects of the same 3D size would be projected to the same 2D projection size regardless of a different distance to the camera.

But if the depth of field of the object of interest is small with respect to the camera ↔ object distance the scaled orthographic projection is a good approximation for the perspective one. Thus it is also called weak perspective projection.

Here is a sample of a 3D pose (markers: light blue) projected to a virtual image plane (the transparent rectangle) by a scaled orthographic projection. The 2D pose markers are colored in dark blue:

How does it work?

Assuming the projection from 3D to 2D can be modeled roughly by a scaled orthographic projection and

• we know the 2D positions (u,v) in the image plane of some of the 3D model points (x,y,z) with u = s*x+dx, v=s*y+dy
• we know the relative lengths of the model segments
• we have an approximation for the scale s of the scaled orthographic projection

If we know the 2D position (u1,v1) of the start point and the end position (u2,v2) of a segment, i.e. the 2d projection length l' of a model segment and the 3D length l of the segment, we can use this foreshortening information of body segments to constrain the offset dz=z1-z2 between the corresponding 3D points (x1,y1,z1) and (x2,y2,z2) up to a sign if we know the scale s of the projection.

Since we cannot know which of the two points has the smaller z-coordinate this ambiguity is left. It's the ambiguity / information loss introduced by the nature of a 3D to 2D projection.

The maths of Taylor's method in a nutshell

Ups! The last formula is wrong… ((u1-u2)^2 + (v1-v2)^2 ) / s^2.

There has to be a + and not a - between the (u1-u2)^2 and (v1-v2)^2.

Do we get an unique result?

No! Unfortunately not…

Even if we know

• the scale of the scaled orthographic projection
• and the true bone lengths

there is still a huge set of possible poses that fit Taylor's equations.

Reason: since we have the +/- sign amibiguity for each of the bone of the body model (for each bone: 2 possible values of dz), for a body model with N bones we have 2^N possible poses.

In the following animation I rendered some to show how ambiguous the result still is: all the displayed 3D poses are projected to the same 2D pose!

What happens if we estimate the bone length or the scale wrong?

Then the dz value will be wrong!

Consider the difference below the square root.

Bone length:

• if the bone length l is estimated too big ⇒ minuend will be too big ⇒ dz will be too big
• if the bone length l is estimated too small ⇒ minuend will be too small ⇒ dz will be too small

Scale:

• if the scale s is estimated too big ⇒ subtrahend will be too small ⇒ dz will be too big
• if the scale s is estimated too small ⇒ subtrahend will be too big ⇒ dz will be too small

Underestimating the scale s ⇒ dz values get too small (same effect as understimating the bone lengths):

Overestimating the scale s ⇒ dz values get too big (same effect as overestimating the bone lengths): 