Learning an Affine 2D Transformation

Important

The material covered in this exercise forms the basis for the mandatory assignment, but involves different data. We highly recommend that you read all of the material carefully before attempting to solve the exercise and that you complete this exercise before solving the assignment.

Figure 1: Naming the planes and transformation

Assume that an affine mapping is sufficient to transform image coordinates of a person walking in the atrium at ITU to an overview map (image) of the ground-floor of ITU. The goal of this exercise is learn the coefficients of the affine mapping given training data and then subsequently map the tracking coordinates for the person in the video is given in the file trackingdata.dat .

As shown in Figure 1, denote

$I$: an image in the video ITUStudent.mov (for clarity $I = I_t$ where $t$ is time).
$G$: the ground-floor in $I$
$M$: overview map of the ground-floor plane in $I$ of ITU building (seen from above)
${T}_G^M$: the affine transformation from the ground-floor $G$ to the overview map $M$. Note that the subscripts define where the transformation maps from and to.

The goal is to learn the coefficients of the affine transformation and use it to map the tracking data to the overview map.

Overall, your tasks are to:

Learn the coefficients of the affine transformation (the entries of the transformation matrix) ${T}_G^M$.
Apply the mapping ${T}_G^M$ to transform the tracking data from the ground-floor $G$ to the overview map $M$.
Display the location of the person on the ground-floor on the overview map.

Overall, completing these steps will potentially allow you to compute metric measurments such as distances, speed and acceleration of the tracked person.

Overview

Figure 2: The three sub-regions of the person being tracked.

The data is provided in the data/ folder. The 3 pairs of corresponding points are the training data for learning the affine mapping and the tracking data is the test data. The tracking data is obtained from trackingdata.dat where each row $r_i$ contains tracking coordinates for frame $i$ in the video ITUStudent.mov .

As shown in Figure 2, each rectangle is defined by coordinates pairs $(x_1, y_1)$ and $(x_2, y_2)$ corresponding to the top left and bottom right hand corners of the rectangle. The rectangles for specific bodyparts are stored in a dictionary data that has keys "body", "legs", and "all". To access the arrays of the corresponding body parts.

List of individual tasks

Task 1: Constructing the design matrix
Task 2: Implement the model
Task 3: Defining the affine transformation
Task 4: Affine transformation of points
Task 5: Plot results

Background on Affine mappings

An affine mapping $T$ is given by

$$ T = \begin{bmatrix} w_1 & w_2 & w_3 \\ w_4 & w_5 & w_6 \\ 0 & 0 & 1 \end{bmatrix}, $$

where $T$ transforms points between the planes by $p^{\prime} = T p$. Note that $p$ and $p^{\prime}$ are 2D homogeneous coordinates.

Define $\mathbf{w}$ as the vector of coefficients of $T$.

$$ \mathbf{w} = \begin{bmatrix} w_1 \\ w_2 \\ w_3\\ w_4 \\ w_5 \\ w_6 \end{bmatrix}. $$

The goal is to linearly estimate $\mathbf{w}$ given corresponding point pairs from $G$ and $M$.

Task 1: Constructing the design matrix

This task is divided into

(1) a step showing the linear relationship between a single input-output pair yielding the first rows of the design matrix and
(2) a step extending this to multiple input-output point pairs.

Given an input $p=\begin{bmatrix} x \\ y \\1\end{bmatrix}$ and output $p'=\begin{bmatrix} x' \\ y' \\1\end{bmatrix}$, explain how the knowns and unknowns of the affine mapping $T$ can be represented in the form $A_1\mathbf{w}=b_1$
$$ A_1=\begin{bmatrix} x & y & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & x & y & 1 \\ \end{bmatrix},\quad \mathbf{w}= \begin{bmatrix} w_1 \\ w_2 \\ w_3 \\ w_4 \\ w_5 \\ w_6 \end{bmatrix} ,\quad b_1=\begin{bmatrix} x^{\prime} \\ y^{\prime} \\ \end{bmatrix}. $$
where the knowns are located in $A_1$ and the unknowns in $\mathbf{w}$.

This means that a pair of corresponding points yield two equations in the design matrix $A$ and $\mathbf{w}$ contains the model parameters.
Argue why at least $3$ pairs of corresponding points are required to solve for the unknowns $\mathbf{w}$.
Show that the unknowns $\mathbf{w}$ can be by found through $\mathbf{w} = A^{-1}b$, where $A$ is the design matrix, containing only input values, and $b$ is the vector containing only output values.
$$ A = \begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & x_1 & y_1 & 1 \\ x_2 & y_2 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & x_2 & y_2 & 1 \\ x_3 & y_3 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & x_3 & y_3 & 1 \end{bmatrix}, b = \begin{bmatrix} x_1^{\prime} \\ y_1^{\prime} \\ x_2^{\prime} \\ y_2^{\prime} \\ x_3^{\prime} \\ y_3^{\prime} \end{bmatrix}. $$

Loading the data

A pre-selected set of 3 point pairs are loaded:

# importing needed libraries
import matplotlib.pyplot as plt
import numpy as np
import skimage

#### loading data
points_source = np.load('data/points_source.npy')
points_destination = np.load('data/points_destination.npy')

Visualizing the data

The 3 pairs of points are visualized in the following cell:

ITU_image = skimage.io.imread('data/image_ground.jpg')
ITU_RGB = ITU_image[:, :, :3]
plt.figure(figsize=(14, 8))
ITU_RGB[int(points_source[0, 1]) - 5:int(points_source[0, 1]) + 5,
int(points_source[0, 0]) - 5:int(points_source[0, 0]) + 5] = np.array([255, 0, 0])
ITU_RGB[int(points_source[1, 1]) - 5:int(points_source[1, 1]) + 5,
int(points_source[1, 0]) - 5:int(points_source[1, 0]) + 5] = np.array([0, 255, 0])
ITU_RGB[int(points_source[2, 1]) - 5:int(points_source[2, 1]) + 5,
int(points_source[2, 0]) - 5:int(points_source[2, 0]) + 5] = np.array([0, 0, 255])
plt.imshow(ITU_RGB)

ITU_Map = plt.imread('data/ITUMap.png')
plt.figure(figsize=(14, 8))
Map_RGB = ITU_Map[:, :, :3]
Map_RGB[int(points_destination[0, 1]) - 5:int(points_destination[0, 1]) + 5,
int(points_destination[0, 0]) - 5:int(points_destination[0, 0]) + 5] = np.array([1, 0, 0])
Map_RGB[int(points_destination[1, 1]) - 5:int(points_destination[1, 1]) + 5,
int(points_destination[1, 0]) - 5:int(points_destination[1, 0]) + 5] = np.array([0, 1, 0])
Map_RGB[int(points_destination[2, 1]) - 5:int(points_destination[2, 1]) + 5,
int(points_destination[2, 0]) - 5:int(points_destination[2, 0]) + 5] = np.array([0, 0, 1])
plt.imshow(Map_RGB);

### We can also visualize the points 
plt.figure(figsize=(14, 4))
plt.subplot(1, 2, 1) 
plt.plot(points_source[:, 0], points_source[:, 1], 'r*')
plt.ylim(256, 0)
plt.xlim(0, 320)
plt.title('The 3 source points')

plt.subplot(1, 2, 2)
plt.plot(points_destination[:, 0], points_destination[:, 1], 'b*')
plt.ylim(348, 0)
plt.xlim(0, 800), 
plt.title('The 3 destination points')

Learning the affine model

The following tasks will create the calc_affine() function that estimates the model parameters $\mathbf{w}$ of the affine model, $T$, using $P^{G}$ and $P^{M}$ as inputs, where $P^{G}$ and $P^{M}$ are the points in the ground plane $G$ (point_source ) and the overview map $M$ (points_destination ).

Task 2: Implement the model

Implement the function calc_affine in the cell below. The function shall:

Create the design matrix $A$
Estimate the model parameters $\mathbf{w}$ using the inverse $A^{-1}$
Return the affine transformation matrix $T$

### affine tranformation creation

def calc_affine(points_source, points_destination):
    """
    Estimate the affine transformation matrix using the corresponding points pairs
    
    Args:
        points_source: Points in the video 
        points_destination: Corresponding points in the map

    Returns:
        The affine matrix T
    """
    return None # Replace


T = calc_affine(points_source, points_destination)
print('The affine Trasformation Matrix:\n', T)

The affine Trasformation Matrix:
 [[ 1.56949934e-01  1.00049407e+00  1.18090580e+02]
 [-3.76976285e-01  1.97134387e-01  1.75141304e+02]
 [ 0.00000000e+00  0.00000000e+00  1.00000000e+00]]

Estimate points in the overview map

Through the following tasks use the transformation to display the path walked by a student in the overview map. load_data() (provided in the cell below) is used to generate the data , the dictionary with the different regions of the tracked person in $G$.

Each key in the dictionary data represents a bodypart of the person. The value is an $N\times 4$ matrix where each row, $(x_{u}, y_{u}, x_{l}, y_{l})$, contains the upper left $(x_{u}, y_{u})$ corner and lower right $(x_{l}, y_{l})$ corner of the bounding box encapsulating the bodypart in frame $I_t$.

def load_data():
    """Loads the tracking data."""
    filename = "data/trackingdata.dat"
    data = np.loadtxt(filename)
    data = {"body": data[:, :4], "legs": data[:, 4:8], "all": data[:, 8:]}

    return data


### Load data 
data = load_data()

### Needed functions

def to_homogeneous(points):
    if len(points.shape) == 1:
        points = points.reshape((*points.shape, 1))
    return np.vstack((points, np.ones((1, points.shape[1]))))


def to_euclidean(points):
    return points[:2] / points[2]


def get_center(part, i):
    """Returns center of body part.

    Parameters: part refers to a Nx4 array containing rectangle points for a specific
    body part. i refers to the frame index to fetch.
    """
    x = int((part[i, 0] + part[i, 2]) / 2)
    y = int((part[i, 1] + part[i, 3]) / 2)

    return np.array([x, y])

Task 3: Defining the affine transformation

Implement the function apply_affine such that it:

Transforms point into homogeneous coordinates using the to_homogeneus function.
Applies the transformation T to the homogeneous coordinate.
Returns the result as a Euclidean coordinate (use the to_euclidean function).

def apply_affine(T, point):
    """Apply affine transformation T to point.
    
    Args:
        T: the affine transformation
        point: the point to transform

    Returns:
        the transformed point p' 
    """
    return None # Replace

Task 4: Affine transformation of points

Implement transform_points such that it:

Uses get_center to get the center point for all the points in data for the specified body part .
Applies the affine transformation T to the center points using the apply_affine function.

Tip

Iterate over the tracking data and transform the body part center for each frame.

def transform_points(part, data, T):
    """
    Apply affine transformation to all points in tracking data for a specific part
    
    Args:
        part: body part 
        data: point data
        T: transformation matrix

    Returns:
        
    """
    return None # Replace


### Use your recently created function to transform the tracking data
G_points, M_points = transform_points("legs", data, T)

Task 5: Plot results

The cell below visualizes the G_points and M_points calculated above.

Modify the cell below to plot the tracking data for all body parts.
Inspect the video.
- Determine which body part is the most accurate after the transformation?
Reflect on the limitations of using the transformation T in this exercise.
- How can the accuracy be improved?
- What does it say about the model?

Hint

Is the model sensitive to the 3 selected point pairs? Which points are well represented by the model?

#### Visualize transformed points
plt.figure(figsize=(14, 6))
plt.plot(points_destination[:, 0], points_destination[:, 1], 'b*', label="3 points for transformation estimation")
plt.plot(M_points[:, 0], M_points[:, 1], 'g.', label="tracked person")
plt.ylim(348, 0)
plt.xlim(0, 800)
plt.title('ITU MAP plane')
plt.legend()
plt.show()

##### orignal plane visualization

plt.figure(figsize=(8, 6))
plt.plot(points_source[:, 0], points_source[:, 1], 'b*', label="3 points for transformation estimation")
plt.plot(G_points[:, 0], G_points[:, 1], 'g.', label="tracked person")
plt.ylim(256, 0), plt.xlim(0, 320)
plt.title('Video plane')
plt.legend()
plt.show()