Clustering

This exercise involves working with K-means clustering, Mean-shift clustering, and Agglomerative clustering on the poses data used for the exercises in week 2 and week 10 . The task includes applying and experimenting with each method and then comparing their outcomes. The comparison will be conducted by clustering a sequence of human poses using these three methods in similar steps.

The dataset tensor contains 1403 pose sequences. Each sequence is a 100-frames time series capturing human poses. Each pose consists of 25 skeletal joints, where each joint is defined by an x and y coordinate ($25*2$). The shape of the dataset tensor is $(1403, 100, 25*2)$. For this exercise, you will use a single pose sequence of 100 frames and apply clustering to the sequence.

Task overview

For each clustering method you will:

Implement the method.
Plot the clusters in the sequence.
Plot the cluster centers.

List of individual tasks

Task 1: Introduction
Task 2: Setting up the data
Task 3: Fitting the algorithm
Task 4: K-means clustering visualization
Task 5: Cluster characteristics
Task 6: Elbow Method
Task 7: Fitting the mean-shift algorithm
Task 8: Visualizing clusters
Task 9: Reflection and the bandwith parameter
Task 10: Dendrograms
Task 11: Fitting the agglomerative clustering alg…
Task 12: Agglomerative clustreing algorithm mean …
Task 13: Compare and reflect on the methods

Task 1: Introduction

This task is about understanding the data.

Explain the benefits of applying a clustering method to a sequence of pose data.
Identify possible pitfalls.

# write reflections here

The following cells import libraries and provide functions for plotting the poses:

import numpy as np
import matplotlib.pyplot as plt
import warnings
import seaborn as sns
from sklearn.cluster import MeanShift
from sklearn.cluster import KMeans
from sklearn.cluster import AgglomerativeClustering
from scipy.spatial.distance import pdist, squareform
import scipy.cluster.hierarchy as shc
from scipy.cluster.hierarchy import dendrogram, linkage
from clustering_utils import *


# Suppress the specific warning
warnings.filterwarnings("ignore")

Task 2: Setting up the data

Run the cell below to load and reshape the dataset, and extract a single pose sequence of 100 frames.
Choose the 100th pose sequence as the dataset.

data = np.load('poses_norm.npy')
N,T,D,C = data.shape
reshaped_data = data.reshape(N,T,D*C)
sequence = reshaped_data[191]
print(sequence.shape)

(100, 50)

K-means Clustering

In this exercise you will use k-means clustering on a pose sequence.

Task 3: Fitting the algorithm

Run the cell below to create an instance of the KMeans class with 3 clusters and to find clusters in the sequence.

# Specify the number of clusters (k)
k = 3  

# Create KMeans 
kmeans = KMeans(n_clusters=k, random_state=42)

# Fit to the data
kmeans.fit(sequence)

Task 4: K-means clustering visualization

Use the documentation on the labels_ attribute. to obtain cluster labels.
Use the plot_single_pose function to color each pose in the pose sequence according to which cluster it belongs to.

# Write your solution here

# You can now use 'cluster_labels' to see which cluster each frame belongs to
# 'centroids' contains the coordinates of the cluster centers

colors = {0: 'red', 1: 'blue', 2: 'green', 3: "orange", 4: "black", 5: "brown", 6: "yellow", 7: "cyan"}  # Map cluster labels to colors
plt.figure(figsize=(25,15))
for i in range(len(sequence)):
    plt.subplot(10, 10, i + 1)
    plot_single_pose(sequence[i], c=colors[cluster_labels[i]], head=False)
    plt.ylim(1, 0)
    plt.xlim(-1, 1)

Task 5: Cluster characteristics

Run the cell below to extract and plot the cluster centers.
Visually inspect the results and identify the characteristics of the poses in each cluster:
- What distinguishes the clusters?
- What do the cluster centers represent?
- What do the clusters reveal about movements?
Reason about the choice of 3 clusters and the effect on the result.
Change the random state and fit the model again. Explain whether you obtain different groupings and why?

num_centroids = len(centroids)
# Set up subplots
fig, axes = plt.subplots(1, num_centroids, figsize=(num_centroids * 3, 3))

# Assuming you have a function plot_single_pose defined
for i in range(len(centroids)):
    plt.subplot(1, len(centroids), i+1)
    plt.title(f'Cluster center {i+1}')
    plot_single_pose(centroids[i], c=colors[i], head=False)
    plt.ylim(1, 0)
    plt.xlim(-1, 1)

plt.tight_layout()
plt.show()


# write your reflection here

Task 6: Elbow Method

This task uses the Elbow Method to choose the number of clusters k . The cell below:

Applies k-means clustering to the sequence using each k in k_range .
Extracts the within-cluster sum of squares using kmeans.intertia_ and stores it in a list.
Plots the within-cluster sum of squares as a function of the number of clusters (elbow curve as displayed below)

Visually determine the optimal number of clusters.
Calculate the rate of change for the elbow curve and plot its absolute value. Use the curve to determine the optimal k .
Revisit and run the previous tasks (Task 3, Task 4 and Task 5 ) with the optimal number of clusters.
How does the elbow method influence the results in the previous tasks?

# Find the optimal k using the elbow method
k_range = range(2, 30) # values for the number of clusters `k`
inertia = []

for k in k_range:
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(sequence)
    inertia.append(kmeans.inertia_)

# Plot the elbow curve
plt.figure(figsize=(10, 5))
plt.plot(k_range, inertia, marker='o')
plt.xlabel('Number of Clusters (k)')
plt.ylabel('WCSS')
plt.title('Elbow Method for Optimal k')
plt.show()

# Write your reflections here...

Mean-shift Clustering

This exercise is about applying mean-shift clustering to the sequence of human poses.

Task 7: Fitting the mean-shift algorithm

Run the cell below to:
- create an instance of the MeanShift() class with a bandwith of 0.629
- cluster the pose sequence.

# Perform Mean Shift clustering
mean_shift = MeanShift(bandwidth=0.629)
mean_shift.fit(sequence)

Task 8: Visualizing clusters

Extract the cluster labels for each pose in the sequence. Use the labels_ attribute as described in the documentation on mean-shift clustering.
Visualize the pose sequence, assigning a distinct color to each pose based on its cluster.
Extract and plot the cluster centers using the cluster_centers_ attribute.

# Write your solution here

Task 9: Reflection and the bandwith parameter

Visually examine the plots. What are the characteristics of the poses belonging to each cluster?
What effect does decreasing and increasing the bandwidth parameter by 0.2 (0.829 / 0.429) have and why?

Info

You might encounter longer run times for lower bandwith values.

# Write your solution here

Agglomerative Clustering

This exercise is about applying hierarchical clustering to the human poses dataset.

Task 10: Dendrograms

Run the cell below to generate a dendrogram.
How many clusters would result from setting the distance threshold to: 0.5, 1.5, 3.0, 4?
What types of relationships can agglomerative clustering reveal that non-hierarchical methods might miss?

colors = {0: 'black', 1: 'blue', 2: 'green', 3: 'red'}
sns.set_palette([colors[i] for i in range(len(colors))])
shc.set_link_color_palette(None)
plt.figure(figsize=(10,10))
plt.title('Dendrogram for Agglomerative Clustering')
plt.xlabel('Pose Index')
plt.ylabel('Distance')
linkage_matrix = shc.linkage(sequence, method ='ward', metric="euclidean")
Dendrogram = shc.dendrogram(linkage_matrix)

Task 11: Fitting the agglomerative clustering algorithm

The cell below creates an instance of the AgglomerativeClustering() class. The documentation on agglomerative clustering is given here.

Select the distance threshold that results in 3 clusters, then apply agglomerative clustering on the sequence.
Extract the cluster labels for each pose in the sequence.
Visualize the pose sequence by assigning a distinct color to each pose based on its cluster.

agg_clustering = AgglomerativeClustering(n_clusters=k, metric='euclidean', linkage='ward')

# Write your solution here (agglomerative clustering and cluster centers)

Task 12: Agglomerative clustreing algorithm mean pose

The AgglomerativeClustering() class does not have an attribute or a method for directly extracting cluster centers. Calculate the mean pose within each cluster for $k = 3$ and plot it.
Change the distance metric (try l1 and cosine ) and repeat the previous two tasks (Task 10, Task 11 and Task 12.1).
What differences do you observe, and can you explain why these differences occur? Does a different distance metric result in more meaningful clusters?

# Write your solution here

Comparison

Task 13: Compare and reflect on the methods

Compare the clustered poses obtained with the different clustering methods (K-means, Mean-shift and Agglomerative clustering). What are the similarities and differences between the pose clusters across methods?
Why can clusters look similar across methods?
What are the main reasons behind differences in the clusters?
The algorithms use different methods for determining the number of clusters. How do these differences impact the results for the pose data?

# write your relfection here