Home
Course Guidelines
About the course Prerequite Material References
Python
Jupyter Notebooks Python overview
Exercises
Before the semester start: Installation and exercise setup Week 1: Introduction to Python and libraries Week 2: Vector representations Week 3: Linear Algebra Week 4: Linear Transformations Week 5: Models and least squares Week 6: Assignment 1 - Gaze Estimation Week 7: Model selection and descriptive statistics Week 8: Filtering Week 9: Classification Week 10: Evaluation Week 11: Dimensionality reduction Week 12: Clustering and refresh on gradients Week 13: Neural Networks Week 14: Convolutional Neural Networks (CNN's)
Tutorials
Week 1: Data analysis, manipulation and plotting Week 2: Linear algebra Week 3: Transformations tutorial Week 4: Projection and Least Squares tutorial Week 7: Cross-validation and descriptive statistics tutorial Week 8: Filtering tutorial Week 11: Gradient Descent / Ascent
In-class Exercises
In-class 1 In-class 2 In-class 10 In-class 3 In-class 4 In-class 8
Explorer

Document

  • Overview
  • 1. Clustering
  • 2. Gradients: Pen & Paper

Content

  • Task overview
    • Task 1 Introduction
    • Task 2 Setting up the data
  • K-means Clustering
    • Task 3 Fitting the algorithm
    • Task 4 K-means clustering visualization
    • Task 5 Cluster characteristics
    • Task 6 Elbow Method
  • Mean-shift Clustering
    • Task 7 Fitting the mean-shift algorithm
    • Task 8 Visualizing clusters
    • Task 9 Reflection and the bandwith parameter
  • Agglomerative Clustering
    • Task 10 Dendrograms
    • Task 11 Fitting the agglomerative clustering algorithm
    • Task 12 Agglomerative clustreing algorithm mean pose
  • Comparison
    • Task 13 Compare and reflect on the methods

Clustering

This exercise involves working with K-means clustering, Mean-shift clustering, and Agglomerative clustering on the poses data used for the exercises in week 2 and week 10 . The task includes applying and experimenting with each method and then comparing their outcomes. The comparison will be conducted by clustering a sequence of human poses using these three methods in similar steps.

The dataset tensor contains 1403 pose sequences. Each sequence is a 100-frames time series capturing human poses. Each pose consists of 25 skeletal joints, where each joint is defined by an x and y coordinate ($25*2$). The shape of the dataset tensor is $(1403, 100, 25*2)$. For this exercise, you will use a single pose sequence of 100 frames and apply clustering to the sequence.

Task overview

For each clustering method you will:

  • Implement the method.
  • Plot the clusters in the sequence.
  • Plot the cluster centers.
List of individual tasks
  • Task 1: Introduction
  • Task 2: Setting up the data
  • Task 3: Fitting the algorithm
  • Task 4: K-means clustering visualization
  • Task 5: Cluster characteristics
  • Task 6: Elbow Method
  • Task 7: Fitting the mean-shift algorithm
  • Task 8: Visualizing clusters
  • Task 9: Reflection and the bandwith parameter
  • Task 10: Dendrograms
  • Task 11: Fitting the agglomerative clustering alg…
  • Task 12: Agglomerative clustreing algorithm mean …
  • Task 13: Compare and reflect on the methods
Task 1: Introduction

This task is about understanding the data.

  1. Explain the benefits of applying a clustering method to a sequence of pose data.
  2. Identify possible pitfalls.
# write reflections here
# write reflections here

The following cells import libraries and provide functions for plotting the poses:

import numpy as np import matplotlib.pyplot as plt import warnings import seaborn as sns from sklearn.cluster import MeanShift from sklearn.cluster import KMeans from sklearn.cluster import AgglomerativeClustering from scipy.spatial.distance import pdist, squareform import scipy.cluster.hierarchy as shc from scipy.cluster.hierarchy import dendrogram, linkage from clustering_utils import * # Suppress the specific warning warnings.filterwarnings("ignore")
import numpy as np
import matplotlib.pyplot as plt
import warnings
import seaborn as sns
from sklearn.cluster import MeanShift
from sklearn.cluster import KMeans
from sklearn.cluster import AgglomerativeClustering
from scipy.spatial.distance import pdist, squareform
import scipy.cluster.hierarchy as shc
from scipy.cluster.hierarchy import dendrogram, linkage
from clustering_utils import *


# Suppress the specific warning
warnings.filterwarnings("ignore")
Task 2: Setting up the data
  1. Run the cell below to load and reshape the dataset, and extract a single pose sequence of 100 frames.
  2. Choose the 100th pose sequence as the dataset.
data = np.load('poses_norm.npy') N,T,D,C = data.shape reshaped_data = data.reshape(N,T,D*C) sequence = reshaped_data[191] print(sequence.shape)
data = np.load('poses_norm.npy')
N,T,D,C = data.shape
reshaped_data = data.reshape(N,T,D*C)
sequence = reshaped_data[191]
print(sequence.shape)
(100, 50)

K-means Clustering

In this exercise you will use k-means clustering on a pose sequence.

Task 3: Fitting the algorithm
  1. Run the cell below to create an instance of the KMeans class with 3 clusters and to find clusters in the sequence.
# Specify the number of clusters (k) k = 3 # Create KMeans kmeans = KMeans(n_clusters=k, random_state=42) # Fit to the data kmeans.fit(sequence)
# Specify the number of clusters (k)
k = 3  

# Create KMeans 
kmeans = KMeans(n_clusters=k, random_state=42)

# Fit to the data
kmeans.fit(sequence)
Task 4: K-means clustering visualization
  1. Use the documentation on the labels_ attribute. to obtain cluster labels.
  2. Use the plot_single_pose function to color each pose in the pose sequence according to which cluster it belongs to.
# Write your solution here # You can now use 'cluster_labels' to see which cluster each frame belongs to # 'centroids' contains the coordinates of the cluster centers colors = {0: 'red', 1: 'blue', 2: 'green', 3: "orange", 4: "black", 5: "brown", 6: "yellow", 7: "cyan"} # Map cluster labels to colors plt.figure(figsize=(25,15)) for i in range(len(sequence)): plt.subplot(10, 10, i + 1) plot_single_pose(sequence[i], c=colors[cluster_labels[i]], head=False) plt.ylim(1, 0) plt.xlim(-1, 1)
# Write your solution here

# You can now use 'cluster_labels' to see which cluster each frame belongs to
# 'centroids' contains the coordinates of the cluster centers

colors = {0: 'red', 1: 'blue', 2: 'green', 3: "orange", 4: "black", 5: "brown", 6: "yellow", 7: "cyan"}  # Map cluster labels to colors
plt.figure(figsize=(25,15))
for i in range(len(sequence)):
    plt.subplot(10, 10, i + 1)
    plot_single_pose(sequence[i], c=colors[cluster_labels[i]], head=False)
    plt.ylim(1, 0)
    plt.xlim(-1, 1)
Task 5: Cluster characteristics
  1. Run the cell below to extract and plot the cluster centers.

  2. Visually inspect the results and identify the characteristics of the poses in each cluster:

    • What distinguishes the clusters?
    • What do the cluster centers represent?
    • What do the clusters reveal about movements?
  3. Reason about the choice of 3 clusters and the effect on the result.

  4. Change the random state and fit the model again. Explain whether you obtain different groupings and why?

num_centroids = len(centroids) # Set up subplots fig, axes = plt.subplots(1, num_centroids, figsize=(num_centroids * 3, 3)) # Assuming you have a function plot_single_pose defined for i in range(len(centroids)): plt.subplot(1, len(centroids), i+1) plt.title(f'Cluster center {i+1}') plot_single_pose(centroids[i], c=colors[i], head=False) plt.ylim(1, 0) plt.xlim(-1, 1) plt.tight_layout() plt.show() # write your reflection here
num_centroids = len(centroids)
# Set up subplots
fig, axes = plt.subplots(1, num_centroids, figsize=(num_centroids * 3, 3))

# Assuming you have a function plot_single_pose defined
for i in range(len(centroids)):
    plt.subplot(1, len(centroids), i+1)
    plt.title(f'Cluster center {i+1}')
    plot_single_pose(centroids[i], c=colors[i], head=False)
    plt.ylim(1, 0)
    plt.xlim(-1, 1)

plt.tight_layout()
plt.show()


# write your reflection here
Task 6: Elbow Method

This task uses the Elbow Method to choose the number of clusters k . The cell below:

  • Applies k-means clustering to the sequence using each k in k_range .
  • Extracts the within-cluster sum of squares using kmeans.intertia_ and stores it in a list.
  • Plots the within-cluster sum of squares as a function of the number of clusters (elbow curve as displayed below)
  1. Visually determine the optimal number of clusters.

  2. Calculate the rate of change for the elbow curve and plot its absolute value. Use the curve to determine the optimal k .

  3. Revisit and run the previous tasks (Task 3, Task 4 and Task 5 ) with the optimal number of clusters.

  4. How does the elbow method influence the results in the previous tasks?

# Find the optimal k using the elbow method k_range = range(2, 30) # values for the number of clusters `k` inertia = [] for k in k_range: kmeans = KMeans(n_clusters=k, random_state=42) kmeans.fit(sequence) inertia.append(kmeans.inertia_) # Plot the elbow curve plt.figure(figsize=(10, 5)) plt.plot(k_range, inertia, marker='o') plt.xlabel('Number of Clusters (k)') plt.ylabel('WCSS') plt.title('Elbow Method for Optimal k') plt.show()
# Find the optimal k using the elbow method
k_range = range(2, 30) # values for the number of clusters `k`
inertia = []

for k in k_range:
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(sequence)
    inertia.append(kmeans.inertia_)

# Plot the elbow curve
plt.figure(figsize=(10, 5))
plt.plot(k_range, inertia, marker='o')
plt.xlabel('Number of Clusters (k)')
plt.ylabel('WCSS')
plt.title('Elbow Method for Optimal k')
plt.show()
# Write your reflections here...
# Write your reflections here...

Mean-shift Clustering

This exercise is about applying mean-shift clustering to the sequence of human poses.

Task 7: Fitting the mean-shift algorithm
  1. Run the cell below to:
    • create an instance of the MeanShift() class with a bandwith of 0.629
    • cluster the pose sequence.
# Perform Mean Shift clustering mean_shift = MeanShift(bandwidth=0.629) mean_shift.fit(sequence)
# Perform Mean Shift clustering
mean_shift = MeanShift(bandwidth=0.629)
mean_shift.fit(sequence)
Task 8: Visualizing clusters
  1. Extract the cluster labels for each pose in the sequence. Use the labels_ attribute as described in the documentation on mean-shift clustering.
  2. Visualize the pose sequence, assigning a distinct color to each pose based on its cluster.
  3. Extract and plot the cluster centers using the cluster_centers_ attribute.
# Write your solution here
# Write your solution here
Task 9: Reflection and the bandwith parameter
  1. Visually examine the plots. What are the characteristics of the poses belonging to each cluster?
  2. What effect does decreasing and increasing the bandwidth parameter by 0.2 (0.829 / 0.429) have and why?
Info

You might encounter longer run times for lower bandwith values.

# Write your solution here
# Write your solution here

Agglomerative Clustering

This exercise is about applying hierarchical clustering to the human poses dataset.

Task 10: Dendrograms
  1. Run the cell below to generate a dendrogram.
  2. How many clusters would result from setting the distance threshold to: 0.5, 1.5, 3.0, 4?
  3. What types of relationships can agglomerative clustering reveal that non-hierarchical methods might miss?
colors = {0: 'black', 1: 'blue', 2: 'green', 3: 'red'} sns.set_palette([colors[i] for i in range(len(colors))]) shc.set_link_color_palette(None) plt.figure(figsize=(10,10)) plt.title('Dendrogram for Agglomerative Clustering') plt.xlabel('Pose Index') plt.ylabel('Distance') linkage_matrix = shc.linkage(sequence, method ='ward', metric="euclidean") Dendrogram = shc.dendrogram(linkage_matrix)
colors = {0: 'black', 1: 'blue', 2: 'green', 3: 'red'}
sns.set_palette([colors[i] for i in range(len(colors))])
shc.set_link_color_palette(None)
plt.figure(figsize=(10,10))
plt.title('Dendrogram for Agglomerative Clustering')
plt.xlabel('Pose Index')
plt.ylabel('Distance')
linkage_matrix = shc.linkage(sequence, method ='ward', metric="euclidean")
Dendrogram = shc.dendrogram(linkage_matrix)
Task 11: Fitting the agglomerative clustering algorithm

The cell below creates an instance of the AgglomerativeClustering() class. The documentation on agglomerative clustering is given here.

  1. Select the distance threshold that results in 3 clusters, then apply agglomerative clustering on the sequence.
  2. Extract the cluster labels for each pose in the sequence.
  3. Visualize the pose sequence by assigning a distinct color to each pose based on its cluster.
agg_clustering = AgglomerativeClustering(n_clusters=k, metric='euclidean', linkage='ward') # Write your solution here (agglomerative clustering and cluster centers)
agg_clustering = AgglomerativeClustering(n_clusters=k, metric='euclidean', linkage='ward')

# Write your solution here (agglomerative clustering and cluster centers)
Task 12: Agglomerative clustreing algorithm mean pose
  1. The AgglomerativeClustering() class does not have an attribute or a method for directly extracting cluster centers. Calculate the mean pose within each cluster for $k = 3$ and plot it.
  2. Change the distance metric (try l1 and cosine ) and repeat the previous two tasks (Task 10, Task 11 and Task 12.1).
  3. What differences do you observe, and can you explain why these differences occur? Does a different distance metric result in more meaningful clusters?
# Write your solution here
# Write your solution here

Comparison

Task 13: Compare and reflect on the methods
  1. Compare the clustered poses obtained with the different clustering methods (K-means, Mean-shift and Agglomerative clustering). What are the similarities and differences between the pose clusters across methods?
  2. Why can clusters look similar across methods?
  3. What are the main reasons behind differences in the clusters?
  4. The algorithms use different methods for determining the number of clusters. How do these differences impact the results for the pose data?
# write your relfection here
# write your relfection here