Neural networks

In this exercise, you will experiment with neural networks for a multiclass-classification on images of clothes using PyTorch . The goal is to experiment with multilayer-perceptrons (MLP) and convolutional neural network (CNN) architectures and in particular how network topology and/or optimisation strategies may impact performance. Your main tasks will be to:

Train, modify and evaluate an MLP and a CNN on the FashionMNIST dataset
Tune your models through different optimisation strategies (dropuout, early stopping and data augmentation)

Additionally, this page revises evaluation metrics for multiclass classification.

Dataset

This exercise uses the FashionMNIST dataset, which contains images of clothes article (from Zalando). Each image is a $28\times 28$ pixel grayscale image having a label of one of ten classes of clothing articles. A total of $60,000$ training samples and $10,000$ test samples are provided. A subset of the images sorted by class is shown in Figure 1.

As you will see, the FashionMNIST dataset is sufficiently small for teaching purposes, but it is not a trivial dataset to work with.

Info

The dataset should be downloaded automatically when you load the libraries.

Framework

Most of the code is contained in python scripts. Refer to the docstrings whenever you are in doubt.

The file networks.py contains a selection of neural architectures with different topologies. Inspect the predefined networks in the file.
PyTorchTrainer in trainers.py is the class used for performing training and evaluation. Inspect the source code.
The class MetricLogger in metrics.py contains methods for calculating the evaluation metrics:
- reset() sets the entries of the confusion matrix to zero.
- log(predicted, target) adds a log entry to the confusion matrix based on the predicted and target values.
- The one_hot argument in the constructor is needed since Scikit-learn provides numerical predictions while PyTorch provides one-hot encoded predictions.

Task 1: Examine the code

Read through each of the three files to get acquainted with their structure and content. Spend most time on the networks and training setups because they will be more important in the first tasks.

Metrics for multiclass classification

This exercise will use evaluation metrics for multiclass classification. The confusion matrix $C$ is used to define metrics for binary classification. Figure 2 shows the $10\times 10$ confusion matrix $C$ for the FashionMNIST using a support vector machine. The true class is given on the x-axis and the predicted class on the y-axis.

Figure 2:
Confusion matrix using SVM on FashionMNIST

Evaluation metrics

Below is a description of the evaluation metrics for a specific class $i$. Note that precision and recall for multiclass classification are vectors describing the metric per each class.

Accuracy: The ratio of correct predictions and the total number of samples.

$$ accuracy_i = \frac{\sum_{i=1}^{10} C_{i,i}}{\sum_{i=1}^{10}\sum_{j=1}^{10} C_{i, j}} $$

Precision: The ratio of correct predictions for class $i$ and the total number of predictions for that class.

$$ precision_i = \frac{C_{i, i}}{\sum_{j=1}^{10} C_{i, j}} $$

Recall: The ratio of correct predictions for a certain class to the number of samples belonging to that class.

$$ recall_i = \frac{C_{i, i}}{\sum_{j=1}^{10} C_{j, i}} $$