Note
The term linear classification is actually referring to an affine model as it includes a bias term.
This exercise is about linear classification (actually affine) and the visualization of decision boundaries. Noteably, the parameters of the decision boundary will initially be adjusted manually (or randomly) then learned using least squares.
The term linear classification is actually referring to an affine model as it includes a bias term.
The prediction function $f_{w}(\mathbf{x})$ is used to predict which class a data point belongs to, by applying the $\text{sign}$ function to the result of the linear combination of input features and weights. The classification is based on whether the output of $f_{w}(\mathbf{x})$ is positive or negative.
In the following section you will be experimenting with a linear classifier:
$$ f_{w}(\mathbf{x}) = \mathbf{y} = \text{sign}({w_0} + \mathbf{w}^\top\mathbf{x}) $$where $\mathbf{w}$ are the model parameters (including a bias term) and $\mathbf{x}$ are the coordinates of the input.
The $\text{sign}$ function is given by:
$$ \text{sign}(z) = \begin{cases} -1 & \text{if } z ≤ 0, \\ 1 & \text{if } z > 0. \end{cases} $$Alternatively, using homogeneous representation the classifier is expressed as (with appropriate updates to $\mathbf{w}$ and $\mathbf{x}$):
$$ \mathbf{y} = \text{sign}(\mathbf{w}^\top\mathbf{x}) $$In binary classification the decision boundary separates the positive and negative classes and is defined by:
$$ f_w(\mathbf{x}) = 0 $$Points on one side (positive) of the boundary will be classified as the positive class (1), while points on the other side (negative) will be classified as the negative class (-1).
For the affine model, the decision boundary is the line when:
$$ \mathbf{w}^\top \mathbf{x} + w_0 = 0 $$For a two-dimensional affine model (with features $x_1$ and $x_2$), the decision boundary is given by:
$$ w_1 x_1 + w_2 x_2 + w_0 = 0 $$For display purposes, the decision boundary can be expressed in terms of $x_1$ and $x_2$ by isolating $x_2$ on one side:
$$ x_2 = -\frac{w_0}{w_2} - \frac{w_1}{w_2} x_1 $$The cell below imports libraries and generates random data to be used for classification.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
np.random.seed(42) ## generate the same sequence of random points
# Generate 2 clusters of data, by drawing from a normal distribution.
S = np.eye(2) ## covariance matrix, set to indenty matrix i.e. x,y independent.
p_pos = np.random.multivariate_normal([1, 1], S, 40)
p_neg = np.random.multivariate_normal([-1, -1], S, 40)
## 40 points (x,y) coordinates
p_pos.shape
The data of the positive and negative classes are stored in the variables p_pos
and p_neg
, respectively. The next cell visualizes the two classes.
fig, ax = plt.subplots()
ax.plot(p_pos[:, 0], p_pos[:, 1], "o", label='positive class')
ax.plot(p_neg[:, 0], p_neg[:, 1], "P", label='negative class')
plt.title("Data", fontsize=24)
plt.legend()
In the following task you will manually change the model parameters of a linear decision boundary and visualize the results.
linear_boundary
, which, given an $x_1$-coordinate and the model parameters $w$ = [$w_0$, $w_1$, $w_2$], returns the corresponding $x_2$-value according to:x_values
below provides the $x_1$-values (x-coordinates) over which the boundary will be plotted. The model parameters $w_0$, $w_1$, and $w_2$ define the position and slope of the boundary. Use the linear_boundary
function to generate points for the decision boundary by implementing the following steps:w
with the manually selected model parameters. x_values
and w
to the linear_boundary
function to calculate the corresponding $x_2$-values.def linear_boundary(x, w):
"""
:param x: x values of the line.
:param w: List of model parameters [bias, slope] of the line.
:return: the x2 values which correspond to the y-values of the boundary / line .
"""
# Write solutions here
...
# Defining x-values and weights
x_values = np.linspace(-3, 3, 100)
w = None # write your solution here
# Plotting the data points and decision boundary
fig, ax = plt.subplots()
ax.plot(p_pos[:, 0], p_pos[:, 1], "o", label='Positive class')
ax.plot(p_neg[:, 0], p_neg[:, 1], "P", label='Negative class')
ax.plot(x_values, linear_boundary(x_values, w))
plt.title("2D Linear Decision Boundary", fontsize=20)
plt.legend()
plt.show()
# Write your reflections here
The performance of the model can be evaluated by calculating the accuracy, which is defined as (percentage): $$ \text{accuracy} = \frac{\text{Number of correct predictions}}{\text{Total number of predictions}} $$ .
Implement the function predict
that given the data and the model parameters w
as input, predicts whether a data point belongs to the neg
or pos
class. Return 1 for points above the boundary (positive class) and -1 for points below the boundary (negative class). Manually select the model parameters to predict class labels for both the p_neg
and p_pos
variables by calculating the sign of the prediction function $f_w(\mathbf{x})=z = w_0 + w_1 \cdot x_1 + w_2 \cdot x_2$ for each data point.
Implement and execute the accuracy
function that returns the accuracy of the classifier over the entire training set by comparing the predicted classes to the actual classes.
The denominator in the accuracy formula can be found by counting the number of times the prediction $f_w(\mathbf{x_i}) = y_i$, or the amount of times $ y_i \cdot f_w(\mathbf{x_i}) > 0 $, where $y_i$ are the labels.
def predict(w, p):
"""
Predict class based on decision boundary (logistic regression).
:param w: Model parameters [w0, w1, w2].
:param p: Data points to classify (Nx2 array).
:return: Predicted class labels (-1 for neg, 1 for pos).
"""
# Write solutions here
def accuracy(predictions,targets):
"""
:param predictions (1xn array): of predicted classes for the n data points.
:param targets (1xn array): actual classes for the n data points.
:return (float): fraction of correctly predicted points (num_correct/num_points).
"""
# Write solutions here
pred_pos = None # write your solution here
pred_neg = None # write your solution here
acc_pos = None # write your solution here
acc_neg = None # write your solution here
print(f'Accuracy of decision line: {(acc_neg+acc_pos)/2:.3f}')
Accuracy of decision line: 0.925
Implement the learn_affine_classifier
function which given arrays of positive and negative examples uses least squares to estimate the model parameters w
to separate the two classes. In particular the function should:
labels
consisting of the labels (1 for positive class, -1 for negative class).Run the cell below to plot the decision boundary.
Evaluate the accuracy of the learned decision boundary:
predict
function to predict class labels for the data points. accuracy
function to evaluate the found model.# 1
def learn_affine_classifier(p_pos, p_neg):
"""
Fit the decision boundary using least squares regression.
:param p_pos: Data points of the positive class.
:param p_neg: Data points of the negative class.
:return: The learned parameters w (intercept and slope).
"""
# Combine the positive and negative points into a single dataset
data = None # Write your solution here
# Create the labels: 1 for positive class, 0 for negative class
labels = None # Write your solution here
# Add a bias column (column of 1s) to the data
X = None # Write your solution here
# Learn the model parameters
w_least_squares = None # Write your solution here
# Return the model parameters
return w_least_squares
# Find the decision boundary with least squares
w_learned = learn_affine_classifier(p_pos, p_neg)
# 2
# Define x-values for plotting the decision boundaries
x_values = np.linspace(-3, 3, 100)
# Plotting the learned decision boundary
fig, ax = plt.subplots()
ax.scatter(p_pos[:, 0], p_pos[:, 1], marker='o', label='Class 1') # positive class
ax.scatter(p_neg[:, 0], p_neg[:, 1], marker='x', label='Class 2') # negative class
ax.plot(x_values, linear_boundary(x_values, w_learned), label='Learned boundary: y = %.2f + %.2f x' % (w_learned[0], w_learned[1]), color='green')
plt.title("Learned Decision Boundary", fontsize=20)
plt.legend()
plt.show()
# 3
pred_pos = None # write your solution here
pred_neg = None # write your solution here
acc_pos = None # write your solution here
acc_neg = None # write your solution here
print(f'Accuracy of learned boundary: {(acc_neg + acc_pos) / 2:.3f}')
Accuracy of learned boundary: 0.887