In this exercise, the goal is to minimize the function
$$ f(x, y) = e^{-x^2 - y^2} \sin(x) \cos(y), $$which combines exponential decay with sinusoidal variations. This makes it a compelling case for gradient descent optimization.
Unlike typical loss functions used in machine learning, where minimization is performed with respect to model parameters and data, here the aim is directly to find the values of $x$ and $y$ that minimize $f(x, y)$. In this context, $f(x, y)$ serves as a measure of error when evaluated with respect to the model parameters. Besides $x$ and $y$ are not treated as model parameters but simply as variables within the mathematical function.
The following steps will be needed
The function $f(x, y)$ and its gradient descent optimization will be implemented within the ExpTrig
class defined in the cell below The class includes the following methods, which must be implemented in the subsequent tasks.
forward
must return the function value of f(x,y)
e.g., the predictionsdf_dx
must return the partial derivative of the function with respect to x
df_dy
must return the partial derivative of the function with respect to y
backward
must return the gradient of f(x,y)
as a tuple (df_dx, df_dy)
. The naming 'backward' is used due to its reference to backpropagation.display_function
makes a figure of the function defined in forward
.# import packages
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
class ExpTrig:
def forward(self, x, y):
"""
Args:
x: x-values (Can be single float or 1D array)
y: y-values (Can be single float or 1D array)
Returns:
The function values of f (size-like x and y)
"""
# Write the code here
def df_dx(self, x,y):
"""
Args:
x: x-values (Can be single float or 1D numpy array)
y: y-values (Can be single float or 1D numpy array)
Returns:
The partial derivative of f with respect to x (size-like x and y)
"""
# Write the code here
def df_dy(self, x,y):
"""
Args:
x: x-values (Can be single float or 1D numpy array)
y: y-values (Can be single float or 1D numpy array)
Returns:
The partial derivative of f with respect to y (size-like x and y)
"""
# Write the code here
def backward(self, x, y):
"""
args:
x: x-values
y: y-values
Returns:
Patial derivatives of the function (i.e. the gradient) as a tuple
"""
# Write the code here
def display_function(self):
x = np.linspace(-2, 2, 400)
y = np.linspace(-2, 2, 400)
x, y = np.meshgrid(x, y)
z = self.forward(x,y)
# Create a 3D plot
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x, y, z, cmap='viridis', alpha=0.8)
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('f(x, y)')
ax.set_title('Surface of f(x, y) = e^{-x^2 - y^2} sin(x) cos(y)')
plt.show()
forward
function in the class.# Grid of x, y points
f = ExpTrig()
f.display_function()
Use the plot to identify the minima of the function and discuss potential challenges associated with gradient-based optimization.
# Write your reflection here...
Gradient Descent is an iterative optimization method that updates input variables step by step by using the gradient of a function with respect to these variables. The updates are made by moving in the negative direction of the gradient, which corresponds to the direction of the steepest decrease in the function's value. The update rule is given by:
$$ x_{t+1} = x_t - \lambda \nabla_x f(x_t) $$where:
On a piece of paper, show that the partial derivatives of the function $f(x, y)$ with respect to $x$ and $y$, i.e., $\frac{\partial f}{\partial x} \quad \text{and} \quad \frac{\partial f}{\partial y}$ are
$$ \frac{\partial f}{\partial x} = -2xe^{-x^2 - y^2} \sin(x) \cos(y) + e^{-x^2 - y^2} \cos(x) \cos(y) $$ $$ \frac{\partial f}{\partial y} = -2ye^{-x^2 - y^2} \sin(x) \cos(y) - e^{-x^2 - y^2} \sin(x) \sin(y) $$Recall the chain rule and and product rule for finding the partial derivatives for $x$ and $y$.
# Write solution here
Implement the following functions in the ExpTrig
class:
df_dx
and df_dy
which return the values of the partial derivatives for a given x
and y
. See the results in the previous task (in case you didn't complete the previous task the result is given there)
backward
method so that it returns the gradient evaluated at a given x, y.
# Write your solution here
The following steps outline the implementation of gradient descent for the function $f(x, y)$ , which is defined in the ExpTrig
class. The implementation of optimize_function
method involves several tasks that must be completed. Implement each of these steps in the cell below.
Initialize Coordinates: Start by initializing x
and y
as start_x
and start_y
, respectively. These are your starting guesses for the optimization.
Initialize History Lists: Create two lists, x_all
and y_all
, to record the coordinates visited during the optimization.
Start Gradient Descent Loop: Implement a for-loop that iterates a specified number of times. Within each iteration:
backward
method of func
to compute the gradients at the current estimate (x, y)
.x
and y
using the optimization steps by taking a step in the direction opposite to the gradient, scaled by the learning_rate
. x
, y
, and the function value, $f(x,y)$ at these coordinates every 25 iterations.x
and y
to x_all
and y_all
, respectively.Return the Result: the function should return x_all
and y_all
.
def optimize_function(func, start_x, start_y, learning_rate, iterations):
"""
Optimize a given function using gradient descent.
Args:
- func (Function): A function object that must have 'forward' and 'backward' methods.
The 'forward' method computes the function value at a given point (x, y),
and the 'backward' method computes the gradient at that point.
- start_x (float): The starting x-coordinate for the optimization.
- start_y (float): The starting y-coordinate for the optimization.
- learning_rate (float): The step size for each iteration in the gradient descent.
- iterations (int): The total number of iterations for the optimization process.
Returns:
- x_all (list of float): List of all x-coordinates of the points visited during the optimization.
- y_all (list of float): List of all y-coordinates of the points visited during the optimization.
This function performs gradient descent on the provided function object. Starting from
(start_x, start_y), it iteratively moves in the direction opposite to the gradient,
with step sizes determined by the learning rate. The function's value and current
coordinates are printed every 25 iterations. The function returns the history of
coordinates visited during the optimization.
"""
# return ...
# Write the optimization rutine here
# Example optimization
x_arr, y_arr = optimize_function(ExpTrig(), start_x=-1.5, start_y=-1., learning_rate=0.2, iterations=100)
Iteration 1, x = -1.4877579973279986, y = -0.9851319464762178, f(x, y) = -0.022818031499845872 Iteration 26, x = -0.6685122063716475, y = -0.05961890104988543, f(x, y) = -0.39432935018582344 Iteration 51, x = -0.6532713185908388, y = -6.700323564826335e-05, f(x, y) = -0.39665295841433423 Iteration 76, x = -0.6532711870955082, y = -7.500448458167815e-08, f(x, y) = -0.3966529610854677
optimize_function
.x = np.linspace(-2, 2, 400)
y = np.linspace(-2, 2, 400)
x, y = np.meshgrid(x, y)
z = ExpTrig().forward(x,y)
# Assuming you have lists `x_values` and `y_values` containing the optimization path
x_values = np.array(x_arr)
y_values = np.array(y_arr)
z_values = np.exp(-np.array(x_values)**2 - np.array(y_values)**2) * np.sin(x_values) * np.cos(y_values)
# Plot the optimization path on the surface
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x, y, z, cmap='viridis', alpha=0.6)
ax.plot(x_values, y_values, z_values, marker='o', color='r', markersize=5, label='Optimization Path')
ax.plot(x_values[-1], y_values[-1], z_values[-1], marker='x', color='b', markersize=10, label='Final Point')
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('f(x, y)')
ax.set_title('Optimization Path on f(x, y) Surface')
ax.legend()
plt.show()
Assess the proficiency of the of the gradient descent algorithm
# Write your reflections here...