Non-Linear Optimization: Multivariable Functions

In this exercise, the goal is to minimize the function

$$ f(x, y) = e^{-x^2 - y^2} \sin(x) \cos(y), $$

which combines exponential decay with sinusoidal variations. This makes it a compelling case for gradient descent optimization.

Unlike typical loss functions used in machine learning, where minimization is performed with respect to model parameters and data, here the aim is directly to find the values of $x$ and $y$ that minimize $f(x, y)$. In this context, $f(x, y)$ serves as a measure of error when evaluated with respect to the model parameters. Besides $x$ and $y$ are not treated as model parameters but simply as variables within the mathematical function.

Steps

The following steps will be needed

Compute partial derivatives to calculate the gradient in a point.
Calculating forward (prediction) and backward (gradient steps) passes of the function.
Iteratively using the forward and backward passes to optimize the function.

List of individual tasks

Task 1: Forward pass
Task 2: Explain the function
Task 3: Backward pass pen and paper
Task 4: Backward implementation
Task 5: Gradient descent
Task 6: Optimization path
Task 7: Reflection

Class Implementation

The function $f(x, y)$ and its gradient descent optimization will be implemented within the ExpTrig class defined in the cell below The class includes the following methods, which must be implemented in the subsequent tasks.

forward must return the function value of f(x,y) e.g., the predictions
df_dx must return the partial derivative of the function with respect to x
df_dy must return the partial derivative of the function with respect to y
backward must return the gradient of f(x,y) as a tuple (df_dx, df_dy) . The naming 'backward' is used due to its reference to backpropagation.
display_function makes a figure of the function defined in forward .

# import packages
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

class ExpTrig:

    def forward(self, x, y):
        """
        Args:
        x: x-values (Can be single float or 1D array)
        y: y-values (Can be single float or 1D array)

        Returns:
        The function values of f (size-like x and y)
        """
        # Write the code here

    def df_dx(self, x,y):
        """
        Args:
        x: x-values (Can be single float or 1D numpy array)
        y: y-values (Can be single float or 1D numpy array)

        Returns:
        The partial derivative of f with respect to x (size-like x and y)
        """
        # Write the code here

    def df_dy(self, x,y):
        """
        Args:
        x: x-values (Can be single float or 1D numpy array)
        y: y-values (Can be single float or 1D numpy array)

        Returns:
        The partial derivative of f with respect to y (size-like x and y)
        """
        # Write the code here

    def backward(self, x, y):
        """
        args:
        x: x-values
        y: y-values
        Returns:
        Patial derivatives of the function (i.e. the gradient) as a tuple
        """
        # Write the code here

    def display_function(self):
        
        x = np.linspace(-2, 2, 400)
        y = np.linspace(-2, 2, 400)
        x, y = np.meshgrid(x, y)

        z = self.forward(x,y)

        # Create a 3D plot  
        fig = plt.figure(figsize=(10, 8))
        ax = fig.add_subplot(111, projection='3d')
        ax.plot_surface(x, y, z, cmap='viridis', alpha=0.8)
        ax.set_xlabel('X axis')
        ax.set_ylabel('Y axis')
        ax.set_zlabel('f(x, y)')
        ax.set_title('Surface of f(x, y) = e^{-x^2 - y^2} sin(x) cos(y)')
        plt.show()

Task 1: Forward pass

Implement the forward function in the class.
Run the cell below to visualize the function.

# Grid of x, y points
f = ExpTrig()

f.display_function()

Task 2: Explain the function

Use the plot to identify the minima of the function and discuss potential challenges associated with gradient-based optimization.

# Write your reflection here...

Gradient Descent

Gradient Descent is an iterative optimization method that updates input variables step by step by using the gradient of a function with respect to these variables. The updates are made by moving in the negative direction of the gradient, which corresponds to the direction of the steepest decrease in the function's value. The update rule is given by:

$$ x_{t+1} = x_t - \lambda \nabla_x f(x_t) $$

where:

$x_t \in \mathbb{R}^n$ represents the input to the function $f(x)$ at iteration $t$,
$x_{t+1}$ is the updated input at iteration $t+1$,
$\nabla_x f(x_t)$ is the gradient of $f$ with respect to $x$ at $x_t$,
$\lambda$ is the learning rate, which controls the size of each step, and
$t$ is the current iteration step.

Task 3: Backward pass pen and paper

On a piece of paper, show that the partial derivatives of the function $f(x, y)$ with respect to $x$ and $y$, i.e., $\frac{\partial f}{\partial x} \quad \text{and} \quad \frac{\partial f}{\partial y}$ are

$$ \frac{\partial f}{\partial x} = -2xe^{-x^2 - y^2} \sin(x) \cos(y) + e^{-x^2 - y^2} \cos(x) \cos(y) $$ $$ \frac{\partial f}{\partial y} = -2ye^{-x^2 - y^2} \sin(x) \cos(y) - e^{-x^2 - y^2} \sin(x) \sin(y) $$

Hint

Recall the chain rule and and product rule for finding the partial derivatives for $x$ and $y$.

# Write solution here

Task 4: Backward implementation

Implement the following functions in the ExpTrig class:

df_dx and df_dy which return the values of the partial derivatives for a given x and y . See the results in the previous task (in case you didn't complete the previous task the result is given there)
backward method so that it returns the gradient evaluated at a given x, y.

# Write your solution here

Optimization Method

Task 5: Gradient descent

The following steps outline the implementation of gradient descent for the function $f(x, y)$ , which is defined in the ExpTrig class. The implementation of optimize_function method involves several tasks that must be completed. Implement each of these steps in the cell below.

Initialize Coordinates: Start by initializing x and y as start_x and start_y , respectively. These are your starting guesses for the optimization.
Initialize History Lists: Create two lists, x_all and y_all , to record the coordinates visited during the optimization.
Start Gradient Descent Loop: Implement a for-loop that iterates a specified number of times. Within each iteration:
- Use the backward method of func to compute the gradients at the current estimate (x, y) .
- Update x and y using the optimization steps by taking a step in the direction opposite to the gradient, scaled by the learning_rate .
- Logging (optional): Add a logging mechanism to display the current state of the optimization. For example, print the iteration number, values x , y , and the function value, $f(x,y)$ at these coordinates every 25 iterations.
- Record History: append the current values of x and y to x_all and y_all , respectively.
Return the Result: the function should return x_all and y_all .

def optimize_function(func, start_x, start_y, learning_rate, iterations):
    """
    Optimize a given function using gradient descent.

    Args:
    - func (Function): A function object that must have 'forward' and 'backward' methods.
                       The 'forward' method computes the function value at a given point (x, y),
                       and the 'backward' method computes the gradient at that point.
    - start_x (float): The starting x-coordinate for the optimization.
    - start_y (float): The starting y-coordinate for the optimization.
    - learning_rate (float): The step size for each iteration in the gradient descent.
    - iterations (int): The total number of iterations for the optimization process.

    Returns:
    - x_all (list of float): List of all x-coordinates of the points visited during the optimization.
    - y_all (list of float): List of all y-coordinates of the points visited during the optimization.

    This function performs gradient descent on the provided function object. Starting from 
    (start_x, start_y), it iteratively moves in the direction opposite to the gradient, 
    with step sizes determined by the learning rate. The function's value and current 
    coordinates are printed every 25 iterations. The function returns the history of 
    coordinates visited during the optimization.
    """
    # return ...
    # Write the optimization rutine here

# Example optimization
x_arr, y_arr = optimize_function(ExpTrig(), start_x=-1.5, start_y=-1., learning_rate=0.2, iterations=100)

Iteration 1, x = -1.4877579973279986, y = -0.9851319464762178, f(x, y) = -0.022818031499845872
Iteration 26, x = -0.6685122063716475, y = -0.05961890104988543, f(x, y) = -0.39432935018582344
Iteration 51, x = -0.6532713185908388, y = -6.700323564826335e-05, f(x, y) = -0.39665295841433423
Iteration 76, x = -0.6532711870955082, y = -7.500448458167815e-08, f(x, y) = -0.3966529610854677

Task 6: Optimization path

Run the cell below to plot the optimization path of the optimize_function .
Identify areas where the optimization slows down or oscillates and explain why might these occur?
How might a different learning rate affect the optimization path? Consider the possibilities of paths that overshoot, oscillate, or stagnate.
How does the shape of the surface (e.g., steep slopes or flat regions) influence the behavior of the optimization path?

x = np.linspace(-2, 2, 400)
y = np.linspace(-2, 2, 400)
x, y = np.meshgrid(x, y)
z = ExpTrig().forward(x,y)

# Assuming you have lists `x_values` and `y_values` containing the optimization path
x_values = np.array(x_arr)
y_values = np.array(y_arr)
z_values = np.exp(-np.array(x_values)**2 - np.array(y_values)**2) * np.sin(x_values) * np.cos(y_values)

# Plot the optimization path on the surface
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x, y, z, cmap='viridis', alpha=0.6)
ax.plot(x_values, y_values, z_values, marker='o', color='r', markersize=5, label='Optimization Path')
ax.plot(x_values[-1], y_values[-1], z_values[-1], marker='x', color='b', markersize=10, label='Final Point')
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('f(x, y)')
ax.set_title('Optimization Path on f(x, y) Surface')
ax.legend()
plt.show()

Task 7: Reflection

Assess the proficiency of the of the gradient descent algorithm

Use gradient descent to find the minimum using the following starting points:

$$ x_{0}=1.5, y_{0}=1.5 $$ $$ x_{0}=-1.5, y_{0}=-1.5 $$ $$ x_{0}=-1.0, y_{0}=1.3 $$ $$ x_{0}=1.2, y_{0}=-1.5 $$

In the cell below describe the main observations from these optimizations.
Explain why the optimization process sometimes fails to find the global minimum?
Use different learning rates (try 0.1, 0.5, 1.0) and discuss how they affect the results.
List two issues commonly encountered in gradient descent optimization and provide a potential solution for each problem:
- Describe each issue and explain why it poses a challenge.
- Propose a potential solution to address each issue effectively.

# Write your reflections here...