Practical introduction to Python and Numpy

Important

We recommend that you use use either Jupyter Lab (not notebook) or Visual Studio Code to solve the exercises as you may otherwise risk bad formatting of the exercises. The Jupyter Lab and the Visual Studio Code exercise files are located in different folders in the course material repository on OneDrive.

Overview of tasks

Task 1: Vector length
Task 2: List comprehensions
Task 3: Inner product
Task 4: Array indexing
Task 5: Length using numpy
Task 6: Calculating angles
Task 7: Distances - continued

This exercise delves deeper into the syntax of Python and NumPy, providing guidelines for effectively working with arrays using both lists (native) and NumPy arrays. The first part of the exercise is about implementing basic linear algebra operations using lists while the second part is about using Numpy arrays.Notice, in this course we will primarily be using Numpy when operating on vectors and matrices.

Run each code cell as you progress through the exercise. Incomplete cells are part of tasks and have to be completed by you.

Using native types in Python to implement basic linear algebra operations

The vectors va and vb are defined as:

va = [2, 2]
vb = [3, 4]

Vector length

The length (L2-norm) of a vector is defined as

$$ ||v|| = \sqrt{\sum_{i=1}^N v_i^2}. $$

Task 1: Vector length

Implement the length as a Python function in the code cell below.
Calculate the length of the vectors va and vb using the implementation from (1).
Verify the result using pen and paper.

Hints:

For-loops in Python loop through the elements of an iterator and takes the current iterator value as the iteration variable, similar to for-each loops in Java.
The range(x) function in Python returns an iterator of integers from $0,\dots, x-1$.
The length of a list l can be found using the len(l) function.
The ** operator implements exponentiation in Python. For the square root of x , use x**(1/2) .
Use Python's built in help(<function/class/method>) function for additional documentation. In Jupyter Lab, you can also open a documentation popover by pressing placing the text cursor on the desired symbol and pressing Shift + Tab.

def length(v):
    ...


print('a', length(va))
print('b', length(vb))
assert length(va) == 8**0.5
assert length(vb) == 5

a 2.8284271247461903
b 5.0

List comprehensions

Using loops for list iteration requires quite a lot of boilerplate code. Fortunately, Python's list comprehensions are created to make list iteration more expressive and easier to understand.

A list comprehension has the following form

[f(e) for e in list]

where $f$ is an arbitrary function applied to each element $e$. This is equivalent to the map function in functional programming. Note: List comprehensions can also include guard rules. You can read more about list comprehensions here .

Python also provides a wealth of utility functions for performing common list operations. One such function is

sum(l)

which sums all elements in the list argument.

Task 2: List comprehensions

Implement the length2 function in the cell below by using a list comprehension and the sum function .
- First, exponentiate each element in a list comprehension, resulting in a new list of values.
- Then use the sum function to add all elements and calculate the square root of the sum.
Verify the result using pen and paper.

def length2(v):
    ...

print('a', length2(va))
print('b', length2(vb))
assert length2(va) == 8**0.5
assert length2(vb) == 5

a 2.8284271247461903
b 5.0

Task 3: Inner product

In this task you will calculate the dot product of two vectors using lists. The definition of the dot product:

$$ a\cdot b = \sum_{i=1}^N a_ib_i. $$

where $a$ and $b$ are $n$-dimensional vectors.

Complete the function dot below by implementing the inner (dot) product using either for-loops or list comprehensions.
- Note: If you want to use list comprehensions you need the function zip to interleave the two lists. The zip function is equivalent to zip in most functional programming languages. The documentation can be found here
Test the implementation on va and vb . Verify the results using pen and paper.

def dot(a, b):
    ...

# Tests
assert dot(va, vb) == 14

Numpy

Numpy makes it way easier to work with multidimensional arrays and provides a significant performance increase when operating on arrays. Refer to this week's tutorial for further information.

The following code imports the numpy package and creates a $3\times 3$ matrix:

Note that the import statement renames numpy to np . This is commonly done in Python to avoid namespace confusion.

import numpy as np

A = np.array([
    [1, 2, 3],
    [3, 4, 9],
    [5, 7, 3]
])

Use A.shape to get the dimensions (size) of the array. The shape property works on all Numpy arrays, e.g. (A*2).shape works as well (we will return to array operations later in this exercise).

The cell below prints the shape of A :

A.shape

Slicing

Slicing allows you to select a subarray of elements using the <start>:<stop> notation, e.g. 0:2 . Inspect the code cell below for a few examples:

single = A[0]
print('single element', single)

vector = A[:2, 1] # 0's can be ommitted.
print('vector of elements', vector)

matrix = A[:, :2]
print('matrix of elements\n', matrix)

single element [1 2 3]
vector of elements [2 4]
matrix of elements
 [[1 2]
 [3 4]
 [5 7]]

Negative indices are equivalent to counting backwards from the end of the array, i.e. -<idx> is equivalent to len(a)-<idx> . A few examples:

single = A[-1, -1]
print('single', single)

arange = A[0:-2, 0:-1]
print('arange', arange)

single 3
arange [[1 2]]

Info

You can find the official documentation for Numpy slicing here .

Task 4: Array indexing

Use slicing to create the following variables:

Create a 2x2 matrix ur from of the upper right corner of A .
Extract the 2nd row of A and store it in the variable row .
Extract the 1st column of A and store it in the variable col .

ur = ...
row = ...
col = ...

print('upper right\n', ur)
print('row', row)
print('column', col)

# Tests
assert np.all(ur == np.array([[2, 3], [4, 9]]))
assert np.all(row == np.array([3, 4, 9]))
assert np.all(col == np.array([1, 3, 5]))

upper right
 [[2 3]
 [4 9]]
row [3 4 9]
column [1 3 5]

Using Numpy array operations

While these implementations seem fine for small inputs, they become unbearingly slow for large arrays.

Let's try an example. The code below uses numpy to generate $1000000$-dimensional vectors of random numbers:

ta = np.random.randint(100, size=1000000)
tb = np.random.randint(100, size=1000000)

Info

In this course the speed of your programs should not be of a major concern.

Jupyter notebooks support the command %timeit <statement> , which runs a performance test on a given statement. This makes it possible to performance test the native implementation of the inner product from Task 3:

%timeit dot(ta, tb)

238 ms ± 5.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Not very fast, huh? Let's try using Numpy's built-in function for inner products, np.dot :

%timeit np.dot(ta, tb)

519 µs ± 5.95 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

This is approximately 300 times faster than the native implementation (on the test computer, anyway)!. What about other list operations? Let's try the sum function:

%timeit sum(ta)
%timeit np.sum(ta)

77.6 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
408 µs ± 2.26 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Again, a similar performance improvement. Because of its performance, Numpy should always be used instead of native Python wherever possible. In general, you should expect a speed improvement of several orders of magnitude when using Numpy.

Beware that speed is not a central topic for this course - but hey there is no reason to waste our time either!

Adapting to Numpy

This exercise is about adapting the length function implemented int Task 1 to Numpy. Overloaded operators are common in Numpy. For example, the ** operator can be used to raise the elements of a NumPy array x to any power. For example x**4 raises all elements of the array x to the power of $4$

Task 5: Length using numpy

In the cell below, implement length_np using Numpy. You can use Numpy's sum function (np.sum ).
Test it on the provided input vec .

def length_np(v):
    ...

vec = np.array([2, 3, 4, 5])
length_np(vec)

Compare the Python and Numpy implementations using an array of random numbers:

vr = np.random.randint(100, size=10000)

%timeit length_np(vr)
%timeit length(vr)
%timeit length2(vr)

11.5 µs ± 83.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.39 ms ± 35.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.04 ms ± 29.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This should reveal a large difference between the Numpy and Python implementations.

Angles between vectors

The angle between vectors $\mathbf{u}$ and $\mathbf{v}$ is described by the following relation (as will be presented in the lecture):

$$ \cos \theta = \frac{\mathbf{u}\cdot \mathbf{v}}{\|\mathbf{u}\|\|\mathbf{v}\|} $$

or equivalently

$$ \mathbf{u}\cdot \mathbf{v} = |\mathbf{u}\|\|\mathbf{v}\|\cos \theta $$

Note to self: Return the result as a tuple of (radians, degrees). Check what resources we currently have on tuples

Task 6: Calculating angles

Use Numpy to implement the angle function in the code cell below. The function should return the angle in radians between inputs a and b .
Verify the example below using pen and paper.
Make a new function called angle2 , which returns a tuple containing the angle in radians and degrees. Verify the results using a calculator.

def angle(a, b):
    ...

a = np.array([2, 3, 4])
b = np.array([0, -1, 2])
print(angle(a, b)) # The result should be: 1.1426035712129559
assert angle(a, b) == 1.1426035712129559

1.1426035712129559

Distances

The Euclidean distance between two vectors $\mathbf{a}$ and $\mathbf{b}$ is calculated as the length of the difference vector between $\mathbf{a}$ and $\mathbf{b}$, i.e. $\|\mathbf{a}-\mathbf{b}\|$.

Task 7: Distances

Use the code cell below to create two-dimensional vectors $\mathbf{a}=\begin{bmatrix}0\\0\end{bmatrix}$ and $\mathbf{b}=\begin{bmatrix}1\\1\end{bmatrix}$ using np.zeros and np.ones (refer to the tutorial for inspiration).
Calculate the distance between the points and print the result.
Create n-dimensional vectors $\mathbf{a}=\begin{bmatrix}0\\\vdots\\0\end{bmatrix}$ and $\mathbf{b}=\begin{bmatrix}1\\\vdots\\1\end{bmatrix}$ using np.zeros and np.ones (refer to the tutorial for inspiration) for $n=1, \dots, 10$. Calculate the distance between the vectors for each number of dimensions. Plot the distances as a function of $n$.

# Write your solution here

707.1067811865476

Task 8: Distances - continued

This task extends on the previous exercise.

Explain the relationship in the figure between the number of dimensions and the distance.
Derive a formula for the distance between $\mathbf{a}$ and $\mathbf{b}$ as a function of the number of dimensions $n$, i.e. $f(n)=?$