Info
The tutorials may contain small exercises and these are all optional.
The tutorials may contain small exercises and these are all optional.
We assume you have read the reading material for this week prior to starting the tutorial. This tutorial covers the following topics:
Introduction to arrays and vectors in numpy.
Loading/Saving data.
Essential methods for data analysis/manipulation.
Elementary plotting using matplotlib.
Run the cell below to import Numpy and Matplotlib:
#Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
Numpy has several functions for creating arrays. The following are especially useful for this course (read more about array creation here ):
np.ones(size)
, np.zeros(size)
: Creates an array of size size
with either all ones or zeros.np.linspace(start, stop, num)
, np.arange(start, stop, step)
: Creates 1d arrays of ranges from start
to stop
(inclusive) using either interpolation to create num
elements in the case of linspace
or using a certain step
-size in arange
.np.random.uniform(size)
, np.random.normal(loc, scale, size)
: Creates arrays with random elements drawn either from a uniform or normal/Gaussian distribution. For the gaussian, loc
=$\mu$ (mean) and scale
=$\sigma$ (standard deviation).Don't worry about memorizing them for now.
The cell below shows a few samples of their use:
a_ones = np.ones((2, 3)) # 2 by 3 array of ones.
a_zeros = np.zeros((3, 2)) # 2 by 3 array of ones.
a_linspace = np.linspace(0, 10, 5) ## creates an array of 5 numbers evenly spaced from 0 to 9 (10-1 # zero indexed).
a_arange = np.arange(0, 10, 2) # creates arrays from 0 to 9 (max) with a stride of 2. since (10>9) the max value will be 8.
a_uniform = np.random.uniform(size= (2, 2)) # creates a 2 by 2 array of "random" numbers drawn from a uniform distribution.
a_normal = np.random.normal(size=(2, 2)) # creates a 2 by 2 array of "random" numbers drawn from a normal/gaussian distribution.
print('ones:\n', a_ones)
print('zeros:\n', a_zeros)
print('linspace:\n', a_linspace)
print('arange:\n', a_arange)
print('uniform:\n', a_uniform)
print('normal:\n', a_normal)
ones: [[1. 1. 1.] [1. 1. 1.]] zeros: [[0. 0.] [0. 0.] [0. 0.]] linspace: [ 0. 2.5 5. 7.5 10. ] arange: [0 2 4 6 8] uniform: [[0.93462787 0.69105722] [0.66622164 0.80950346]] normal: [[ 0.07697199 0.58999498] [-1.85410955 -0.5251612 ]]
There is no need for iteration (i.e. loops) when creating arrays in numpy!
The following example shows how to save Numpy arrays. Numpy arrays can be stored in two formats:
np.save(save_path)
stores the data as a compressed npy file.np.savetxt(save_path)
stores the data as a (uncompressed txt-file).a_normal_50 = np.random.normal(size=(50,2))
## Saving the array as a compressed npy file (numpy data format)
np.save('./Data/RandomData.npy',a_normal_50)
a_arange_50 = np.arange(0,100,2)
np.save('./Data/StructuredData.npy',a_arange_50)
#numpy can additionally save to as a txt-file (uncompressed) formats like.
a_linspace_50 = np.linspace((1,2),(10,20),10)
### saving data as a regular txt file, also possible to save as a csv file
np.savetxt('./Data/Txt_file.txt',a_linspace_50)
Numpy arrays can be loaded with the Numpy functions np.load(path)
and np.loadtxt(path)
aws shown in the following example:
## Loading data stored as a compressed npy file (numpy data format)
A = np.load('./Data/RandomData.npy')
B = np.load('./Data/StructuredData.npy')
#load data stored as a txt/ (csv) file (uncompressed) formats like.
C = np.loadtxt('./Data/Txt_file.txt')
# Note A[:N] is only a slice i.e. the first N elements of A
print('A:\n',A[:5])
print('B:\n',B[:10])
print('C:\n',C[:5])
A: [[-0.26213006 -0.5933807 ] [ 1.08625266 -1.29803499] [-0.84009091 -0.76566598] [ 0.94150557 0.75964165] [ 0.59257595 0.98924685]] B: [ 0 2 4 6 8 10 12 14 16 18] C: [[ 1. 2.] [ 2. 4.] [ 3. 6.] [ 4. 8.] [ 5. 10.]]
Numpy arrays are also used for handling multidimensional data, sometimes requiring operations along specific axes.
In this example, we calculate the average of $N$ random vectors.
The cell below defines an $N\times K$ matrix of random values:
N, K = 20, 10
r = np.random.uniform(size=(N, K))
The Numpy function np.mean
calculates averages over Numpy arrays. The axis
argument specifies the direction ($0$ for rows or $1$ for columns) of the calculation. This is demonstrated in the cell below:
np.mean(r, axis=0)
The axis
argument is supported by most of Numpy's functions, including sum
and sqrt
.
This section covers essential methods for data analysis and manipulation. The methods will be used abundantly throughout the course and are worth paying careful attention to.
np.mean(Array,dim)
, np.std(Array,dim)
: Calculate the mean value of a given Numpy array of numbers (floats
or integers
).a.shape
: Finds the shape (dimensionality of a given data array), Len(list/Array)
provides the length of the first list/Array dimension.:
operator can create slices (subvectors) of an array A
as A[start:stop:step]
. Read more in the official guide here
.np.concatenate(Array list, axis)
: Stack numpy arrays along the direction of axis
.Next, we consider a few examples to demonstrate the functionalities described above:
A = np.linspace(0,9,10)
B = np.array([
[-16, 15, -14, 13],
[-12, 11, -10, 9],
[-8, 7, -6, 5],
[-4, 3, -2, 1]
])
print('A:\n',A)
print('B:\n',B)
A: [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.] B: [[-16 15 -14 13] [-12 11 -10 9] [ -8 7 -6 5] [ -4 3 -2 1]]
### Mean of an array
# Using/calling the mean method from the Numpy library to determine the mean of the loaded data.
print('Mean A:\n',np.mean(A))
# Most Numpy array manipulation methods can additionally be called from an array object
print('Mean of using Array method:\n',A.mean())
### Std of an array
print('Std of A:\n',np.std(A))
### Sum of an array
print('A sum:\n', np.sum(A))
### shape (size) of an array
print('A shape:\n',A.shape)
print('B shape:\n',B.shape)
## np.concatenation([A,B]) example
print('Concatenation of A and Slice of B matrix:\n',np.concatenate([A,B[0,:]],axis=0))
Mean A: 4.5 Mean of using Array method: 4.5 Std of A: 2.8722813232690143 A sum: 45.0 A shape: (10,) B shape: (4, 4) Concatenation of A and Slice of B matrix: [ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. -16. 15. -14. 13.]
### Slicing of array
print(B[:,0])
print(A[:5])
print('A[5:], A array except the first 5:\n',A[5:])
print('A[:-5], A array except the last 5:\n', A[:-5])
print('A[1::2] array of every second elemt of A starting from the second:\n',A[1::2])
[-16 -12 -8 -4] [0. 1. 2. 3. 4.] A[5:], A array except the first 5: [5. 6. 7. 8. 9.] A[:-5], A array except the last 5: [0. 1. 2. 3. 4.] A[1::2] array of every second elemt of A starting from the second: [1. 3. 5. 7. 9.]
### Adding of array
print('Adding a slice of A shape (4,) to B shape (4,4) using broadcasting:\n',A[:4]+B)
print('Adding constant to A (10,) using broadcasting:\n',A+10)
print('Adding single element array (shape (1,)) to B (shape (4,4)) using broadcasting:\n',B + np.array([10]))
### Elementwise multiplication of arrayLoading
print('Elementwise multiplication of a slice of A (shape (4,)) to B (shape (4,4)) using broadcasting:\n',A[:4]*B)
### Add division example
print('Elementwise division of a slice of B (shape (4,)) and A (shape (4,)):\n',B[0,:]/A[1:5])
Adding a slice of A shape (4,) to B shape (4,4) using broadcasting: [[-16. 16. -12. 16.] [-12. 12. -8. 12.] [ -8. 8. -4. 8.] [ -4. 4. 0. 4.]] Adding constant to A (10,) using broadcasting: [10. 11. 12. 13. 14. 15. 16. 17. 18. 19.] Adding single element array (shape (1,)) to B (shape (4,4)) using broadcasting: [[-6 25 -4 23] [-2 21 0 19] [ 2 17 4 15] [ 6 13 8 11]] Elementwise multiplication of a slice of A (shape (4,)) to B (shape (4,4)) using broadcasting: [[ -0. 15. -28. 39.] [ -0. 11. -20. 27.] [ -0. 7. -12. 15.] [ -0. 3. -4. 3.]] Elementwise division of a slice of B (shape (4,)) and A (shape (4,)): [-16. 7.5 -4.66666667 3.25 ]
Just as the elementwise arithmetic operators, Numpy implements elementwise comparison operators (see the official guide
for additional detail). For example, to find all elements of vr
larger than $98$, write:
vr = np.array([0, 99, 5, 70, 24, 1, 200]) # Create array of random values
vr > 98
This boolean array can be used to select elements from a Numpy array:
comparison = vr > 98
vr[comparison]
Boolean arrays can be combined by using the logical operators &
and |
:
vr[(vr < 2) | (vr > 98)]
Boolean indexing can also be used for assignment:
vr[vr > 50] = 0
vr
Matplotlib contains an API for creating and manipulating plots using functions.
plot
and scatter
will be the most frequently used functions in this course:
plot
is typically used for creating connected line segments described by x and y data.scatter
is used for plotting individual points, e.g. from a dataset.Take a look at the following sample plot code and output:
x_range = np.linspace(0, 5, 50) # Creates an array of linearly spaced elements
y_linear = x_range + 3 # adding to constant to the numpy array (broadcasting)
y_quadratic = x_range**2 # elementwise exponetiation
y_exp = np.exp(x_range) # exponential function applied elemtwise to x_range
plt.plot(x_range, y_linear)
plt.plot(x_range, y_quadratic)
plt.plot(x_range,y_exp);
Scatter plots are two-dimensional plots of individual points. The example below creates a quadratic function, adds normally distributed random noise to it, and plots both the original (with plt.plot
) and the noisy points (with plt.scatter
).
x_range = np.linspace(-10, 10, 50) # Create the x-values for the plot
y_values = x_range**2 # Calculate the y-values for the quadratic
noise = np.random.normal(scale=5, size=50) # Create random noise
y_noise = y_values + noise # Add the noise to the y-values
plt.plot(x_range, y_values) # Plot the quadratic function
plt.scatter(x_range, y_noise); # Plot the noisy points
Matplotlib allows customization of plots. Some useful functionality is described below:
plt.plot
takes a third argument, format
, which is used to adapt the styling of lines. Generally, a letter designating a color (e.g. r
,g
,b
) and a symbol designating line or point style (e.g. +
, --
) are combined to produce a format, e.g. r+
to create red crosses.plt.scatter
takes an argument c
for the color (can be letter form or complete color names) and an argument marker
for the marker style (e.g. +
, o
).Here is a basic example:
plt.plot(x_range, y_values, 'r--')
plt.scatter(x_range, y_noise, c='green', marker='d');
Matplotlib automatically assigns colors to lines and point series using an internally defined style
, however, you can change colors manually. The current style can be changed permanently using plt.style.use(style)
or inside a with
block using plt.style.context(style)
. A reference of built-in style-sheets can be found here
. The cell below shows an example:
# We create some normal and uniformly distributed noise. (random data i.e. not structured)
xs, ys = np.random.normal(size=(2, 100))
xu, yu = np.random.uniform(size=(2,100))
with plt.style.context('dark_background'):
plt.scatter(xs, ys, marker='+')
plt.scatter(xu, yu, marker='x')
Legend, title, and axis labels can be added to plots using the following functions:
plt.legend(titles)
: Creates a legend using a list of titles
for the names. Previously plotted elements are added in order.plt.title(title)
: Set plot title using string title
. Use plt.suptitle
when adding a title to multiple plots.plt.ylabel(name)
/plt.xlabel(name)
: Set plot axis labels.plt.legend(label_list)
: Setting the data labels can be done with a label list or without input if labels are provided at each separate plot.with plt.style.context('dark_background'):
plt.scatter(xs, ys, marker='+')
plt.scatter(xu, yu, marker='x')
plt.legend(['normal', 'uniform'])
plt.title('Comparison of distributions')
plt.ylabel('Y')
plt.xlabel('X')
Matplotlib makes it possible to combine multiple plots into one figure. The function plt.subplots
creates a figure with multiple sub-plots. The function returns a figure object and an array of axes objects. The axes objects are used to make plots in each subplot, add titles, and so forth. An example is shown in the cell below:
figure, axes = plt.subplots(2, 2, figsize=(7, 5))
axes[0, 0].plot(x_range, y_linear)
axes[0, 1].plot(x_range, y_quadratic)
axes[1, 0].scatter(xs, ys)
axes[1, 1].plot(x_range, y_values)
axes[1, 1].scatter(x_range, y_noise);
To save a plot, use plt.savefig(output_path)
to save the last plot created. An example is provided below:
plt.savefig('./outputs.pdf');