Lab Objectives¶
This lab aims to use a neural network with one hidden layer to do classification. You will learn how to
- Implement a 2-class classification neural network with a single hidden layer.
- Compute the cross entropy loss.
- Implement forward and backward propagation.
Please try out the following cells and run the python code in your notebook.
This is not an assignment and you do not need to submit it
1 - Packages¶
First let's import all the packages that you will need during this lab. You can also get help by clicking on the following hyperlinks to the packages:
- numpy is the fundamental package for scientific computing with Python.
- sklearn provides simple and efficient tools for data mining and data analysis.
- matplotlib is a library for plotting graphs in Python.
- planar_utils provide various useful functions used in this notebook.
# Package imports
import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
import sklearn.linear_model
from planar_utils import plot_decision_boundary, sigmoid, load_planar_dataset
%matplotlib inline
np.random.seed(1) # set a seed so that the results are consistent
2 - Dataset¶
Next, let's get the dataset you will work on. The following code will load a "flower" 2-class dataset into variables X and Y.
X, Y = load_planar_dataset()
X.shape
(2, 400)
Visualise the dataset using matplotlib. The data looks like a "flower" with some red (label y=0) and some blue (y=1) points. Our goal is to build a model to fit this data. In other words, we want the classifier to define regions as either red or blue.
# Visualise the data:
plt.scatter(X[0, :], X[1, :], c=Y, s=40, cmap=plt.cm.Spectral);
We have:
- a numpy-array (matrix) X that contains your features (x1, x2)
- a numpy-array (vector) Y that contains your labels (red:0, blue:1).
Lets first get a better sense of what our data is like.
Specifically, how many training examples do you have? In addition, what is the shape of the variables X and Y?
Here is how we can get the shape of a numpy array: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.shape.html
shape_X = X.shape
shape_Y = Y.shape
m = (X.size)/shape_X[0] # training set size
print ('The shape of X is: ' + str(shape_X))
print ('The shape of Y is: ' + str(shape_Y))
print ('There are m = %d training examples' % (m))
The shape of X is: (2, 400) The shape of Y is: (1, 400) There are m = 400 training examples
3 - Simple Logistic Regression¶
Before building a full neural network, let's first see how logistic regression performs on this problem. We can use sklearn's built-in functions to do that. Run the code below to train a logistic regression classifier on the dataset.
# Train the logistic regression classifier
clf = sklearn.linear_model.LogisticRegressionCV();
clf.fit(X.T, Y.T.ravel());
We can now plot the decision boundary of these models. Run the code below.
# Plot the decision boundary for logistic regression
plot_decision_boundary(lambda x: clf.predict(x), X, Y)
plt.title("Logistic Regression")
# Print accuracy
LR_predictions = clf.predict(X.T)
print (f'Accuracy of logistic regression: {np.mean(Y == LR_predictions)*100}%')
Accuracy of logistic regression: 47.0%
Interpretation: The dataset is not linearly separable, so logistic regression doesn't perform well. Hopefully a neural network will do better. Let's try this now!
4 - Neural Network model¶
Logistic regression did not work well on the "flower dataset". We are going to train a Neural Network with a single hidden layer.
Here is our model:

Mathematically:
For one example $x^{(i)}$: $$z^{[1] (i)} = W^{[1]} x^{(i)} + b^{[1]}\tag{1}$$ $$a^{[1] (i)} = \tanh(z^{[1] (i)})\tag{2}$$ $$z^{[2] (i)} = W^{[2]} a^{[1] (i)} + b^{[2]}\tag{3}$$ $$\hat{y}^{(i)} = a^{[2] (i)} = \sigma(z^{ [2] (i)})\tag{4}$$ $$y^{(i)}_{prediction} = \begin{cases} 1 & \mbox{if } a^{[2](i)} > 0.5 \\ 0 & \mbox{otherwise} \end{cases}\tag{5}$$
Given the predictions on all the examples, we can also compute the cost $J$ as follows: $$J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large\left(\small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large \right) \small \tag{6}$$
About activation function:
In addition to the Sigmoid function, there are more activation functions you may want to consider for use in hidden layers, such as Hyperbolic Tangent (Tanh) and Rectified Linear Activation (ReLU). More detail can be found in this tutorial: https://machinelearningmastery.com/choose-an-activation-function-for-deep-learning
Reminder: The general methodology to build a Neural Network is:
1. Define the neural network structure ( # of input units, # of hidden units, etc).
2. Initialize the model's parameters
3. Loop:
- Implement forward propagation
- Compute loss
- Implement backward propagation to get the gradients
- Update parameters (gradient descent)
We often build helper functions to compute steps 1-3 and then merge them into one function we call nn_model(). Once we've built nn_model() and learnt the right parameters, we can make predictions on new data.
4.1 - Define the neural network structure¶
def layer_sizes(X, Y):
"""
Arguments:
X -- input dataset of shape (input size, number of examples)
Y -- labels of shape (output size, number of examples)
Returns:
n_x -- the size of the input layer
n_h -- the size of the hidden layer
n_y -- the size of the output layer
"""
n_x = X.shape[0] # size of input layer
n_h = 4
n_y = Y.shape[0] # size of output layer
return (n_x, n_h, n_y)
4.2 - Initialize the model's parameters¶
- initialize the weights matrices with random values.
- Use
np.random.randn(a,b) * 0.01to randomly initialize a matrix of shape (a,b).
- Use
- initialize the bias vectors as zeros.
- Use
np.zeros((a,b))to initialize a matrix of shape (a,b) with zeros.
- Use
def initialize_parameters(n_x, n_h, n_y):
"""
Argument:
n_x -- size of the input layer
n_h -- size of the hidden layer
n_y -- size of the output layer
Returns:
params -- python dictionary containing parameters:
W1 -- weight matrix of shape (n_h, n_x)
b1 -- bias vector of shape (n_h, 1)
W2 -- weight matrix of shape (n_y, n_h)
b2 -- bias vector of shape (n_y, 1)
"""
np.random.seed(2)
W1 = np.random.randn(n_h,n_x) * 0.01
b1 = np.zeros((n_h,1))
W2 = np.random.randn(n_y,n_h) * 0.01
b2 = np.zeros((n_y,1))
assert (W1.shape == (n_h, n_x))
assert (b1.shape == (n_h, 1))
assert (W2.shape == (n_y, n_h))
assert (b2.shape == (n_y, 1))
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters
4.3 - The Loop¶
Implement forward_propagation().
- Look above at the mathematical representation (Equations (1) - (5)) of the classifier.
- We can use the function
sigmoid(). It is built-in (imported) in the notebook. - We also can use the function
np.tanh(). It is part of the numpy library. - Steps:
- Retrieve each parameter from the dictionary "parameters" (which is the output of
initialize_parameters()) by usingparameters[".."]. - Implement Forward Propagation. Compute $Z^{[1]}, A^{[1]}, Z^{[2]}$ and $A^{[2]}$ (the vector of all your predictions on all the examples in the training set).
- Retrieve each parameter from the dictionary "parameters" (which is the output of
- Values needed in the backpropagation are stored in "
cache". Thecachewill be given as an input to the backpropagation function.
def forward_propagation(X, parameters):
"""
Argument:
X -- input data of size (n_x, m)
parameters -- python dictionary containing your parameters (output of initialization function)
Returns:
A2 -- The sigmoid output of the second activation
cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
"""
# Retrieve each parameter from the dictionary "parameters"
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
# Implement Forward Propagation to calculate A2 (probabilities)
Z1 = np.dot(W1,X) + b1
A1 = np.tanh(Z1)
Z2 = np.dot(W2,A1) + b2
A2 = sigmoid(Z2)
assert(A2.shape == (1, X.shape[1]))
# Values needed in the backpropagation are stored in "cache". This will be given as an input to the backpropagation
cache = {"Z1": Z1,
"A1": A1,
"Z2": Z2,
"A2": A2}
return A2, cache
Compute cost
Now that we have computed $A^{[2]}$ (in the Python variable "A2"), which contains $a^{[2](i)}$ for every example, we can compute the cost function as follows:
$$J = - \frac{1}{m} \sum\limits_{i = 1}^{m} \large{(} \small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large{)} \small\tag{13}$$
- There are many ways to implement the cross-entropy loss. Here is how we implemente $- \sum\limits_{i=0}^{m} y^{(i)}\log(a^{[2](i)})$:
logprobs = np.multiply(np.log(A2),Y)
cost = - np.sum(logprobs) # no need to use a for loop!
def compute_cost(A2, Y, parameters):
"""
Computes the cross-entropy cost given in equation (13)
Arguments:
A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
Y -- "true" labels vector of shape (1, number of examples)
parameters -- python dictionary containing your parameters W1, b1, W2 and b2
Returns:
cost -- cross-entropy cost given equation (13)
"""
m = Y.shape[1] # number of example
# Compute the cross-entropy cost
logprobs = logprobs = np.multiply(Y,np.log(A2)) + np.multiply((1-Y), np.log(1-A2))
cost = - np.sum(logprobs)/m
cost = float(np.squeeze(cost)) # makes sure cost is the dimension we expect.
# E.g., turns [[17]] into 17
assert(isinstance(cost, float))
return cost
Implement the function backward_propagation().
Using the cache computed during forward propagation, we can now implement backward propagation.
Backpropagation is usually the hardest (most mathematical) part in deep learning. To help you review what you've learned in last lecture, below is the summary of gradient descent. We will be using the six equations on the right of the following image, since we are building a vectorised implementation.
- Tips:
- To compute dZ1 you'll need to compute $g^{[1]'}(Z^{[1]})$. Since $g^{[1]}(.)$ is the tanh activation function, if $a = g^{[1]}(z)$ then $g^{[1]'}(z) = 1-a^2$. So you can compute
$g^{[1]'}(Z^{[1]})$ using
(1 - np.power(A1, 2)).
- To compute dZ1 you'll need to compute $g^{[1]'}(Z^{[1]})$. Since $g^{[1]}(.)$ is the tanh activation function, if $a = g^{[1]}(z)$ then $g^{[1]'}(z) = 1-a^2$. So you can compute
$g^{[1]'}(Z^{[1]})$ using
def backward_propagation(parameters, cache, X, Y):
"""
Implement the backward propagation using the instructions above.
Arguments:
parameters -- python dictionary containing our parameters
cache -- a dictionary containing "Z1", "A1", "Z2" and "A2".
X -- input data of shape (2, number of examples)
Y -- "true" labels vector of shape (1, number of examples)
Returns:
grads -- python dictionary containing your gradients with respect to different parameters
"""
m = X.shape[1]
# First, retrieve W1 and W2 from the dictionary "parameters".
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
# Retrieve also A1 and A2 from dictionary "cache".
A1 = cache["A1"]
A2 = cache["A2"]
Z1 = cache["Z1"]
Z2 = cache["Z2"]
# Backward propagation: calculate dW1, db1, dW2, db2.
# corresponding to 6 equations shown above
dZ2 = A2 - Y
dW2 = (1 / m) * np.dot(dZ2, A1.T)
db2 = (1 / m) * (np.sum(dZ2, axis=1, keepdims=True))
dZ1 = np.dot(W2.T,dZ2) * (1 - np.power(A1, 2))
dW1 = (1 / m) * (np.dot(dZ1, X.T))
db1 = (1 / m) * (np.sum(dZ1, axis=1, keepdims=True))
grads = {"dW1": dW1,
"db1": db1,
"dW2": dW2,
"db2": db2}
return grads
Implement the update rule.
We will use (dW1, db1, dW2, db2) in order to update parameters (W1, b1, W2, b2).
General gradient descent rule: $ \theta = \theta - \alpha \frac{\partial J }{ \partial \theta }$ where $\alpha$ is the learning rate and $\theta$ represents a parameter. Learning rate is also called a hyperparameter in this model.
Illustration: The gradient descent algorithm with a good learning rate (converging) and a bad learning rate (diverging).

def update_parameters(parameters, grads, learning_rate):
"""
Updates parameters using the gradient descent update rule given above
Arguments:
parameters -- python dictionary containing your parameters
grads -- python dictionary containing your gradients
Returns:
parameters -- python dictionary containing your updated parameters
"""
# Retrieve each parameter from the dictionary "parameters"
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
# Retrieve each gradient from the dictionary "grads"
dW1 = grads["dW1"]
db1 = grads["db1"]
dW2 = grads["dW2"]
db2 = grads["db2"]
# Update rule for each parameter
W1 = W1 - learning_rate * dW1
b1 = b1 - learning_rate * db1
W2 = W2 - learning_rate * dW2
b2 = b2 - learning_rate * db2
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters
4.4 - Integrate parts 4.1, 4.2 and 4.3 in nn_model()¶
Goal: Build our neural network model in nn_model().
The neural network model needs to use the previous functions in the right order.
def nn_model(X, Y, n_h, learning_rate, num_iterations = 10000, print_cost=False):
n_x = layer_sizes(X, Y)[0]
n_y = layer_sizes(X, Y)[2]
# Initialize parameters
parameters = initialize_parameters(n_x, n_h, n_y)
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
# Loop (gradient descent)
for i in range(0, num_iterations):
# Forward propagation
A2, cache = forward_propagation(X, parameters)
# Cost function
cost = compute_cost(A2, Y, parameters)
# Backpropagation
grads = backward_propagation(parameters, cache, X, Y)
# Update rule for each parameter
parameters = update_parameters(parameters, grads, learning_rate)
# If print_cost=True, Print the cost every 1000 iterations
if print_cost and i % 1000 == 0:
print ("Cost after iteration %i: %f" %(i, cost))
# Returns parameters learnt by the model. They can then be used to predict output
return parameters
4.5 - Predictions¶
Goal: Use the model to predict by building predict(). Use forward propagation to predict results.
Reminder: predictions = $y_{prediction} = \mathbb 1 \text{{activation > 0.5}} = \begin{cases} 1 & \text{if}\ activation > 0.5 \\ 0 & \text{otherwise} \end{cases}$
If we would like to set the entries of a matrix X to 0 and 1 based on a threshold we would do: X_new = (X > threshold)
def predict(parameters, X):
"""
Using the learned parameters, predicts a class for each example in X
Arguments:
parameters -- python dictionary containing your parameters
X -- input data of size (n_x, m)
Returns
predictions -- vector of predictions of our model (red: 0 / blue: 1)
"""
A2, cache = forward_propagation(X, parameters)
predictions = (A2 > 0.5)
return predictions
It is time to run the model and see how it performs on a planar dataset. Run the following code to test your model with a single hidden layer of $n_h$ hidden units.
parameters = nn_model(X, Y, 4, 1.2 , num_iterations = 10000, print_cost=True)
# Plot the decision boundary
plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
plt.title("Decision Boundary for hidden layer size " + str(4))
# Print accuracy
predictions = predict(parameters, X)
print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')
Cost after iteration 0: 0.693048 Cost after iteration 1000: 0.288083
Cost after iteration 2000: 0.254385 Cost after iteration 3000: 0.233864 Cost after iteration 4000: 0.226792 Cost after iteration 5000: 0.222644 Cost after iteration 6000: 0.219731 Cost after iteration 7000: 0.217504 Cost after iteration 8000: 0.219554 Cost after iteration 9000: 0.218585 Accuracy: 90%
/var/folders/nx/sjjh3ql16mz61qpqlll0lj840000gn/T/ipykernel_70651/150445256.py:9: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')
Accuracy is really high compared to what we've got with Logistic Regression (47%). The model has learnt the leaf patterns of the flower! Neural networks are able to learn even highly non-linear decision boundaries.
To summarise, you've learnt to:
- Build a complete neural network with a hidden layer
- Make a good use of a non-linear unit
- Implemented forward propagation and backpropagation, and trained a neural network
References:
end