Logistic Regression with a Neural Network mindset

General Architecture of the learning algorithm

It’s time to design a simple algorithm to distinguish cat images from non-cat images.

I will build a Logistic Regression, using a Neural Network mindset. The following Figure explains why Logistic Regression is actually a very simple Neural Network! Mathematical expression of the algorithm:

For one example Missing superscript or subscript argument x^{(i)} :

The cost is then computed by summing over all training examples:

Key steps:
In this exercise, I will carry out the following steps:

- Initialize the parameters of the model

• Learn the parameters for the model by minimizing the cost
• Use the learned parameters to make predictions (on the test set)
• Analyse the results and conclude

Building the parts of algorithm

The main steps for building a Neural Network are:

1. Define the model structure (such as number of input features)

2. Initialize the model’s parameters

3. Loop:

• Calculate current loss (forward propagation)

• Calculate current gradient (backward propagation)

You often build 1-3 separately and integrate them into one function we call model().

Helper functions

sigmoid

Using code from “Python Basics”, implement sigmoid(). As we seen in the figure above, I will compute $sigmoid( w^T x + b) = \frac{1}{1 + e^{-(w^T x + b)}}$ to make predictions. I will use np.exp().

Forward and Backward propagation

Now that our parameters are initialized, we can do the “forward” and “backward” propagation steps for learning the parameters.

Exercise: Implement a function propagate() that computes the cost function and its gradient.

Hints:

Forward Propagation:

• I get X
• I compute $A = \sigma(w^T X + b) = (a^{(1)}, a^{(2)}, …, a^{(m-1)}, a^{(m)})$
• I calculate the cost function: $J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$

Here are the two formulas I will be using:

Optimization

• I have initialized our parameters.
• We are also able to compute a cost function and its gradient.
• Now, I want to update the parameters using gradient descent.

Exercise: Write down the optimization function. The goal is to learn $w$ and $b$ by minimizing the cost function $J$. For a parameter $\theta$, the update rule is $\theta = \theta - \alpha \text{ } d\theta$, where $\alpha$ is the learning rate.

predict

Exercise: The previous function will output the learned w and b. We are able to use w and b to predict the labels for a dataset X. Implement the predict() function. There are two steps to computing predictions:

1. Calculate $\hat{Y} = A = \sigma(w^T X + b)$

2. Convert the entries of a into 0 (if activation <= 0.5) or 1 (if activation > 0.5), stores the predictions in a vector Y_prediction. If you wish, you can use an if/else statement in a for loop (though there is also a way to vectorize this).

What to remember

We’ve implemented several functions that:

• Initialize (w,b)
• Optimize the loss iteratively to learn parameters (w,b):
• computing the cost and its gradient
• updating the parameters using gradient descent
• Use the learned (w,b) to predict the labels for a given set of examples

Merge all functions into a model

You will now see how the overall model is structured by putting together all the building blocks (functions implemented in the previous parts) together, in the right order.

Exercise: Implement the model function. Use the following notation:

- Y_prediction_test for your predictions on the test set

• Y_prediction_train for your predictions on the train set
• w, costs, grads for the outputs of optimize()

github