Abstract

Gradient Checking is a important method to verify whether our backpropagation code is correct or not. In this article, I will introduce the theory of n-layer gradient checking firstly, and then i will implement a 3 layer gradient checking step by step,hope after this article,you can implement the gradient checking method to resolve your neural network’s backpropagation bugs.

Theory

We know we can use $\frac{J(\theta+\epsilon)-J(\theta-\epsilon)}{2\epsilon}$ to estimate the grad of J in $\theta$ :

But for the n layer neural network ,there are many parameters as $W^{},b^{},W^{},b^{},…,W^{[l]},b^{[l]}$, and $dW^{},db^{},dW^{},db^{},…,dW^{[l]},db^{[l]}$ for these parameters when we do backpropagation, we can merge these parameters to a big parameter named $\theta$ wich contains $\theta^{},\theta^{},..,\theta^{[l]}$and $d\theta$ contains $d\theta^{},d\theta^{},..,d\theta^{[l]}$. For gradient checking,we can fix the number of $l-1$ parameters and assume that $J$ is only related with one parameter $\theta^{[i]}$ ,so we can estiimate the grad of J in $\theta^{[i]}$:

Thus we can coculate $d\theta{approx}^{[i]}$ for every $\theta^{[i]}$ and then we should measure whether $d\theta{approx}$ is close $d\theta$ within our acceptable range by this formla :

I will set $\epsilon=1e^{-7}$ and if $difference \le 1e^{-7}$ ,we can say our backpropagation is correct , if $1e^{-7} < difference\le 1e^{-5}$ ,we should alert whether our backpropagation is correct,and if $difference>1e^{-5}$

there is a great possibility that our backpropagation is incorrect.

Implement

First of all, we need implement the dictionary_to_vector and vector_to_dictionary function to convert the parameters with parameter:

vector_to_dictionary

Then we can implements the forward and backward propagation of neural network:

backward_propagation_n

then we can implement the gradient_check:

Instructions

For each i in num_parameters:

• To compute J_plus[i]:
1. Set $\theta^{+}$ to np.copy(parameters_values)
2. Set $\theta^{+}_i$ to $\theta^{+}_i + \varepsilon$
3. Calculate $J^{+}_i$ using to forward_propagation_n(x, y, vector_to_dictionary($\theta^{+}$ )).
• To compute J_minus[i]: do the same thing with $\theta^{-}$
• Compute $gradapprox[i] = \frac{J^{+}_i - J^{-}_i}{2 \varepsilon}$

Thus, we get a vector gradapprox, where gradapprox[i] is an approximation of the gradient with respect to parameter_values[i]. You can now compare this gradapprox vector to the gradients vector from backpropagation. Just like for the 1D case (Steps 1’, 2’, 3’), compute:

Output: It seems that there were errors in the backward_propagation_n code we implement! Good that we’ve implemented the gradient check. Go back to backward_propagation and try to find/correct the errors :

It should be:

then we run the test gradient cheking code: the backpagation is correct now!