## Abstract

Gradient Checking is a important method to verify whether our backpropagation code is correct or not. In this article, I will introduce the theory of n-layer gradient checking firstly, and then i will implement a 3 layer gradient checking step by step,hope after this article,you can implement the gradient checking method to resolve your neural network’s backpropagation bugs.

## Theory

We know we can use $\frac{J(\theta+\epsilon)-J(\theta-\epsilon)}{2\epsilon}$ to estimate the grad of J in $\theta$ :

But for the n layer neural network ,there are many parameters as $W^{[1]},b^{[1]},W^{[2]},b^{[2]},…,W^{[l]},b^{[l]}$, and $dW^{[1]},db^{[1]},dW^{[2]},db^{[2]},…,dW^{[l]},db^{[l]}$ for these parameters when we do backpropagation, we can merge these parameters to a big parameter named $\theta$ wich contains $\theta^{[1]},\theta^{[2]},..,\theta^{[l]}$and $d\theta$ contains $d\theta^{[1]},d\theta^{[2]},..,d\theta^{[l]}$. For gradient checking,we can fix the number of $l-1$ parameters and assume that $J$ is only related with one parameter $\theta^{[i]}$ ,so we can estiimate the grad of J in $\theta^{[i]}$:

Thus we can coculate $d\theta{approx}^{[i]}$ for every $\theta^{[i]}$ and then we should measure whether $d\theta{approx}$ is close $d\theta$ within our acceptable range by this formla :

I will set $\epsilon=1e^{-7}$ and if $difference \le 1e^{-7}$ ,we can say our backpropagation is correct , if $1e^{-7} < difference\le 1e^{-5}$ ,we should alert whether our backpropagation is correct,and if $difference>1e^{-5}$

there is a great possibility that our backpropagation is incorrect.

## Implement

First of all, we need implement the dictionary_to_vector and vector_to_dictionary function to convert the parameters with parameter:

### vector_to_dictionary

Then we can implements the forward and backward propagation of neural network:

### backward_propagation_n

then we can implement the gradient_check:

#### Instructions

For each i in num_parameters:

• To compute J_plus[i]:
1. Set $\theta^{+}$ to np.copy(parameters_values)
2. Set $\theta^{+}_i$ to $\theta^{+}_i + \varepsilon$
3. Calculate $J^{+}_i$ using to forward_propagation_n(x, y, vector_to_dictionary($\theta^{+}$ )).
• To compute J_minus[i]: do the same thing with $\theta^{-}$
• Compute $gradapprox[i] = \frac{J^{+}_i - J^{-}_i}{2 \varepsilon}$

Thus, we get a vector gradapprox, where gradapprox[i] is an approximation of the gradient with respect to parameter_values[i]. You can now compare this gradapprox vector to the gradients vector from backpropagation. Just like for the 1D case (Steps 1’, 2’, 3’), compute:

Output:

It seems that there were errors in the backward_propagation_n code we implement! Good that we’ve implemented the gradient check. Go back to backward_propagation and try to find/correct the errors :

It should be:

then we run the test gradient cheking code:

the backpagation is correct now!