### Introduction

Nowadays, Neural Networks has become a popular topic. Compared with traditional statistical machine learning methods, Neural Networks shows a good performance on image processing and computer vision. Due to its powerful learning ability, it attracts more and more attention from scientists as well as industrial users.

Neural Networks is a network composed by a) Input layer, b) hidden layer(s) and c) output layer. for each hidden layer and output layer there is an activation function that transfer the input for neural at that layer into the output of that neural.

### Details

As the above graph shows, the neural network has one input layer [x1, x2, x3], one hidden layer [z1, z2, z3, z4] and one output layer [y1, y2]. Between each connected layer, each arc has a corresponding weight that indicates how important last neural output is for constructing the next input for next neural networks. These weights are the main objects that we need to train in our NN training process.

Besides layer transfer weights, activation function is also another key point when training and calculating neural networks. *ReLu*, *Linear*, *tanh* and *sigmoid* functions are popular choice for activation function of hidden layer. As for the output layer, *softmax* function is often used.

#### Back Propagation

Back Propagation is widely used in training neural networks. Actually its idea is to derive loss function on each arc weight. But how can we do that? In the former picture, we can assume the weight from Xi to Zj is Wij, from Zj to Yk is Vjk, activation function for both hidden layer and output layer are both sigmoid function. and the loss function is the square loss function which is:

$$

L(y,y^*) = \frac{1}{2}[(y_1-y^*_1)^2 + (y_2- y^*_2)^2]

$$

Then we dervie L on Wij we can get:

$$

\frac{\partial L}{\partial w_{i}j} = \sum_k \frac{\partial L}{\partial y_l}\cdot\frac{\partial y_k}{\partial z_j}\cdot\frac{\partial z_j}{\partial m_j}\cdot\frac{\partial {m_j}}{\partial w_{i}j}

$$

This derivation is used to update for weight Wij at each iteration

#### Forward Propagation

Forward Propagation is to use the weight calculated to calculate the final output. Just like get the output based on last time’s updated weight.