The following are the equations used to implement neural network. From Nielsen’s online textbook, I collected the equations useful for immediate basic implementation on non-stochastic version of neural network. Do check out the online textbook, it is thorough and comprehensive. As a complementary information and extra practice, I made some notes here (**update: no longer available**) , including some extra steps in mathematical derivations. This is implemented in python package kero. See this link for the first related tutorial.

The first 2 equations below show each iteration in the gradient descent. Note that typically stochastic process is implemented by averaging over batches of data, though it is not shown here (it is in the notes mentioned above).

Let F be the function that outputs the true values. Let NN be the neural network that computes the predicted values, denoted a^L as shown below.