Deep Learning and Neural Network with kero 1. PART 1: preparing data 2. PART 2: train model and plot its progress

*kero version 0.6.2*

*Note: this is still at an experimental phase. User interface will not be very friendly yet 😦*

*Figure 1. (Top) The plot of MSE value over 1000 epochs of training. (Bottom) Horizontal axis labels each data point (we have 24 data points in this post). Vertical axis labels the norm value. The red dot is the norm value after the training, and the blue circle is the norm value before the training. Ideally, a good training will cause the red dot to move to 0. At a data point with 0 norm, the neural network is predicting the value accurately, exactly.*

We continue from PART 1. Note that at the end of this post, the implementation is still not geared towards practical usage or ease of application. Indeed, this is still at the stage of research: we will do, for example, speed testing etc.

**Update** : development is discontinued. It has been a good practice, but it will certainly be more productive to use available API such as tensorflow and pytorch. Do check them out!

### Loading data

Consider SECTION 1 of **testNNupdater2C.py. **Using* prep()* defined in **testNNupdater2aux.py** from PART 1, we load all the necessary items, including training data (*input_set)* and its corresponding true output (*Y0_set*) and the neural network object. Note that for every run, *Y0_set* **is randomized**.

### Training Neural Network Model

Now consider SECTION 2 of **testNNupdater2C.py. **We initiate the object that trains the model, called *NetworkUpdater()*. Three functions are called:

- set_settings(), This will set what kind of gradient descent scheme we want. For now, we follow regular stochastic gradient descent.
- set_training_data(). We will feed our training data into this function.
- set_neural_network(). We put in the neural network that we want to train into our updater object via this function.

Some concepts here, including number of epochs, batch size, regular stochastic gradient descent can be found in the note I wrote, here. Cheers!

We are in SECTION 3 now. Basically, start training! This is done using update_wb() function, which is the heart of our neural network training.

### Evaluating Performance

SECTION 4 prints the Mean Squared Value at every ~1% of completion. Of course we would like MSE to decrease (hopefully) for every epoch. SECTION 5 prints the time taken to perform update_wb() or the training in seconds. And then it prints the time in seconds, minutes, hours and days if this process is done 10,000 times. This 10,000 is just a factor for rough estimation of the time taken if the whole process is 10,000 larger, perhaps by having 10,000 more data points, or perhaps by having 10,000 times number of neurons.

Let us see the output. The time taken for 10k is about 1.25 days. I will keep this in mind and see if anything can be done faster. Note that the MSE values show a decrease, which is desirable. In case anything does not look right, I will spend extra time checking the details of implementation.

---aux--- Initializing a Neural Network object. --- test2C --- Initializing a Neural Network object. + update_wb(). epoch | mse value + epoch { 100 } 0.46130921580291995 + epoch { 201 } 0.4355913582596121 + epoch { 302 } 0.4130500519336926 + epoch { 403 } 0.3934130717616613 + epoch { 504 } 0.37638130123898367 + epoch { 605 } 0.36161320734305336 + epoch { 706 } 0.3488187540253349 + epoch { 807 } 0.3377184625080933 + epoch { 908 } 0.3280573038928975 n (no of data points) = 24 time taken [s] = 10.765251398086548 time taken at 10k x [s] = 107652.51398086548 time taken at 10k x [min] = 1794.208566347758 time taken at 10k x [hr] = 29.903476105795967 time taken at 10k x [day] = 1.2459781710748319 --------------TEST PREDICTION---------------------- completed...

Finally, the section TEST PREDICTION plots figure 1. The MSE indeed shows decreases over 1000 epochs of training, which is good. The **norm** value for each data point is the Euclidean distance between the true output and the output predicted by the neural network. The nearer to 0 the norm value is, the more accurate the prediction. The figure also generally shows that the training causes the norm values to generally decrease to zero. Do play around with different parameters to see the different outcomes.

That is all for now!

**testNNupdater2C.py**

import testNNupdater2aux as taux import kero.multib.NeuralNetwork as nn import kero.utils.utils as ut import numpy as np import time import matplotlib.pyplot as plt print("--- test2C ---") # ---------------- SECTION 1 -------------------- # input_set : list of numpy matrix. # Y_set : list of numpy matrix. Output computed by NN # Y0_set : list of numpy matrix. True/observed output # the grand objective is to train NN so that Y_set is equal to Y0_set # ------------------------------------------- # this is a collection of a_l_set and z_l_set over all data points # z_l_set is the collection of z values over all layers, l=2,3,...L # and a_l_set is the corresponding activated values # Recall: a_l_set and z_l_set each is a list of numpy matrices out = taux.prep(print_data=False) input_set=out["input_set"] Y_set=out["Y_set"] Y0_set=out["Y0_set"] collection_of_fed_forward_a_l=out["collection_of_fed_forward_a_l"] collection_of_fed_forward_z_l=out["collection_of_fed_forward_z_l"] weights=out["weights"] biases=out["biases"] NeuralNetwork=out["NeuralNetwork"] a_L_set = Y_set # ---------------- SECTION 2 -------------------- nu = nn.NetworkUpdater() nu.set_settings(method="RegularStochastic", method_specific_settings={ "batch_size":4, "no_of_epoch":1000, "shuffle_batch":True, }) nu.set_training_data(input_set,Y0_set) nu.set_neural_network(NeuralNetwork) # ---------------- SECTION 3 -------------------- L = len(weights) + 1 n = len(input_set) AF = nn.activationFunction(func = "Sigmoid") start = time.time() # print("input_set[0]:",input_set[0]) # collection_of_batches = ut.partition_list(input_set, 6, do_shuffle=False ) # for batch in collection_of_batches: # print(" > batch item: ") # for x in batch: # ut.print_numpy_matrix(x,formatting="%6.2f",no_of_space=5) # print(" ----------") weights_next, biases_next, mse_list = nu.update_wb(input_set, Y0_set, weights, biases, AF, mse_mode="compute_only", verbose=11) end = time.time() elapsed = end - start # ---------------- SECTION 4 -------------------- print("epoch | mse value ") mark = 1 for i in range(len(mse_list)): if mark >= 0.1*len(mse_list) or i==0: print(" + epoch {",i ,"} ", mse_list[i]) mark = 1 else: mark = mark + 1 fig = plt.figure() ax1 = fig.add_subplot(211) plt.plot(range(len(mse_list)), mse_list) # ---------------- SECTION 5 -------------------- print("") print("n (no of data points) = ",n) print("") print("time taken [s] = ", elapsed) print("time taken at 10k x [s] = ", elapsed*1e4) print("time taken at 10k x [min] = ", elapsed*1e4/(60)) print("time taken at 10k x [hr] = ", elapsed*1e4/(3600)) print("time taken at 10k x [day] = ", elapsed*1e4/(3600*24)) print("--------------TEST PREDICTION----------------------") # this is another rough measure of accuracy # # 1. norm_before : normalized euclidean distance between the true point and the initial guess point (value predicted by untrained model) # 2. norm_after : normalized euclidean distance between the true point and the value predicted by trained model # A trained model is supposed to give a smaller norm_after (nearer to zero). # In another words, the predicted points should be closer the true value after training count = 1 norm_before_collection = [] norm_after_collection = [] for one_input, one_Y0 in zip(input_set, Y0_set): # a_l_set : list of numpy matrix output_size = len(one_Y0) a_1 = one_input # Before training test0_a_l_set, _ = nu.feed_forward(weights, biases, a_1, AF, verbose=False, matrix_formatting="%6.2f") Y_before = test0_a_l_set[-1] # After training test_a_l_set, _ = nu.feed_forward(weights_next, biases_next, a_1, AF, verbose=False, matrix_formatting="%6.2f") Y_after = test_a_l_set[-1] norm_before = np.linalg.norm(Y_before-one_Y0,ord="fro")/output_size norm_after = np.linalg.norm(Y_after-one_Y0,ord="fro")/output_size norm_before_collection.append(norm_before) norm_after_collection.append(norm_after) # print(" one_input: ") # ut.print_numpy_matrix(one_input,formatting="%6.2f",no_of_space=10) # print(" one_Y0: ") # ut.print_numpy_matrix(one_Y0,formatting="%6.5f",no_of_space=10) # print(" Y_before: ") # ut.print_numpy_matrix(Y_before,formatting="%6.5f",no_of_space=10) # print(" Y_after: ") # ut.print_numpy_matrix(Y_after,formatting="%6.5f",no_of_space=10) count = count + 1 ax2 = fig.add_subplot(212) plt.scatter(range(1,len(norm_before_collection)+1),norm_before_collection,facecolors='none',edgecolor="b",label="before") plt.scatter(range(1,len(norm_after_collection)+1),norm_after_collection,label="after",facecolors='r',edgecolor="r") plt.plot(np.linspace(1,len(norm_after_collection)+1,len(norm_after_collection)+1),[0]*(len(norm_after_collection)+1),"r") ax1.set_xlabel("epoch") ax1.set_ylabel("MSE") ax2.set_xlabel("data points") ax2.set_ylabel("norms") ax2.legend() print("completed...") plt.show()