The following link is the extended summary for the machine learning course. Below a preview is shown.

# Month: January 2019

Hi! This is week 2 of my PhD at NTU as an employee of Alibaba. I will attempt to clear course requirements – and of course learn new stuffs! Here are the related topics that I will review:

- Numerical Algorithm
- Statistics
- Image Processing

It seems like I will proceed with image processing for my research; but anyway, there are still things pending approvals and what-not.

Update soon!

(20190123) Here is the link to extended summary to Andrew Ng’s coursera machine learning course.

Hiya!

This is quite similar to the announcement here, but this time round we are using scipy to find the optimal parameters to fit the result to the experimental data. This should be better since the optimization package should be a lot more robust. See the post in the following link Parameter Estimation for Differential Equations using Scipy Least Square.

Cheers~

The post is about parameter estimation based on gradient descent specifically for the differential equation , i.e. we are looking for a good that best fits the experimental data. We will go through the methods of choosing initial parameter guesses and running the optimization. This post is more fully explained in this note: Differential Equation Parameter Estimations. The data used here is a fraction of the sample data I was given. The codes and sample data can also be found **here**.

As usual, do use virtual environment for cleaner package management (you can see here). The packages needed can be installed via pip.

pip install numpy pip install scipy pip install matplotlib

Put the scripts **scipyode_header.py**, **cubicODE.py** and **cubicODE_post_processing.py** into the project folder – in the virtual environment if you are using one. The code can be found at the end of the post.

**PART 1. Loading data**. In scipyode_header.py, customize load_data() as you see fit. As it is, it will load the data starting from a maximum point, and assume that the csv file contains 2 columns: column 1 is for variable t, column 2 is n. The file does not have any header.

Go into cubicODE.py, set the following toggles.plot_data True initial_guess_mode False optimize_mode False Run

**cubicODE.py**. You will see figure like figure 1. In this example, the relevant data points (blue) are the points that overlap with the orange plot.**PART 1. Define the ODE**. Write the ODE in*def f(t,n,f_args)*.**PART 2**.**Choosing initial guesses**. Set any number of initial guesses you want to try out with by setting*collection_of_initial_guesses*. Since we optimize [G,k1,k2,k3], an example guess is [0,1e-3,0.5e-3,1.2e-7]. Set the following toggles and then run**cubicODE.py.**Plot like figure 1 will be generated.

plot_data False initial_guess_mode True optimize_mode False **PART 2. Optimization**. Once the choices of initial guesses are decided upon, set the following toggles and then run**cubicODE.py.**The optimized parameters from each initial guess will be saved as a .par file, and the save name is set by the variable*save_name*. The code in our example used 3 initial guesses and*save_name = “20190110_cubic”.*Thus,*20190110_cubic_1.par, 20190110_cubic_2.par*and*20190110_cubic_3.par*will be created.

plot_data False initial_guess_mode False optimize_mode True **Loading optimized parameters**. Go to cubicODE_post_processing.py. If we want to load all the 3 saved .par files, then set*load_name = “20190110_cubic”*and*serial_number = [1,2,3].*Then run**cubicODE_post_processing.****py.**The plot results are shown in figure 2(A, B, C), showing good fit. The optimized parameters for each initial guess can be read off from the console, as in figure 2(D). The smaller cost value is, the better the fit is generally.

The optimized parameters seem to produce good fitting. We are done!

Figure 1.Green plot shows the experimental data points while the rest are generated using RK45 using 3 different initial guesses, given by *collection_of_initial_guesses* = [ [0,1e-3,0.5e-3,1.2e-7], [0,1e-2,0,1e-7], [0,0,1.5e-3,1e-7] ].

Figure 2. (A, B, C) shows the results of parameter estimation from 3 different initial guesses. (D) shows the results in the console. For each serial number, the optimized parameters are printed as [G, k1, k2, k3].

**scipyode_header.py**

import numpy as np from scipy.integrate import RK45 import matplotlib.pyplot as plt import csv def RK45_wrapper(x0, time_checkpoints, f, f_args, stepsize=None): if stepsize is None: stepsize = 0.01*(time_checkpoints[-1]-time_checkpoints[0]) t_set = [time_checkpoints[0]] x_set = [x0] t_checkpoint_new = [time_checkpoints[0]] x_checkpoint = [x0] f_temp = lambda t,y : f(t,y, f_args) for i in range(len(time_checkpoints)-1): t1 = time_checkpoints[i+1] if i>0: x0 = ox.y t0 = ox.t else: t0 = time_checkpoints[i] ox = RK45(f_temp,t0,x0,t1,max_step=stepsize) while ox.t < t1: msg = ox.step() t_set.append(ox.t) x_set.append(ox.y) x_checkpoint.append(ox.y) t_checkpoint_new.append(ox.t) return x_set, t_set, x_checkpoint, t_checkpoint_new def read_csv(filename, header_skip=1,get_the_first_N_rows = 0): # filename : string, name of csv file without extension # headerskip : nonnegative integer, skip the first few rows # get_the_first_N_rows : integer, the first few rows after header_skip to be read # if all rows are to be read, set to zero # example # xx = read_csv('tryy', 3) # xx = [[np.float(x) for x in y] for y in xx] # nice here # print(xx) # print(np.sum(xx[0])) out = [] with open(str(filename)+'.csv') as csv_file: data_reader = csv.reader(csv_file) count = 0 i = 0 for row in data_reader: if count < header_skip: count = count + 1 else: out.append(row) if get_the_first_N_rows>0: i = i + 1 if i == get_the_first_N_rows: break return out def load_data(plot_data=False): print(" --+ load_data().") # load data filename = 'sample_data' all_data = read_csv(filename, header_skip=0) all_data = [[np.float(x) for x in y] for y in all_data] X=[x[0] for x in all_data] Y=[x[1] for x in all_data] # customize your data loading here max_index = np.argmax(Y) relevant_data= all_data[max_index:] Xr = [x[0] for x in relevant_data] Yr = [x[1] for x in relevant_data] print(" data size = ", len(Xr)) print(" peak = ", X[max_index],",", Y[max_index]) if plot_data: fig = plt.figure() ax=fig.add_subplot(111) ax.scatter(X,Y,3) ax.scatter(X[max_index],Y[max_index]) ax.plot(Xr,Yr,'r') ax.set_xlabel("t") ax.set_ylabel("n") plt.show() return Xr, Yr

**cubicODE.py**

import numpy as np import matplotlib.pyplot as plt import scipyode_header as sh import pickle from scipy.optimize import least_squares # TOGGLES: initial_guess_mode = 0 optimize_mode = 1 # SETTINGS: save_name = "20190110_cubic" # --------------- PART 1 ---------------- # # Load data, prepare the differential equation def f(t,n, f_args): # function for pendulum motion with friction dndt = [f_args[0] - f_args[1] * n - f_args[2] * n**2 - f_args[3] * n**3] return dndt print("\n------------- LOADING DATA ----------------\n") Xr, Yr = sh.load_data(plot_data=0) x0 = np.array([Yr[0]]) t_set_expt = [np.array([x]) for x in Xr] n_set_expt = [np.array([x]) for x in Yr] print("x0 = ",x0) print("Xr[:5] = ",Xr[:5]) print("Yr[:5] = ",Yr[:5]) print("t_set_expt[:5] = ",t_set_expt[:5]) print("n_set_expt[:5] = ",n_set_expt[:5]) # --------------- PART 2 ---------------- # # try out initial guesses # initial guess [G, k1, k2, k3] collection_of_initial_guesses = [ [0,1e-3,0.5e-3,1.2e-7], [0,1e-2,0,1e-7], [0,0,1.5e-3,1e-7] ] if initial_guess_mode: print("\n------------- INITIAL TESTING ----------------\n") color_scheme = np.linspace(0,1,len(collection_of_initial_guesses)) fig = plt.figure() ax = fig.add_subplot(111) ax.scatter(Xr,Yr,3,color='g',label="expt data") for i in range(len(collection_of_initial_guesses)): f_args0 = collection_of_initial_guesses[i] # print(f_args0) this_label = "initial guess" + str(i+1) _, _, x_test, t_test = sh.RK45_wrapper(x0, t_set_expt, f, f_args0, stepsize=None) ax.scatter(t_test,x_test,3,color=(1-color_scheme[i],0,color_scheme[i]),label=this_label) ax.legend() ax.set_xlabel("t") ax.set_ylabel("n") if optimize_mode: def target_function(f_args): # "checkpoint" here refers to the intended data points # since RK45 will generate more points in between the checkpoints. # What happens is just that with smaller time steps in between the checkpoints, # we can compute the checkpoints more precisely. _, _, x_checkpoint, t_checkpoint_new = sh.RK45_wrapper(x0, t_set_expt, f, f_args, stepsize=None) collection_of_f_i = [] # each i is a data point in this particular example for x1, x2 in zip(x_checkpoint,n_set_expt): this_norm = np.linalg.norm(np.array(x1)-np.array(x2)) collection_of_f_i.append(this_norm) return np.array(collection_of_f_i) print("\n------------- OPTIMIZATION STAGE ----------------\n") for i in range(len(collection_of_initial_guesses)): f_args0 = collection_of_initial_guesses[i] print(" initial guess = ", f_args0) optimized_res = least_squares(target_function,f_args0) f_args_optimized = optimized_res.x SAVE_DATA = { "initial_guess": f_args0, "optimized": f_args_optimized, "cost":optimized_res.cost } print(" + optimized result = ",f_args_optimized) print(" + cost = ",optimized_res.cost) output = open(save_name+"_"+ str(i+1) +".par", 'wb') pickle.dump(SAVE_DATA, output) output.close() plt.show()

**cubicODE_post_processing.py**

import numpy as np import matplotlib.pyplot as plt import pickle import scipyode_header as sh # SETTINGS load_name = "20190110_cubic" serial_number = [1,2,3] print("\n------------- LOADING DATA ----------------\n") def f(t,n, f_args): # function for pendulum motion with friction dndt = [f_args[0] - f_args[1] * n - f_args[2] * n**2 - f_args[3] * n**3] return dndt Xr, Yr = sh.load_data(plot_data=0) x0 = np.array([Yr[0]]) t_set_expt = [np.array([x]) for x in Xr] n_set_expt = [np.array([x]) for x in Yr] print("x0 = ",x0) print("Xr[:5] = ",Xr[:5]) print("Yr[:5] = ",Yr[:5]) print("t_set_expt[:5] = ",t_set_expt[:5]) print("n_set_expt[:5] = ",n_set_expt[:5]) print("\n------------- PLOTTING ----------------\n") color_scheme = np.linspace(0,1,len(serial_number)) for j in range(len(serial_number)): fig = plt.figure() ax = fig.add_subplot(111) ax.scatter(Xr,Yr,3,color='g',label="expt data") i = serial_number[j] this_label = "initial guess" + str(i+1) pkl_file = open(load_name+'_'+str(i)+'.par', 'rb') SAVE_DATA = None SAVE_DATA = pickle.load(pkl_file) pkl_file.close() # print(SAVE_DATA) f_args_initial = SAVE_DATA["initial_guess"] f_args_optimized = SAVE_DATA["optimized"] cost = SAVE_DATA["cost"] _, _, x_op, t_op = sh.RK45_wrapper(x0, t_set_expt, f, f_args_optimized, stepsize=None) print("Serial number:",j+1, " - Optimized parameters = " ) print(" ",f_args_optimized) print(" cost = ", cost) title_msg = load_name +":"+ str(j+1) ax.plot(t_op,x_op,color=(1-color_scheme[j],0,color_scheme[j]),label=this_label) ax.legend() ax.set_title(title_msg) ax.set_xlabel("t") ax.set_ylabel("n") plt.show()

In this post we use Convolutional Neural Network, with VGG-like convnet structure for MNIST problem: i.e. we train the model to recognize hand-written digits. We mainly follow the official keras guide, in this link.

Download MNIST file that has been converted into CSV form; I got it from this link.

The jupyter notebook detailing the attempt in this post is found **here **by the name **keras_test2.ipynb**.

**Starting with a conclusion**: it works pretty well, for a very quick training, the model can recognize hand-written digit with 98% accuracy.

As shown below, our input is 28 by 28 with 1 channel (1 color), since the hand-written digit is stored in a 28 by 28-pixel greyscale image. The layers used are

- 2x 2D convolutional layers with 32x 3 by 3 filters followed by max pooling for each 2 by 2 block of pixels. Then dropout layer is used; this is to prevent over-fitting.
- 2x 2D convolutional layers with 64x 3 by 3 filters followed by max pooling for each 2 by 2 block of pixels. Then dropout layer is used.
- Flatten layer just reshapes 2D image-like output from the previous layer to a 1D list of values. The first denses layer has 256 neurons, followed by dropout layer and finally a dense layer of 10 neurons corresponding to 10 classes or 10 different digits in MNIST. All activation functions are ReLu except the last one, softmax, as usual.

See the link here on how the data is prepared for training (i.e. the missing code shown as … partial code… below).

# ... partial code ... model = Sequential() # input: 28x28 images with 1 channels -> (28 ,28, 1) tensors. model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28,28,1))) model.add(Conv2D(32, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(256, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(10, activation='softmax')) sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True) model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy']) model.fit(x_train, y_train, batch_size=16, epochs=10) model.evaluate(x_test, y_test, batch_size=32)

For a quick training, this model obtains a very high accuracy of 0.98, as shown below.

Epoch 1/10 6400/6400 [==============================] - 6s 904us/step - loss: 0.8208 - acc: 0.7206 Epoch 2/10 6400/6400 [==============================] - 2s 379us/step - loss: 0.2427 - acc: 0.9266 Epoch 3/10 6400/6400 [==============================] - 2s 379us/step - loss: 0.1702 - acc: 0.9483 Epoch 4/10 6400/6400 [==============================] - 2s 380us/step - loss: 0.1353 - acc: 0.9589 Epoch 5/10 6400/6400 [==============================] - 2s 373us/step - loss: 0.1117 - acc: 0.9650 Epoch 6/10 6400/6400 [==============================] - 2s 379us/step - loss: 0.1080 - acc: 0.9697 Epoch 7/10 6400/6400 [==============================] - 2s 374us/step - loss: 0.0881 - acc: 0.9734 Epoch 8/10 6400/6400 [==============================] - 2s 375us/step - loss: 0.0880 - acc: 0.9736 1s - los Epoch 9/10 6400/6400 [==============================] - 2s 377us/step - loss: 0.0690 - acc: 0.9766 Epoch 10/10 6400/6400 [==============================] - 2s 373us/step - loss: 0.0686 - acc: 0.9800 100/100 [==============================] - 0s 940us/step

Hello!

I had planned to develop kero further. But at the end of the day, that should be just for practice; look at the existing libraries and APIs such as keras and tensorflow: they are already very extensive and also well maintained. On that note, I should remind myself that one does not try to beat the giants in their game, unless I have something really innovative to offer.

That said, while implementing kero, I had had fun reading up the theories down to the fundamental of basic implementation of a neural network. For now, I think exploring tutorials with keras is a good idea as well!

To start this off, here is the first tutorial. Enjoy!

*Update :
(20190104) second tutorial on MNIST.*

The problem we try to solve here is the **remainder problem**. We train our neural network to find the remainder of a number randomly drawn from 0 to 99 inclusive when it is divided by 17. For example, given 20, the remainder is 3.

The code (in Jupyter notebook) detailing the results of this post can be found **here **by the name **keras_test1.ipynb**. In all the tests, we use only 1 hidden layers made of 64 neurons and different input and output layers to take into account the context of the problem. With the context taken into account, we show that we can help the neural network model train better!

**Test 1A and Test 1B**

*Note: See the corresponding sections in the Jupyter notebook.*

We start with a much simpler problem. Draw a random number from 0 to 10 inclusive. We find their remainders when divided by 10, which is quite trivial. From **test 1A**, with 4 epochs, we see a steady improvement in prediction accuracy up to 82%. With 12 epochs in **test 1B**, our accuracy is approximately 100%. Good!

**Test 2A and Test 2B**

Now, we raise the hurdle. We draw wider range of random numbers, from 0 to 99 inclusive. To be fair we give the neural network more data points for training. We get pretty bad outcome; the trained model in **test 2A** suffers the problem of predicting only 1 outcome (it always predicts the remainder is 0). In **test 2B**, we perform the same training, but for longer epochs. The problem still occurs.

**Test 3A**

Now we solve the problem in test 2A and 2B by contextualizing the problem. Notice that in test 1A, 1B, 2A and 2B, there is only 1 input (i.e. 1 neuron in the input layer) which exactly corresponds to the random number whose remainder is to be computed.

Now, in this test, we convert it into 2 inputs, splitting the unit and tenth digits. For example, if the number is 64, the input to our neural network is now (6,4). If the number is 5, then it becomes (0,5). This is done using **extract_digit()** function. The possible “concept” that the neural network can learn is the fact that for division by 10, only the last digit matters. That is to say, if our input is (a,b) after the conversion, then only b matters.

What do we get? 100% accuracy! All is good.

**Test 3B**

Finally, we raise the complexity and solve our original problem. We draw from 0 to 99 inclusive, and find the remainder from division with 17. We use **extract_digit()** function here as well. Running it over 24 epochs, we get an accuracy of 96% (and it does look like it can be improved)!

**Conclusion**? First thing first, this is just a demonstration of neural network using keras. But more importantly, contextualizing the input does help!

The code for Test3B can be found in the following.

[1]

import numpy as np from keras.models import Sequential from keras.layers import Dense

[2]

N = 100 D = 17 def simple_binarizer17(y, bin_factor=1, bin_shift=0): out = [0+bin_shift]*17 out[y] = 1*bin_factor return out def extract_digit(x): b = x%10 a = (x-b)/10 return [int(a),int(b)] X0_train = np.random.randint(N+1,size=(256000,1)) Y_train = np.array([simple_binarizer17(x%D) for x in np.transpose(X0_train).tolist()[0]]) X0_test = np.random.randint(N+1,size=(100,1)) Y_test = np.array([simple_binarizer17(x%D) for x in np.transpose(X0_test).tolist()[0]]) X_train = np.array([extract_digit(X[0]) for X in X0_train]) X_test = np.array([extract_digit(X[0]) for X in X0_test]) for X0,X in zip(X0_train[:10],X_train[:10]): print(X0,"->",X)

[3]

model = Sequential() model.add(Dense(units=64, activation='relu', input_dim=2)) model.add(Dense(units=17, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy']) model.fit(X_train, Y_train, epochs=24, batch_size=32) loss_and_metrics = model.evaluate(X_test, Y_test, batch_size=10) print("--LOSS and METRIC--") print(loss_and_metrics) print("--PREDICT--") classes = model.predict(X_test, batch_size=16)

[4]

count = 0 correct_count = 0 for y0,y in zip(Y_test,classes): count = count+1 correct_pred = False if np.argmax(y0)==np.argmax(y): correct_pred = True correct_count = correct_count + 1 if count<20: print(np.argmax(y0),"->",np.argmax(y), "(",correct_pred,")") accuracy = correct_count/len(Y_test) print("accuracy = ", accuracy)

This is an extended glossary of some sorts. I intended it to be a quick reference for anyone starting a research into cognitive sciences. Find the full notes in the following link: