Alibaba-NTU Talent Program

Hi! This is week 2 of my PhD at NTU as an employee of Alibaba. I will attempt to clear course requirements – and of course learn new stuffs! Here are the related topics that I will review:

  1. Numerical Algorithm
  2. Statistics
  3. Image Processing

It seems like I will proceed with image processing for my research; but anyway, there are still things pending approvals and what-not.

Update soon!

(20190123) Here is the link to extended summary to Andrew Ng’s coursera machine learning course.

Parameter Estimation for Differential Equations using Scipy Least Square

home > Research

The post is about parameter estimation based on gradient descent specifically for the differential equation \frac{dn}{dt}=G-k_1 n-k_2 n^2-k_3 n^3, i.e. we are looking for a good (G,k_1,k_2,k_3) that best fits the experimental data. We will go through the methods of choosing initial parameter guesses and running the optimization. This post is more fully explained in this note: Differential Equation Parameter Estimations. The data used here is a fraction of the sample data I was given. The codes and sample data can also be found here.

As usual, do use virtual environment for cleaner package management (you can see here). The packages needed can be installed via pip.

pip install numpy
pip install scipy
pip install matplotlib

Put the scripts and into the project folder – in the virtual environment if you are using one. The code can be found at the end of the post.

  1. PART 1. Loading data. In, customize load_data() as you see fit. As it is, it will load the data starting from a maximum point, and assume that the csv file contains 2 columns: column 1 is for variable t, column 2 is n. The file does not have any header.
    Go into, set the following toggles.

    plot_data True
    initial_guess_mode False
    optimize_mode False

    Run You will see figure like figure 1. In this example, the relevant data points (blue) are the points that overlap with the orange plot.

  2. PART 1. Define the ODE. Write the ODE in def f(t,n,f_args).
  3. PART 2. Choosing initial guesses. Set any number of initial guesses you want to try out with by setting collection_of_initial_guesses.  Since we optimize [G,k1,k2,k3], an example guess is [0,1e-3,0.5e-3,1.2e-7]. Set the following toggles and then run Plot like figure 1 will be generated.
    plot_data False
    initial_guess_mode True
    optimize_mode False
  4. PART 2. Optimization. Once the choices of initial guesses are decided upon, set the following toggles and then run The optimized parameters from each initial guess will be saved as a .par file, and the save name is set by the variable save_name. The code in our example used 3 initial guesses and save_name = “20190110_cubic”. Thus, 20190110_cubic_1.par, 20190110_cubic_2.par and 20190110_cubic_3.par will be created.
    plot_data False
    initial_guess_mode False
    optimize_mode True
  5. Loading optimized parameters. Go to If we want to load all the 3 saved .par files, then set load_name = “20190110_cubic” and serial_number = [1,2,3]. Then run The plot results are shown in figure 2(A, B, C), showing good fit. The optimized parameters for each initial guess can be read off from the console, as in figure 2(D). The smaller cost value is, the better the fit is generally.

The optimized parameters seem to produce good fitting. We are done!


Figure 1.Green plot shows the experimental data points while the rest are generated using RK45 using 3 different initial guesses, given by collection_of_initial_guesses = [  [0,1e-3,0.5e-3,1.2e-7], [0,1e-2,0,1e-7], [0,0,1.5e-3,1e-7] ].


Figure 2. (A, B, C) shows the results of parameter estimation from 3 different initial guesses. (D) shows the results in the console. For each serial number, the optimized parameters are printed as [G, k1, k2, k3].

import numpy as np
from scipy.integrate import RK45
import matplotlib.pyplot as plt
import csv

def RK45_wrapper(x0, time_checkpoints, f, f_args, stepsize=None):
	if stepsize is None:
		stepsize = 0.01*(time_checkpoints[-1]-time_checkpoints[0])
	t_set = [time_checkpoints[0]]
	x_set = [x0]
	t_checkpoint_new = [time_checkpoints[0]]
	x_checkpoint = [x0]

	f_temp = lambda t,y : f(t,y, f_args)

	for i in range(len(time_checkpoints)-1):	
		t1 = time_checkpoints[i+1]
		if i>0:
			x0 = ox.y
			t0 = ox.t
			t0 = time_checkpoints[i]
		ox = RK45(f_temp,t0,x0,t1,max_step=stepsize)
		while ox.t < t1:
			msg = ox.step()
	return x_set, t_set, x_checkpoint, t_checkpoint_new

def read_csv(filename, header_skip=1,get_the_first_N_rows = 0):
    # filename : string, name of csv file without extension
    # headerskip : nonnegative integer, skip the first few rows
    # get_the_first_N_rows : integer, the first few rows after header_skip to be read
    #   if all rows are to be read, set to zero

    # example
    # xx = read_csv('tryy', 3)
    # xx = [[np.float(x) for x in y] for y in xx] # nice here
    # print(xx)
    # print(np.sum(xx[0]))
    out = []
    with open(str(filename)+'.csv') as csv_file:
        data_reader = csv.reader(csv_file)
        count = 0
        i = 0
        for row in data_reader:
            if count < header_skip:
                count = count + 1
                if get_the_first_N_rows>0:
                	i = i + 1
                	if i == get_the_first_N_rows:
    return out

def load_data(plot_data=False):
	print(" --+ load_data().")

	# load data
	filename = 'sample_data'
	all_data = read_csv(filename, header_skip=0)
	all_data = [[np.float(x) for x in y] for y in all_data]

	X=[x[0] for x in all_data]
	Y=[x[1] for x in all_data]

	# customize your data loading here
	max_index = np.argmax(Y)
	relevant_data= all_data[max_index:]
	Xr = [x[0] for x in relevant_data]
	Yr = [x[1] for x in relevant_data]

	print("     data size = ", len(Xr))
	print("     peak = ", X[max_index],",", Y[max_index])
	if plot_data:
		fig = plt.figure()

	return Xr, Yr

import numpy as np
import matplotlib.pyplot as plt
import scipyode_header as sh
import pickle
from scipy.optimize import least_squares

initial_guess_mode = 0
optimize_mode = 1
save_name = "20190110_cubic"

# --------------- PART 1 ----------------
# Load data, prepare the differential equation

def f(t,n, f_args): # function for pendulum motion with friction 
	dndt = [f_args[0] - f_args[1] * n - f_args[2] * n**2 - f_args[3] * n**3]
	return dndt
print("\n------------- LOADING DATA ----------------\n")
Xr, Yr = sh.load_data(plot_data=0)

x0 = np.array([Yr[0]])
t_set_expt = [np.array([x]) for x in Xr]
n_set_expt = [np.array([x]) for x in Yr]
print("x0 = ",x0)
print("Xr[:5] = ",Xr[:5])
print("Yr[:5] = ",Yr[:5])
print("t_set_expt[:5] = ",t_set_expt[:5])
print("n_set_expt[:5] = ",n_set_expt[:5])

# --------------- PART 2 ----------------
# try out initial guesses

# initial guess [G, k1, k2, k3]
collection_of_initial_guesses = [ 

if initial_guess_mode:
	print("\n------------- INITIAL TESTING ----------------\n")
	color_scheme = np.linspace(0,1,len(collection_of_initial_guesses))
	fig = plt.figure()
	ax = fig.add_subplot(111)
	ax.scatter(Xr,Yr,3,color='g',label="expt data")
	for i in range(len(collection_of_initial_guesses)):
		f_args0 = collection_of_initial_guesses[i]
		# print(f_args0)
		this_label = "initial guess" + str(i+1)
		_, _, x_test, t_test = sh.RK45_wrapper(x0, t_set_expt, f, f_args0, stepsize=None)		

if optimize_mode:
	def target_function(f_args):
		# "checkpoint" here refers to the intended data points
		#   since RK45 will generate more points in between the checkpoints.
		#   What happens is just that with smaller time steps in between the checkpoints,
		#   we can compute the checkpoints more precisely.
		_, _, x_checkpoint, t_checkpoint_new = sh.RK45_wrapper(x0, t_set_expt, f, f_args, stepsize=None)
		collection_of_f_i = []
		# each i is a data point in this particular example
		for x1, x2 in zip(x_checkpoint,n_set_expt):
			this_norm = np.linalg.norm(np.array(x1)-np.array(x2))
		return np.array(collection_of_f_i)
	print("\n------------- OPTIMIZATION STAGE ----------------\n")

	for i in range(len(collection_of_initial_guesses)):
		f_args0 = collection_of_initial_guesses[i]
		print(" initial guess = ", f_args0)
		optimized_res = least_squares(target_function,f_args0)
		f_args_optimized = optimized_res.x

			"initial_guess": f_args0,
			"optimized": f_args_optimized,

		print(" + optimized result = ",f_args_optimized)
		print(" + cost = ",optimized_res.cost)
		output = open(save_name+"_"+ str(i+1) +".par", 'wb')
		pickle.dump(SAVE_DATA, output)

import numpy as np
import matplotlib.pyplot as plt
import pickle
import scipyode_header as sh

load_name = "20190110_cubic"
serial_number = [1,2,3] 

print("\n------------- LOADING DATA ----------------\n")
def f(t,n, f_args): # function for pendulum motion with friction 
	dndt = [f_args[0] - f_args[1] * n - f_args[2] * n**2 - f_args[3] * n**3]
	return dndt
Xr, Yr = sh.load_data(plot_data=0)
x0 = np.array([Yr[0]])
t_set_expt = [np.array([x]) for x in Xr]
n_set_expt = [np.array([x]) for x in Yr]
print("x0 = ",x0)
print("Xr[:5] = ",Xr[:5])
print("Yr[:5] = ",Yr[:5])
print("t_set_expt[:5] = ",t_set_expt[:5])
print("n_set_expt[:5] = ",n_set_expt[:5])

print("\n------------- PLOTTING ----------------\n")
color_scheme = np.linspace(0,1,len(serial_number))

for j in range(len(serial_number)):
	fig = plt.figure()
	ax = fig.add_subplot(111)
	ax.scatter(Xr,Yr,3,color='g',label="expt data")
	i = serial_number[j]
	this_label = "initial guess" + str(i+1)
	pkl_file = open(load_name+'_'+str(i)+'.par', 'rb')
	SAVE_DATA = pickle.load(pkl_file)
	# print(SAVE_DATA)
	f_args_initial = SAVE_DATA["initial_guess"]
	f_args_optimized = SAVE_DATA["optimized"]
	cost  = SAVE_DATA["cost"]
	_, _, x_op, t_op = sh.RK45_wrapper(x0, t_set_expt, f, f_args_optimized, stepsize=None)	
	print("Serial number:",j+1, " - Optimized parameters = " )
	print(" ",f_args_optimized)
	print(" cost = ", cost)
	title_msg = load_name +":"+ str(j+1)

Convolutional Neural Network with keras: MNIST

home > Machine Learning

In this post we use Convolutional Neural Network, with VGG-like convnet structure for MNIST problem: i.e. we train the model to recognize hand-written digits. We mainly follow the official keras guide, in this link.

Download MNIST file that has been converted into CSV form; I got it from this link.

The jupyter notebook detailing the attempt in this post is found here by the name keras_test2.ipynb.

Starting with a conclusion: it works pretty well, for a very quick training, the model can recognize hand-written digit with 98% accuracy.

As shown below, our input is 28 by 28 with 1 channel (1 color), since the hand-written digit is stored in a 28 by 28-pixel greyscale image. The layers used are

  1. 2x 2D convolutional layers with 32x 3 by 3 filters  followed by max pooling for each 2 by 2 block of pixels. Then dropout layer is used; this is to prevent over-fitting.
  2. 2x 2D convolutional layers with 64x 3 by 3 filters followed by max pooling for each 2 by 2 block of pixels. Then dropout layer is used.
  3. Flatten layer just reshapes 2D image-like output from the previous layer to a 1D list of values. The first denses layer has 256 neurons, followed by dropout layer and finally a dense layer of 10 neurons corresponding to 10 classes or 10 different digits in MNIST. All activation functions are ReLu except the last one, softmax, as usual.

See the link here on how the data is prepared for training (i.e. the missing code shown as … partial code… below).

# ... partial code ...

model = Sequential()
# input: 28x28 images with 1 channels -> (28 ,28, 1) tensors.
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28,28,1)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Dense(256, activation='relu'))
model.add(Dense(10, activation='softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy']), y_train, batch_size=16, epochs=10)
model.evaluate(x_test, y_test, batch_size=32)

For a quick training, this model obtains a very high accuracy of 0.98, as shown below.

Epoch 1/10
6400/6400 [==============================] - 6s 904us/step - loss: 0.8208 - acc: 0.7206
Epoch 2/10
6400/6400 [==============================] - 2s 379us/step - loss: 0.2427 - acc: 0.9266
Epoch 3/10
6400/6400 [==============================] - 2s 379us/step - loss: 0.1702 - acc: 0.9483
Epoch 4/10
6400/6400 [==============================] - 2s 380us/step - loss: 0.1353 - acc: 0.9589
Epoch 5/10
6400/6400 [==============================] - 2s 373us/step - loss: 0.1117 - acc: 0.9650
Epoch 6/10
6400/6400 [==============================] - 2s 379us/step - loss: 0.1080 - acc: 0.9697
Epoch 7/10
6400/6400 [==============================] - 2s 374us/step - loss: 0.0881 - acc: 0.9734
Epoch 8/10
6400/6400 [==============================] - 2s 375us/step - loss: 0.0880 - acc: 0.9736 1s - los
Epoch 9/10
6400/6400 [==============================] - 2s 377us/step - loss: 0.0690 - acc: 0.9766
Epoch 10/10
6400/6400 [==============================] - 2s 373us/step - loss: 0.0686 - acc: 0.9800
100/100 [==============================] - 0s 940us/step


Neural Network Tutorials with keras: a Detour


I had planned to develop kero further. But at the end of the day, that should be just for practice; look at the existing libraries and APIs such as keras and tensorflow: they are already very extensive and also well maintained. On that note, I should remind myself that one does not try to beat the giants in their game, unless I have something really innovative to offer.

That said, while implementing kero, I had had fun reading up the theories down to the fundamental of basic implementation of a neural network. For now, I think exploring tutorials with keras is a good idea as well!

To start this off, here is the first tutorial. Enjoy!

Update :
(20190104) second tutorial on MNIST.


Neural Network with keras: Remainder Problem

home > Machine Learning

The problem we try to solve here is the remainder problem. We train our neural network to find the remainder of a number randomly drawn from 0 to 99 inclusive when it is divided by 17. For example, given 20, the remainder is 3.

The code (in Jupyter notebook) detailing the results of this post can be found here by the name keras_test1.ipynb. In all the tests, we use only 1 hidden layers made of 64 neurons and different input and output layers to take into account the context of the problem. With the context taken into account, we show that we can help the neural network model train better!

Test 1A and Test 1B

Note: See the corresponding sections in the Jupyter notebook.

We start with a much simpler problem. Draw a random number from 0 to 10 inclusive. We find their remainders when divided by 10, which is quite trivial. From test 1A, with 4 epochs, we see a steady improvement in prediction accuracy up to 82%. With 12 epochs in test 1B, our accuracy is approximately 100%. Good!

Test 2A and Test 2B

Now, we raise the hurdle. We draw wider range of random numbers, from 0 to 99 inclusive. To be fair we give the neural network more data points for training. We get pretty bad outcome; the trained model in test 2A suffers the problem of predicting only 1 outcome (it always predicts the remainder is 0). In test 2B, we perform the same training, but for longer epochs. The problem still occurs.

Test 3A

Now we solve the problem in test 2A and 2B by contextualizing the problem. Notice that in test 1A, 1B, 2A and 2B, there is only 1 input (i.e. 1 neuron in the input layer) which exactly corresponds to the random number whose remainder is to be computed.

Now, in this test, we convert it into 2 inputs, splitting the unit and tenth digits. For example, if the number is 64, the input to our neural network is now (6,4). If the number is 5, then it becomes (0,5). This is done using extract_digit() function. The possible “concept” that the neural network can learn is the fact that for division by 10, only the last digit matters. That is to say, if our input is (a,b) after the conversion, then only b matters.

What do we get? 100% accuracy! All is good.

Test 3B

Finally, we raise the complexity and solve our original problem. We draw from 0 to 99 inclusive, and find the remainder from division with 17. We use extract_digit() function here as well. Running it over 24 epochs, we get an accuracy of 96% (and it does look like it can be improved)!

Conclusion? First thing first, this is just a demonstration of neural network using keras. But more importantly, contextualizing the input does help!

The code for Test3B can be found in the following.


import numpy as np
from keras.models import Sequential
from keras.layers import Dense


N = 100
D = 17

def simple_binarizer17(y, bin_factor=1, bin_shift=0):
    out = [0+bin_shift]*17
    out[y] = 1*bin_factor
    return out

def extract_digit(x):
    b = x%10
    a = (x-b)/10
    return [int(a),int(b)]

X0_train = np.random.randint(N+1,size=(256000,1))
Y_train = np.array([simple_binarizer17(x%D) for x in np.transpose(X0_train).tolist()[0]])
X0_test = np.random.randint(N+1,size=(100,1))
Y_test = np.array([simple_binarizer17(x%D) for x in np.transpose(X0_test).tolist()[0]])

X_train = np.array([extract_digit(X[0]) for X in X0_train])
X_test = np.array([extract_digit(X[0]) for X in X0_test])
for X0,X in zip(X0_train[:10],X_train[:10]):


model = Sequential()

model.add(Dense(units=64, activation='relu', input_dim=2))
model.add(Dense(units=17, activation='softmax'))
              metrics=['accuracy']), Y_train, epochs=24, batch_size=32)

loss_and_metrics = model.evaluate(X_test, Y_test, batch_size=10)
print("--LOSS and METRIC--")
classes = model.predict(X_test, batch_size=16)


count = 0
correct_count = 0
for y0,y in zip(Y_test,classes):    
    count = count+1    
    correct_pred = False
    if np.argmax(y0)==np.argmax(y):
        correct_pred = True
        correct_count = correct_count + 1
    if count<20:
        print(np.argmax(y0),"->",np.argmax(y), "(",correct_pred,")")
accuracy = correct_count/len(Y_test)
print("accuracy = ", accuracy)