Parameter Estimation for Differential Equations using Gradient Descent

home > Research

The post is about parameter estimation based on gradient descent specifically for the differential equation \frac{dn}{dt}=G-k_1 n-k_2 n^2-k_3 n^3, i.e. we are looking for a good (G,k_1,k_2,k_3) that best fits the experimental data. We will go through the methods of choosing initial parameter guesses and running the optimization. This post is more fully explained in this note: Differential Equation Parameter Estimations. The data used here is a fraction of the sample data I was given. The codes and sample data can also be found here.

Consider this link instead Parameter Estimation for Differential Equations using Scipy Least Square since it utilizes more robust optimization component from scipy.

Place, and (the codes are at the end of the post) in the project folder. Some required python packages include matplotlib, scipy and kero; do pip install them.

  1. In the command line, cd into the directory where the project is.
  2. Preliminary. Prepare sample_data.csv containing data points (t,n). First column is for t (horizontal axis), second column is n (vertical axis). Set the following toggles in
    plot_data True
    do_initial_testing False
    start_op False

    Run python This plots the sample data, shown in figure 1. In this context, the relevant data points are the data after the peak. The orange curve shows the plot of only the extracted relevant data.

  3. Initial testing stage. Set the following toggles in
    plot_data False
    do_initial_testing True
    start_op False

    Set collection_of_p_init. You can place as many guesses as you want. In this example,

    collection_of_p_init =[
      [0,1e-3,0.5e-3,1.2e-7], # guess 1
      [0,1e-2,0,1e-7], # guess 2
      [0,0,1.5e-3,1e-7], # guess 3

    we try using 3 different initial parameters, where each of them is of the format [G,k1,k2,k3], corresponding to the 4 parameters that we want to fit. Then, run the command:  python The results are shown in figure 2. It can be seen that the first and third guesses are closer to the experimental data; we might want to use the parameters near these values for random initiation in the optimization stage (but for demonstration sake, we will not do it here).

  4. Optimization stage. Set the following toggles in
    plot_data False
    do_initial_testing False
    start_op True

    Set the preferred settings. p_init_max and p_init_min are the range of random initial parameters to try from. We can set the number of attempts via the variable no_of_tries, and of course each attempt starts with a random initial parameter based on the range we just specified. Run the command:  python You will see optimization in progress with decreasing mse values.

  5. Viewing the optimization result. We want to load all 3 fitting_data files we have created. Thus, in fitting, set
    filename = 'fitting_data'
    save_data_series = [1,2,3]

    Run the command:  python fitting Figure 3 shows how after every iteration, the resulting curve is approaching the experimental data.

Finally, we want to see what is the best parameter p. When viewing the optimization result, the parameters will be printed. We are more or less done!

What else to do?

In this post I have only run a few iterations over a few number of random parameter initializations. To obtain the best results, we should run them over larger iterations and try more random initial parameters.

That’s all, have fun~


Figure 1. A plot of n-t from the sample data is in blue. The orange data is in

Figure 2. The curves show 3 different plots from 3 different guesses of initial parameters. The red curve is the experimental data. The console shows recommended setting for learning_rate variable.


Figure 3. As the algorithm is iterated, the curve goes from red to blue. The figures generally show that the curve approaches the experimental data and that MSE trend decreases along the iteration of i.

import numpy as np
import kero.utils.utils as ut
import param_estimate as pe
import matplotlib.pyplot as plt
from scipy.integrate import RK45
import csv
import pickle

es = pe.DE_param_estimator()

print("----------------- Load Data -------------------------------")
Xr, Yr = pe.load_data(plot_data=0)

x0 = np.array([Yr[0]])
t_set_expt = [np.array([x]) for x in Xr]
n_set_expt = [np.array([x]) for x in Yr]
print("x0 = ",x0)
print("Xr[:5] = ",Xr[:5])
print("Yr[:5] = ",Yr[:5])
print("t_set_expt[:5] = ",t_set_expt[:5])
print("n_set_expt[:5] = ",n_set_expt[:5])


print("----------------- Prepare Model ---------------------------") 
# MODEL used to fit this data
# We are estimating the parameters of the following differential equations:
#   dn/dt = F(n, p) = G - k1 * n - k2 * n**2 - k3 * n**3
#   where p is the parameter (G,k1,k2,k3)
def F(y,p):
	return p[0] - p[1] * y -p[2] * y**2 - p[3] * y**3
diff_part_G = lambda x: 1
diff_part_k1 = lambda x: -x 
diff_part_k2 = lambda x: -x**2 
diff_part_k3 = lambda x: -x**3 
list_of_derivatives = [diff_part_G, diff_part_k1, diff_part_k2, diff_part_k3]
print("  Model prepared.")

do_initial_testing = 0
if do_initial_testing:
	print("---------------- Prepare Initial guess --------------------")
	# Before optimizing, try playing around the p_init values here
	# We will choose the range of uniform random values for guess p based on tests conducted here
	# Write down choices that seem good and any remarks here 
	#   1. p_init = [0,5e-2,2e-3,1.1e-7] # near the real value. all n less than expt data
	#   2. p_init = [0,5e-2,1e-3,1.1e-7] # seems to be better than 1
	#   3. p_init = [0,5e-2,0,1.1e-7] # all n above expt data 
	#   4. p_init = [0,1e-1,1e-4,1.1e-7] # small n above expt data, larger n below
	#   5. p_init = [0,1e-2,1e-3,1.1e-7] # exceedingly close! Let's vary around these values instead

	collection_of_p_init =[
		[0,1e-3,0.5e-3,1.2e-7], # guess 1
		[0,1e-2,0,1e-7], # guess 2
		[0,0,1.5e-3,1e-7], # guess 3

	fig = plt.figure()
	ax = fig.add_subplot(111)
	no_of_test = len(collection_of_p_init)
	for i in range(no_of_test):
		p_init = collection_of_p_init[i] # guess p
		time_checkpoints = t_set_expt
		def f(t,y):
			return np.array([F(y[0],p_init)])
		_,_, x_checkpoint, t_checkpoint_new = pe.RK45_wrapper(x0, time_checkpoints, f, stepsize=None)
		ax.plot(t_checkpoint_new,x_checkpoint, label="guess "+str(i+1), color=(0,1-1*(i+1)/no_of_test,1*(i+1)/no_of_test))

		# To find what is the best learning_rate
		print(" testing initial guess parameters: ",i+1)
		es.update_p(t_set_expt, n_set_expt, F, p_init, list_of_derivatives, verbose=11)
	ax.plot(Xr,Yr, color = "r", label="expt values")
	ax.legend( )

# ------------------------------------------------------------------------------- #
start_op = 1
if start_op:	
	print("----------------- Start Optimizing ---------------------------")
	# ********* Settings *************
	p_init_max = [1e-10,0 ,0,0.9e-7] # [Gmax,k1max,k2max,k3max]
	p_init_min = [1e-10, 2e-2,2e-3,1.2e-7] # [Gmin ,k1min,k2min,k3min]
	no_of_tries = 3
	save_name = 'fitting_data'
	es.learning_rate= 1e-16
	no_of_iterations = 10
	save_interval = 1
	# ********************************

	n_set_expt_MAT = pe.RK45_output_to_list(n_set_expt) # MATRIX list form
	for j in range(no_of_tries):
		p_init = []
		for a,b in zip(p_init_min, p_init_max):
		p_now= p_init
		save_count = 0
		SAVE_DATA = []
		print("TRY ", j+1," / ",no_of_tries)
		mse_list = []
		for i in range(no_of_iterations + 1):
			if i>0:
				# for i =0 , initial state, do not iterate yet
				p_now = es.update_p(t_set_expt, n_set_expt, F, p_now, list_of_derivatives, verbose=0)
			#------------------------ FOR REALTIME OBSERVATION -----------
			def f(t,y):
				return np.array([F(y[0],p_now)])
			time_checkpoints = t_set_expt
			_,_, n_set_next, t_set_next = pe.RK45_wrapper(x0, time_checkpoints, f, stepsize=None)
			n_set_next_MAT = pe.RK45_output_to_list(n_set_next)
			mse = pe.MSE(n_set_next_MAT,n_set_expt_MAT)
			print(" i = ", i , " -- > mse = ", mse)
			# print("   p_new = ", p_now)

			#------------------------ TO BE SAVED -------------------------
			save_count = save_count + 1
			if save_count == save_interval or i==0:
				save_count = 0
				data_store = {
				"mse_list": mse_list
		output = open(save_name+"_"+ str(j+1) +".par", 'wb')
		pickle.dump(SAVE_DATA, output)

print("\nClosing Program...")

import param_estimate as pe
import matplotlib.pyplot as plt
import numpy as np
import time
import pickle

filename = 'fitting_data'
save_data_series = [1,2,3]

for j in save_data_series:
	pkl_file = open(filename+'_'+str(j)+'.par', 'rb')
	SAVE_DATA = pickle.load(pkl_file)

	fig = plt.figure(j)
	ax = None
	ax2 = None
	ax = fig.add_subplot(211)

	# ------------------- EXPT DATA ---------------------------------=
	Xr, Yr = pe.load_data(plot_data=0)

	# ------------------- DATA FROM OPTIMIZATION ---------------------
	color_scheme = np.linspace(0,1,len(SAVE_DATA))
	for i in range(len(SAVE_DATA)):
		data_store = SAVE_DATA[i]
		n_set = data_store["n_set"]
		t_set = data_store["t_set"]
		p = data_store["p"]
		mse_list = data_store["mse_list"]
		print(" i = ", i)
		print("   p = ", p )
		if i==0:
			this_label = "init curve" 
			this_label = str(i)
		ax.plot(t_set,n_set, color=(1-color_scheme[i],0,color_scheme[i]), label=this_label)
		if i ==0:
			this_title = "initial G,k1,k2,k3 = " + str(p)
	ax2 = fig.add_subplot(212)

print("\nClosing Post-Processing Program...")

# We are estimating the parameters of the following differential equations:
#   dn/dt = F(n, p) = G - k1 * n - k2 * n**2 - k3 * n**3
#   where p is the parameter (G,k1,k2,k3)

import numpy as np
import kero.utils.utils as ut
from scipy.integrate import RK45
import matplotlib.pyplot as plt
import csv

class DE_param_estimator:
	def __init__(self):
		self.learning_rate = 1e-21
	# l2
	def update_p(self,t_set_expt, n_set_expt, F, p_now, list_of_derivatives,
		verbose = False):
		# t_set_expt: list of numpy array
		#   for example: [array([25.2047]), array([25.279]), array([25.3533]), array([25.4277]),...]
		# n_set_expt: list of numpy array
		#   for example: [array([789.696]), array([784.109]), array([763.528]), array([726.782]),...]
		# p_now: list of parameters
		# return
		#   p_next: list of parameters
 		nabla_mse = self.nabla_MSE(t_set_expt, n_set_expt, F, p_now, list_of_derivatives)
 		if verbose>10:
 			# temp = self.learning_rate*nabla_mse
 			temp2 = np.transpose(np.matrix(p_now))
 			for_recommend = []
 			if verbose>30:
	 			print("    nabla_mse = ")
	 			print("    p_now =")
	 		det_list = []
	 		with np.errstate(divide='ignore'):
	 			for k in range(len(p_now)):
	 				det = 1e-2*p_now[k]/nabla_mse.item(k)
	 				if verbose>20:
		 				print("    p_now[k]/(nabla_mse[k]*100) =",det)
		 			det2 = np.log10(np.abs(det))
		 			if (det2!=np.inf) and (det2!=-np.inf) and (not np.isnan(det2)):
		 	if verbose>20: print("      det_list = ",det_list)
		 	print("       recommended learning rate ~ ", 10**np.min(det_list))
 		p_next= np.transpose(np.matrix(p_now)) - self.learning_rate * nabla_mse
 		p_next = np.transpose(p_next).tolist()
 		p_next = p_next[0]
 		return p_next
	# L3	
	def dkdMSE(self, t_set_expt, n_set_expt, F, p_now, diff_part):
		# lambda function : d/dk d/dt of n
		# t_set_expt, n_set_expt : experimental data
		N = len(t_set_expt)
		dkmse = 0 

		def f(t,y):
			return np.array([F(y[0],p_now)])

		x0 = n_set_expt[0] # array
		time_checkpoints= t_set_expt
		x_set, t_set, x_checkpoint, t_checkpoint_new = RK45_wrapper(x0, time_checkpoints, f, stepsize=None)

		for i in range(1,len(n_set_expt)):			
			ni = x_checkpoint[i][0]
			# print(" ni =", ni)
			m = i+1
			diff_term = 0
			for j in range(m):
				diff_term = diff_term + diff_part(x_checkpoint[j][0])
			diff_term = diff_term / m 
			dkmse = dkmse + ( ni - n_set_expt[i][0])* diff_term
		dkmse = 1/N*dkmse 
		return dkmse

	def nabla_MSE(self, t_set_expt, n_set_expt, F, p_now, list_of_derivatives,
		verbose = False):
		# list_of_derivatives: list of lambda functions

		dGdMSE = self.dkdMSE(t_set_expt, n_set_expt, F, p_now, list_of_derivatives[0])
		dk1mse = self.dkdMSE(t_set_expt, n_set_expt, F, p_now, list_of_derivatives[1])
		dk2mse = self.dkdMSE(t_set_expt, n_set_expt, F, p_now, list_of_derivatives[2])
		dk3mse = self.dkdMSE(t_set_expt, n_set_expt, F, p_now, list_of_derivatives[3])
		# print("dGdMSE=", dGdMSE)
		# print("dk1mse=",dk1mse)
		# print("dk2mse=",dk2mse)
		# print("dk3mse=",dk3mse)
		nabla_mse = np.transpose(np.matrix([dGdMSE, dk1mse, dk2mse, dk3mse]))
		return nabla_mse

def load_data(plot_data=False):
	print(" --+ load_data().")

	# load data
	filename = 'sample_data'
	all_data = read_csv(filename, header_skip=0)
	all_data = [[np.float(x) for x in y] for y in all_data]

	X=[x[0] for x in all_data]
	Y=[x[1] for x in all_data]

	# customize your data loading here
	max_index = np.argmax(Y)
	relevant_data= all_data[max_index:]
	Xr = [x[0] for x in relevant_data]
	Yr = [x[1] for x in relevant_data]

	print("     data size = ", len(Xr))
	print("     peak = ", X[max_index],",", Y[max_index])
	if plot_data:
		fig = plt.figure()

	return Xr, Yr

def read_csv(filename, header_skip=1,get_the_first_N_rows = 0):
    # filename : string, name of csv file without extension
    # headerskip : nonnegative integer, skip the first few rows
    # get_the_first_N_rows : integer, the first few rows after header_skip to be read
    #   if all rows are to be read, set to zero

    # example
    # xx = read_csv('tryy', 3)
    # xx = [[np.float(x) for x in y] for y in xx] # nice here
    # print(xx)
    # print(np.sum(xx[0]))
    out = []
    with open(str(filename)+'.csv') as csv_file:
        data_reader = csv.reader(csv_file)
        count = 0
        i = 0
        for row in data_reader:
            if count < header_skip:
                count = count + 1
                if get_the_first_N_rows>0:
                	i = i + 1
                	if i == get_the_first_N_rows:
    return out

def MSE(Y_set,Y0_set):
	# Y_set is a list of Y
	# Y0_set is a list of Y0
	# Assume Y_set and Y0_set have the same size, otherwise error will be raised
	# Y (numpy matrix) is a vector: computed data from model
	# Y0 (numpy matrix) is a vector: observed/true data
	# Assume Y and Y0 have the same size, otherwise error will be raised
	# Assume all entries are real
	mse = 0
	n = len(Y_set)
	if not (n==len(Y0_set)):
		print("MSE(). Error. Inconsistent dimension between Y_set and Y_set0. Return nothing.")
	for i in range(n):
		Y = Y_set[i]
		Y0 = Y0_set[i]
		Nout1 = len(Y.tolist())
		Nout2 = len(Y0.tolist())
		if not (Nout1==Nout2):
			print("MSE(). Error. Inconsistent dimension between Y and Y0. Return nothing.")
		# now compute the norms between true and model output values
		for j in range(Nout1):
			mse = mse + (np.abs(Y0.item(j)-Y.item(j)))**2
	mse = mse / (2*n)

	return mse

def RK45_wrapper(x0, time_checkpoints, f, stepsize=None):
	# example
	# def f(t,y):
	# 	# dx/dt = f(x,t)
	# 	return np.array([-1*y[0]])
	# x0 = np.array([1.0]) # initial value
	# time_checkpoints = np.linspace(0,10,11)
	# x_set, t_set, x_checkpoint, t_checkpoint_new = RK45_wrapper(x0, time_checkpoints, f, stepsize=None)

	# t_checkpt_diff = []
	# for t,t_new in zip(time_checkpoints, t_checkpoint_new):
	# 	t_checkpt_diff.append(abs(t-t_new))
	# print("max checkpt diff = ", np.max(t_checkpt_diff))

	if stepsize is None:
		stepsize = 0.01*(time_checkpoints[-1]-time_checkpoints[0])
	t_set = [time_checkpoints[0]]
	x_set = [x0]
	t_checkpoint_new = [time_checkpoints[0]]
	x_checkpoint = [x0]

	for i in range(len(time_checkpoints)-1):	
		t1 = time_checkpoints[i+1]
		if i>0:
			x0 = ox.y
			t0 = ox.t
			t0 = time_checkpoints[i]
		ox = RK45(f,t0,x0,t1,max_step=stepsize)
		while ox.t < t1:
			msg = ox.step()
	return x_set, t_set, x_checkpoint, t_checkpoint_new

def RK45_output_to_list(xx):
	return [np.matrix((x.tolist())[0]) for x in xx]