## Cubic B-Splines Interpolation

home > Numerical Algorithm

The assignment for CE7453 numerical algorithm course was: Given a set of points in 2D, create a program that outputs the control points of a cubic B-spline curve that interpolates these 2D points. Here are some examples for viewing purposes! (I will only release the source code once the course is completed).

This slideshow requires JavaScript.

My notes for this numerical algorithm course can be found here.

## First-half of the Semester

It has been half a semester! As a PhD student under Alibaba-NTU Talent Program, for now, my research direction is gravitating towards medical imaging. These 8 weeks have been hectic with course works and admin from both NTU and Alibaba side though. I shall update with content very soon.

The course works I am taking now:

CE7453. Numerical Algorithm. Cubic B-Splines Interpolation.
EE7403. Image Processing.
HP7001. Advanced Research Design and Data Analysis.
K6312. Information Mining and Analysis.

See the notes for the courses above here (except K6312).

[No longer Updated]

## update_wb()

Perform neural network training using the specified methods in the specified settings.

kero.multib.NeuralNetwork.py

class NetworkUpdater:
def update_wb(self, input_set, Y0_set, weights, biases, AF,
mse_mode="compute_only",
verbose=False):
return weights_next, biases_next, mse_list

Arguments/Return

 input_set list of numpy matrix [x]. Each x a column vector m x 1, m the number of neurons in input layer Y0_set list of numpy matrix [Y0]. Each Y0 nx1, where n is the no of neurons in layer l=L. The true/observed values in the output layer corresponding to the input set. In another words, for each k=1,…,N, Y0_set[k] = f(x[k]) where f is the true function that our neural network is modelling and N the number of data points. weights the collection of weights in the neural network. weights is a list [w_l], where w_l is the collection of weights between the (l-1)-th and l-th layer for l=2,3,…,L where l=1 is the input layer, l=2 the first hidden layer and l=L is the output layer. w_l is a matrix (list of list) so that w_l[i][j] is the weight between neuron j at layer l-1 and neuron i at layer l biases the collection of biases in the neural network. biases is a list [b_l], where b_l is the collection of biases in the l-th layer for l=2,3,…,L AF AF (activationFunction). Assume it is initiated. mse_mode String. If mse_mode=”compute_only”, then mse_list will be returned, containing the cost function MSE (mean squared value) at each epoch of training. If mse_mode= “compute_and_print”, the MSE value at each epoch will be printed. If mse_mode=None, mse_list is None i.e. MSE value is not computed. Default=”compute_only” verbose Bool False or integer The larger the integer, the more information is printed. Set them to suitable integers for debugging. Default=False return weights_next Same as weights, but has undergone 1 gradient descent iteration. return biases_next Same as biases, but has undergone 1 gradient descent iteration. return mse_list List of float [mse]. See mse_mode.

Example Usage 1

kero version: 0.6.2

## set_neural_network()

Pass all the neural network parameters into the updater.

kero.multib.NeuralNetwork.py

class NetworkUpdater:
def set_neural_network(self, NeuralNetwork):
return

Arguments/Return

 NeuralNetwork NeuralNetwork object.

Example Usage 1

kero version: 0.6.2

## set_training_data()

Feed the training data into the neural network training function.

kero.multib.NeuralNetwork.py

class NetworkUpdater:
def set_training_data(self,input_set,Y0_set):
return

Arguments/Return

 input_set list of numpy matrix [x]. Each x a column vector m x 1, m the number of neurons in input layer Y0_set list of numpy matrix [Y0]. Each Y0 nx1, where n is the no of neurons in layer l=L. The true/observed values in the output layer corresponding to the input set. In another words, for each k=1,…,N, Y0_set[k] = f(x[k]) where f is the true function that our neural network is modelling and N the number of data points.

Example Usage 1

kero version: 0.6.2

## Deep Learning and Neural Network with kero PART 2

Deep Learning and Neural Network with kero
1. PART 1: preparing data
2. PART 2: train model and plot its progress

kero version 0.6.2

Note: this is still at an experimental phase. User interface will not be very friendly yet 😦 Figure 1. (Top) The plot of MSE value over 1000 epochs of training. (Bottom) Horizontal axis labels each data point (we have 24 data points in this post). Vertical axis labels the norm value. The red dot is the norm value after the training, and the blue circle is the norm value before the training. Ideally, a good training will cause the red dot to move to 0. At a data point with 0 norm, the neural network is predicting the value accurately, exactly.

We continue from PART 1. Note that at the end of this post, the implementation is still not geared towards practical usage or ease of application. Indeed, this is still at the stage of research: we will do, for example, speed testing etc.

Update : development is discontinued. It has been a good practice, but it will certainly be more productive to use available API such as tensorflow and pytorch. Do check them out!

Consider SECTION 1 of testNNupdater2C.py. Using prep() defined in testNNupdater2aux.py from PART 1, we load all the necessary items, including training data (input_set) and its corresponding true output (Y0_set) and the neural network object. Note that for every run, Y0_set is randomized.

### Training Neural Network Model

Now consider SECTION 2 of testNNupdater2C.py. We initiate the object that trains the model, called NetworkUpdater(). Three functions are called:

1. set_settings(), This will set what kind of gradient descent scheme we want. For now, we follow regular stochastic gradient descent.
2. set_training_data(). We will feed our training data into this function.
3. set_neural_network(). We put in the neural network that we want to train into our updater object via this function.

Some concepts here, including number of epochs, batch size, regular stochastic gradient descent can be found in the note I wrote, here. Cheers!

We are in SECTION 3 now. Basically, start training! This is done using update_wb() function, which is the heart of our neural network training.

### Evaluating Performance

SECTION 4 prints the Mean Squared Value at every ~1% of completion. Of course we would like MSE to decrease (hopefully) for every epoch. SECTION 5 prints the time taken to perform update_wb() or the training in seconds. And then it prints the time in seconds, minutes, hours and days if this process is done 10,000 times. This 10,000 is just a factor for rough estimation of the time taken if the whole process is 10,000 larger, perhaps by having 10,000 more data points, or perhaps by having 10,000 times number of neurons.

Let us see the output. The time taken for 10k is about 1.25 days. I will keep this in mind and see if anything can be done faster. Note that the MSE values show a decrease, which is desirable. In case anything does not look right, I will spend extra time checking the details of implementation.

---aux---
Initializing a Neural Network object.
--- test2C ---
Initializing a Neural Network object.
+ update_wb().
epoch | mse value
+ epoch { 100 }  0.46130921580291995
+ epoch { 201 }  0.4355913582596121
+ epoch { 302 }  0.4130500519336926
+ epoch { 403 }  0.3934130717616613
+ epoch { 504 }  0.37638130123898367
+ epoch { 605 }  0.36161320734305336
+ epoch { 706 }  0.3488187540253349
+ epoch { 807 }  0.3377184625080933
+ epoch { 908 }  0.3280573038928975

n (no of data points) =  24

time taken [s] =  10.765251398086548
time taken at 10k x [s] =  107652.51398086548
time taken at 10k x [min] =  1794.208566347758
time taken at 10k x [hr] =  29.903476105795967
time taken at 10k x [day] =  1.2459781710748319
--------------TEST PREDICTION----------------------
completed...

Finally, the section TEST PREDICTION plots figure 1. The MSE indeed shows decreases over 1000 epochs of training, which is good. The norm value for each data point is the Euclidean distance between the true output and the output predicted by the neural network. The nearer to 0 the norm value is, the more accurate the prediction. The figure also generally shows that the training causes the norm values to generally decrease to zero. Do play around with different parameters to see the different outcomes.

That is all for now!

testNNupdater2C.py

import testNNupdater2aux as taux
import kero.multib.NeuralNetwork as nn
import kero.utils.utils as ut
import numpy as np
import time
import matplotlib.pyplot as plt

print("--- test2C ---")
# ---------------- SECTION 1 --------------------
# input_set : list of numpy matrix.
# Y_set : list of numpy matrix. Output computed by NN
# Y0_set : list of numpy matrix. True/observed output
#  the grand objective is to train NN so that Y_set is equal to Y0_set
# -------------------------------------------
# this is a collection of a_l_set and z_l_set over all data points
#   z_l_set is the collection of z values over all layers, l=2,3,...L
#   and a_l_set is the corresponding activated values
#   Recall: a_l_set and z_l_set each is a list of numpy matrices

out = taux.prep(print_data=False)
input_set=out["input_set"]
Y_set=out["Y_set"]
Y0_set=out["Y0_set"]
collection_of_fed_forward_a_l=out["collection_of_fed_forward_a_l"]
collection_of_fed_forward_z_l=out["collection_of_fed_forward_z_l"]
weights=out["weights"]
biases=out["biases"]
NeuralNetwork=out["NeuralNetwork"]

a_L_set = Y_set

# ---------------- SECTION 2 --------------------
nu = nn.NetworkUpdater()
nu.set_settings(method="RegularStochastic",
method_specific_settings={
"batch_size":4,
"no_of_epoch":1000,
"shuffle_batch":True,
})
nu.set_training_data(input_set,Y0_set)
nu.set_neural_network(NeuralNetwork)

# ---------------- SECTION 3 --------------------
L = len(weights) + 1
n = len(input_set)
AF = nn.activationFunction(func = "Sigmoid")
start = time.time()

# print("input_set:",input_set)
# collection_of_batches = ut.partition_list(input_set, 6, do_shuffle=False )
# for batch in collection_of_batches:
# 	print(" > batch item: ")
# 	for x in batch:
# 		ut.print_numpy_matrix(x,formatting="%6.2f",no_of_space=5)
# 		print("    ----------")
weights_next, biases_next, mse_list = nu.update_wb(input_set, Y0_set, weights, biases, AF,
mse_mode="compute_only", verbose=11)
end = time.time()
elapsed = end - start

# ---------------- SECTION 4 --------------------
print("epoch | mse value ")
mark = 1
for i in range(len(mse_list)):
if mark >= 0.1*len(mse_list) or i==0:
print(" + epoch {",i ,"} ", mse_list[i])
mark = 1
else:
mark = mark + 1

fig = plt.figure()
plt.plot(range(len(mse_list)), mse_list)

# ---------------- SECTION 5 --------------------
print("")
print("n (no of data points) = ",n)
print("")
print("time taken [s] = ", elapsed)
print("time taken at 10k x [s] = ", elapsed*1e4)
print("time taken at 10k x [min] = ", elapsed*1e4/(60))
print("time taken at 10k x [hr] = ", elapsed*1e4/(3600))
print("time taken at 10k x [day] = ", elapsed*1e4/(3600*24))

print("--------------TEST PREDICTION----------------------")
# this is another rough measure of accuracy
#
#   1. norm_before : normalized euclidean distance between the true point and the initial guess point (value predicted by untrained model)
#   2. norm_after : normalized euclidean distance between the true point and the value predicted by trained model
# A trained model is supposed to give a smaller norm_after (nearer to zero).
# In another words, the predicted points should be closer the true value after training

count = 1
norm_before_collection = []
norm_after_collection = []

for one_input, one_Y0 in zip(input_set, Y0_set):
# a_l_set : list of numpy matrix
output_size = len(one_Y0)
a_1 = one_input
# Before training
test0_a_l_set, _ = nu.feed_forward(weights, biases, a_1, AF,
verbose=False,
matrix_formatting="%6.2f")
Y_before = test0_a_l_set[-1]
# After training
test_a_l_set, _ = nu.feed_forward(weights_next, biases_next, a_1, AF,
verbose=False,
matrix_formatting="%6.2f")
Y_after = test_a_l_set[-1]
norm_before = np.linalg.norm(Y_before-one_Y0,ord="fro")/output_size
norm_after = np.linalg.norm(Y_after-one_Y0,ord="fro")/output_size
norm_before_collection.append(norm_before)
norm_after_collection.append(norm_after)
# print(" one_input: ")
# ut.print_numpy_matrix(one_input,formatting="%6.2f",no_of_space=10)
# print(" one_Y0: ")
# ut.print_numpy_matrix(one_Y0,formatting="%6.5f",no_of_space=10)
# print(" Y_before: ")
# ut.print_numpy_matrix(Y_before,formatting="%6.5f",no_of_space=10)
# print(" Y_after: ")
# ut.print_numpy_matrix(Y_after,formatting="%6.5f",no_of_space=10)
count = count + 1
plt.scatter(range(1,len(norm_before_collection)+1),norm_before_collection,facecolors='none',edgecolor="b",label="before")
plt.scatter(range(1,len(norm_after_collection)+1),norm_after_collection,label="after",facecolors='r',edgecolor="r")
plt.plot(np.linspace(1,len(norm_after_collection)+1,len(norm_after_collection)+1),*(len(norm_after_collection)+1),"r")

ax1.set_xlabel("epoch")
ax1.set_ylabel("MSE")
ax2.set_xlabel("data points")
ax2.set_ylabel("norms")
ax2.legend()

print("completed...")
plt.show()

## initiate_neural_network()

Initiate a neural network with the desired number of layers and neurons.

kero.multib.NeuralNetwork.py
def initiate_neural_network(self,bulk, mode=None,
verbose = False,
verbose_init_mode=False,
verbose_consistency=False):
return

Arguments/Return

 bulk and mode (default=None) If mode=None, then bulk is a dictionary bulk[“weights”] : the collection of weights in the neural network. weights is a list [w_l], where w_l is the collection of weights between the (l-1)-th and l-th layer for l=2,3,…,L where l=1 is the input layer, l=2 the first hidden layer and l=L is the output layer. w_l is a matrix (list of list) so that w_l[i][j] is the weight between neuron j at layer l-1 and neuron i at layer l bulk[“biases”] : the collection of biases in the neural network. biases is a list [b_l], where b_l is the collection of biases in the l-th layer for l=2,3,…,L   If mode=”UniformRandom”, then bulk is a dictionary bulk[ “number_of_neurons” ]: list of integers [n1, n2, …, nL] specifying the number of neurons in each layer. bulk[ “bounds” ]: list of integers [lower_bound, upper_bound]. Weights will be uniform randomly initiated between these bounds. bulk[ “layerwise_normalization ]”: Bool. If set to True, the initiated weights at a layer will be divided by the number of neurons in the previous layer. ADD MORE MODES verbose (default=False) verbose_init_mode (default=False) verbose_consistency (default=False) Bool False or integer The larger the integer, the more information is printed. Set them to suitable integers for debugging.

Example Usage 1

In this example, we try to initiate a neural network with number of layers and neurons shown below. testNN1.py

import kero.multib.NeuralNetwork as nn

#----------------------------------------
# weights : the collection of weights in the neural network
#   weights is a list [w_l], where w_l is the collection of weights between
#     the (l-1)-th and l-th layer
#     for l=2,3,...,L where l=1 is the input layer, l=2 the first hidden layer
#     and l=L is the output layer
#   w_l is a matrix (list of list)
#     so that w_l[i][j] is the weight between neuron j at layer l-1 and
#     neuron i at layer l
# biases : the collection of biases in the neural network
#   biases is a list [b_l], where b_l is the collection of biases in the l-th layer
#     for l=2,3,...,L

# arbitary choices
arb0 = [0.1,0.1,0.1]
arb1 = [0.1,0.2,0.3]
arb2 = [0.4,0.5,0.6]
arb3 = [-0.1,-0.1,-0.1] # just to show negative weight is okay

# Weights and biases
# Input layer - Hidden layer 1
# (layer 1 - layer 2)
# -------------------------------------
# An Example: w_2 == 0.2. This means the weight between
#   between neuron 2 of the input layer and neuron 1 of hidden layer 1 is 0.2
# Note that w_2 is 3x3 matrix. Input layer and hidden layer 1 both have 3 neurons
w_2 = [arb1, arb2, arb3]
b_2 = [0, 0, 0]
# Hidden layer 1 - Hidden layer 2
# (layer 2 - layer 3)
# --------------------------------------
# w_3 is a 2x3 matrix.
#   Hidden layer 1 (layer 2) have 3 neurons
#   Hidden layer 2 (layer 3) have 2 neurons
w_3 = [arb0, arb0]
b_3 = [0,0]
# Hidden layer 2 - Output layer
# (layer 3 - layer 4)
w_4 = [[0.1,0.1],[0.1,0.1]]
b_4 = [0,0.1]

net1=nn.NeuralNetwork()
bulk={
"weights" : [w_2,w_3,w_4],
"biases" : [b_2,b_3,b_4]
}
net1.initiate_neural_network(bulk,
verbose=11,
verbose_init_mode=11,
verbose_consistency=11)
net1.print_neural_network_information()

The output is the following.

Initializing a Neural Network object.
+ Initiate_neural_network(). Mode =  None
-+ initiate_neural_network_general().
--+ neuron_number_consistency_check().
layer:  1  to  2
is_matrix/row/col =  True / 3 / 3
layer:  2  to  3
is_matrix/row/col =  True / 2 / 3
layer:  3  to  4
is_matrix/row/col =  True / 2 / 2
> print_neural_network_verbose_information().
Dimension consistent? : True
Layers? [Input, 2,3,..., Output] : [3, 3, 2, 2]
Learning rate :  1e-05

Example Usage 2

We use “UniformRandom” mode.

import numpy as np
import kero.utils.utils as ut
import kero.multib.NeuralNetwork as nn

number_of_neurons = [2,2,2]
lower_bound, upper_bound = 0, 1
bounds = [lower_bound, upper_bound]
bulk = {
"number_of_neurons" : number_of_neurons,
"bounds": bounds,
"layerwise_normalization": False,
}

NeuralNetwork = nn.NeuralNetwork()
NeuralNetwork.initiate_neural_network(bulk, mode="UniformRandom",
verbose = 31,
verbose_init_mode=31,
verbose_consistency=False)

nu = nn.NetworkUpdater()
weights = NeuralNetwork.weights
biases = NeuralNetwork.biases

AF = nn.activationFunction(func = "Sigmoid")
a_1 = np.transpose(np.matrix([2,1]))
a_l_set, z_l_set = nu.feed_forward( weights, biases, a_1, AF,
verbose=31,matrix_formatting="%6.2f")

See that the weights are initiated within the given bounds. The computation during feed forward can be verified as well.

Initializing a Neural Network object.
+ Initiate_neural_network(). Mode =  UniformRandom
-+ initiate_neural_network_general().
self.weights:
0.8320     0.4613
0.2730     0.7667
-------------------------
0.5041     0.9887
0.9356     0.3219
-------------------------
self.biases:
0.0000
0.0000
-------------------------
0.0000
0.0000
-------------------------
--+ feed_forward()
------------------------------------
layer  0 to layer 1
w_l =
0.83   0.46
0.27   0.77
a_l_minus_1 =
2.00
1.00
b_l =
0.00
0.00
->  0  :  2.1253304868524494  :  0.893340899905837
->  1  :  1.3126242707556344  :  0.7879519599996674
a_l_act =
0.89
0.79
------------------------------------
layer  1 to layer 2
w_l =
0.50   0.99
0.94   0.32
a_l_minus_1 =
0.89
0.79
b_l =
0.00
0.00
->  0  :  1.2294228612345892  :  0.7737175455242622
->  1  :  1.0894171526790268  :  0.7482719517680232
a_l_act =
0.77
0.75

Example Usage 2B

Compare the use of “UniformRandom” mode with and without  layerwise_normalization.

import numpy as np
import kero.multib.NeuralNetwork as nn
import kero.utils.utils as ut

number_of_neurons = [3,3,2,2]
lower_bound, upper_bound = -1, 1
bounds = [lower_bound, upper_bound]

print(" Without optional setting -layerwise_normalization-")
bulk = {
"number_of_neurons" : number_of_neurons,
"bounds": bounds
}

net1 = nn.NeuralNetwork()
net1.initiate_neural_network(bulk, mode="UniformRandom",
verbose = 31,
verbose_init_mode=31,
verbose_consistency=31)

print(" With optional setting -layerwise_normalization-")
bulk["layerwise_normalization"] = True

net2 = nn.NeuralNetwork()
net2.initiate_neural_network(bulk, mode="UniformRandom",
verbose = 41,
verbose_init_mode=31,
verbose_consistency=31)

See that the weights with layerwise normalization activated has been divided by a factor.

Without optional setting -layerwise_normalization-
Initializing a Neural Network object.
+ Initiate_neural_network(). Mode =  UniformRandom
-+ initiate_neural_network_general().
--+ neuron_number_consistency_check().
layer:  1  to  2
is_matrix/row/col =  True / 3 / 3
layer:  2  to  3
is_matrix/row/col =  True / 2 / 3
layer:  3  to  4
is_matrix/row/col =  True / 2 / 2
self.weights:
0.2876    -0.9968     0.9729
-0.1648    -0.0441     0.4335
0.3545    -0.8691    -0.0322
-------------------------
-0.5176    -0.6738     0.2846
0.2911    -0.7449     0.1738
-------------------------
-0.4232    -0.5835
0.2598     0.4446
-------------------------
self.biases:
0.0000
0.0000
0.0000
-------------------------
0.0000
0.0000
-------------------------
0.0000
0.0000
-------------------------
With optional setting -layerwise_normalization-
Initializing a Neural Network object.
+ Initiate_neural_network(). Mode =  UniformRandom
layerwise_normalization activated.
layer:  0 to 1 layerwise_NORM =  0.3333333333333333
(-) no of neurons at layer i-1 =  3
layer:  1 to 2 layerwise_NORM =  0.3333333333333333
(-) no of neurons at layer i-1 =  3
layer:  2 to 3 layerwise_NORM =  0.5
(-) no of neurons at layer i-1 =  2
-+ initiate_neural_network_general().
--+ neuron_number_consistency_check().
layer:  1  to  2
is_matrix/row/col =  True / 3 / 3
layer:  2  to  3
is_matrix/row/col =  True / 2 / 3
layer:  3  to  4
is_matrix/row/col =  True / 2 / 2
self.weights:
0.0405     0.2528    -0.1187
0.2457    -0.1119    -0.1885
0.2457     0.1292    -0.0500
-------------------------
-0.0970    -0.2860    -0.2685
-0.0102     0.0269     0.0070
-------------------------
0.3501     0.3493
-0.0006     0.0379
-------------------------
self.biases:
0.0000
0.0000
0.0000
-------------------------
0.0000
0.0000
-------------------------
0.0000
0.0000
-------------------------

kero version: 0.6.3 and above

## Neural Network

IN PROGRESS. We are updating individual functions before aggregating them into this class of object.

This class of object creates a neural network and implements the gradient descent and back propagation.

kero.multib.NeuralNetwork.py

class NeuralNetwork:
def __init__(self):
return
def initiate_neural_network():
return
def neuron_number_consistency_check():
return

kero version: 0.6.0 and above

## SpecialGallery

This class of object is reserved for visualization of data for Network object processed by Network_processor object.

kero.multib.nDnet.py

class SpecialGallery:
def __init__(self):
self.quiver3D_lattice3D_settings={}
return
def quiver3D_lattice3D(self,
fig,
ax,
no_of_steps_to_animate,
grid_lattice_points,
field_history,
field_name_to_plot,
field_index_to_plot=[0,1,2],
field_name_label=["X","Y","Z"],
grid_shape=None,
time_between_steps_in_ms=1,
gif_filename='quiver.gif',
show_real_time=True,
view_angle_elevation=90,
view_angle_azimuth=90,
xlim=None,
ylim=None,
zlim=None,
arrow_length_scale=None,
):
return
 Properties Description quiver3D_lattice3D_settings TO BE UPDATED

kero version: 0.5.1 and above

## build_conj_dataframe()

kero.DataHandler.DataTransform.py

class clean_data:
def build_conj_dataframe(self, conj_command_set, conj_command_setting_set = None):
return

This function builds the conjugate data frame. What do we mean by conjugate data frame? If we have column A, B in our original data frame, scale column A to A’ and binarize column B to B1 and B2, then we have a new data frame consisting of columns A’, B1 and B2. This new data frame is the conjugate data frame. The example below shows how a data frame is transformed to its conjugate. Example Usage

Let us start by creating some random data table, make some of its data points defective, and then extract the non-defective part of the data table.

import kero.DataHandler.RandomDataFrame as RDF
import pandas as pd
import kero.DataHandler.DataTransform as dt
import numpy as np

rdf = RDF.RandomDataFrame()
col1 = {"column_name": "first", "items": [1, 2, 3]}
itemlist = list(np.linspace(10, 20, 48))
col2 = {"column_name": "second", "items": itemlist}
col3 = {"column_name": "third", "items": ["gg", "not"]}
col4 = {"column_name": "fourth", "items": ["my", "sg", "id", "jp", "us", "bf"]}
col_out={"column_name": "result", "items": ["classA","classB","classC"]}
rdf.initiate_random_table(20, col1, col2, col3, col4,col_out, panda=True)
rdf.crepify_table(rdf.clean_df, rate=0.08)
rdf.crepified_df.to_csv("testing_conj_df.csv", index=False)

cleanD, _, _ = dt.data_sieve(df)  # cleanD, crippD, origD'
cleanD.get_list_from_df()
colname_set = df.columns

Up to here, we have only created and removed the defects in the random data frame.  The table looks like this.

   first     second third fourth  result
0      1  10.851064   not     us  classA
1      1  12.127660   not     my  classC
2      3  16.808511    gg     bf  classB
3      2  18.085106   not     jp  classC
4      3  20.000000    gg     us  classB

Now we do the real work. Notice that cleanD is a clean_data object. Besides, the data frame cleanD.clean_df, which is the property of the clean_data object, must have been initiated for the following to work — data_sieve() does this for you. Naturally it is so, since we want to build the conjugate data frame of clean_df, a data frame that does not contain defects. In another words, we do not want to deal with the defective parts in this example.

# conversion choices
# - 1. "discrete_to_bool"
# - 2. "cont_to_scale"
# - 3. "discrete_to_int"
# - 4. ""
conj_command_set = {colname_set: "discrete_to_bool",
colname_set: "cont_to_scale",
colname_set: "discrete_to_bool",
colname_set: "discrete_to_bool",
colname_set: "discrete_to_int"}
discrete_to_int_settings={"classA":0,"classB":1,"classC":2}
cont_to_scale_settings = {"scale": [-1, 1], "mode": "uniform", "original_scale":[10,20]}
conj_command_setting_set= {colname_set: True,
colname_set: cont_to_scale_settings,
colname_set: True,
colname_set: True,
colname_set: discrete_to_int_settings}
cleanD.build_conj_dataframe(conj_command_set,conj_command_setting_set=conj_command_setting_set)

print(df)
print("\n\nCOMPARE : CLEANED\n\n")
print(cleanD.clean_df)
print("\n\nCOMPARE : CONJUGATED\n\n")

print(cleanD.clean_df_conj)

conj_command_set and conj_command_setting_set

Notice that in the code above we transform the data frame according to a set of rules. Some columns are binarized, and one of them is scaled. This is expressed by the conjugate commands, which comes with settings and options.

Format:

conj_command_set = { column_name _1: mode_1 ,…}

conj_command_setting_set= { column_name _1: option_1, …}

 Command Description cont_to_scale function invoked: def conj_from_cont_to_scaled(col, scale=None, original_scale=None, mode=”uniform”) scale (list of double): [min, max] original_scale (list of double): [o_min, o_max] mode (string) conj_command_setting : {“scale” : scale, “mode” : mode, “original_scale” : o_scale} mode=”uniform” If “original_scale” : [o_min, o_max] Given a column with double data type col=[x1,…, xN], then scale in the following manner $x_k \rightarrow min+(max-min)\times\frac{x_k -o_min}{o_max-o_min}$ If “original_scale” : None, if not specified, then, scale similarly, but with o_min = min(col) and o_max = max(col) Other mode(s) to be added discrete_to_bool function invoked: def conj_from_discrete_to_bool(col, drop_one_column=False): drop_one_column (Bool) conj_command_setting : drop_one_column Given column with header “somecolumn” and [x1,x2,…,xN] where xk is any element from {class1, class2,…, classN}, replace this column with Boolean columns “somecolumn_class1”,…,”somecolumn_classN”. If xk is classk, then in column classk , xk is 1 and the rest of the entries are 0. If drop_one_column is True, then the column classN will be discarded. This is to prevent Dummy Variable Trap. Otherwise, nothing further happens. discrete_to_int function invoked: def conj_from_discrete_to_int(col, rule=None): rule (dictionary) = {class1: int1, …, classN : intN} conj_command_setting : {class1: int1, …, classN : intN} Given a column [x1, …, xN] where xk is any element from {class1, …, classN}, then convert every element xk=classj to xk=intj. Note: technically intk can be any data type, i.e. we can use this function to convert the content to anything coherently.

See another example here.

See a more complete example in the loan problem post here.

kero version: 0.1 and above