MNIST Neural Network test 1

home > Machine Learning

To test MNIST using kero 0.6.3, I will use jupyter notebook in a virtual environment. Also, in this folder, place adhoc_utils.py containing the function read_csv() from here. I will use virtualenv as usual: see here. Then after activating the virtual environment, simply:

pip install jupyter
pip install kero
pip install matplotlib
pip install opencv-python
jupyter notebook

Download MNIST file that has been converted into CSV form; I got it from this link. Now, create the python notebook  mnist_dnn.ipynb (see below) and run all the cellsYou can find this test run and similar test runs here.

Unfortunately, it appears that the trained models only predict one single output for any input (it predicts only 6 for any image in one of the attempts, which is bad). Several possible issues and remarks include the following.

  1. There might be defective data points. Update: not likely, it is easy to check it with tested machine learning algorithm. I tried using keras on the same data here; training and prediction have been successful.
  2. Different loss functions are more suitable, check out, for example, KL divergence. Update: this is certainly more than meets the eye. See a tutorial from Stanford here. Using MSE, which is L2, appears to be harder to optimize. Use instead L1 norms like cross-entropy loss.
  3. This example uses no softmax layer at the end; in fact, using default Neural Network from kero, the final layer is activated using the same activation function (in this example, sigmoid function) as other layers. The maximum value at the output layer is taken as the predicted output.
  4. DNN has been treated like a black box; nobody quite knows what happens throughout the process in a coherent manner; in fact it could be just that the randomly initialized weights before training were not chosen in the right range. This might be interesting to study in the future (hopefully the experts come out with new insights soon).

All the above said, the little modification I did (before softmax) includes initiating all biases to zero instead of random and allow for options to generate random weights in a normalized manner (that depend on the number of neurons). I might change the interface a little, but in any case, seems like there might be more works to do! That’s all for now, happy new year!

mnist_dnn.ipynb

[1]

import numpy as np
import adhoc_utils as Aut
import matplotlib.pyplot as plt
import cv2, time

import kero.multib.NeuralNetwork as nn
import kero.utils.utils as ut

[2]

# Loading MNIST image data from csv files. 
# Also, binarize labels.
#
#

def simple_binarizer(mnist_label, bin_factor=1, bin_shift=0):
    # mnist_label: int, 0,1,2... or 9
    out = [0+bin_shift]*10
    out[mnist_label] = 1*bin_factor
    return np.transpose(np.matrix(out))
def convert_list_of_string_to_float(this_list):
    out = []
    for i in range(len(this_list)):
        out.append(float(this_list[i]))
    return out

bin_shift = 0
bin_factor = 1
img_width, img_height = 28, 28
pixel_normalizing_factor = 255
# read_csv returns list of list.
# good news is, the loaded data is already flattened.
mnist_train =  Aut.read_csv("mnist_train", header_skip=1,get_the_first_N_rows = 6400)
mnist_train_labels_binarized = [simple_binarizer(int(x[0]),bin_factor=bin_factor,bin_shift=bin_shift) for x in mnist_train]
mnist_train_data = [1/pixel_normalizing_factor*np.transpose(np.matrix(convert_list_of_string_to_float(x[1:]))) for x in mnist_train]
# 

# Uncomment this to print the binarized labels
#
for i in range(5):
    print(mnist_train[i][0] ,":",ut.numpy_matrix_to_list(mnist_train_labels_binarized[i]))

[3]

# Uncomment this to see the flattened image profile
#
# temp = mnist_train_data[0]
# print("max = ", np.max(temp))
# print("min = ", np.min(temp))
# mean_val = np.mean(temp)
# print("mean = ", mean_val)
# fig0 = plt.figure()
# ax0 = fig0.add_subplot(111)
# ax0.plot(range(len(temp)),temp)
# ax0.plot(range(len(temp)),[mean_val]*len(temp))

[4]

# To visualize the loaded data, uncomment and run this section.
#
# 

# mnist_train_labels = [x[0] for x in mnist_train]
# mnist_train_data_image_form = [np.array(x[1:]).reshape(img_height,img_width).astype(np.uint8) for x in mnist_train]

# data_length = len(mnist_train_data)
# for i in range(10):
#     if i < data_length:
#         print(mnist_train_data_image_form[i].shape,end=",")

# #  
# count=0
# title_set = []
# for label,img_data in zip(mnist_train_labels,mnist_train_data_image_form):
#     title = "count: "+str(count)+"| label: "+str(label)
#     title_set.append(title)
#     cv2.imshow(title, img_data)
#     cv2.resizeWindow(title, 300,300)
#     count = count + 1
#     if count == 5:
#         break
# cv2.waitKey(0)
# for title in title_set:
#     cv2.destroyWindow(title)

[5]

# input_set: list of numpy matrix [x], 
#   where each x is a column vector m by 1, m the size of input layer.
# Y0_set: list of numpy matrix [Y0],
#   where each Y0 is a column vector N by 1, N the size of output layer.
#   This is equal to 10, since it corresponds to labels 0,1,...,9.
#
#

input_set = mnist_train_data
Y0_set = mnist_train_labels_binarized
number_of_neurons = [784,28,10]
lower_bound, upper_bound = 0 ,1
bounds = [lower_bound, upper_bound]
bulk = {
    "number_of_neurons" : number_of_neurons,
    "bounds": bounds,
    "layerwise_normalization": True,
}

NeuralNetwork = nn.NeuralNetwork()
NeuralNetwork.learning_rate = 1
NeuralNetwork.initiate_neural_network(bulk, mode="UniformRandom",
    verbose = False,
    verbose_init_mode=False,
    verbose_consistency=False)

nu = nn.NetworkUpdater()
nu.set_settings(method="RegularStochastic",
method_specific_settings={
        "batch_size":8,
        "no_of_epoch":32,
        "shuffle_batch":True,
})
nu.set_training_data(input_set,Y0_set)
nu.set_neural_network(NeuralNetwork)

[6]

AF = nn.activationFunction(func = "Sigmoid")
start = time.time()
weights_next, biases_next, mse_list = nu.update_wb(input_set, Y0_set, 
                NeuralNetwork.weights, NeuralNetwork.biases, AF,
                mse_mode="compute_and_print", verbose=11)
end = time.time()
elapsed = end - start

[7]

print("epoch | mse value ")
mark = 1
for i in range(len(mse_list)):
    if mark >= 0.1*len(mse_list) or i==0:
        print(" + epoch {",i ,"} ", mse_list[i])
        mark = 1
    else:
        mark = mark + 1

fig = plt.figure()
ax1 = fig.add_subplot(211)
plt.plot(range(len(mse_list)), mse_list)

[8]

print("time taken [s] = ", elapsed)
print("time taken [min] = ", elapsed/60)
print("time taken [hr] = ", elapsed/3600)
print("time taken at 10k x [s] = ", elapsed*1e4)
print("time taken at 10k x [min] = ", elapsed*1e4/(60))
print("time taken at 10k x [hr] = ", elapsed*1e4/(3600))
print("time taken at 10k x [day] = ", elapsed*1e4/(3600*24))

[9]

no_of_images_to_test=500
mnist_test =  Aut.read_csv("mnist_test", header_skip=1,get_the_first_N_rows = no_of_images_to_test)
mnist_test_labels = [int(x[0]) for x in mnist_test]
mnist_test_data = [1/pixel_normalizing_factor*np.transpose(np.matrix(convert_list_of_string_to_float(x[1:]))) for x in mnist_test]

hit_list = []
predict_list = []
predict_val = []
for i in range(no_of_images_to_test):
    a_1 = mnist_test_data[i]
    test_a_l_set, _ = nu.feed_forward(weights_next, biases_next, a_1, AF,
        verbose=False,
        matrix_formatting="%6.2f")
    Y_after = test_a_l_set[-1]
    predicted_label = int(np.argmax(Y_after))
    actual_label= mnist_test_labels[i]
    # print(Y_after)
#     print("predicted vs actual = ", predicted_label,"/",actual_label)
    predict_list.append(predicted_label)
    predict_val.append(Y_after)
    if actual_label==predicted_label:
        hit_list.append(1)
    else:
        hit_list.append(0)
print("predict list = ")
print(predict_list)
print("predict values = ")
for i in range(10):
#     print(ut.numpy_matrix_to_list(predict_val[i]))
    ut.print_numpy_matrix(np.transpose(predict_val[i]),formatting="%9.6f",no_of_space=20)
print("hit list = ")
print(hit_list)
print("percentage correct = ", 100* np.sum(hit_list)/len(hit_list))

 

kero version 0.6.3