The problem we try to solve here is the remainder problem. We train our neural network to find the remainder of a number randomly drawn from 0 to 99 inclusive when it is divided by 17. For example, given 20, the remainder is 3.
The code (in Jupyter notebook) detailing the results of this post can be found here by the name keras_test1.ipynb. In all the tests, we use only 1 hidden layers made of 64 neurons and different input and output layers to take into account the context of the problem. With the context taken into account, we show that we can help the neural network model train better!
Test 1A and Test 1B
Note: See the corresponding sections in the Jupyter notebook.
We start with a much simpler problem. Draw a random number from 0 to 10 inclusive. We find their remainders when divided by 10, which is quite trivial. From test 1A, with 4 epochs, we see a steady improvement in prediction accuracy up to 82%. With 12 epochs in test 1B, our accuracy is approximately 100%. Good!
Test 2A and Test 2B
Now, we raise the hurdle. We draw wider range of random numbers, from 0 to 99 inclusive. To be fair we give the neural network more data points for training. We get pretty bad outcome; the trained model in test 2A suffers the problem of predicting only 1 outcome (it always predicts the remainder is 0). In test 2B, we perform the same training, but for longer epochs. The problem still occurs.
Now we solve the problem in test 2A and 2B by contextualizing the problem. Notice that in test 1A, 1B, 2A and 2B, there is only 1 input (i.e. 1 neuron in the input layer) which exactly corresponds to the random number whose remainder is to be computed.
Now, in this test, we convert it into 2 inputs, splitting the unit and tenth digits. For example, if the number is 64, the input to our neural network is now (6,4). If the number is 5, then it becomes (0,5). This is done using extract_digit() function. The possible “concept” that the neural network can learn is the fact that for division by 10, only the last digit matters. That is to say, if our input is (a,b) after the conversion, then only b matters.
What do we get? 100% accuracy! All is good.
Finally, we raise the complexity and solve our original problem. We draw from 0 to 99 inclusive, and find the remainder from division with 17. We use extract_digit() function here as well. Running it over 24 epochs, we get an accuracy of 96% (and it does look like it can be improved)!
Conclusion? First thing first, this is just a demonstration of neural network using keras. But more importantly, contextualizing the input does help!
The code for Test3B can be found in the following.
import numpy as np from keras.models import Sequential from keras.layers import Dense
N = 100 D = 17 def simple_binarizer17(y, bin_factor=1, bin_shift=0): out = [0+bin_shift]*17 out[y] = 1*bin_factor return out def extract_digit(x): b = x%10 a = (x-b)/10 return [int(a),int(b)] X0_train = np.random.randint(N+1,size=(256000,1)) Y_train = np.array([simple_binarizer17(x%D) for x in np.transpose(X0_train).tolist()]) X0_test = np.random.randint(N+1,size=(100,1)) Y_test = np.array([simple_binarizer17(x%D) for x in np.transpose(X0_test).tolist()]) X_train = np.array([extract_digit(X) for X in X0_train]) X_test = np.array([extract_digit(X) for X in X0_test]) for X0,X in zip(X0_train[:10],X_train[:10]): print(X0,"->",X)
model = Sequential() model.add(Dense(units=64, activation='relu', input_dim=2)) model.add(Dense(units=17, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy']) model.fit(X_train, Y_train, epochs=24, batch_size=32) loss_and_metrics = model.evaluate(X_test, Y_test, batch_size=10) print("--LOSS and METRIC--") print(loss_and_metrics) print("--PREDICT--") classes = model.predict(X_test, batch_size=16)
count = 0 correct_count = 0 for y0,y in zip(Y_test,classes): count = count+1 correct_pred = False if np.argmax(y0)==np.argmax(y): correct_pred = True correct_count = correct_count + 1 if count<20: print(np.argmax(y0),"->",np.argmax(y), "(",correct_pred,")") accuracy = correct_count/len(Y_test) print("accuracy = ", accuracy)