An artificial neural network is made of layers, and a layer is made of many perceptrons (aka neurons). Perceptron is the basic computational unit of the neural network, which multiplies input with weight, adds bias, and passes the result from the activation function to deliver the output to the next layer.
In this blog, we will first design a single-layer perceptron model for learning logical AND and OR gates. Then we will design a multi-layer perceptron for learning the XOR gate's properties. While creating these perceptrons, we will know why we need multi-layer neural networks.
A single-layer perceptron contains an input layer with neurons equal to the number of features in the dataset and then an output layer with neurons equal to the target class. Single-layer perceptrons separate linearly separable datasets like AND and OR gates. In contrast, a multi-layer perceptron is used when the dataset contains non-linearity. Apart from the input and output layers, MLP( short form of Multi-layer perceptron) has hidden layers in between the input and output layers. These hidden layers help in learning the complex patterns in our data points.
Logic gates are the basic building blocks of digital circuits. They decide which set of input signals will trigger the circuit using boolean operations. They commonly have two inputs in binary form (0,1) and produce one output (0 or 1) after performing a boolean operation.
Graph insights :
Let's understand the neural network training process and see how Perceptron maps these lines over the data points.
Designing a neural network in terms of writing code will be very hectic and unreadable to the users. Escaping all the complexities, data professionals use python libraries and frameworks to implement models. But we are designing an elementary neural network, so we will build it without using any framework like TensorFlow and PyTorch. We will take the help of NumPy, a python library famous for its mathematical operations and multidimensional arrays. Then we will switch to Keras for building multi-layer Perceptron.
So let's start!
# First run pip install numpy command in terminal to install it for windows import numpy as np
The input for the logic gate consists of two values (T, F). T is for true and F for false, similar to binary values (1, 0). Input is fed to the neural network in the form of a matrix. So we have to define the input and output matrix's dimension (a.k.a. shape). X's shape will be (1, 2) because one input set has two values, and the shape of Y will be (1, 1).
T=1.0 F=0.0 # creating data for logical AND operation def get_OR_data(): X=[ [F,F], [F,T], [T,F], [T,T] ] Y=[ [F], [T], [T], [T] ] return X,Y X,Y=get_OR_data()
We have defined the getORdata function for fetching inputs and outputs. Similarly, we can define getANDdata and getXORdata functions using the same set of inputs.
Now, we will define a class MyPerceptron to include various functions which will help the model to train and test. The first function will be a constructor to initialize the parameters like learning rate, epochs, weight, and bias.
class MyPerceptron: def __init__(self,learning_rate=0.1,n_iterations=1000): self.lr=learning_rate self.epochs=n_iterations self.weights=None self.bias=None
The second function is divided into four stages:
def fit(self,X,Y): # Defining the shape of weight and bias. self.weights=np.zeros(X.shape) self.bias=0 # training the model on X_train and Y_train for epoch in range(self.epochs): for i in range(X.shape): # Deciding the activation function Y_pred= self.Step_activ_func(np.dot(self.weights,X[i]) + self.bias) # Deciding the loss function mae=Y[i]-Y_pred # Updating the weight and bias using optimization algorithm self.weights=self.weights + self.lr *mae*X[i] self.bias=self.bias + self.lr * mae
The basic principle of matrix multiplication says if the shape of X is (mn) and W is (nk), then only they can be multiplied, and the shape of XW will be (mk). So keeping this in mind, the weight matrix W will be (2,1). Similarly, the shape of the bias will be (1,1).
An activation function limits the output produced by neurons but not necessarily in the range [0,1] or [0, infinity). This bound is to ensure that exploding and vanishing of gradients should not happen. The other function of the activation function is to activate the neurons so that model becomes capable of learning complex patterns in the dataset. So let's activate the neurons by knowing some famous activation functions.
We will use the Unit step activation function to keep our model simple and similar to traditional Perceptron.
def Step_activ_func(self,activation): if(activation>=0): return 1 else: return 0
After passing the neuron output from the activation function, we must calculate the error between the predicted and actual output. The functions used to calculate this error are called loss functions. And while training the neural network, we try to minimize the summed value of the loss function for all the samples, which is called the cost function in Machine Learning. Some of the famous cost functions in neural networks are :
We will use mean absolute error to implement a single-layer perceptron.
After everything is in place, our goal is to optimize the performance. To fulfil this goal, we need an optimization algorithm. It starts with random weight and bias values and updates them after every iteration to minimize the error. Some of the most famous optimization algorithms are :
We are using a more simple optimization technique here. We will update the parameters using a simple analogy presented below.
Wnew = Wold + learning_rate * error * X
This is our final equation when we go into the mathematics of gradient descent and calculate all the terms involved. To understand how we reached this final result, see this blog.
The learning rate determines how much weight and bias will be changed after every iteration so that the loss will be minimized, and we have set it to 0.1.
We have implemented all the functions of Perceptron, and now it's time to train. But before that, we have to define one more parameter: epoch. An epoch is a parameter that determines the number of times the model should be trained on the entire dataset. We have already initialized the epoch value in the constructor of the MyPerceptron class.
clf=MyPerceptron() clf.fit(X,Y) X_test=[ [F,F], [T,F], [F,T], [F,F] ] Y_test=[ [F], [T], [T], [F] ] X_test=np.array(X_test) Y_test=np.array(Y_test) Y_predicted=clf.predict(X_test) print(Y_predicted) print(accuracy_score(Y_test,Y_predicted))
Testing Result OR: [array([, , , )]
As we can see, the Perceptron predicted the correct output for logical OR. Similarly, we can train our Perceptron to predict for AND and XOR operators. But there is a catch while the Perceptron learns the correct mapping for AND and OR. It fails to map the output for XOR because the data points are in a non-linear arrangement, and hence we need a model which can learn these complexities. Adding a hidden layer will help the Perceptron to learn that non-linearity. This is why the concept of multi-layer Perceptron came in. And now we are going to design one for XOR.
The designing process will remain the same with one change. We will choose one extra hidden layer apart from the input and output layers. We will place the hidden layer in between these two layers. For that, we also need to define the activation and loss function for them and update the parameters using the gradient descent optimization algorithm. So let's start.
Neural networks are complex to code compared to machine learning models. If we compile the whole code of a single-layer perceptron, it will exceed 100 lines. To reduce the efforts and increase the efficiency of code, we will take the help of Keras, an open-source python library built on top of TensorFlow.
from tensorflow.keras.layers import Dense from tensorflow.keras.models import Sequential import tensorflow as tf import numpy as np T=1.0 F=0.0 def get_XOR_data(): X=[ [F,F], [F,T], [T,F], [T,T] ] Y=[ [F], [T], [T], [F] ] return X,Y X,Y=get_XOR_data() X_test=[ [T,F], [T,T], [F,T], [F,F] ] Y_test=[ [T], [F], [T], [F] ]
The first step is to import all the modules and define training and testing data as we did for single-layer Perceptron.
There are three things that we need to decide for each layer:
The number of neurons: It will be 16 so that the layer will learn the complex distribution of data points better.
Activation Function: ReLu because it works well with binary inputs.
The sequential model depicts that data flow sequentially from one layer to the next. Dense is used to define layers of neural networks with parameters like the number of neurons, input_shape, and activation function.
The hidden layer performs non-linear transformations of the inputs and helps in learning complex relations. We will use 16 neurons and ReLu as an activation function for this layer.
To design a hidden layer, we need to define the key constituents again first.
Number of neurons: The output layer has neurons equal to the number of the output variables. One in our case.
Activation Function: The output is in the range of [0, 1], so we need to convert them to either 0 or 1, and for this, we will use the sigmoid function.
Loss Function: We commonly use binary cross-entropy as the loss function for binary classification problems.
Optimization Algorithm: To optimize the cost and reduce the error or loss, we need to update the parameters like weight and bias. So for this, we will use gradient descent.
After compiling the model, it's time to fit the training data with an epoch value of 1000. After training the model, we will calculate the accuracy score and print the predicted output on the test data.
model.fit(X,Y,epochs=100) loss,accuracy=model.evaluate(X_test,Y_test,verbose=0) print('Accuracy: %.2f' % (accuracy*100)) print(loss)
Final Output: [[0.706279 ], [0.21512125], [0.70059645], [0.49652937]]
Expected Output: [, , , ]
We can say that Perceptron performed well and can learn XOR properties. After the successful implementation of MLP, neural networks became very popular and opened vast opportunities to solve complex problems with great accuracy. At the end of this blog, there are two use cases that MLP can easily solve.
These are some basic steps one must follow to train a neural network.
These steps can be performed by writing a few lines of code in Keras or PyTorch using the inbuilt algorithms, but instead of using them as a black box, we should know in and out of those algorithms. And this was the only purpose of coding Perceptron from scratch.
MNIST dataset is the most famous dataset of handwritten digits used for character recognition. Almost every algorithm has been fitted on this dataset to evaluate the model's performance and achieved the highest accuracy score of 99.91. Although many algorithms perform better than MLP, this dataset is perfect for practising neural network implementation. To understand this dataset in detail and understand how a model can be built on this dataset, look at this blog.
The Iris dataset is best for understanding which features are important to predict the flower species. Every machine learning or neural network curriculum takes this dataset as a reference to teach model building. This dataset is good for starting neural networks. This will also follow the same approach of converting image into vectors and flattening it to feed into the neural networks. Please refer to this blog to learn more about this dataset and its implementation.
This blog is intended to familiarize you with the crux of neural networks and show how neurons work. The choice of parameters like the number of layers, neurons per layer, activation function, loss function, optimization algorithm, and epochs can be a game changer. And with the support of python libraries like TensorFlow, Keras, and PyTorch, deciding these parameters becomes easier and can be done in a few lines of code. Stay with us and follow up on the next blogs for more content on neural networks.
If you have any queries/doubts/feedback, please write us at firstname.lastname@example.org. Enjoy machine learning!
Subscribe to get well designed content on data structure and algorithms, machine learning, system design, object orientd programming and math.