We can classify machine learning models based on the nature of the algorithm used in three categories: Statistical Machine Learning, Artificial Neural Networks, and Deep Learning. Artificial Neural Networks, also known as Neural Networks, are those algorithms inspired by the human brain. Although the comparison of ANNs with Human brains is superficial, this analogy makes us understand ANNs in simplest form. With this article, we will start our journey toward knowing this branch of Machine Learning in greater detail.

Here, we will be discussing these topics in greater detail:

- An analogy from the human brain and defining ANNs in simple English language.
- Explanations of the constituent terms in the definition of ANNs.
- Key insights from the schematic diagram of NN.
- What are the Advantages and Disadvantages of ANNs?
- Practical use-cases of ANNs in real life.

The human brain learns over a period from several experiences and circumstances. Neurons are the basic building blocks present in human brains. Billions of these neurons, along with their interconnections, store the learned experiences in the form of learnings. Once our sensory organs, like eyes, skin, and ears, perceive the same situations, they respond similarly. Let's take the example of learning to drive. Our brains experience many situations in the environment and learn their responses. After learning, based on signals from our eyes, nose, and ears, our brain responds to various body parts to make the vehicle drive on the road.

The encounter of any new event modifies the learning stored in the brain neurons. Schematically, we can see the complete functionality as mapping input signals to the response signals is what human brains learn. The neurons mimic this exact property in neural networks. We have neurons and their interconnections to store the learnings gained via multiple data samples.

If we define the term ANN in plain English, we can say:

Neural Networks are user-defined nested mathematical functions with user-induced variables that can be modified in a systematic trial-&-error basis to arrive at the closest mathematical relationship between given pair of input and output.

Let's know the terms used in the above definition:

**User-defined:**Machine Learning developers define an arbitrary mathematical function, including some adjustable parameters. If users can define this function, what is the need for ML? But that's the base of starting things; otherwise, the Machine will keep fitting infinite possible functions on the given dataset. So, users define the function to limit the search for machines.**Nested mathematical functions:**In the diagram above, there is just one neuron in the hidden layer, but we can have multiple neurons in multiple hidden layers present between the input and output layers. Schematically the input of the first hidden layer is the input layer, and the output of the first hidden layer is the input of the second hidden layer. Finally, the output of the second hidden layer is the actual output we wanted from NN. The overall process is nested, so we can say that these are nested mathematical functions.**Mathematical Functions:**Machines represent the relationship between input and output pairs as mathematical functions. Each mathematical function has mainly two components: 1: Aggregation Function: Inputs to the mathematical function are modified by the weights and biases in the form of user-induced variables to calculate the weighted sum. 2. Activation Function: Activation functions are applied to the weighted output of the aggregated function to introduce the nonlinearity in the input-output relationship. This nested nonlinearity helps machines learn complex patterns present in our dataset. In our later blogs, we will learn about all the activation functions in greater detail.**User-induced variables:**Machines try to find the best set of parameters to bring the final function as close to the actual fitting function. These parameters can be referred to as weights to the input parameters to decide their importance. For example, if the function is Output = 2*Input + 3, then 2 is the value of weight applied to the input. If the input is multi-dimensional, the weight will also become multi-dimensional. These weights are trainable parameters and get modified while learning from the input and output samples.**Trial-and-error basis:**The average of the differences between the functions' output and the actual output is called cost functions. Machines change the values of user-induced variables to reduce the cost function as low as possible. This change happens systematically so that the cost function decreases progressively.**Systematic:**Suppose the user-defined mathematical function cannot capture the complex patterns in data. Then some other form of a mathematical function is defined, or increasing the nestedness can make machines learn the complex patterns. All these changes happen systematically.**Closest mathematical relation:**For example, a user-defined mathematical function was a*X + b, where X is input and (a and b) are user-induced variables that machines can modify while fitting the best suitable mathematical relationship. Suppose the actual dataset was from the mathematical function 2*X + 3. Still, from the given samples and the total number of iterations the Machine took to modify a and b, it was only able to find that the mathematical relationship was: 1.9*X + 3.2. This is not entirely overlapping, but this is the closest mathematical relation that the Machine found based on the given conditions.

With all this, we might have understood what exactly is present in any Neural Network. Now let's learn how exactly this nestedness work.

While designing the structure of Neural networks, we need to keep these insights in mind.

- Each neuron in the input layer corresponds to one feature of the dataset. So, if we have 50 features in a dataset, the Input layer will have 50 neurons.
- The number of output categories is the total number of neurons in the Output layer. For example, if there are 10 categories in the output, then the Output layer will have 10 neurons.
- The number of hidden layers and Neurons in each hidden layer is pre-defined in the network, and it is not trainable. These non-trainable parameters are called hyperparameters and are tuned based on multiple experiments on the same dataset.
- Every neuron in any layer will be connected to every neuron present in adjacent layers. For example, hidden layer 1 has 20 neurons, and hidden layer 2 has 60 neurons. For hidden layer 1, every 20 neurons will be connected to all 50 neurons of the input layer and all 60 neurons in hidden layer 2.
- Every neuron in the hidden and output layer has one trainable parameter called bias. And every connection between neurons is weighted by a trainable variable called weights. Collectively weight and bias are called the weight matrix.
- Total trainable parameters in a Neural Network with 50 features in input, 10 output categories, 20 neurons in the hidden layer, and 60 neurons in hidden layer 2 will be Biases + weights. Biases: 20 + 60 + 10 = 90, and weights: 50*20 + 20*60 + 60*10 = 2800, so the total trainable parameters will be 2890.

Some of the key advantages of Neural Networks are:

- ANNs are capable of learning complex non-linear relationships between input and output data.
- ANNs can better generalize their learning on unseen data and predicts with lesser error.
- There is no restriction on the distribution of input data. ANNs work better with data following heterogeneous distributions. This property makes it flexible to use with broader categories of data.
- ANNs are more robust towards the noise. Their predictions are not affected too much by noise in the data.

Although ANNs have numerous advantages, there are some drawbacks as well. Some of them are:

- Training ANNs are computationally expensive and require better hardware support. If we increase the number of hidden layers/ nodes in any hidden layer, the requirement for better processors increases.
- Non-explainable nature of ANNs is one of the key disadvantages. They do not explain why and how they have provided the predictions.
- There is no fixed way of designing ANNs. We need to tune numerous hyperparameters like the number of layers, number of neurons, nature of activation function, and many more.

ANNs are advisable when there is a high nonlinearity in the dataset. Some of the most prominent areas where data distributions follow highly complex patterns are:

**Optical Character Recognition:**OCRs are complex problem statements are the characters present in images contain highly complex non-linear relationships. We can provide as much input to the ANNs, e.g., the complete picture in the matrix form. ANNs automatically process them and find the complex relationships present. OCRs, Facial Recognition, and Handwritten document verification are some of their recent advancements in real-life applications.**Stock market price prediction:**Forecasting market stock price is a challenging task. This behavior is unpredictable, but nowadays, many companies take the help of NNs to anticipate whether prices will go higher or lower. Better predictions of future prices from past prices by ANNs help companies make millions and billions.

In this article, we learned about the basics of Artificial Neural Networks by decoding the constituent terms in its definition. We learned about the learnable parameters present in any Neural Network and how to calculate its total number. In the last, we saw some practical, real-life applications of ANNs. We hope you enjoyed the article.

Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.