In recent days, Machine Learning is showing tremendous potential but compared to human intelligence, it is still in its earliest stage and localized to the problem for which it is developed. For example, any ML model developed to predict cancer cells will not predict the presence of Cats in those images. In the current stage, it is not able to observe multiple dimensions and gain intelligence from there. Machine Learning can learn only those specific things for which that algorithm is specially designed. So we can say that it is controlled by humans who have designed them.
A group of scientists is also researching Artificial General Intelligence (AGI) related to self-decision-making machines. The concept of AGI is too complex to discuss here.
At the current stage, we can say that any Machine Learning approach can be mapped to find interdependencies between data.
If we try to find the answer to Different types of Machine Learning, we will get different answers, like Classification and Regression, Supervised and Unsupervised, Probabilistic and Non-probabilistic, and many more.
BUT! Have you ever thought?
Why are there different classification categories for the same Machine Learning?
To find this answer, let’s quickly see the simplest pipeline for any machine learning approach to solve the given problem statement.
Looking into this pipeline, we can quickly identify the 5 different bases on which Machine Learning can be classified.
- Nature of Input Data
- Nature of Problem
- Nature of Algorithm
- Nature of Solution
- Nature of Output Data
Let’s start classifying it on every basis one by one.
Classification Based on Nature of Inputs
Based on the type of Input data that is used to train the algorithms, Machine Learning problems can be classified into four different categories,
- Supervised Learning:- Supervised learning is where we have input variables (X) and an output variable (Y). We use a machine-learning algorithm to learn the mapping function from the input to the output.
It is called Supervised learning because the process of learning from training data can be thought of as a teacher who is supervising the learning process. In the below figure, the teacher explicitly provides information on the input data in the form of annotation.
- Unsupervised Learning:- InUnsupervised learning, we only have input data (X) and no corresponding output variables. So there is no teacher to supervise. This approach is mainly used to dive deeper into data analysis. If you compare this with the above image in the image below, we don’t have any information about the annotation input data. But our model can still segregate them based on their color and size. This segregation, done by machine, can be termed as Unsupervised Learning.
- Semi-Supervised Learning:- Problems where we have a large amount of input data (X), and only some of the data is annotated (Y) are called semi-supervised learning problems. e.g., A large dataset of images in which some are annotated (labeled), and some are not. In the above image, suppose in input data, explicit annotation about the circle and triangle is given, but for rectangle and hexagon, it is missing. This scenario will fall under the category of Semi-Supervised Learning.
Most of the real-world data lie in this category as labeling data is time-consuming and domain experts.
- Reinforcement Learning (RL):- In supervised learning, we present our data in pairs of input and output to our algorithms. In RL, our algorithm works as an agent in an environment with the possible options of actions to choose from. The agent selects the best action from all the options present at that environmental state, and based on that selection, receives reward/risks. The algorithm tries to maximize the reward or minimize the risk, and in this way, it learns.
Classification Based on Nature of the Problem
Based on the type of problem that we are trying to solve, we can classify the Machine learning problem into three different categories.
- Classification Problem:- Classification is a problem that requires machine learning algorithms that learn how to assign a class label to examples from the problem domain. A very intuitive example is classifying images into two labels “Dog” or “Not A Dog.”
- Regression Problem:- Regression is a problem that requires machine learning algorithms that learn to predict continuous variables. An elementary example will be to predict the temperature of the city. (Temperature can take any numeric value between -50 to +50 degree Celsius.)
- Clustering Problem:- Clustering is a type of problem that requires the use of Machine Learning algorithms to group the given data records into a specified number of cohesive units. A simple example will be to group the lemons according to sizes.
Note:- It apparently resembles the classification problem, but the major difference between these algorithms is, the Classification problem is Supervised Learning while Clustering is Unsupervised Learning.
Classification Based on Nature of Algorithm :
Based on the Nature of the Algorithm used in the Machine Learning process, Machine learning can be classified into three categories.
- Classical Machine Learning:- Algorithms that use Statistical and Mathematical equations to derive the relations in training data come under this category. These algorithms are also called Statistical Machine Learning algorithms. It has the advantage of explain-ability (the ability to explain the reason for certain predictions for the given input). e.g., K-means, Decision Trees, Random Forest, Support Vector Machine (SVM), etc.
- Neural Networks:- Algorithms that are inspired by human brains. In the process of these algorithms, a complex mathematical model with a large number of trainable parameters is built. These parameters are trained by using training data. Neural networks seem quite promising but face a lot of limitations when the complexity of the model increases. It also has a limitation in capturing complex dependencies like temporal/spatial dependencies. (Temporal dependencies are the dependencies that depend on time and Spatial dependencies are dependencies that depend upon space).
- Deep-Learning:- Basic principle of Deep-learning is the same as Neural Networks, but some advancement in terms of the layers' architecture is introduced to take Neural Networks' limitations. Deep-learning algorithms are capable of learning Spatial/Temporal relations in training data. But the major drawback with these algorithms is non-explainability.
Nowadays, in industries, Explainable AI is in high demand. Examples of these algorithms are Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short-Terms Memory (LSTM).
Classification Based on Nature of Solution
Based on solution nature, one can classify machine learning into two different categories.
Naturally, ML algorithms are designed to learn the historical input data, make inferences from that historical data, and predict future inputs' output.
To predict the output, the model can take two approaches:-
- Parametric Models:- Models that consider only future inputs to predict the output. Such models take the analogy from the training data and expect the same analogy to be followed in the unseen testing data. Linear regression and Neural Networks are examples of parametric models.
- Non- Parametric Models:- Prediction of output depends on the Input features and previous outputs that the model has predicted earlier. In this approach, the predicted output value is derived from the output values in similar scenarios identified from training data. KNN and Decision Trees are examples of non-parametric models.
Classification Based on Nature of Output Data
Based on the Nature of output, Machine learning can be classified into two different categories.
- Probabilistic Models:- It gives the output in the form of probabilities, reflecting the prediction's confidence. For example, the classification problem is relevant to probabilistic models as algorithms predict the label with a certain confidence.
E.g., Suppose our model looks into one picture and says it is 60% sure that a dog is present in that picture. CNN Algorithm is one example of this category.
- Non-Probabilistic Models:- This model predicts but does not give any measure to know the quality of prediction. But there will be external methods to know the error between the predicted value and the actual value. Decision Tree and SVM are some examples that lie under this category.
Critical questions to explore
Question 1: On what basis, the same model can be classified as a supervised model and a classification model?
Question 2: What is the difference between supervised and unsupervised learning?
Question 3: What is the difference between classification and clustering?
Question 4: What makes Reinforcement Learning different from supervised, unsupervised and semi-supervised learning?
Question 5: How Deep-Learning and Neural Networks are different?
- Machine Intelligence by Suresh Samudrala
Enjoy Thinking, Enjoy Machine Learning, Enjoy Algorithms!