Classification of Machine Learning Models into Five Categories

At the current stage, we can say that in any Machine Learning approach, we try to find interdependencies or relations between input and output data. Suppose we try to find the answer to Different types of Machine Learning. In that case, we will get different answers, like Classification and Regression, Supervised and Unsupervised, Probabilistic and Non-probabilistic, and many more. BUT! Have you ever thought? Why are there different classification categories for the same Machine Learning? To find this answer, let's quickly see the most straightforward pipeline for any machine learning approach to solve any given problem.

classification of machine learning models pipeline

Five major components play a vital role in the pipeline, and the classification of Machine Learning depends upon the Nature of all these five components. Looking into this pipeline, we can quickly identify the five different bases on which Machine Learning can be classified.

  1. Nature of Input Data
  2. Nature of Problem
  3. Nature of Algorithm
  4. Nature of Solution
  5. Nature of Output Data

Let's start classifying it on every basis one by one.

Classification Based on the Nature of Inputs

Based on the type of Input data that is used to train the algorithms, Machine Learning problems can be classified into four different categories,

classification of machine learning models categories

Let's explore each of these categories in detail.

  • Supervised Learning: Supervised Learning is where we have input variables (X) and an output variable (Y). We use a machine-learning algorithm to learn the mapping function from the input to the output. It is called Supervised Learning because training data can be thought of as a teacher who is supervising the learning process. In the below figure, the teacher explicitly provides information on the input data in the form of labels.

classification of machine learning models categories 1

  • Unsupervised Learning: Here, we only have input data (X) and no corresponding output variables. So there is no teacher to supervise. But now we must be thinking, how will machines learn mapping functions then? In this exciting field, machines try to form a pseudo output and then learn mapping functions. Pseudo output can be like the similarity among data samples. In simple terms, we can say that they try to treat underlying characteristics present in the input data samples as their output data. No explicit output data is required.

    This approach is mainly used to dive deeper into data analysis. If we compare the above image with the image below, we don't have any information about the shape type of the input data. But our model can still segregate them based on their shape and size. This segregation, done by machines, can be termed Unsupervised Learning.

classification of machine learning models categories 2

  • Semi-Supervised Learning: Problems where we have a large amount of input data (X), and only some part is labeled (Y), are called semi-supervised learning problems. e.g., A large dataset of images in which some are annotated (labeled) and some are not. In the above image, suppose in input data, explicit annotation about the circle and triangle is given, but for rectangle and hexagon, it is missing. This scenario will fall under the category of Semi-Supervised Learning. Most of the real-world data lie in this category, as labeling data is time-consuming and requires expert human resources.

classification of machine learning models categories 3

  1. Reinforcement Learning (RL): In supervised Learning, we present our data in pairs of input and output to our algorithms. In Reinforcement Learning, our machines work as agents in a virtual environment and perform possible actions in that environment. For example, suppose we asked our machines to play chess. Here Machine is the agent, and the chessboard is the environment.

    Our agent is trying to move King, and we know that the King can perform specific movements only. So all the movements a king can make at any stage of the game are the possible set of actions that our agent can take. The agent selects the best action from all the options available in that environmental state and, based on that selection, receives rewards/risks. Simply put, our agent will not take any such move to land our King in checkmate condition. The algorithm tries to maximize the reward with the best set of actions or minimize the risk (of losing the game), and in this way, it learns.

classification of machine learning models categories 4

Classification Based on the Nature of the Problem

Based on the type of problem that we are trying to solve, we can classify the Machine learning problem into three different categories.

classification of machine learning models categories 5

Let's dive deeper into each of these categories.

  1. Classification Problem: Classification is a problem that requires machine learning algorithms that learn how to assign a class label to examples from the problem domain. An intuitive example is classifying images into two labels "Dog" or "Not A Dog."
  2. Regression Problem: Regression is a problem that requires machine learning algorithms that learn to predict continuous variables. An elementary example will be to predict the temperature of the city. (Temperature can take any numeric value between -50 to +50 degrees Celsius.)
  3. Clustering Problem: Clustering is a type of problem that requires using Machine Learning algorithms to group the given data samples into a specified number of groups. A simple example will be to group the lemons according to their sizes. Note:- It resembles the classification problem, but the major difference between these algorithms is that the Classification problem is Supervised Learning, while Clustering is Unsupervised Learning.

Classification Based on the Nature of the Algorithm

Based on the Nature of the Algorithm used in the Machine Learning process, Machine learning can be classified into three categories.

classification of machine learning models categories 6

  1. Classical Machine Learning: Algorithms that use Statistical and Mathematical equations to derive the relations in input and output data come under this category. These algorithms are also called Statistical Machine Learning algorithms. It has the advantage of explainability (the ability to explain the reason for certain predictions for the given input). e.g., K-means, Decision Trees, Random Forest, Support Vector Machine (SVM), Linear Regression, etc.
  2. Neural Networks:- NN Algorithms that are inspired by human brains. In the process of these algorithms, a complex mathematical model with many trainable parameters ( entries of weight and bias matrices) is built. These parameters are learned by using training data. Neural networks seem quite promising but face many limitations when the model's complexity increases. It also limits capturing complex dependencies like temporal/spatial dependencies. Temporal dependencies depend on time (how the input sample at time step t1 depends upon the input sample at time step t2), and Spatial dependencies depend upon space (how input collected at one condition depends upon input at different conditions).
  3. Deep-Learning:- Basic principle of Deep-learning is the same as Neural Networks, but some advancement in terms of the hidden layers' placement (architecture) is introduced to tackle Neural Networks' limitations. Deep-learning algorithms are capable of learning Spatial/Temporal relations in training data. But the major drawback of these algorithms is non-explainability. Examples of these algorithms are Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short-Terms Memory (LSTM), etc.

Classification Based on the Nature of the Solution

One can classify machine learning into two categories based on the solution nature.

classification of machine learning models categories 7

Naturally, ML algorithms are designed to learn the historical input data, make inferences from that historical data, and predict the output for future inputs. 
To predict the output, the model can take two approaches:-

  1. Parametric Models: Models that consider only future inputs to predict the output. Such models take the analogy from the training data and expect the same analogy to be followed in the unseen testing data. Linear regression and Neural Networks are examples of parametric models.

    In parametric models, we have a fixed number of parameters that do not depend upon the number of input samples. For example, if we have to learn the straight-line function, then the parameters we need to learn are "slope" and "intercept". Values of these parameters can change, but the number is fixed to two.

  2. Non- Parametric Models: The prediction of output depends on the Input features and previous outputs that the model predicted earlier. In this approach, the predicted output value is derived from the output values in similar scenarios identified from training data. K-Nearest Neighbor and Decision Trees are examples of non-parametric algorithms.

    We generally get confused that in non-parametric Learning, we don't even have parameters. But it's not at all true. We have parameters in a non-parametric condition, but the number of parameters depends upon the input samples. If we increase or decrease samples, we will have different parameters to learn.

 Classification Based on the Nature of Output Data

Based on the Nature of the output, Machine learning can be classified into two categories.

classification of machine learning models categories 8

  1. Probabilistic Models:It gives the output in the form of probabilities, reflecting the prediction's confidence. For example, the classification problem is relevant to probabilistic models as algorithms predict the label with a certain confidence. 
    E.g., Suppose our model looks into one picture and says it is 60% sure that a dog is present in that picture. CNN Algorithm is one of the examples in this category.

    We define decision boundaries in the case of probabilistic models. Let's say, If the model's confidence is greater than 60%, then only we will say that model predicted that class; otherwise, we will say that it did not predict that class. The value of this 60% is as per the user's need and can be changed accordingly.

  2. Non-Probabilistic Models: This model predicts but does not give any measure to know prediction quality. But there will be external methods to determine the error between the predicted and actual values. Decision Trees and SVM are some examples that lie under this category.

    In Regression problems, we directly get the desired output in the continuous format, but our model does not provide some information like how it is confident in predicting that. For example, if our model says that tomorrow's temperature will be 24.7 degrees Celcius. But we don't know whether the model is confident in saying that.

Critical questions to explore

Question 1: On what basis the same model can be classified as a supervised model and a classification model?
Question 2: What is the difference between supervised and unsupervised Learning?
Question 3: What is the difference between classification and clustering?
Question 4: What differentiates Reinforcement Learning from supervised, unsupervised, and semi-supervised Learning?
Question 5: How Deep-Learning and Neural Networks different?

Conclusion

In this article, we classified machine learning models based on five different categories. This article aimed to make learners aware of all the classifications of machine learning. Later, detailed descriptions of "supervised and unsupervised learning algorithms" and "classification and regression" problems will be presented.

References

  1. Machine Intelligence by Suresh Samudrala

Next Blog: Supervised vs. Unsupervised Learning

Previous Blog: Gradient Descent Algorithm

Enjoy Machine Learning, Enjoy Algorithms!

Share feedback with us

More blogs to explore

Our weekly newsletter

Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.

© 2022 Code Algorithms Pvt. Ltd.

All rights reserved.