Based on the nature of input that we provide to a machine learning algorithm, machine learning can be classified into 4 major categories - Supervised Learning, Unsupervised Learning, Semi-Supervised Learning, Reinforcement Learning.

In today's era of Data Science, Machine Learning, and Artificial Intelligence, the support of libraries and frameworks are easily accessible. We can build the entire Machine Learning model pipeline with just a few lines of code using frameworks like Scikit-Learn, Tensorflow, Keras, etc. But these supports enforce us to treat ML algorithms as a black box which is not good from a learning point of view. It is of utmost importance to know the exact working methodologies from a learning perspective.

That's why, in this article, we will implement the complete machine learning process without using any of the frameworks. We will use NumPy and Matplotlib libraries to build our Machine Learning model from scratch.

So let's first import the Numpy library.

```
import numpy as np
```

Let's take a simple problem such that we will be able to follow each step very thoroughly. For this article, we want our machine to learn a mapping function,

Please note that we are going to solve a Regression problem, and for that, we need our input data to be in the Supervised format. One thing is clear from the problem statement that this will be a parametric model as we need to learn the fixed values of parameters. We will be using a classical Machine Learning algorithm to build a non-probabilistic model.

Did you notice something?

We categorized our Machine Learning model on five different bases, which we discussed in this blog. Let's quickly generate our supervised data.

For this problem, we can choose random values of X and generate the corresponding Y values using the above formulae. Here, we will use Numpy's ** arange** function to pick values from the range of -5 to +5 after every 0.2 interval. For example, [-5, -4.8, -4.6, …, +4.6, +4.8]. Note: +5 is excluded; hence we will have 50 data samples.

```
X = np.arange(-5,5,0.2)
#generating the corresponding Y
Y = -1*(X**2 + X - 2)
```

Let's first visualize the data that we generated in the previous step. For plotting the curve, we will use the Matplotlib library.

```
from matplotlib import pyplot as plt
plt.figure("Data Figure")
plt.scatter(X, Y, color='red', label='data')
plt.xlabel('X', fontsize=16)
plt.ylabel('Y', fontsize=16)
plt.grid()
plt.show()
```

But it's too simple to learn this function as we have perfect data. Why not add some impurity in the data such that it should not be perfectly following the quadratic equation?

Where f will be the given problem statement, and **ε** is the impurity or noise having zero mean and σ² variance. To add this impurity in the data, we will use the random.normal function present in the Numpy library, which takes three input parameters as mean, variance, and total samples to be generated. We can control the magnitude of this noise by changing the value of magnitude in the code below.

```
magnitude = 2
Y = -1*(X**2 + X - 2) + magnitude*np.random.normal(0,1,len(X))
```

After this, data will look something like shown in the figure below.

Let's use this data as our final target variables. We want to fit a 2-degree polynomial on this dataset.

And for that, we need to learn the parametric values of θ0, θ1, and θ2. If we convert our input in the form of [X X²], then weight and bias matrices will look like,

But, we only have X as the dataset for now. We would need to make another column of X² as a new feature for that. The function below ** makepolynomialfeatures** will do that for us. It will take the input of X and Y, will calculate the X², and concatenate it with the existing X value. One thing must be noted that the input we created earlier had a shape of (50,), which means data is not in the row and column format. First, we need to bring our data X in the space of rows and columns, which will be done by the reshape function present in the NumPy library. It takes the data (to be reshaped) and the new shape as the input and returns the reshaped data.

```
def make_polynomial_features(X, Y):
X1 = np.reshape(X, (X.shape[0], 1))
X2 = X**2
X2 = np.reshape(X2, (X2.shape[0], 1))
new_data = np.concatenate((X1, X2), axis=1)
Y1 = np.reshape(Y, (Y.shape[0], 1))
new_data = np.concatenate((new_data, Y1), axis=1)
return new_data
X1 = make_polynomial_features(X, Y)
X = X1
np.random.shuffle(X)
```

Looking carefully, we also stacked the target variable into the final complete data. It is to make sure that the corresponding Y position should remain the same if we shuffle the input data X. This is done with the shuffle function of the NumPy library.

Now, we have the data ready for our Machine Learning algorithm. We will be using the famous ** gradient descent** algorithm to guide our learning process. Our data is free from outliers as we have generated the data from a fixed polynomial equation and added slight noise. So, we can use MSE (mean squared error) as our cost function.

We can also represent the relationship between X and Y as,

We know that the updates in the gradient descent happen in a specific way.

And in the case of MSE, we know the gradients of the MSE are like,

** θ0** is the bias and the

We are playing with the matrices, so dimensional validity should always be there.

In the update of weight term, we need to multiply **error** and **X_new,** such that it produces (2 X 1) matrix then only addition or subtraction with weight matrix will be possible. For the sake of that, we will use the dot product between error and X*new matrices. If we perform the operation of Transpose(X*new)*error, then a (2 X 50) matrix will get multiplied to a (50 X 1), resulting in a (2 X 1) matrix which was required.

Too much maths, but now we know everything that we need. So, let's code our gradient descent algorithm from scratch.

```
def prediction(weight, bias, X):
Y_predicted = np.dot(X, weight) + bias
return Y_predicted
```

The above function ** prediction** gives us the predicted value for a given weight, bias, and input matrices. This function will be required to calculate expected values after every update in weight and bias matrices. As we know, the learning rate is fixed for gradient descent, so let it be a small value of 0.001, and we want the parameters to get updated 1000 times, assuming that those many steps will be sufficient to find the minima of the cost function.

```
def fit_ml(X_train, Y_train, learning_rate=0.001, iterations=1000):
num_samples, num_features = X_train.shape
print("number of samples = ",num_samples)
print("number of features = ",num_features)
weight = np.zeros(num_features)
bias = 0.1
for i in range(iterations):
Y_predicted = prediction(weight, bias, X_train)
dw = (1/num_samples)*2*np.dot(X_train.T, (Y_predicted - Y_train))
db = (1/num_samples)*2*np.sum((Y_predicted - Y_train))
weight = weight - learning_rate*dw
bias = bias - learning_rate*db
return weight, bias
```

The function ** fit_ml** is ready to be used.

We can not use 100% data for training purposes. To check the learning model's performance, we also need to test it on some unseen data. That's why we will split the available dataset into training and testing samples. As the samples are shuffled, we can choose the first 80% data samples for training and the rest 20% for testing the model performance.

```
def make_spliting(data, split_percentage=0.8):
X_train = X[:int(split_percentage*(len(X))),:-1]
Y_train = X[:int(split_percentage*(len(X))),-1]
print(X_train.shape, Y_train.shape)
X_test = X[int(split_percentage*(len(X))):, :-1]
Y_test = X[int(split_percentage*(len(X))):,-1]
return X_train, Y_train, X_test, Y_test
X_train, Y_train, X_test, Y_test = make_spliting(X)
plt.figure("Data")
plt.scatter(X_train[:,0], Y_train, color='red', label='train_data')
plt.scatter(X_test[:,0],Y_test,color='green',label='test_data')
plt.xlabel("Samples", fontsize=16 )
plt.ylabel("Y", fontsize=16 )
plt.legend(loc="upper right")
plt.grid()
plt.show()
```

Let's choose the evaluation matric to be MSE. The function ** evaluation_mse** will be used to calculate it, taking Y

```
def evaluation_mse(Y_predicted, Y_actual):
m = len(Y_actual)
mse = 0
for i in range(m):
mse = mse + (Y_predicted[i] - Y_actual[i])**2
MSE = (1/m)*mse
return MSE
```

We are ready to train our Machine Learning algorithm from scratch. After calling the ** fit_ml** function, we will have the learned parametric values, and we will say that our machine has learned.

```
#Below line will give us the weight and bias matrix values
weight, bias = fit_ml(X_train, Y_train)
print("Training Done")
#Below line is used to predict on train data
Y_train_pred = prediction(weight, bias,X_train)
print("Mean Squared Error = ",evaluation_mse(Y_train_pred, Y_train))
plt.figure("Training")
plt.scatter(X_train[:,0], Y_train, color='red', label='train_data')
plt.scatter(X_train[:,0], Y_train_pred, label='prediction')
plt.xlabel("X", fontsize=16 )
plt.ylabel("Y", fontsize=16 )
plt.legend(loc="upper right")
plt.grid()
plt.show()
###Mean Squared Error = 5.584458137818953
```

Now, we have the trained parameters (Weight and Bias matrices). We can test the accuracy of these values on an utterly unseen data sample. So, let's do that.

```
Y_test_pred = prediction(weight, bias, X_test)
print("Mean Squared Error = ",evaluation_mse(Y_test_pred, Y_test))
plt.figure("Testing")
plt.scatter(X_test[:,0], Y_test, color='green', label='test_data')
plt.scatter(X_test[:,0], Y_test_pred,label='prediction')
plt.xlabel("X", fontsize=16 )
plt.ylabel("Y", fontsize=16 )
plt.legend(loc="upper right")
plt.grid()
plt.show()
#Mean Squared Error = 5.86669209809574
```

We can not draw a solid line directly because samples are shuffled, and if we use the plot function of matplotlib, it will be a mess. To sort this mess, we need to use the sorting of samples according to X values and then use their corresponding Y values to plot the solid line.

```
Y_pred = np.reshape(prediction(weight, bias, X1[:,:-1]), (len(X1[:,:-1]), 1))
plot_x = np.concatenate((np.reshape(X1[:,0], (len(X1[:,0]),1)),Y_pred), axis=1)
plot_x1 = np.concatenate((np.reshape(X1[:,0], (len(X1[:,0]),1)),np.reshape(Y, (len(Y),1))), axis=1)
plot_x = plot_x[plot_x[:, 0].argsort(kind='mergesort')]
plt.figure("All plot")
plt.scatter(plot_x[:,0],Y, color='red', label='Actual Values')
plt.plot(plot_x[:,0],plot_x[:,1], label='predicted function')
plt.xlabel("X", fontsize=16)
plt.ylabel("Y", fontsize=16)
plt.legend(loc="upper right")
plt.show()
```

The complete code can be seen below.

```
import numpy as np
def prediction(weight, bias, X):
Y_predicted = np.dot(X, weight) + bias
return Y_predicted
def fit_ml(X_train, Y_train, learning_rate=0.001, iterations=1000):
#initialize the weights
num_samples, num_features = X_train.shape
print(num_samples, num_features)
weight = np.zeros(num_features)
bias = 0.1
for i in range(iterations):
#print("Iteration no.: ", i)
Y_predicted = prediction(weight, bias, X_train)
dw = (1/num_samples)*2*np.dot(X_train.T, (Y_predicted - Y_train))
db = (1/num_samples)*2*np.sum((Y_predicted - Y_train))
weight = weight - learning_rate*dw
bias = bias - learning_rate*db
return weight, bias
def evaluation_mse(Y_predicted, Y_actual):
m = len(Y_actual)
mse = 0
for i in range(m):
mse = mse + (Y_predicted[i] - Y_actual[i])**2
MSE = (1/m)*mse
return MSE
X = np.arange(-5,5,0.2)
Y = -1*(X**2 + X - 2) + 2*np.random.normal(0,1,len(X))
from matplotlib import pyplot as plt
plt.figure("Data Figure")
plt.scatter(X, Y, color='red', label='data')
plt.xlabel('X', fontsize=16)
plt.ylabel('Y', fontsize=16)
plt.grid()
plt.show()
def make_polynomial_features(X, Y):
X1 = np.reshape(X, (X.shape[0], 1))
X2 = X**2
X2 = np.reshape(X2, (X2.shape[0], 1))
new_data = np.concatenate((X1, X2), axis=1)
Y1 = np.reshape(Y, (Y.shape[0], 1))
new_data = np.concatenate((new_data, Y1), axis=1)
return new_data
#print(len(X))
X1 = make_polynomial_features(X, Y)
X = X1
np.random.shuffle(X)
def make_spliting(data, split_percentage=0.8):
X_train = X[:int(split_percentage*(len(X))),:-1]
Y_train = X[:int(split_percentage*(len(X))),-1]
print(X_train.shape, Y_train.shape)
X_test = X[int(split_percentage*(len(X))):, :-1]
Y_test = X[int(split_percentage*(len(X))):,-1]
return X_train, Y_train, X_test, Y_test
X_train, Y_train, X_test, Y_test = make_spliting(X)
plt.figure("Data")
plt.scatter(X_train[:,0], Y_train, color='red', label='train_data')
plt.scatter(X_test[:,0],Y_test,color='green',label='test_data')
plt.xlabel("X", fontsize=16 )
plt.ylabel("Y", fontsize=16 )
plt.legend(loc="upper right")
plt.grid()
plt.show()
weight, bias = fit_ml(X_train, Y_train)
print("Training Done")
Y_train_pred = prediction(weight, bias,X_train)
print("Mean Squared Error = ",evaluation_mse(Y_train_pred, Y_train))
Y_test_pred = prediction(weight, bias, X_test)
print("Mean Squared Error = ",evaluation_mse(Y_test_pred, Y_test))
Y_pred = np.reshape(prediction(weight, bias, X1[:,:-1]), (len(X1[:,:-1]), 1))
plot_x = np.concatenate((np.reshape(X1[:,0], (len(X1[:,0]),1)),Y_pred), axis=1)
plot_x1 = np.concatenate((np.reshape(X1[:,0], (len(X1[:,0]),1)),np.reshape(Y, (len(Y),1))), axis=1)
plot_x = plot_x[plot_x[:, 0].argsort(kind='mergesort')]
plt.figure("All plot")
plt.scatter(plot_x[:,0],Y, color='red', label='Actual Values')
plt.plot(plot_x[:,0],plot_x[:,1], label='predicted function')
plt.xlabel("X", fontsize=16)
plt.ylabel("Y", fontsize=16)
plt.legend(loc="upper right")
plt.show()
plt.figure("Training")
plt.scatter(X_train[:,0], Y_train, color='red', label='train_data')
plt.scatter(X_train[:,0], Y_train_pred, label='prediction')
plt.xlabel("X", fontsize=16 )
plt.ylabel("Y", fontsize=16 )
plt.legend(loc="upper right")
plt.grid()
plt.show()
plt.figure("Testing")
plt.scatter(X_test[:,0], Y_test, color='green', label='test_data')
plt.scatter(X_test[:,0], Y_test_pred,label='prediction')
plt.xlabel("X", fontsize=16 )
plt.ylabel("Y", fontsize=16 )
plt.legend(loc="upper right")
plt.grid()
plt.show()
```

Indeed an extended code. But you know, this same code can be done within 10–15 lines using frameworks. Not only this, but the codes used in the frameworks will be highly efficient as they will be using multithreading processes. That’s how their code takes significantly lesser time and space compared to ours.

In this article, we successfully built a Machine Learning model by coding from scratch. Through this article, we demonstrated how exactly a machine learning algorithm works using the gradient descent algorithm. We hope you enjoyed the article.

Supervised, Unsupervised, And Semi-Supervised Learning With Real-Life Usecase

Based on the nature of input that we provide to a machine learning algorithm, machine learning can be classified into 4 major categories - Supervised Learning, Unsupervised Learning, Semi-Supervised Learning, Reinforcement Learning.

Pathway For Attempting Machine Learning Projects

In Machine Learning solutions, we need to have the most coordination between technology and business verticals. For any Machine Learning project from business experts, there are mainly seven different verticals or phases it has to pass. All of these seven verticals are mentioned in the image above.

Subscribe to get free weekly content on data structure and algorithms, machine learning, system design, oops and math. enjoy learning!