Coding Machine Learning Algorithm From Scratch

EnjoyAlgorithms Blog Cover Image

In today's era of Data Science, Machine Learning, and Artificial Intelligence, the support of libraries and frameworks are easily accessible. We can build the entire Machine Learning model pipeline with just a few lines of code using frameworks like Scikit-Learn, Tensorflow, Keras, etc. But these supports enforce us to treat ML algorithms as a black box which is not good from a learning point of view. It is of utmost importance to know the exact working methodologies from a learning perspective.

That's why, in this article, we will implement the complete machine learning process without using any of the frameworks. We will use NumPy and Matplotlib libraries to build our Machine Learning model from scratch.

So let's first import the Numpy library.

import numpy as np

Problem Definition

Let's take a simple problem such that we will be able to follow each step very thoroughly. For this article, we want our machine to learn a mapping function,

problem definition function for this blog

Please note that we are going to solve a Regression problem, and for that, we need our input data to be in the Supervised format. One thing is clear from the problem statement that this will be a parametric model as we need to learn the fixed values of parameters. We will be using a classical Machine Learning algorithm to build a non-probabilistic model. 

Did you notice something?

We categorized our Machine Learning model on five different bases, which we discussed in this blog. Let's quickly generate our supervised data.

Data Generation

For this problem, we can choose random values of X and generate the corresponding Y values using the above formulae. Here, we will use Numpy's arange function to pick values from the range of -5 to +5 after every 0.2 interval. For example, [-5, -4.8, -4.6, …, +4.6, +4.8]. Note: +5 is excluded; hence we will have 50 data samples.

X = np.arange(-5,5,0.2)

#generating the corresponding Y
Y = -1*(X**2 + X - 2)

Data Analysis and Pre-processing

Let's first visualize the data that we generated in the previous step. For plotting the curve, we will use the Matplotlib library.

from matplotlib import pyplot as plt

plt.figure("Data Figure")

plt.scatter(X, Y, color='red', label='data')

plt.xlabel('X', fontsize=16)
plt.ylabel('Y', fontsize=16)

plt.grid()
plt.show()

Data generated without noise

But it's too simple to learn this function as we have perfect data. Why not add some impurity in the data such that it should not be perfectly following the quadratic equation?

Function with gaussian noise of mean 0 and variance sigma_square

Where f will be the given problem statement, and ε is the impurity or noise having zero mean and σ² variance. To add this impurity in the data, we will use the random.normal function present in the Numpy library, which takes three input parameters as mean, variance, and total samples to be generated. We can control the magnitude of this noise by changing the value of magnitude in the code below.

magnitude = 2
Y = -1*(X**2 + X - 2) + magnitude*np.random.normal(0,1,len(X))

After this, data will look something like shown in the figure below.

Data plotting with noise

Let's use this data as our final target variables. We want to fit a 2-degree polynomial on this dataset.

Parametric equation

And for that, we need to learn the parametric values of θ0, θ1, and θ2. If we convert our input in the form of [X X²], then weight and bias matrices will look like,

Matrix based representation

But, we only have X as the dataset for now. We would need to make another column of X² as a new feature for that. The function below makepolynomialfeatures will do that for us. It will take the input of X and Y, will calculate the X², and concatenate it with the existing X value. One thing must be noted that the input we created earlier had a shape of (50,), which means data is not in the row and column format. First, we need to bring our data X in the space of rows and columns, which will be done by the reshape function present in the NumPy library. It takes the data (to be reshaped) and the new shape as the input and returns the reshaped data. 

def make_polynomial_features(X, Y):

    X1 = np.reshape(X, (X.shape[0], 1))
    X2 = X**2
    X2 = np.reshape(X2, (X2.shape[0], 1))
    new_data = np.concatenate((X1, X2), axis=1)

    Y1 = np.reshape(Y, (Y.shape[0], 1))
    new_data = np.concatenate((new_data, Y1), axis=1)

    return new_data

X1 = make_polynomial_features(X, Y)
X = X1
np.random.shuffle(X)

Looking carefully, we also stacked the target variable into the final complete data. It is to make sure that the corresponding Y position should remain the same if we shuffle the input data X. This is done with the shuffle function of the NumPy library.

Model Building

Now, we have the data ready for our Machine Learning algorithm. We will be using the famous gradient descent algorithm to guide our learning process. Our data is free from outliers as we have generated the data from a fixed polynomial equation and added slight noise. So, we can use MSE (mean squared error) as our cost function.

We can also represent the relationship between X and Y as,

Relationship between Y and X in terms of weight and bias

We know that the updates in the gradient descent happen in a specific way.

Gradient descent pseudo code

And in the case of MSE, we know the gradients of the MSE are like,

Gradient calculation 1

Gradient calculation 2

θ0 is the bias and the [θ1, θ2].T is the Weight matrix for our use case. Hence the update terms in the gradient descent will look like,

Updates of weight and bias

We are playing with the matrices, so dimensional validity should always be there.

Dimensions of different matrices

In the update of weight term, we need to multiply error and X_new, such that it produces (2 X 1) matrix then only addition or subtraction with weight matrix will be possible. For the sake of that, we will use the dot product between error and Xnew matrices. If we perform the operation of Transpose(Xnew)*error, then a (2 X 50) matrix will get multiplied to a (50 X 1), resulting in a (2 X 1) matrix which was required. 

Too much maths, but now we know everything that we need. So, let's code our gradient descent algorithm from scratch.

def prediction(weight, bias, X):

    Y_predicted = np.dot(X, weight) + bias

    return Y_predicted

The above function prediction gives us the predicted value for a given weight, bias, and input matrices. This function will be required to calculate expected values after every update in weight and bias matrices. As we know, the learning rate is fixed for gradient descent, so let it be a small value of 0.001, and we want the parameters to get updated 1000 times, assuming that those many steps will be sufficient to find the minima of the cost function.

def fit_ml(X_train, Y_train, learning_rate=0.001, iterations=1000):

    num_samples, num_features = X_train.shape

    print("number of samples = ",num_samples)
    print("number of features = ",num_features)
    
    weight = np.zeros(num_features)
    bias = 0.1

    for i in range(iterations):

        Y_predicted = prediction(weight, bias, X_train)

        dw = (1/num_samples)*2*np.dot(X_train.T, (Y_predicted - Y_train))
        db = (1/num_samples)*2*np.sum((Y_predicted - Y_train))

        weight = weight - learning_rate*dw
        bias = bias - learning_rate*db

    return weight, bias

The function fit_ml is ready to be used.

Data Splitting

We can not use 100% data for training purposes. To check the learning model's performance, we also need to test it on some unseen data. That's why we will split the available dataset into training and testing samples. As the samples are shuffled, we can choose the first 80% data samples for training and the rest 20% for testing the model performance.

def make_spliting(data, split_percentage=0.8):

    X_train = X[:int(split_percentage*(len(X))),:-1]
    Y_train = X[:int(split_percentage*(len(X))),-1]

    print(X_train.shape, Y_train.shape)

    X_test = X[int(split_percentage*(len(X))):, :-1]
    Y_test = X[int(split_percentage*(len(X))):,-1]

    return X_train, Y_train, X_test, Y_test

X_train, Y_train, X_test, Y_test = make_spliting(X)

plt.figure("Data")
plt.scatter(X_train[:,0], Y_train, color='red', label='train_data')
plt.scatter(X_test[:,0],Y_test,color='green',label='test_data')
plt.xlabel("Samples", fontsize=16 )
plt.ylabel("Y", fontsize=16 )
plt.legend(loc="upper right")
plt.grid()
plt.show()

Training and testing data

Evaluation Metric

Let's choose the evaluation matric to be MSE. The function evaluation_mse will be used to calculate it, taking Ypredicted and Yactual as input parameters and producing the MSE.

def evaluation_mse(Y_predicted, Y_actual):

    m = len(Y_actual)
    mse = 0
    
    for i in range(m):

        mse = mse + (Y_predicted[i] - Y_actual[i])**2

    MSE = (1/m)*mse
    return MSE

Training

We are ready to train our Machine Learning algorithm from scratch. After calling the fit_ml function, we will have the learned parametric values, and we will say that our machine has learned.

#Below line will give us the weight and bias matrix values
weight, bias = fit_ml(X_train, Y_train)

print("Training Done")

#Below line is used to predict on train data
Y_train_pred = prediction(weight, bias,X_train)

print("Mean Squared Error = ",evaluation_mse(Y_train_pred, Y_train))

plt.figure("Training")
plt.scatter(X_train[:,0], Y_train, color='red', label='train_data')
plt.scatter(X_train[:,0], Y_train_pred, label='prediction')
plt.xlabel("X", fontsize=16 )
plt.ylabel("Y", fontsize=16 )
plt.legend(loc="upper right")
plt.grid()
plt.show()



###Mean Squared Error =  5.584458137818953

Model results on training data

Testing

Now, we have the trained parameters (Weight and Bias matrices). We can test the accuracy of these values on an utterly unseen data sample. So, let's do that.

Y_test_pred = prediction(weight, bias, X_test)

print("Mean Squared Error = ",evaluation_mse(Y_test_pred, Y_test))

plt.figure("Testing")
plt.scatter(X_test[:,0], Y_test, color='green', label='test_data')
plt.scatter(X_test[:,0], Y_test_pred,label='prediction')
plt.xlabel("X", fontsize=16 )
plt.ylabel("Y", fontsize=16 )
plt.legend(loc="upper right")
plt.grid()
plt.show()



#Mean Squared Error = 5.86669209809574

Model results on testing data

Testing on all available data and Drawing a solid line

We can not draw a solid line directly because samples are shuffled, and if we use the plot function of matplotlib, it will be a mess. To sort this mess, we need to use the sorting of samples according to X values and then use their corresponding Y values to plot the solid line.

Y_pred = np.reshape(prediction(weight, bias, X1[:,:-1]), (len(X1[:,:-1]), 1))

plot_x = np.concatenate((np.reshape(X1[:,0], (len(X1[:,0]),1)),Y_pred), axis=1)

plot_x1 = np.concatenate((np.reshape(X1[:,0], (len(X1[:,0]),1)),np.reshape(Y, (len(Y),1))), axis=1)
plot_x = plot_x[plot_x[:, 0].argsort(kind='mergesort')]


plt.figure("All plot")
plt.scatter(plot_x[:,0],Y, color='red', label='Actual Values')
plt.plot(plot_x[:,0],plot_x[:,1], label='predicted function')
plt.xlabel("X", fontsize=16)
plt.ylabel("Y", fontsize=16)
plt.legend(loc="upper right")
plt.show()

Model results on all data with solid line

Keeping everything in one place

The complete code can be seen below.

import numpy as np

def prediction(weight, bias, X):

    Y_predicted = np.dot(X, weight) + bias

    return Y_predicted


def fit_ml(X_train, Y_train, learning_rate=0.001, iterations=1000):

    #initialize the weights

    num_samples, num_features = X_train.shape

    print(num_samples, num_features)

    weight = np.zeros(num_features)
    bias = 0.1


    for i in range(iterations):

        #print("Iteration no.: ", i)

        Y_predicted = prediction(weight, bias, X_train)

        dw = (1/num_samples)*2*np.dot(X_train.T, (Y_predicted - Y_train))
        db = (1/num_samples)*2*np.sum((Y_predicted - Y_train))

        weight = weight - learning_rate*dw
        bias = bias - learning_rate*db

    return weight, bias



def evaluation_mse(Y_predicted, Y_actual):

    m = len(Y_actual)
    mse = 0
    for i in range(m):

        mse = mse + (Y_predicted[i] - Y_actual[i])**2

    MSE = (1/m)*mse
    return MSE

X = np.arange(-5,5,0.2)

Y = -1*(X**2 + X - 2) + 2*np.random.normal(0,1,len(X))

from matplotlib import pyplot as plt
plt.figure("Data Figure")

plt.scatter(X, Y, color='red', label='data')
plt.xlabel('X', fontsize=16)
plt.ylabel('Y', fontsize=16)

plt.grid()
plt.show()


def make_polynomial_features(X, Y):

    X1 = np.reshape(X, (X.shape[0], 1))

    X2 = X**2
    X2 = np.reshape(X2, (X2.shape[0], 1))
    new_data = np.concatenate((X1, X2), axis=1)

    Y1 = np.reshape(Y, (Y.shape[0], 1))
    new_data = np.concatenate((new_data, Y1), axis=1)

    return new_data

#print(len(X))

X1 = make_polynomial_features(X, Y)
X = X1
np.random.shuffle(X)

def make_spliting(data, split_percentage=0.8):

    X_train = X[:int(split_percentage*(len(X))),:-1]
    Y_train = X[:int(split_percentage*(len(X))),-1]

    print(X_train.shape, Y_train.shape)

    X_test = X[int(split_percentage*(len(X))):, :-1]
    Y_test = X[int(split_percentage*(len(X))):,-1]

    return X_train, Y_train, X_test, Y_test

X_train, Y_train, X_test, Y_test = make_spliting(X)

plt.figure("Data")
plt.scatter(X_train[:,0], Y_train, color='red', label='train_data')
plt.scatter(X_test[:,0],Y_test,color='green',label='test_data')
plt.xlabel("X", fontsize=16 )
plt.ylabel("Y", fontsize=16 )
plt.legend(loc="upper right")
plt.grid()
plt.show()

weight, bias = fit_ml(X_train, Y_train)

print("Training Done")

Y_train_pred = prediction(weight, bias,X_train)
print("Mean Squared Error = ",evaluation_mse(Y_train_pred, Y_train))

Y_test_pred = prediction(weight, bias, X_test)
print("Mean Squared Error = ",evaluation_mse(Y_test_pred, Y_test))

Y_pred = np.reshape(prediction(weight, bias, X1[:,:-1]), (len(X1[:,:-1]), 1))
plot_x = np.concatenate((np.reshape(X1[:,0], (len(X1[:,0]),1)),Y_pred), axis=1)
plot_x1 = np.concatenate((np.reshape(X1[:,0], (len(X1[:,0]),1)),np.reshape(Y, (len(Y),1))), axis=1)

plot_x = plot_x[plot_x[:, 0].argsort(kind='mergesort')]


plt.figure("All plot")
plt.scatter(plot_x[:,0],Y, color='red', label='Actual Values')
plt.plot(plot_x[:,0],plot_x[:,1], label='predicted function')
plt.xlabel("X", fontsize=16)
plt.ylabel("Y", fontsize=16)
plt.legend(loc="upper right")
plt.show()

plt.figure("Training")
plt.scatter(X_train[:,0], Y_train, color='red', label='train_data')
plt.scatter(X_train[:,0], Y_train_pred, label='prediction')
plt.xlabel("X", fontsize=16 )
plt.ylabel("Y", fontsize=16 )
plt.legend(loc="upper right")
plt.grid()
plt.show()


plt.figure("Testing")
plt.scatter(X_test[:,0], Y_test, color='green', label='test_data')
plt.scatter(X_test[:,0], Y_test_pred,label='prediction')
plt.xlabel("X", fontsize=16 )
plt.ylabel("Y", fontsize=16 )
plt.legend(loc="upper right")
plt.grid()
plt.show()

Indeed an extended code. But you know, this same code can be done within 10–15 lines using frameworks. Not only this, but the codes used in the frameworks will be highly efficient as they will be using multithreading processes. That’s how their code takes significantly lesser time and space compared to ours.

Conclusion

In this article, we successfully built a Machine Learning model by coding from scratch. Through this article, we demonstrated how exactly a machine learning algorithm works using the gradient descent algorithm. We hope you enjoyed the article.

Enjoy Learning! Enjoy Coding! Enjoy Algorithms!

We'd love to hear from you

More content from EnjoyAlgorithms

Our weekly newsletter

Subscribe to get free weekly content on data structure and algorithms, machine learning, system design, oops and math. enjoy learning!