Visual understanding enables learning more than any other form of learning. We see! We analyze! And We learn! Following the pathway of humans, now Machine Learning models can also recognize the characters present in images. The technique of recognizing characters using machine learning or computer vision has become very popular among industries. Google, Microsoft, Twitter, Mathworks, and many more industrial giants are using OCR techniques to solve a wide variety of tasks, including spam classification, automatic reply, or number-plate detection.
Character recognition is a primary step in recognizing whether any text or character is present and whether our algorithms can recognize it or not in the given image. From understanding a text to identifying an object in an image to scene understanding, everything relies on the basic task of recognition. To understand it clearly, let’s take an example.
In the image shown below, character recognition can solve simple problems such as identifying the characters in the number plate. This recognition can be mapped to data association problems such as identifying the vehicle with a particular ID and check whether it is present at a certain location or violated any traffic law.
We all know what deep learning has accomplished so far. Many computer vision applications that were supposed to be impossible few years back are now possible due to deep learning. It has attained a state-of-the-art in problems like OCR. Many deep-learning architectures are developed over the years, focusing on the problems discussed above. Indeed, we need a deeper neural network to solve complex tasks, but for certain simple image classification problems, a linear model can do a decent job for us. The good thing with linear models are,
The problem of simple character recognition can be solved using algorithms like Multi-Layer Perceptron (MLP), SVMs, Logistic Regression, etc.
In this article, we will describe the steps to implement a Logistic Regression classifier for character recognition. So let’s start without any further delay.
We will use the MNIST dataset. The MNIST dataset is a 28x28 pixel² sized open-source dataset commonly available in scikit-learn and other machine-learning frameworks.
A sample image from the MNIST dataset
The above image is a sample from the MNIST handwritten digit dataset. We will use such samples to show how a linear model can perform the task of recognizing hand-written digits.
For cases where the image is well enhanced (In a human-readable format), no form of restoration (cleaning of images) is needed, and no need for any data augmentation (transformation of images) or domain adaptation (when train and test lie in different distributions) of the input image features is required. A linear model can deliver decent accuracy.
We will be using the sklearn library to import the data. The MNIST dataset has images of size 28x28 which, when flattened out, will give a vector 784x1. Use the fetch_openml package from the sklearn.dataset and import the flattened vector dataset using ‘mnist_784’. This dataset will be used for the Logistic Regression classifier model.
X, y = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)
The MNIST dataset has images of size 28x28, and we are going to flatten that out to provide our regression classifier an input format on which it can be trained. Logistic Regression falls in the category of Generalized Linear Models (GLM) and is very much like linear regression, except logistic regression predicts categorical target variables instead of continuous data. This means that the final layer output values in logistic regression are probability values between 0 and 1, which classify any observation into a particular category. The model can be imported from sklearn.linear_model with tunable parameters such as the strength of regularization, penalty type.
Three major steps that can be filled are,
First, we will see how the model has performed under different scenarios. We will be using different combinations by varying the pre-processing types and the strength of regularization. To evaluate the model, we will be using the accuracy parameter.
Let us view how the weight matrix corresponding to each output class looks like.
When plotted out, the weight matrix shows how the model has learned its parameters to provide its decision while predicting a class.
Character recognition has become an integral part of computer vision and analysis, and as such several corporations are actively working to improve upon the current state-of-the-art. Several industries are actively working on developing algorithms that can precisely categorize hand-written digits and notes. Tools such as Optical Character Recognition (OCR) are in practice now and have achieved the state-of-the-art in such objectives.
Cloud Vision lets algorithm designers integrate optical character recognition (OCR ), vision detection features, including image labeling. These algorithms can be easily incorporated into the user’s overall pipeline. Check out the documentation to learn how to use such APIs and the overall package of services offered by them.
Microsoft’s Azure’s Computer Vision API includes Optical Character Recognition (OCR) capabilities that extract printed or handwritten text from images. This API provides developers with access to advanced image processing algorithms that have attained the current state-of-the-art. This API works in several languages, making it one of the most used characters recognition API.
Based on this project, interviewers can ask these questions in your machine learning interview:
In this article, we discussed one of the best machine learning applications: optical character recognition. We discussed the steps to implement a linear classifier model over the MNIST dataset and evaluated the performance. After that, we discussed some of the use-cases of companies that are currently using this technology. We hope you have enjoyed the article and sensed how efficient even the linear machine learning models can be.
Get well-designed application and interview centirc content on ds-algorithms, machine learning, system design and oops. Content will be delivered weekly.