Introduction to APIs for Data Science and Machine Learning

API (Application Programming Interface) is an agent or mechanism that allows applications or servers to interact and share data. These datasets are later used by developers or data scientists to extract meaning or find hidden complex patterns. In data science and machine learning, APIs are mainly used for data gathering and model deployment.

Key points covered in this blog

  1. What is an API?
  2. Common terminology used in APIs.
  3. How does an API work?
  4. Why does a data scientist need to learn about APIs?
  5. Popular APIs used in data science and machine learning.

Let’s start with each point one by one.

What is an API?

API is a set of protocols, routines, tools, and standards that enable software applications to communicate with each other. In other words, an API defines how software components should interact with each other and provides a way for developers to access the functionality of a particular application or service, without having to understand the underlying code.

APIs are mostly used in web and mobile applications to integrate different software systems and enable the exchange of data between them. For example, when we use social media apps like Instagram, Facebook, Snapchat, etc., and want to see our followers list, we click on the follower's tab, and the app requests the database server through APIs to show us the list.

How does API works in system design?

Let's see how it happens by understanding the terms used in the abbreviation of API.

Application: It refers to the software, service, or code that a programmer wants to interact with or use in their own application.

Programming: It is the protocol established between the application and the interface. There must be some set of rules obeyed to connect and perform actions. For example, APIs following the SOAP protocol return data in XML format, whereas RESTful APIs can return data in many formats, most prominently in JSON format.

Interface: The interface is the abstraction of implementation. While the User Interface (UI) is made for the users to interact with the application, APIs are made for application programmers to use in their applications. It provides a set of methods or functions that the application can call to perform specific actions or access specific data.

Let’s take another example to understand. We book flight tickets through third-party apps or websites like ixigo, Paytm, MakeMyTrip, or Goibibo (Interface). These websites generate an API call (Programming) requesting the database of the airline (Application) for which we wish to book the ticket, and show us the seat availability. Once we book the ticket, an API call is generated again, requesting the airline to update its database.

Now, one question arises: How do airlines authenticate these requests and secure user data? To answer this question, let’s first understand some terminologies.

Commonly used terminologies in API

HTTP Methods

HTTP stands for Hyper Text Transfer Protocol, a protocol or a medium followed to make API calls. HTTP methods (AKA verbs or requests) tell the API what operation to perform. There are mainly 4 HTTP methods.

  1. GET: GET allows the user or developer to read the data. In the flight ticket example, the third-party app requested airlines to show seat availability using a GET method.
  2. PATCH: A patch request is used to write or update the database. When the booking is made, the third-party app generates a PATCH request to update the database of the airline.
  3. POST: This request is made when someone tries to create a new resource over the database server, just like we post on Instagram and Facebook.
  4. DELETE: Sometimes, we cancel our booked tickets. Thus to delete the data, a DELETE request is generated, which tells API to delete a particular record from the database of airlines.

API requests and responses

These are the actions and the results we get from HTTP methods. These methods are known as requests, and the result API provides after processing the request is a response. The response is mainly in JSON format.

API key

Here we will get our answer. A unique key is provided to each partner or developer accessing the server or any resources of an organization. This key is used to check whether or not the request made is from an authenticated partner. Airlines like Emirates, Qatar Airways, and Indigo provide API keys to third-party apps to make requests and maintain user safety.

Endpoints

Any application or service deploys an endpoint or an address that works as a front door for users and developers to enter and use. While making API requests, we have to attach these endpoints at the end of the URL. For example:

/this-is-an-endpoint
/login
/website/about
https://this-is-the-url/this-is-an-endpoint

API Gateway

API gateways are the gatekeepers that perform several operations, like merging all API calls from clients and routing them to the backend, providing accurate data, authentication, and protection. These allow users to access the results of many API calls through one API request. For example, on an e-commerce website, when we search for a smartwatch, the API gateway merges this result with reviews, ratings, and recommendations and represents it to us.

REST APIs

REST stands for REpresentational State Transfer. It is a web architecture with a set of constraints applied to web service applications. Its popularity is justified by its statelessness property, which means that servers do not store client data in any way while they make requests.

The other advantage of REST APIs is their ability to provide data in the form of resources (which can be related as objects). Every resource comes with a unique URL that can be used to access it. And these URLs are known as REST APIs, which fulfill the REST architecture constraints. 

Now, as we have discussed basic concepts and terms used in APIs, let’s see how it helps to build and deploy data science and machine learning models.

Why is API important in Data Science and ML?

When we think about data science, common terms like AI, machine learning, and data preprocessing come to mind. So why is it important for a data scientist to know about APIs? Let’s go through the steps of a machine-learning model to answer this question.

Data gathering: The first step of every ML model development is data collection. There are three ways to get data: we can create our own data, gather it from sensors, or use data that has already been collected. Primarily, we use the third option, which can be done with the help of APIs.

Suppose we are making a recommendation system for an over-the-top (OTT) platform. We need data on movies, web series, ratings, genres, reviews, and more to train our model. Rather than manually searching the internet for data on each individual movie, we can use TMDB (The Movie Database) open API to fetch it. The data will be returned in JSON format, which can be easily converted to a dataframe using the pandas pd.dataframe() function.

Model deployment: When the model is ready for deployment, the model API can be sent to software developers for use. This API allows developers to interact with the model and obtain predictions by specifying endpoints.

In the next section, we will learn how data gathering is done using APIs.

How to fetch data using API?

There are many open APIs available that allow us to fetch data. There is a website called The Movie Database (TMDB), which provides open APIs to access its data. We will use this website API to fetch the data of top-rated movies.

Step 1: Go to https://developers.themoviedb.org/ and login to the website to get the API key. 

Step 2: On the left side, a list of APIs is given. We will use Top rated Movies API. Click on it and select try it out. The URL of the API is provided at the bottom of the page; copy it.

Step 3: Open the local code editor of python: import requests and pandas.

import pandas as pd
import requests

Step 4: Use the ‘get’ request to fetch the data of top-rated movies using the API URL copied. Use your API key in the specified field of the URL.

response = requests.get('https://api.themoviedb.org/3/movie/top_rated?api_key=e8f47b23e6250bc0d8cb662b80682a91&language=en-US&page=1').json()

Step 5: If we leave the response in this format, it will be useless. We have to convert it to a Dataframe for further analysis. We will use pandas for this operation.

final_data=pd.DataFrame(response_API['results'])
print(final[['title','popularity','release_date','vote_average']].head())

The data looks like this.

The output of fetched data using API

The most challenging part of fetching data using API is getting the URL because of the authentication and privacy issues. Once we get the API URL, the procedure will remain the same. 

Popular APIs used in ML and Data Science

Instead of making applications from scratch, using these APIs to perform small portions of work is favorable. Here are some most popular Data science APIs.

Google API

Google provides APIs in almost every domain, like database APIs, Machine Learning APIs, Security and Identity APIs, Networking APIs, and many more. We can use them in our applications to increase their performance.

Amazon Machine Learning API

This API is of great use for developers to build applications like fraud detection, targeted marketing, and demand forecasting. These APIs create machine-learning models by finding the trends in provided data with great accuracy.

US Census Bureau API

This API allows us to make custom queries to the data and embed them into mobile or web applications. Census usually contains data on the population and economy of a nation. This data can be used to make major demographic and economic statistics more accessible to users.

Spotify API

Spotify has the biggest database of songs and artists, and access to this database is a fortune for music applications. Using this API, we can save database space and increase the performance of our application.

There are many other APIs available on the internet. We have listed some widely used APIs here and will suggest exploring and using them to make great applications.

Conclusion

APIs are a constantly evolving field, with innovations and advancements emerging all the time. Some of the areas of innovation are Microservices, Machine learning, Cloud integration, security, etc. APIs are famous because they support the idea of developing any software from components by hiding its functionality and making it reusable for different applications. Given all of these benefits, APIs have become an integral part of every software application, and organizations are providing access to their resources through them.

If you have any queries/doubts/feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy data science, Enjoy algorithms! Content Moderator: Shubham Gautam.

More from EnjoyAlgorithms

Self-paced Courses and Blogs