How Uber Use Machine Learning To Facilitate Surge Price?

Data Science, Machine Learning, and Artificial Intelligence are transforming the world using their applicability in many domains. More and more tech companies are leveraging such futuristic technologies to provide enhanced customer service. Uber is one such tech giant that continuously explores methods and dedicated experiments to better customer experience. It keeps on optimizing the operations and services by deploying Machine Learning-based services to predict the market demand, find the best and optimal routes, detect any possible fraud, provide more customer-centric services, monitor and update the data to provide the most efficient real-time services.

Key takeaways from this blog

In this article, we will try to focus on these things majorly:

  1. How does Uber use Machine Learning to facilitate its business?
  2. How can we implement one of its use-cases of Surge-Pricing?
  3. How to visualize which features are contributing to the learning?
  4. How to evaluate the model performance?
  5. Possible interview questions on this project.

So, let’s start with our first question. 

How does Uber use Machine Learning to facilitate business?

You must be familiar with Uber about how easy it is to use its service. We only need to open the app, book a cab, a cab comes and takes us to our destination point, and we pay the driver after the ride’s completion. Isn’t that too simple? 
In reality, it is not that simple as it appears from the outside. Behind offering such a simple user experience, Uber runs many background services and complex algorithms. The key component which is making it possible is Machine Leaning. Let’s see how Uber utilizes Machine Learning to offer seamless services.

1. Adequate demand-supply chain

Uber deals with a large amount of data daily. Uber forecasts the location and time of the demand by exploiting both the stored and the real-time data. It uses these estimates to aware the drivers so that more and more cabs will be available to meet the demand requirements in a particular area. Hence in this manner, Uber manages and balances demand and supply chain and offers customer-centric services.

Uber making customer happy

Source: Forbs

2. Fare Estimates

Machine Learning enabled Demand Forecasting allows Uber to play with the prices during the peak hours for increased profit, but it also comes with a cost of customer retention. Uber calculates fares using real-time traffic data. It also analyses various external factors that could affect the fares, such as public transport availability and how accessible these public facilities are.

3. Customer Retention

The gap in the demand-supply chain could result in the unavailability of cabs. Such circumstances may result in users booking a ride from different available services. Uber’s machine learning-based demand predictions play a crucial role in customer retention. It uses both historical and real-time data to bridge the gap between demand and supply.

4. Accurate expected time arrival

It might be very frustrating for the users to wait for the cabs to reach a pickup location. Using Machine Learning-based approaches, Uber uses real-time traffic and GPS data, and Map APIs to forecast the expected arrival time. Specific steps could be taken to decrease the expected time arrival (ETA) when customers book rides. Uber always focuses on providing a superior customer experience by reducing the waiting time of the user.

5. Route Optimization

Uber uses Machine Learning based system to predict the best routes and recommends the most optimal routes to the drivers. Using its accurate route optimization system, Uber assists drivers in avoiding crowded areas. Traditionally, the route selection was based on absolute presumption and behavioral estimation of the driver. They didn’t consider any real-time traffic, road blockage, and other weather conditions. Machine Learning-based systems incorporate all these parameters and offer the best services.

6. Uber Pool

Uber has introduced the Uber Pool services that allow shared riding to combat cabs’ difficult unavailability during peak hours. Uber Pool allows ride-sharing between the riders heading in the same direction and allows customers to have an economical ride at a lesser price. Uber uses Machine Learning based algorithms to identify possible matching rides and assign them the same cab. Through such an advanced system, it also decides whom to pick first and drop first. Uber Pool also uses the stored data to find out the hidden pattern and accordingly modify the prices to offer the best services to its customers and, at the same time, manages higher profits.

Uber pool concept


7. Big Data and Uber

Uber is continuously revolutionizing the world using its Machine Learning based system and data-driven business model. Its system collects and maintains a large amount of data, uses big data processing techniques, and offers more personalized services. It solely relied on Hadoop and Spark frameworks for real-time processing of large-scale Machine Learning based algorithms. 

Uber managing big data

Source: Uber Blog

It maintains a huge database of drivers simultaneously, which allows it to match any ride to that particular driver in just 10–15 seconds. Uber closely observes each ride and its associated data to predict more accurate demand-supply chain prices and allocate sufficient resources according to the need. It considers various external data such as the availability of public transport facilities and many external factors.

8. Surge Pricing

We must have noticed that sometimes Uber charges us 1.5–2 times the usual price because of the Machine Learning-powered Surge Pricing algorithm. This algorithm is used to find the most reasonable prices to offer based on that particular area’s economic and current traffic conditions. It ensures that the passenger must always get a ride, even when it comes to higher prices. This algorithm uses geo-location data, and demand forecasting data to position drivers efficiently and highly depends on regression analysis tools to determine which locations will be the busiest to activate surge pricing in that area. This could also be used to send more drivers to that location to offer more customer-oriented services, allowing more customer retention and more profit.

Hence it is highly evident that How Machine Learning is involved in the functioning of Uber. Now it’s time to move towards implementing one of its use-cases on our own, as this is the best way to learn something thoroughly.

Surge Price of Uber

Problem Statement

In this article, we will develop Uber’s Machine Learning-powered Surge Pricing algorithm. We will predict the serge_multiplier based on different weather conditions. Uber and Lyft’s ride prices are not constant like public transport and are greatly affected by the demand and supply of rides at a given time. Sometimes, the weather/rain/snow causes more people to take rides, affecting the service’s pricing.

Here in this section, we’ll be looking into implementing cab price prediction for Lyft and Uber cabs against the weather based on serge. The open-source dataset is available here

Implementation Steps

Step 1: Data Description

The images given below show the structure of the two sets of data that will be used here. Cab price data consists of the details of each ride along with its corresponding price. At the same time, weather data gives information regarding the weather at a particular instant of time.

cab_df = pd.read_csv("cab_rides.csv",delimiter=',')
weather_df = pd.read_csv("weather.csv",delimiter=',')

Cab data snippet

Cab price data

Weather data snippet

Weather data

Step 2: Data Preprocessing

The initial step involves data cleaning like removing null values, changing the date_time to the desired format, and other data preprocessing steps. After preprocessing and merging the two datasets of Cabs and Weather, the snapshot of the final data would be:

Merged data snippet

Merged Dataset

Step 3: Balance the data

If we try to observe the surge multiplier frequency distribution, it is evident (shown in the image below) that the data is highly imbalanced. Hence we need to apply over-sampling techniques to balance out the data. Here we have used Synthetic Minority Over-sampling TEchnique (SMOTE) over sampler.

Before SMOTE

unique, counts = np.unique(train_labels, return_counts=True)
print(dict(zip(unique, counts)))

{1: 844981, 1.25: 15112, 1.5: 7042, 1.75: 3380, 2: 3029, 2.5: 187}
Train Data Before SMOTE = (873731, 9) Test Data after SMOTE = (291244, 9). NOTE: We have removed label = 3 as it had very less samples. 


from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=42)
train_features, train_labels = sm.fit_resample(train_features, train_labels)

{1: 844981, 1.25: 844981, 1.5: 844981, 1.75: 844981, 2: 844981, 2.5: 844981}
Train Data = (5069886, 9) Test Data = (291244, 9)

Step 4: Model Training

As the price between given source and destination is almost fixed. We need to predict the desirable surge_multiplier to get the appropriate price according to the weather condition. This problem can be solved as both Regressions as well as the classification problem. We have used Random Forest Classifier and considering [1, 1.25, 1.5, 1.75, 2.0, 2.5] as 6 different classes. One can also use some other classifiers like SVM or even a neural network.

from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
rf = RandomForestClassifier(n_jobs=-1, random_state = 42,class_weight="balanced")

Step 5: Feature Importance

It is always a good practice to see the dependency of our model on different features. The diagram below shows that our Random Forest Classifier is mostly dependant upon the Temperature and the Wind feature.

# Get numerical feature importances
importances = list(rf.feature_importances_)
# List of tuples with variable and importance
feature_importances = [(feature, round(importance, 2)) for feature, importance in zip(feature_list, importances)]
# Sort the feature importances by most important first
feature_importances = sorted(feature_importances, key = lambda x: x[1], reverse = True)
# Print out the feature and importances 
[print('Variable: {:20} Importance: {}'.format(*pair)) for pair in feature_importances];

Feature importance

It can be seen that the model is dependant on the distance feature the most and subsequently on the other features represented in the decreasing order of importance.

Step 6: Evaluation of the built model

from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
f1_score(test_labels, predictions, average='weighted')
accuracy_score(test_labels, predictions)

For the random classifier model that we built, F1-score is 0.9616, and the Accuracy is 95.77%. So we can say that the ML model is doing quite a decent job here. The below diagram shows the complete confusion matrix.

Confusion Matrix

Possible Interview Questions

Based on this project, an interviewer can ask these questions in any data science and machine learning interview:

  1. What are the different techniques that Uber uses to retain the customer?
  2. Here you have solved the classification problem. Can this be solved as a regression problem? Which algorithms will be used in the regression problem?
  3. What is Ensemble learning? Is Random Forest an Ensemble learning approach?
  4. Why don’t we use the decision tree directly? Why Random Forest?
  5. What are the methods that you will propose to increase the accuracy from this point?


The swift progress of Machine Learning tools and techniques is continuously bringing favorable circumstances to offer customer-oriented services and intensify several businesses’ productivity. Uber has emerged as a king using machine learning-based systems and focusing more on offering Customer Oriented Services. Artificial Intelligence and Machine Learning backed system helps offer optimized services and is also highly useful for adding and retaining customer service. In this article, we implemented one of the use-cases of Uber using the Machine Learning based approach. We hope you have enjoyed it.

Enjoy Learning! Enjoy Travelling! Enjoy Algorithms!

We welcome your comments

Subscribe Our Newsletter

Get well-designed application and interview centirc content on ds-algorithms, machine learning, system design and oops. Content will be delivered weekly.