Predicting Personality Using Machine Learning


We can observe many machine learning applications in our daily lives. One of the most significant applications is classifying individuals based on their personality traits. For example, the availability of high-dimensional and large amounts of data has paved the way for increasing the effectiveness of marketing campaigns by targeting specific people. Such personality-based communications are highly effective in increasing the popularity and attractiveness of products and services. It leads to increased usage, customer satisfaction, and broader acceptance among users.

Some common examples of how personality-based approaches are used in machine learning include:

  • Personalizing online advertisement campaigns leads to more revenue and higher click-through rates.
  • Personality traits are closely associated with an individual's behavior and preferences. Therefore, incorporating a personality-based approach has significantly increased the attractiveness of Recommender Systems.
  • Personality-based adaptations can also provide personalized visualizations and make better music recommendations.
  • Personality traits can also be used to solve the "cold start" problem by using scientifically validated and relatively stable latent dimensions of an individual in personalized systems.

It is clear that a person's personality plays a significant role in their interactions. According to reports, companies often request social profiles from job candidates in their hiring forms to gain insight into their personalities and assign them tasks that they are well-suited for. This not only helps companies select suitable candidates but also increases their efficiency. Isn't this amazing?

Key takeaways from this blog

  1. What are the big five personality trait models?
  2. How do machine learning algorithms use behavior on Facebook or other social media to predict a person's personality?
  3. What are the steps to implement a personality predictor using machine learning?
  4. How do countries like Japan use artificial intelligence to find suitable matches for couples to address high divorce rates?

So let’s start without any further delay!

Big Five Personality Trait model

The Big Five Personality Trait model, also known as the OCEAN model, is a widely used framework for assessing personality in psychology. It provides a summary of a person's overall character.

  • Openness: This trait encompasses characteristics such as insight, imagination, sensitivity, attentiveness, and curiosity. People who score high in openness are typically curious, creative and open to new experiences.
  • Conscientiousness: This trait relates to a person's level of care, discipline, deliberation and diligence. People who score high in conscientiousness are typically goal-oriented and have good self-control and organizational skills.
  • Extroversion: This trait relates to a person's level of emotional expression and assertiveness. Extroverted people are outgoing and comfortable interacting with others and tend to be enthusiastic and excitable.
  • Agreeableness: This trait relates to a person's level of generosity and cooperativeness. People who score high in agreeableness are typically kind, trusting and considerate.
  • Neuroticism: This trait relates to a person's emotional stability and tendency to experience negative emotions such as anxiety and depression. People who score high in neuroticism are more easily prone to mood swings and may be more sensitive to stress.

What personalities are predicted using machine learning?

Personality Prediction and Machine Learning

The availability of high-dimensional and fine-grained data about human behavior has made researching and observing human behavior much easier. For example, mobile sensing studies and data collected from daily activities have greatly impacted how psychologists conduct research and administer personality assessments.

In this direction, machine learning models have the potential to revolutionize research and assessment in personality psychology. Algorithms can handle large datasets, including thousands of attributes, without issues of collinearity. Additionally, ML algorithms are highly efficient at recognizing patterns in datasets that humans may not be able to detect. These ML models can lead to more accurate, objective, and automated personality assessments.

Another example would be social media, where people express their likes, thoughts, feelings, and opinions. Machine learning models have been effectively using this data to predict individuals' "Big Five" (OCEAN) personality traits. Various supervised machine learning algorithms, such as Naïve Bayes and Support Vector Machines, are widely used in industries to predict personality traits. Additionally, researchers have recently started applying unsupervised learning methods to identify other psychological constructs in digital data.

Personality Predictions Based on User Behavior on Facebook

In recent years, social media platforms like Facebook, Twitter, Instagram, and LinkedIn have grown in popularity among internet users. These platforms provide a valuable opportunity for researchers to study and understand individuals' online behaviors, preferences, and personalities. Different personalities are associated with different social interactions and behavior patterns on social media, such as status updates and preferences.

The figure below describes how to predict the personality traits of Facebook users based on different features and measures of the Big Five model.

How facebook use machine learning ensemble approach for personality prediction?

Now that we have a basic understanding of personality traits and their use cases. Let's dive deeper into predicting the Big Five personality traits.

Steps to implement Big Five Personality Test Prediction

This section performs Big Five Personality Test prediction using a dataset of 1,015,342 questionnaire answers collected online by Open Psychometrics. Let’s look at how the dataset appears in actuality. The Number of participants = 1015341.

data = data_raw.copy()
pd.options.display.max_columns = 150
data.drop(data.columns[50:107], axis=1, inplace=True)
data.drop(data.columns[51:], axis=1, inplace=True)
print('Number of participants: ', len(data))

Open Psychometric dataset used for personality prediction

Dataset: Consists of 110 columns.

Step 1: Dataset Description

Specific questions are asked for each personality trait, and participants must choose between 1 and 5. The scale was labeled between 1 = Disagree, 3 = Neutral, and 5 = Agree. Here EST corresponds to the Extroversion trait, AGR corresponds to the Agreeable Personality, etc.

Step 2: Visualization

Let’s look into how questions for each personality trait are distributed. Here we are showing the frequency distribution of questions for Extroversion and Conscientious Personality.

Conscientious Personality

Conscientious Personality distribution in the Open Psychometric dataset

Extroversion Personality

Extroversion Personality distribution in the Open Psychometric dataset

Dataset visualization 2

Step 3: Data Preprocessing and Clustering

We need to scale the data using MinMaxScaler between 0–1. Scaling helps in optimizing the model’s performance and generate better results.

Step 4: Model Building

Now, we have our data in the desired format. So let’s take a step ahead and gets our hands dirty by forming five clusters where each cluster corresponds to each personality train from the OCEAN model. For this problem, we are using the K-means clustering algorithm. After performing clustering, we have our results:

Cluster Distribution

from sklearn.cluster import KMeans

df_model = data.drop('country', axis=1)

#define 5 clusters and fit model
kmeans = KMeans(n_clusters=5)
k_fit =
# Predicting the Clusters
pd.options.display.max_columns = 10

#labels_ is used to identify Labels of each point
predictions = k_fit.labels_
df_sample['Clusters'] = predictions

k-means prediction of clusters on open psychometric dataset

Step 5: Results

We can use the PCA algorithm for dimensionality reduction to visualize our final results. After performing PCA, we have:

import seaborn as sns
sns.scatterplot(data=df_pca, x='PCA1', y='PCA2', hue='Clusters', palette='Set2', alpha=0.9)
plt.title('Personality Clusters after PCA');

Visualization of K-means Cluster Results with k=5 personality prediction

Step 6: Performance Evaluation

For evaluating the model performance, reconstruction error is used. PCA is used to project the points into the low-dimensional space. The original points are reconstructed by projecting the low-dimensional representations back into the high-dimensional space. The distance between the reconstructions and original points is inversely related to how well the model captures the structure present in the data. Similarly, reconstruction error can also be used to compute the R2 score, measuring the performance.

The complete code for personality prediction project can be found here.

Case Study: Japan’s AI-enabled dating service

Since the start of the century, Japan has seen a decreasing trend in the birth rate. It is because of reducing the number of annual marriages from 800,000 in 2000 to 600,000 in 2019. Nowadays, it is too difficult to find the perfect mate, even in this COVID-19 time when almost everything has become online and virtual. Hence to help Japan’s declining birth rate and people find their eternal love, Japan’s government is leveraging Artificial Intelligence and Machine Learning so that they can get married and start their families.

However, Japan’s Cabinet believes that current dating services are not advanced enough to find the perfect match. They have relied on preferences such as age, income, and educational level filled by the users. Hence Japan Government sought Artificial Intelligence’s help to find the perfect match based on more hidden patterns.

The new AI and ML-based dating systems have shown excellent results by focusing on individuals’ values and personalities. Hence, this more personalized approach, rather than merely using age, income, education level, and the matched pair, has a higher probability of getting married. The government also pays two-thirds of the new and improved AI dating systems’ operating costs to support such services.

Currently, Japan’s Cabinet Office is also looking for approval of two billion yen for the new and advanced AI-enabled dating service in the budget.

Possible Interview Questions

  • What is the k-Means algorithm? How do we decide the value of k?
  • Can hierarchical clustering be used in place of k-means? What’s the difference between them?
  • What are unsupervised learning approaches? Why is it called unsupervised?
  • How can we use the predicted personality of any individual?


The usage of Machine Learning methods in psychological research is expected to increase sharply soon. Personalization is the key to businesses expanding and offering customer-oriented services. Similarly, personalization provides better options and gives better opportunities to individuals based on their personalities. Machine Learning has excellent potential in determining personality traits, which can be further used for self-monitoring and for businesses to hire employees based on their personality criteria.

Next Blog: Customer segmentation using hierarchical clustering

Enjoy Learning, Enjoy Algorithms!

Share Your Insights

More from EnjoyAlgorithms

Self-paced Courses and Blogs

Coding Interview

Machine Learning

System Design

Our Newsletter

Subscribe to get well designed content on data structure and algorithms, machine learning, system design, object orientd programming and math.