Predicting Personality Using Machine Learning


Data Science and Artificial Intelligence are revolutionizing the world through technical transformations. We can observe many machine learning applications in our day-to-day lives, but one of the greatest applications of machine learning is to classify individuals based on their personality traits. Each person on this planet is unique and carries a unique personality. The availability of a high-dimensional and large amount of data has paved the way for increasing marketing campaigns' effectiveness by targeting specific people. Such personality-based communications are highly effective in increasing the popularity and attractiveness of products and services. It increased usage, customer satisfaction, and broader acceptance among users. Some common examples are:

  1. Personalization of online advertisement campaigns leads to more revenue and click-through rates.
  2. Personality traits are closely associated with an individual’s behavior and preferences. Hence the fusion of a personality-based approach has primarily increased the Recommender System’s attractiveness.
  3. Personality-based adaptions are also used to provide personalized visualizations and could even suggest better music recommendations.
  4. Personality traits could also solve the “cold start” problem using scientifically validated and relatively stable latent dimensions of an individual based on personalized systems.

Hence it is evident that a human’s personality plays a significant role in their linkings. As per the reports, companies ask for social profiles in their hiring forms from candidates to know more about their personalities and assign them the work they are good at. This not only helps companies in selecting the right candidates but also increases their efficiency. Isn't this amazing?

Key takeaways from this blog

  1. An in-depth understanding of how personality prediction is useful.
  2. What are the big five personality trait models?
  3. How can machine learning algorithms predict your personality based on your behavior on Facebook or other social media channels?
  4. Steps to implement your own personality predictor.
  5. How do countries like Japan use artificial intelligence techniques to find the perfect match for couples to solve a higher divorce rate?

So let's start without any further delay!

Big Five Personality Trait model is a well-known model based on psychological theories and used to measure personality. This model provides a summary of the overall personality of the person. It is also known as the OCEAN model.

  • Openness: Insight, imagination, sensitivity, attentiveness, and curiosity are the significant characteristics of this trait. People with such traits have dynamic personalities and a wide range of interests. They are always willing to explore the world, are curious about other things, and learn more about other people. Such people tend to be more creative.
  • Conscientiousness: Conscientiousness is used to describe the carefulness, discipline, deliberation, and diligence of the person. People with this trait are highly goal-oriented and have exceptional pulse control with the highest levels of thoughtfulness. They have exceptional organizational abilities and are incredibly mindful of deadlines.
  • Extroversion: Extrovert people are emotionally expressive and assert their ideas on others. This trait describes how people are good with their social skills. Such people are characteristically social, enthusiastic, and excitable. They like to take action and see results without any second thought.
  • Agreeableness: Agreeableness analyses individual behavior based on generosity. People with such traits are highly cooperative, affectionate, kind, and incredibly trustable. They are accommodating, generous, and considerate.
  • Neuroticism: Neuroticism describes deals with a person’s emotional instability, sulkiness, and how a person has mood swings. People with high levels of neuroticism are easily irritable.

Personality trait

Personality Prediction and Machine Learning

With the availability of high-dimensional and fine-grained data about human behavior, it becomes too handy to research and observe human behavior. Using mobile sensing studies, data collected from our day-to-day activities have drastically altered how psychologists perform research and undertake personality assessments. Machine learning models are a boon to researchers and are used to learn highly complex relationships and evaluate their generalizability and robustness using the resampling method. It has the potential to transform research and assessment in personality psychology. Algorithms can handle vast datasets, including thousands of attributes, without succumbing to collinearity issues. Moreover, ML algorithms are highly efficient in recognizing patterns in datasets that humans cannot even perceive. The use of these ML models can lead to better, more objective, and automated personality assessments.

People interact and express their likes, thoughts, feelings, and opinions on social media, capturing their personality traits. Machine Learning models have been actively using such a wide range of data to predict individuals’ Big Five (OCEAN) personality traits. Various supervised machine learning algorithms like Naïve Bayes and Support Vector Machines are widely used in industries to predict personality traits. Moreover, recently, researchers have started to apply unsupervised learning methods to identify other psychological constructs in digital data.

Personality Predictions Based on User Behavior on Facebook

In recent years, social media such as Facebook, Twitter, Instagram, and Linkedin have become some of the most popular destinations for internet users. Social network activities provide an excellent platform for researchers to study and understand someone's online behaviors, preferences, and personality. Different personalities are related to forming different social relations and interaction behaviors on status preferences. With the development of social networks, it has become convenient to determine users' personalities based on their social activities. The figure below describes how to predict the personality traits of Facebook users based on different features and measures of the Big Five model.

Complete pipeline of Meta using ensemble approach to predict personality

Figure 2: Personality Predictions Based on User Behavior on the Facebook Social Media Platform, Source

Now, we must have got a basic understanding of personality traits and their use cases. But Hold On!!! Too much of a talk for now. Let’s dive deep into actually predicting the Big Five personality traits.

Steps to implement 

Big Five Personality Test Implementation

This section performs Big Five Personality Test prediction using a dataset consisting of 1,015,342 questionnaire answers collected online by Open Psychometrics. Let’s look at how the dataset appears in actuality. The Number of participants = 1015341.

data = data_raw.copy()
pd.options.display.max_columns = 150
data.drop(data.columns[50:107], axis=1, inplace=True)
data.drop(data.columns[51:], axis=1, inplace=True)
print('Number of participants: ', len(data))

Dataset description

Dataset: Consists of 110 columns

Step 1: Dataset Description

For each personality trait, certain questions are asked, and participants have to choose between 1 to 5. The scale was labeled between 1=Disagree, 3=Neutral, 5=Agree. Here EST corresponds to the Extroversion trait, AGR corresponds to Agreeable Personality, etc.

Step 2: Visualization

Let’s look into how questions for each personality trait are distributed. Here we are showing the frequency distribution of questions for Extroversion and Conscientious Personality.

Conscientious Personality

Dataset visualization

Extroversion Personality

Dataset visualization 2

Step 3: Data Preprocessing and Clustering

We need to scale the data using MinMaxScaler to scale between 0–1. Scaling helps in optimizing the model's performance and generating better results.

Step 4: Model Building

Now, we have our data in the desired format. So let’s take a step ahead and gets our hands dirty by forming five clusters where each cluster corresponds to each personality train from the OCEAN model. For this problem, we are using the K-means clustering algorithm. After performing clustering, we have our results:

Cluster Distribution

from sklearn.cluster import KMeans

df_model = data.drop('country', axis=1)

#define 5 clusters and fit model
kmeans = KMeans(n_clusters=5)
k_fit =
# Predicting the Clusters
pd.options.display.max_columns = 10

#labels_ is used to identify Labels of each point
predictions = k_fit.labels_
df_sample['Clusters'] = predictions

Clustering of dataset

Step 5: Results

For visualizing our final results, we can use the PCA algorithm for dimensionality reduction. After performing PCA, we have

import seaborn as sns
sns.scatterplot(data=df_pca, x='PCA1', y='PCA2', hue='Clusters', palette='Set2', alpha=0.9)
plt.title('Personality Clusters after PCA');

Cluster Results with k=5

Step 6: Performance Evaluation

For evaluating the model performance, reconstruction error is used. In fact, PCA is used to project the points into the low dimensional space. The original points are reconstructed by projecting the low dimensional representations back into the high dimensional space. The distance between the reconstructions and original points is inversely related to how well the model captures the structure present in the data. Similarly, reconstruction error can also be used to compute the R2 score, measuring the performance.

Hence in this way, we can get the Big Five Personality Traits Prediction.

Case Study: Japan’s AI-enabled dating service

Since the start of the century, Japan saw a decreasing trend in the birth rate. It is because of reducing the number of annual marriages from 800,000 in 2000 to 600,000 in 2019. Nowadays, it is too difficult to find the perfect mate, even in this COVID-19 time when almost everything has become online and virtual. Hence to help Japan’s declining birth rate and people find their eternal love, Japan’s government is leveraging Artificial Intelligence and Machine Learning so that they could get married and start their families.

However, Japan’s Cabinet believes that current dating services are not advanced enough in finding the perfect match. They have relied on preferences such as age, income, and educational level filled by the users. Hence Japan Government sought Artificial Intelligence’s help to find the perfect match based on more hidden patterns.

The new AI and ML-based dating systems have shown excellent results by focusing on individuals’ values and personalities. Hence using this more personalized approach rather than merely using age, income, education level, and the matched pair has a higher probability of getting married. The government is also paying two-thirds of the new and improved AI dating systems’ operating costs to support such services.

Currently, Japan’s Cabinet Office is also looking for approval of two billion yen for the new and advanced AI-enabled dating service in the budget.

Possible Interview Questions

  1. What is the k-Means algorithm? How do we decide the value of k?
  2. Can hierarchical clustering be used in place of k-means? What's the difference between them?
  3. What are unsupervised learning approaches? Why is it called unsupervised?
  4. Is it ethical to use social media data without the consent of users? Isn't it a privacy breach?
  5. How can we use the predicted personality of any individual?


The usage of Machine Learning methods in psychological research is expected to increase sharply soon. Personalization is the key to businesses expanding and offering customer-oriented services. Similarly, personalization offers better options and gives better options to individuals based on their personality. Machine Learning has great potential in determining personality traits, which can be further used for self-monitoring and for businesses to hire employees based on their personality criteria.

Enjoy Learning, Enjoy Algorithms!

Share feedback with us

More blogs to explore

Our weekly newsletter

Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.

© 2022 Code Algorithms Pvt. Ltd.

All rights reserved.