Data Science and Artificial Intelligence are revolutionizing the world through technical transformations. We can observe many machine learning applications in our day-to-day lives, but one of the greatest applications of machine learning is to classify individuals based on their personality traits. Each person on this planet is unique and carries a unique personality. The availability of a high-dimensional and large amount of data has paved the way for increasing marketing campaigns' effectiveness by targeting specific people. Such personality-based communications are highly effective in increasing the popularity and attractiveness of products and services. It increased usage, customer satisfaction, and broader acceptance among users. Some common examples are:
Hence it is evident that a human’s personality plays a significant role in their linkings. As per the reports, companies ask for social profiles in their hiring forms from candidates to know more about their personalities and assign them the work they are good at. This not only helps companies in selecting the right candidates but also increases their efficiency. Isn't this amazing?
So let's start without any further delay!
Big Five Personality Trait model is a well-known model based on psychological theories and used to measure personality. This model provides a summary of the overall personality of the person. It is also known as the OCEAN model.
With the availability of high-dimensional and fine-grained data about human behavior, it becomes too handy to research and observe human behavior. Using mobile sensing studies, data collected from our day-to-day activities have drastically altered how psychologists perform research and undertake personality assessments. Machine learning models are a boon to researchers and are used to learn highly complex relationships and evaluate their generalizability and robustness using the resampling method. It has the potential to transform research and assessment in personality psychology. Algorithms can handle vast datasets, including thousands of attributes, without succumbing to collinearity issues. Moreover, ML algorithms are highly efficient in recognizing patterns in datasets that humans cannot even perceive. The use of these ML models can lead to better, more objective, and automated personality assessments.
People interact and express their likes, thoughts, feelings, and opinions on social media, capturing their personality traits. Machine Learning models have been actively using such a wide range of data to predict individuals’ Big Five (OCEAN) personality traits. Various supervised machine learning algorithms like Naïve Bayes and Support Vector Machines are widely used in industries to predict personality traits. Moreover, recently, researchers have started to apply unsupervised learning methods to identify other psychological constructs in digital data.
In recent years, social media such as Facebook, Twitter, Instagram, and Linkedin have become some of the most popular destinations for internet users. Social network activities provide an excellent platform for researchers to study and understand someone's online behaviors, preferences, and personality. Different personalities are related to forming different social relations and interaction behaviors on status preferences. With the development of social networks, it has become convenient to determine users' personalities based on their social activities. The figure below describes how to predict the personality traits of Facebook users based on different features and measures of the Big Five model.
Figure 2: Personality Predictions Based on User Behavior on the Facebook Social Media Platform, Source
Now, we must have got a basic understanding of personality traits and their use cases. But Hold On!!! Too much of a talk for now. Let’s dive deep into actually predicting the Big Five personality traits.
This section performs Big Five Personality Test prediction using a dataset consisting of 1,015,342 questionnaire answers collected online by Open Psychometrics. Let’s look at how the dataset appears in actuality. The Number of participants = 1015341.
data_raw=pd.read_csv("data-final.csv",sep='\t')
data = data_raw.copy()
pd.options.display.max_columns = 150
data.drop(data.columns[50:107], axis=1, inplace=True)
data.drop(data.columns[51:], axis=1, inplace=True)
print('Number of participants: ', len(data))
data.head()
Dataset: Consists of 110 columns
For each personality trait, certain questions are asked, and participants have to choose between 1 to 5. The scale was labeled between 1=Disagree, 3=Neutral, 5=Agree. Here EST corresponds to the Extroversion trait, AGR corresponds to Agreeable Personality, etc.
Let’s look into how questions for each personality trait are distributed. Here we are showing the frequency distribution of questions for Extroversion and Conscientious Personality.
Conscientious Personality
Extroversion Personality
We need to scale the data using MinMaxScaler to scale between 0–1. Scaling helps in optimizing the model's performance and generating better results.
Now, we have our data in the desired format. So let’s take a step ahead and gets our hands dirty by forming five clusters where each cluster corresponds to each personality train from the OCEAN model. For this problem, we are using the K-means clustering algorithm. After performing clustering, we have our results:
Cluster Distribution
from sklearn.cluster import KMeans
df_model = data.drop('country', axis=1)
#define 5 clusters and fit model
kmeans = KMeans(n_clusters=5)
k_fit = kmeans.fit(df_sample)
# Predicting the Clusters
pd.options.display.max_columns = 10
#labels_ is used to identify Labels of each point
predictions = k_fit.labels_
df_sample['Clusters'] = predictions
df_sample.head(10)
For visualizing our final results, we can use the PCA algorithm for dimensionality reduction. After performing PCA, we have
import seaborn as sns
plt.figure(figsize=(10,10))
sns.scatterplot(data=df_pca, x='PCA1', y='PCA2', hue='Clusters', palette='Set2', alpha=0.9)
plt.title('Personality Clusters after PCA');
For evaluating the model performance, reconstruction error is used. In fact, PCA is used to project the points into the low dimensional space. The original points are reconstructed by projecting the low dimensional representations back into the high dimensional space. The distance between the reconstructions and original points is inversely related to how well the model captures the structure present in the data. Similarly, reconstruction error can also be used to compute the R2 score, measuring the performance.
Hence in this way, we can get the Big Five Personality Traits Prediction.
Since the start of the century, Japan saw a decreasing trend in the birth rate. It is because of reducing the number of annual marriages from 800,000 in 2000 to 600,000 in 2019. Nowadays, it is too difficult to find the perfect mate, even in this COVID-19 time when almost everything has become online and virtual. Hence to help Japan’s declining birth rate and people find their eternal love, Japan’s government is leveraging Artificial Intelligence and Machine Learning so that they could get married and start their families.
However, Japan’s Cabinet believes that current dating services are not advanced enough in finding the perfect match. They have relied on preferences such as age, income, and educational level filled by the users. Hence Japan Government sought Artificial Intelligence’s help to find the perfect match based on more hidden patterns.
The new AI and ML-based dating systems have shown excellent results by focusing on individuals’ values and personalities. Hence using this more personalized approach rather than merely using age, income, education level, and the matched pair has a higher probability of getting married. The government is also paying two-thirds of the new and improved AI dating systems’ operating costs to support such services.
Currently, Japan’s Cabinet Office is also looking for approval of two billion yen for the new and advanced AI-enabled dating service in the budget.
The usage of Machine Learning methods in psychological research is expected to increase sharply soon. Personalization is the key to businesses expanding and offering customer-oriented services. Similarly, personalization offers better options and gives better options to individuals based on their personality. Machine Learning has great potential in determining personality traits, which can be further used for self-monitoring and for businesses to hire employees based on their personality criteria.
Enjoy Learning, Enjoy Algorithms!
XGBoost, also known as Extreme Gradient Boosting, is a supervised learning technique that uses an ensemble approach based on the Gradient boosting algorithm. It is a scalable end-to-end system, widely used by data scientists to achieve state-of-the-art results on many machine learning challenges.
The clustering technique is prevalent in many fields, so many algorithms exist to perform it. K-means is one of them! K-means is an unsupervised learning technique used to partition the data into pre-defined K distinct and non-overlapping partitions. These partitions are called clusters, and the value of K depends upon the user's choice.
A Decision Tree (DT) is a hierarchical breakdown of a dataset from the root node to the leaf node based on the attributes to solve a classification or regression problem. They are non-parametric supervised learning algorithms that predict a target variable's value by learning rules inferred from the data features.
Character recognition is a primary step in recognizing whether any text or character is present in the given image using machine learning. Google, Microsoft, and many more technical giants use Optical Character Recognition(OCR) techniques to solve various tasks, including spam classification, automatic reply, number-plate detection, etc.
Subscribe to get free weekly content on data structure and algorithms, machine learning, system design, oops design and mathematics.