t-SNE is a non-linear dimensionality reduction algorithm used for exploring high-dimensional data. It stands for t-distributed Stochastic Neighbor Embedding, where t represents "t-distribution”.
XGBoost, also known as Extreme Gradient Boosting, is a supervised learning technique that uses an ensemble approach based on the Gradient boosting algorithm. It is a scalable end-to-end system, widely used by data scientists to achieve state-of-the-art results on many machine learning challenges.
Random forests is a supervised learning algorithm that can be used to solve both classifications and regression problems. It is popularly applied to data science competitions and practical, real-life situations and provides very intuitive and heuristic solutions.
The best machine learninmodel would have the lowest number of features involved in the analysis keeping the performance high. Therefore, determining the relevant features for the model building phase is necessary. In this session, we will see some feature selection methods and discuss the pros and cons of each.
Boosting is an approach where we sequentially ensemble the predictions made by multiple decision trees. Every decision tree is grown using the information of the previously grown trees and they are not independent to each other.
Many learners want to master Machine Learning and don’t know where to start. It seems like a formidable task, especially if one lacks a thorough background. This article will discuss some of the factors that can be obstacles in learning machine learning. Working around these obstacles can help us master and develop a long-term interest in this subject.
Classification problems are among the most used problem statements in Machine Learning. We evaluate our classification models with available models using standard evaluation metrics like Confusion matrix, Accuracy, Precision, Recall, ROC. In this article, we will discuss some of the popular evaluation metrics used to evaluate the classification models.
When we build a solution for any regression problem, we compare its performance with the existing work using standard metrics, like measuring distance in meters, plot size in square feet, etc. Similarly, we need some standard evaluation metrics to evaluate two regression models. Some of them are MAE, MSE, RMSE, and R-Squared.
Naive Bayes is a popular supervised machine learning algorithm that predicts the categorical target variables. This algorithm makes some silly assumptions while making any predictions. But the most exciting thing is: It still performs better or equivalent to the best algorithms. So let's learn about this algorithm in greater detail.
K-Nearest Neighbor is a supervised learning algorithm that can be used to solve classification as well as regression problems. This algorithm learns without explicitly mapping input variables to the target variables. It is probably the first "machine learning" algorithm, and due to its simplicity, it is still accepted in solving many industrial problems.
The clustering technique is prevalent in many fields, so many algorithms exist to perform it. K-means is one of them! K-means is an unsupervised learning technique used to partition the data into pre-defined K distinct and non-overlapping partitions. These partitions are called clusters, and the value of K depends upon the user's choice.
To detect whether the player is genuine or false in the game, BGMI (PUBG) uses a state-of-the-art machine learning approach to predict the presence of cheaters. It collects players' data, draws meaningful results, and categorizes cheaters into separate categories. They use a supervised learning approach to predict the occurrence of impossible events.
Exploratory data analysis can be classified as Univariate, Bivariate, and Multivariate analysis. Univariate refers to the analysis involving a single variable; Bivariate refers to the analysis between two variables, and Multivariate refers to the statistical procedure for analyzing the data involving more than two variables.
Nowadays, data collection is one of the most common trends, and every company collects data for various uses. When they record any form of data, it comes with multiple impurities. So data preprocessing techniques are used to remove impurities from data and make it useful for training machine learning models.
Principle Component Analysis (PCA) is an unsupervised learning technique to reduce data dimensionality consisting of many inter-related attributes. The PCA algorithm transforms data attributes into a newer set of attributes called Principal Components (PCs).
A Decision Tree (DT) is a hierarchical breakdown of a dataset from the root node to the leaf node based on the attributes to solve a classification or regression problem. They are non-parametric supervised learning algorithms that predict a target variable's value by learning rules inferred from the data features.
In this blog, we will do hands-on on several data preprocessing techniques in machine learning like Feature Selection, Feature Quality Assessment, Feature Sampling, and Feature Reduction. We will use different datasets for demonstration and briefly discuss the intuition behind the methods.
Gradient descent in Machine Learning is one of the most basic cost optimization algorithms. Every interviewer expects you to know about it. This article has discussed how it helps us find the right set of parameters for learning in machine learning.
In earlier stages, machines might be making some mistakes and learning from several experiences. But how? Let’s move towards finding the answer to this question.
Classification of movie reviews into positive and negative review categories using sentiment analysis and NLP is discussed in this article..
Optimization of error function is the respiratory process for Machine learning algorithms. But this error function varies for classification and regression problems.
Prediction of wine quality is a challenging task for humans but using machine learning and data science techniques, it can be made easy.
Regularization is the technique that is used to solve the problem of overfitting in machine learning. The gap between training and testing error become huge and model loose generalizability.
In this article, we will be discussing those 10 most common misconceptions that are so popular that every one of us must have come across at least once in our ML journey.
With ongoing advancements in Machine Learning and Data Science, we can precisely predict the remaining life span of a person given the essential parameters.
Machine learning technologies are now able to predict the individual's personality based on their social media usage. Personality-based communications are highly used in dating apps and recommendation systems.
In this article, we will try to give the 5 most important reasons that justify the need for knowledge of DS Algo in the field of Data Science & Machine Learning or Deep-Learning.
Recommender System refers to a kind of system that could predict the future preference for a user based on his/her previous behavior or by focusing on the behavior of similar kind of users.
This is a glossary of Machine Learning terms commonly used in the industry. We will add more terms related to machine learning, data science, and artificial intelligence in the coming future. Meanwhile, if you want to suggest adding more terms, please let us know.
K-NN implementation and Gmail, Yahoo, and Outlook case studyIn 2019, on average, every person was receiving 130 emails each day, and overall, 296 Billion emails have been sent in that year.
Machine Learning and Artificial Intelligence are some of the hottest topics that can ensure a brighter future for any individual in the upcoming decade.
Subscribe to get free weekly content on data structure and algorithms, machine learning, system design, oops design and mathematics.