Regular expression is an expression that holds a defined search pattern to extract the pattern-specific strings. Today, RE are available for almost every high-level programming language and as data scientists or NLP engineers, we should know the basics of regular expressions and when to use them.
Random forests is a supervised learning algorithm that can be used to solve both classifications and regression problems. It is popularly applied to data science competitions and practical, real-life situations and provides very intuitive and heuristic solutions.
The best machine learninmodel would have the lowest number of features involved in the analysis keeping the performance high. Therefore, determining the relevant features for the model building phase is necessary. In this session, we will see some feature selection methods and discuss the pros and cons of each.
Boosting is an approach where we sequentially ensemble the predictions made by multiple decision trees. Every decision tree is grown using the information of the previously grown trees and they are not independent to each other.
Anomaly detection is a process of finding samples behaving abnormally compared to the majority of samples present in the dataset. Anomaly detection algorithms have important use-cases in Data Analytics and Data Science fields. For example, fraud analysts rely on anomaly detection algorithms to detect fraud in transactions.
Scikit-learn is a free machine learning framework available for Python, providing an interface for supervised and unsupervised learning. It is built over the SciPy library and provides every feature catering to every ML requirement. In this blog, we will learn the essential concepts, tools, and features related to Scikit-learn.
In this article, we will learn about one of the essential topics used in scaling different attributes for machine learning: Normalization and Standardization. Normalization and Standardization are the techniques used to scale all the features in the same range. It avoids the cases of biases on higher or lower magnitude features.
There are various ways to make our computers ML enabled for machine learning projects. In this blog, we will try the most preferred and easy-to-use method, i.e., Python3 with Sublime Text 3. Python is the most preferred language for ML tasks, and sublime text 3 is the code editor to write ML codes.
These days, the support of libraries and frameworks is easily accessible in machine learning. But in this article, we will implement a basic machine learning project without using frameworks like Scikit-learn, Keras, or Pytorch. We will use the NumPy library for numerical operations and Matplotlib to visualize the graphs.
Classification problems are among the most used problem statements in Machine Learning. We evaluate our classification models with available models using standard evaluation metrics like Confusion matrix, Accuracy, Precision, Recall, ROC. In this article, we will discuss some of the popular evaluation metrics used to evaluate the classification models.
When we build a solution for any regression problem, we compare its performance with the existing work using standard metrics, like measuring distance in meters, plot size in square feet, etc. Similarly, we need some standard evaluation metrics to evaluate two regression models. Some of them are MAE, MSE, RMSE, and R-Squared.
Naive Bayes is a popular supervised machine learning algorithm that predicts the categorical target variables. This algorithm makes some silly assumptions while making any predictions. But the most exciting thing is: It still performs better or equivalent to the best algorithms. So let's learn about this algorithm in greater detail.
K-Nearest Neighbor is a supervised learning algorithm that can be used to solve classification as well as regression problems. This algorithm learns without explicitly mapping input variables to the target variables. It is probably the first "machine learning" algorithm, and due to its simplicity, it is still accepted in solving many industrial problems.
The clustering technique is prevalent in many fields, so many algorithms exist to perform it. K-means is one of them! K-means is an unsupervised learning technique used to partition the data into pre-defined K distinct and non-overlapping partitions. These partitions are called clusters, and the value of K depends upon the user's choice.
Exploratory data analysis can be classified as Univariate, Bivariate, and Multivariate analysis. Univariate refers to the analysis involving a single variable; Bivariate refers to the analysis between two variables, and Multivariate refers to the statistical procedure for analyzing the data involving more than two variables.
Principle Component Analysis (PCA) is an unsupervised learning technique to reduce data dimensionality consisting of many inter-related attributes. The PCA algorithm transforms data attributes into a newer set of attributes called Principal Components (PCs).
A Decision Tree (DT) is a hierarchical breakdown of a dataset from the root node to the leaf node based on the attributes to solve a classification or regression problem. They are non-parametric supervised learning algorithms that predict a target variable's value by learning rules inferred from the data features.
Customer Segmentation is splitting the organization's customer base into smaller groups that reflect similarities in their behavior. It helps businesses develop customer-focused strategies, make segment-wise decisions, and maximize the value of each customer to the company. In this blog, we explore the potential of clustering algorithms to accomplish the above task.
SVM is one of the most popular algorithms in the domain of machine learning and data science. Since the discovery of this algorithm in the 1990s, it has been widely popular among experts. The idea behind this algorithm is very intuitive, and experts consider this one of the best “Out of the box” classifiers. In this article, we will try to develop the understanding of SVMs from a beginner level to an expert level.
In this blog, we will do hands-on on several data preprocessing techniques in machine learning like Feature Selection, Feature Quality Assessment, Feature Sampling, and Feature Reduction. We will use different datasets for demonstration and briefly discuss the intuition behind the methods.
In Machine Learning solutions, we need to have the most coordination between technology and business verticals. For any Machine Learning project from business experts, there are mainly seven different verticals or phases it has to pass. All of these seven verticals are mentioned in the image above.
In Machine Learning, Time Series Forecasting refers to the use of statistical models to predict future values using the previously recorded observations.
Time series data is found everywhere, and to perform the time series analysis, we must preprocess the data first. Time Series preprocessing techniques have a significant influence on data modelling accuracy.
Gradient descent in Machine Learning is one of the most basic cost optimization algorithms. Every interviewer expects you to know about it. This article has discussed how it helps us find the right set of parameters for learning in machine learning.
Bias, Variance, and Bias-Variance tradeoff are the most popular terms in machine learning and the most frequent questions asked in machine-learning interviews.
In earlier stages, machines might be making some mistakes and learning from several experiences. But how? Let’s move towards finding the answer to this question.
Computers only understand numbers, not text. So we need to convert our text into vectors using vector encoding.
Machine learning is the science of getting computers to act without being explicitly programmed. Here computer takes Data and Output as its input parameters and tries to produce the best suitable function that maps Data to Outputs. The machine learns a mapping function that maps the input data to the output using existing experiences.
Text data pre-processing ensures optimal results when executed properly. Fortunately, Python has excellent support of NLP libraries such as NLTK, spaCy, and Gensim to ease our text analysis.
Artificial Intelligence and Machine Learning are the most famous buzzwords in the technical industries. Generally we use them as synonyms but in actual it is not.
Logistic Regression is one of the most used machine learning algorithms in industry. It is a supervised learning algorithm where the target variable should be categorical, such as positive or negative, Type A, B, or C, etc. We can also say that it can only solve the classification problems. Although the name contains the term "regression", it is only used to solve the classification problem.
Linear Regression is a supervised machine learning algorithm used to solve regression problems.
Optimization of error function is the respiratory process for Machine learning algorithms. But this error function varies for classification and regression problems.
Regularization is the technique that is used to solve the problem of overfitting in machine learning. The gap between training and testing error become huge and model loose generalizability.
To learn a new subject, we should try to know how exactly that started. Every computer science field has a different history, reflecting the challenges that earlier researchers faced and making our journey easy. This article will discuss the 10 most interesting historical facts considered the turning points in AI and Machine Learning history.
Both classification and regression deal with the problem of mapping a function from input to output. However, when it comes to classification, the output is a discrete (non-continuous) class label or categorical output. While on the other hand, when the problem is a regression problem, the output is continuous.
Based on the nature of input that we provide to a machine learning algorithm, machine learning can be classified into 4 major categories - Supervised Learning, Unsupervised Learning, Semi-Supervised Learning, Reinforcement Learning.
In this article, we will try to find the answer to another most critical question in machine learning and artificial intelligence - How exactly the machine learns?
This is a glossary of Machine Learning terms commonly used in the industry. We will add more terms related to machine learning, data science, and artificial intelligence in the coming future. Meanwhile, if you want to suggest adding more terms, please let us know.
If we try to find the answer to different machine learning types, we will get different answers, like classification and regression, supervised and unsupervised, probabilistic and non-probabilistic, and many more.
We will answer the basic questions related to the fundamentals of machine learning: 1) What is Machine Learning? Then, why do we need Machine Learning? Finally, where can we use Machine Learning?
Subscribe to get free weekly content on data structure and algorithms, machine learning, system design, oops design and mathematics.