In this article, we will learn about methods used for scaling different attributes present in our data. Normalization and Standardization are two most used techniques available for scaling features and bring them on same range.It avoids the cases of biases towards higher or lower magnitude features.
In machine learning, anomaly detection is a process of finding samples behaving abnormally compared to the majority of samples present in the dataset. Anomaly detection algorithms have important use-cases in data analytics and data science fields. For example, fraud analysts use anomaly detection algorithms to detect fraud transactions.
Companies are collecting tons of data, and the need for processed data is increasing. In this blog, we will do hands-on on several data preprocessing techniques in machine learning, like feature selection, feature quality assessment, feature sampling, and feature reduction. We will use different datasets for demonstrating data preprocessing methods.
Time Series Preprocessing techniques have a significant influence on data modeling accuracy. In this blog, we have discussed: 1) Definition of time-series data and its importance. 2) Preprocessing steps for time series data 3) Structuring time-series data, finding the missing values, denoising the features, and finding the outliers present in the dataset.
Unlike humans, machines don’t understand words and their semantic context. So, we convert processed text into a format that the machine can understand using vector encoding. In this blog, we will learn: 1) Word embedding 2) Techniques to embed words (One-hot encoding, Word2Vec, TF-IDF, etc) 3) Implementation of all these embeddings.
We need to clean the text data before feeding it to machine learning algorithms. Fortunately, Python has excellent support for NLP libraries (NLTK, spaCyto) to ease text analysis. In this blog, we will learn: 1) Real-time working on the sentiment analysis dataset 2) Techniques for cleaning text data. 3) Exploratory analysis of text data.
Subscribe to get well designed content on data structure and algorithms, machine learning, system design, object orientd programming and math.