Naive Bayes classifier is known as a simple Bayesian classification algorithm. It is called naive because it assumes that the features in a dataset are mutually independent.
The Naive Bayes classifier considers that the presence (or absence) of a particular feature(attribute) of a class is unrelated to the presence (or absence) of any other feature when the class variable is given. (Nikam, 2015)
Even if these features depend on each other or upon the existence of other features of a class, a naive Bayes classifier considers all of these properties to independently contribute to the probability
Naive Bayes has proven…
In classification tasks, the evaluation metric is used in two stages — training and testing. In the training stage we use evaluation metrics for model optimization, i.e. based on the results we can determine which model and model set up can produce more accurate prediction.
In the second stage — model testing — the evaluation metric is employed to evaluate the accuracy of predictions.
There are many evaluation metrics and each one has its benefits and drawbacks. Thus, selecting an appropriate evaluation metric that works for your problem can be difficult.
In this post I will cover:
As most business decisions are data-driven, hypothesis testing has become a key tool in making the right decisions. That is why it is critical to understand and apply it in the right context.
This article will cover the following topics::
A hypothesis is a statement about the value of a population parameter developed to test a theory or belief.
The result of the test allows us to interpret whether the assumption holds or whether the assumption has been violated.
The two common examples of tests are:
In this article, I would like to show you how to use the Lazy predict library to quickly fit and compare 30 machine learning models.
This library will allow you to quickly build machine learning models, either classification models, or regression models in only a few lines of code. What is more, you will be able to compare 20 to 30 machine learning algorithms.
The Lazy predict library was authored by Shankar Rao Pandala. You can check the documentation here.
To install the Lazy predict library you can use
Alternatively, Lazy Predict can be downloaded from the Github repo…
Social media platforms, online news portals, and other online media have become the main sources of news through which interesting and breaking news are shared at a rapid pace (Khan, J. Y., 2019).
However, many news portals serve special interest by feeding with distorted, partially correct, and sometimes imaginary news that is likely to attract the attention of a target group of people.
Fake news can be defined as a type of yellow journalism or propaganda that consists of deliberate misinformation or hoaxes spread via traditional print and broadcast news media or online social media (David, L., 2017)
There are…
Text pre-processing is an essential step of any NLP system, like the characters, words, and sentences identified at this stage are the fundamental units passed to all further processing stages.
Text pre-processing is the key part of text mining which is a process of extracting useful information from the textual data. Further, it is a necessary step to convert unstructured text data into structured form.
In this article we will cover the following text pre-processing steps:
Text tokenization can be defined as the process of splitting textual data into smaller meaningful components called…
Twitter is a popular microblogging service that allows users to share, deliver, and interpret real-time, short, and simple messages called tweets. That’s why Twitter provides a rich source of data that is used in the fields of opinion mining and sentiment analysis.
In this article, I will show you how to perform Twitter sentiment analysis with GloVe and LSTM. I will demonstrate end to end process covering the following data collection, test preprocessing, and sentiment classification.
Sentiment analysis (also known as opinion mining or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics…
In machine learning sometimes we have too many features on which the final classification is done. The higher the number of features, the harder it gets to work on it. Sometimes, some of these features are correlated, and hence redundant.
Dimensionality reduction, which extracts a small number of features by removing irrelevant, redundant, and noisy information, can be an effective solution.
The commonly used dimensionality reduction methods include supervised approaches such as linear discriminant analysis (LDA) and, unsupervised ones such as principal component analysis (PCA).
In this article, we will focus on LDA. Specifically, I will demonstrate how to use…
The term “ROC curve” is derived from the theory of signal detection, whose task is to distinguish an information signal (e.g. signals from electronic machinery/devices) from random patterns containing no information (noise, random activity).
The first use of the ROC curve dates back to the Second World War. After the attack on Pearl Harbor in 1941, the US began looking for a better method to analyze radar signals to increase the detectability of Japanese aircraft.
In this article I will explain:
Linear Discriminant Analysis (LDA) is most commonly used as a dimensionality reduction technique in the pre-processing step for pattern-classification and machine learning applications (Raschka, S., 2019).
However, LDA is not just a dimension reduction tool. It can be also used as robust classification method.
In this article, I will focus on Linear Discriminant Analysis for classification. First, I will introduce the LDA and explain why we use it for classification tasks instead of logistic regression.
After that, you will see how to use Linear Discriminant Analysis for classification in Python. …
Model Risk Manager @Nordea, Machine Learning Consultant, Connect: https://www.linkedin.com/in/kamil-polak/