Naive Bayes Classifier (with examples)

Lea Setruk
4 min readMar 24, 2021

Introduction:

In machine learning, the purpose is to find certain types of patterns. To do so, you need to train a model over a dataset using algorithms so that the model can learn from those data. Once it is trained, the model can be used to predict on a different data that it hasn’t seen before.

Machine learning can be used for classification (Does the patient has diabetes?, Does this mail is a spam? etc.).

Naive Bayes Classifier is a machine learning model used for classification tasks.

The background you are required to have:

Probability distribution, density, events, independence between random variables, joint probability, conditional probability…

How does it work?

A classifier is a machine learning model that is used to classify different objects based on features. For example, we can classify an email by spam/not spam according to the words in it. Or, we can classify a document by its topic also according to its words.

Naive Bayes is a simple, yet important probabilistic model. It is based on the Bayes’ theorem.

Bayes’ Theorem:

This model is called ‘naïve’ because we naively assume independence between features given the class variable, regardless of any possible correlations.

An advantage of naive Bayes is that it only requires a small number of training data to estimate the parameters necessary for classification. It just needs enough data to understand the probabilistic relationship of each feature to the class variable. Indeed, in this model, interactions between features are ignored, so we don’t need examples of these interactions.

Classification process

Different types of Naive Bayes exist:

  • Gaussian Naive Bayes: When dealing with continuous data, with assumption that these values associated with each class are distributed according to a normal (Gaussian) distribution.
  • Multinomial Naive Bayes: Features represent the frequencies of events. This model is used for document classification, with events representing the occurrence of a word in a single document.
  • Bernoulli Naive Bayes: Features are independent Booleans (binary variables). Like the multinomial model, this model is mostly used for document classification tasks, where words’ occurrence is used rather than frequencies.

Example:

John is a student in computer science that loves listening to music. Sometimes, he has homework to do, it can be programming homework, or else. We have some examples of the type of music he listens to, according to some features.

Data

Assume you saw John in the morning, he had homework to do that doesn’t require programming. What kind of music would Naive Bayes predict?

We need to calculate

Using Bayes’ theorem:

With the independence assumption we obtain:

Now, we calculate these probabilities for both of the music types.

For MusicType = classical:

For MusicType = pop

Thus, we observe that it is more likely for John to listen to classical music in the morning, when he has a non programming homework to do.

For another well and clearly explained example, you can watch this video:

Conclusion:

Naive Bayes is a simple classifier model that is usually used for documents classification, spam filtering etc. Despite the fact that the independence assumptions are often inaccurate in reality, the naive Bayes classifier has properties that make it surprisingly useful in practice.

If you have further questions, you can contact me on LinkedIn (click here).

--

--