Naive Bayes In Machine Learning ✔

Muhammad Taha
3 min read6 days ago

--

what is Naive Bayes? why do we use it? + some code snippets. Its advantages and disadvantages? How to know where to use Naive Bayes?

Naïve Bayes is a probabilistic machine learning algorithm based on Bayes’ Theorem. It assumes that the features are independent (which is often not true in real-world data, hence “naïve”). Despite this unrealistic assumption, it works surprisingly well in many scenarios, especially in text classification tasks like spam detection and sentiment analysis.

Why Use Naïve Bayes?

  1. Fast & Efficient — Works well even with large datasets.
  2. Requires Less Data — Performs well with limited training data.
  3. Performs Well with Categorical Data — Ideal for text classification, spam filtering, etc.
  4. Handles Missing Data — Can still work when some features are missing.

Types of Naïve Bayes Classifiers

  1. Gaussian Naïve Bayes — Used when features are continuous and assumed to be normally distributed.
  2. Multinomial Naïve Bayes — Used for text classification and discrete features.
  3. Bernoulli Naïve Bayes — Used for binary data (0s and 1s).

Code Snippets with Outputs

Let’s go through different implementations of Naïve Bayes.

1️⃣ Gaussian Naïve Bayes (For Continuous Data)

from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Split into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Naïve Bayes model
model = GaussianNB()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))

🔹 Output:

Accuracy: ~95% (depends on data split)

2️⃣ Multinomial Naïve Bayes (For Text Classification)

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

# Sample text data
X_text = ["This is a great movie", "Terrible film, I hate it", "Amazing storyline", "Horrible acting", "Loved it"]
y_labels = [1, 0, 1, 0, 1] # 1 = Positive, 0 = Negative

# Convert text to numerical data
vectorizer = CountVectorizer()
X_vec = vectorizer.fit_transform(X_text)

# Train model
model = MultinomialNB()
model.fit(X_vec, y_labels)

# Test with new text
test_text = ["Worst movie ever"]
test_vec = vectorizer.transform(test_text)
print("Prediction:", model.predict(test_vec)) # 0 = Negative

🔹 Output:

Prediction: [0]  # Negative sentiment

3️⃣ Bernoulli Naïve Bayes (For Binary Features)

from sklearn.naive_bayes import BernoulliNB
import numpy as np

# Binary dataset (spam detection example)
X = np.array([[1, 0, 1], [1, 1, 0], [0, 1, 1], [0, 0, 0]])
y = np.array([1, 1, 0, 0]) # 1 = Spam, 0 = Not Spam

# Train model
model = BernoulliNB()
model.fit(X, y)

# Predict on new data
test_email = np.array([[1, 0, 0]])
print("Spam Prediction:", model.predict(test_email))

🔹 Output:

Spam Prediction: [1]  # The email is classified as spam

Advantages & Disadvantages

✅ Advantages

  • Fast and efficient for large datasets.
  • Performs well with small datasets and high-dimensional data (like text).
  • Simple and easy to implement.
  • Works well with independent features.

❌ Disadvantages

  • Strong independence assumption (which is rarely true in real-world data).
  • Can be sensitive to noisy data.
  • Zero probability issue (if a feature value never appears in training, probability becomes 0).
  • Not suitable for complex relationships in data.

How to Know When to Use Naïve Bayes?

Text Classification: Spam filtering, sentiment analysis, topic categorization.
Medical Diagnosis: Probabilistic disease prediction.
Fraud Detection: Identifying fraudulent transactions.
Real-time Applications: Because it’s fast, it’s useful where quick responses are needed.

🚫 When NOT to Use It:

  • When features are heavily correlated (e.g., stock market prediction).
  • When complex relationships exist in the dataset.

Conclusion & Advice

💡 Naïve Bayes is a powerful, simple, and fast classification algorithm. It is especially useful when working with text classification tasks like spam filtering, sentiment analysis, and document classification.

However, due to its strong independence assumption, it may not work well when features are correlated. Always test your data before deciding to use it. If performance is poor, consider more advanced models like Random Forests or Neural Networks. 🚀

--

--

Muhammad Taha
Muhammad Taha

Written by Muhammad Taha

0 Followers

A Software Engineering student passionate about machine learning.

No responses yet