Logistic Regression + Real Solved Examples 💻

Muhammad Taha
4 min readFeb 23, 2025

--

Its use-cases, real code examples + code snippets for better understandings…

Logistic Regression is a supervised learning algorithm used for classification tasks. Unlike Linear Regression, which predicts continuous values, Logistic Regression predicts probabilities and classifies data into two or more categories.

It uses the sigmoid function to map predicted values between 0 and 1:

P(Y=1)=11+e−(β0+β1X1+β2X2+…+βnXn)P(Y=1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1X_1 + \beta_2X_2 + … + \beta_nX_n)}}P(Y=1)=1+e−(β0​+β1​X1​+β2​X2​+…+βn​Xn​)1​

  • If P(Y) > 0.5, classify as 1
  • If P(Y) < 0.5, classify as 0

For multi-class classification, it extends to softmax regression (multi-logistic regression).

Why Use Logistic Regression in ML?

  • Used for binary (Yes/No, Spam/Not Spam) and multi-class classification.
  • Provides probability estimates, which are useful in decision-making.
  • Efficient, interpretable, and works well with small datasets.
  • Requires less computational power than deep learning models.

When to Use Logistic Regression?

Use Logistic Regression when:
✔ The dependent variable is categorical (e.g., spam/not spam, pass/fail).
✔ The features and target have a non-linear relationship, but can be transformed.
✔ The dataset is small and interpretable.
✔ The output can be represented as probabilities.

Real-World Examples & Code Implementations

Example 1: Predicting if a Student Passes Based on Study Hours

Problem: Given study hours, predict if a student will pass (1) or fail (0).

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression

# Sample Data (Study Hours vs. Pass/Fail)
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]).reshape(-1, 1)
y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1]) # 0 = Fail, 1 = Pass

# Train Model
model = LogisticRegression()
model.fit(X, y)

# Predict for 4.5 hours of study
new_student = np.array([[4.5]])
prediction = model.predict(new_student)
print("Prediction for 4.5 hours of study:", "Pass" if prediction[0] == 1 else "Fail")

# Plot
plt.scatter(X, y, color="blue", label="Actual Data")
plt.plot(X, model.predict_proba(X)[:, 1], color="red", label="Probability Curve")
plt.xlabel("Study Hours")
plt.ylabel("Pass Probability")
plt.legend()
plt.show()

Output:
Prediction for 4.5 hours of study: Pass
(A probability curve will also be plotted.)

Example 2: Email Spam Detection

Problem: Predict if an email is spam (1) or not (0) based on word frequency.

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import accuracy_score

# Sample Data (Emails)
emails = ["Win a free iPhone", "Important meeting tomorrow",
"Claim your lottery prize", "Lunch with boss", "Buy cheap meds online"]
labels = [1, 0, 1, 0, 1] # 1 = Spam, 0 = Not Spam

# Convert text to numerical data
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(emails)

# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.2, random_state=42)

# Train Model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

Output:
Accuracy: 100% (since data is small and simple)

Example 3: Predicting Diabetes Based on BMI and Age

Using logistic regression to classify patients as diabetic (1) or not (0).

from sklearn.datasets import make_classification

# Generate Synthetic Data
X, y = make_classification(n_samples=100, n_features=2, n_classes=2, random_state=42)

# Train Model
model = LogisticRegression()
model.fit(X, y)

# Predict for a new patient
new_patient = np.array([[30, 25]]) # [BMI, Age]
prediction = model.predict(new_patient)
print("Diabetes Prediction:", "Diabetic" if prediction[0] == 1 else "Not Diabetic")

Output:
Diabetes Prediction: Diabetic/Not Diabetic (varies)

Advantages & Disadvantages of Logistic Regression

Advantages:

Easy to interpret and implement.
Computationally efficient for small datasets.
Works well with linearly separable data.
Gives probability outputs for classification confidence.

Disadvantages:

Assumes a linear decision boundary, which may not work for complex problems.
Struggles with imbalanced data (e.g., 95% “No” and 5% “Yes” cases).
Not suitable for large datasets (deep learning performs better).
Sensitive to outliers.

Where is Logistic Regression Used in ML?

Healthcare: Disease prediction (diabetes, cancer).
Finance: Credit risk assessment (loan approval).
Marketing: Customer churn prediction.
Email Filtering: Spam detection.
HR & Recruitment: Employee attrition prediction.

More Example Code Snippets

Predicting Heart Disease

X = np.array([[45, 1], [50, 0], [55, 1], [60, 1], [65, 0]])  # [Age, High BP]
y = np.array([0, 1, 1, 1, 0]) # 0 = No Disease, 1 = Heart Disease

model.fit(X, y)
print("Prediction for 58-year-old with high BP:", model.predict([[58, 1]])[0])

Predicting Loan Approval

X = np.array([[500, 2000], [600, 3000], [700, 4000], [800, 5000]])  # [Credit Score, Income]
y = np.array([0, 0, 1, 1]) # 0 = Denied, 1 = Approved

model.fit(X, y)
print("Loan approval for 650 credit score & 3500 income:", model.predict([[650, 3500]])[0])

Final Thoughts

Logistic Regression is an excellent choice for binary and multi-class classification problems, especially when interpretability is crucial. However, for highly complex or large datasets, advanced models like decision trees, SVMs, or deep learning may be more effective. 🚀

--

--

Muhammad Taha
Muhammad Taha

Written by Muhammad Taha

0 Followers

A Software Engineering student passionate about machine learning.

No responses yet