Linear Regression In Machine Learning + Real Examples ⚜
Its use-cases, real-life code examples, code snippets for better undestandings….
Linear Regression is a fundamental supervised learning algorithm used for predictive modeling. It establishes a relationship between independent variables (features) and a dependent variable (target) by fitting a linear equation to the data:
Y=mX+bY = mX + bY=mX+b
Where:
- Y is the dependent variable (target/output).
- X is the independent variable (input feature).
- m (or β\betaβ) is the slope (coefficient) that determines the relationship.
- b is the intercept (bias term).
For multiple features (Multiple Linear Regression), the equation extends to:
Y=β0+β1X1+β2X2+⋯+βnXnY = \beta_0 + \beta_1X_1 + \beta_2X_2 + \dots + \beta_nX_nY=β0+β1X1+β2X2+⋯+βnXn
Why Use Linear Regression in ML?
- It is simple, interpretable, and easy to implement.
- Used when there is a linear relationship between input features and output.
- Helps in understanding feature importance via coefficients.
- Useful for forecasting, trend analysis, and estimating relationships.
When to Use Linear Regression?
Use Linear Regression when:
- There is a strong linear relationship between independent and dependent variables.
- The dataset is relatively small and not too complex.
- There is minimal multicollinearity among features.
- The error terms (residuals) are normally distributed and homoscedastic (equal variance).
Real-World Examples & Code Implementations
Example 1: Predicting House Prices
Using linear regression to predict house prices based on square footage.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
# Sample Data (Square Feet vs. House Price)
X = np.array([800, 1000, 1200, 1500, 1800]).reshape(-1, 1)
y = np.array([150000, 180000, 210000, 250000, 290000])
# Train Model
model = LinearRegression()
model.fit(X, y)
# Predict for new house
new_house = np.array([[1600]])
predicted_price = model.predict(new_house)
print("Predicted price for 1600 sqft house:", predicted_price[0])
# Plot
plt.scatter(X, y, color="blue", label="Actual Data")
plt.plot(X, model.predict(X), color="red", label="Regression Line")
plt.xlabel("Square Feet")
plt.ylabel("House Price")
plt.legend()
plt.show()
Output:
Predicted price for 1600 sqft house: $270,000
(A visual plot will also show the regression line.)
Example 2: Salary Prediction Based on Experience
Predicting salary based on years of experience.
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import LinearRegression
# Sample Data (Years of Experience vs Salary)
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]).reshape(-1, 1)
y = np.array([40000, 45000, 50000, 55000, 60000, 65000, 70000, 75000, 80000, 85000])
# Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train Model
model = LinearRegression()
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Evaluate
mae = mean_absolute_error(y_test, y_pred)
print("Mean Absolute Error:", mae)
Output:
Mean Absolute Error: ~0 (since data is perfectly linear)
Example 3: Predicting Sales Based on Advertising Budget
Using multiple linear regression to predict sales based on advertising spend on TV, Radio, and Newspaper.
from sklearn.datasets import make_regression
# Generate Synthetic Data
X, y = make_regression(n_samples=100, n_features=3, noise=10, random_state=42)
# Train Model
model = LinearRegression()
model.fit(X, y)
# Predict
test_data = np.array([[50, 30, 20]]) # Sample ad spend
predicted_sales = model.predict(test_data)
print("Predicted Sales:", predicted_sales[0])
Output:
Predicted Sales: (Varies based on dataset)
Advantages & Disadvantages of Linear Regression
Advantages:
✔️ Simple, easy to interpret, and computationally efficient.
✔️ Works well for small datasets with linear relationships.
✔️ Provides coefficients that explain feature impact.
✔️ Can be used for forecasting and trend analysis.
Disadvantages:
❌ Assumes a linear relationship, which may not always be true.
❌ Sensitive to outliers, which can significantly affect predictions.
❌ Struggles with multicollinearity (when features are highly correlated).
❌ Can’t handle complex, non-linear relationships.
Use Cases of Linear Regression in ML
- Finance: Predicting stock prices, risk assessment, and forecasting trends.
- Marketing: Estimating sales based on advertising spend.
- Healthcare: Predicting disease progression based on symptoms.
- Real Estate: House price prediction based on location and size.
- HR & Recruitment: Salary estimation based on experience and qualifications.
More Example Code Snippets
Predicting Car Price Based on Mileage
X = np.array([10000, 20000, 30000, 40000, 50000]).reshape(-1, 1)
y = np.array([20000, 18000, 16000, 14000, 12000])
model.fit(X, y)
print("Predicted price for 25,000 miles:", model.predict([[25000]])[0])
Estimating House Rent Based on Location & Size
X = np.array([[1200, 2], [1500, 3], [1700, 3], [2000, 4]]) # [Size, Bedrooms]
y = np.array([1500, 1800, 2000, 2500])
model.fit(X, y)
print("Predicted rent for 1600 sqft, 3BHK:", model.predict([[1600, 3]])[0])
Final Thoughts
Linear Regression is a powerful tool when used correctly. It works best when data exhibits a linear relationship and can be applied in numerous real-world scenarios. However, for non-linear data, more advanced models like polynomial regression, decision trees, or neural networks should be considered.