Linear Regression is one of the most fundamental and widely used algorithms in Machine Learning and Data Science. It helps to model relationships between variables and make predictions.
In this tutorial, we will cover:
✅ What is Linear Regression?
✅ Understanding the Mathematical Formula
✅ Python Implementation (Using sklearn
)
✅ Evaluating Model Performance
1. What is Linear Regression?
Linear Regression is a Supervised Learning Algorithm used for predicting continuous values. It assumes a linear relationship between the input (X) and output (Y).
Examples of Linear Regression Applications:
- Predicting house prices based on size.
- Forecasting sales revenue based on past data.
- Estimating salary based on years of experience.
2. Mathematical Formula of Linear Regression
The equation of a straight line: Y = mX + c
Where:
- \( Y \) → Dependent variable (Prediction)
- \( X \) → Independent variable (Feature)
- \( m \) → Slope (Coefficient)
- \( c \) → Intercept (Constant)
Cost Function (Mean Squared Error – MSE)
We minimize the error between predicted and actual values using:
\( \text{MSE} = \frac{1}{n} \sum (Y_{\text{actual}} – Y_{\text{predicted}})^2 \)
- \( \text{MSE} \): Mean Squared Error
- \( n \): Number of observations
- \( Y_{\text{actual}} \): Actual observed values
- \( Y_{\text{predicted}} \) : Predicted values from the model
- \( ∑ \) : Summation over all observations
3. Linear Regression Implementation in Python
Step 1: Install Required Libraries
import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error, r2_score
Step 2: Create a Sample Dataset
# Sample dataset: House sizes (sq ft) and corresponding prices ($1000s) X = np.array([500, 800, 1000, 1500, 1800, 2000, 2500, 3000, 3500, 4000]).reshape(-1, 1) y = np.array([150, 200, 250, 300, 350, 400, 450, 500, 550, 600])
Step 3: Split Data into Training and Testing Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
- 80% data for training
- 20% data for testing
Step 4: Train the Linear Regression Model
# Create a Linear Regression model model = LinearRegression() # Train the model using training data model.fit(X_train, y_train)
Step 5: Model Predictions
y_pred = model.predict(X_test)
y_pred
contains the predicted house prices.
Step 6: Evaluate Model Performance
# Print model parameters print("Slope (m):", model.coef_[0]) print("Intercept (c):", model.intercept_) # Calculate Mean Squared Error mse = mean_squared_error(y_test, y_pred) print("Mean Squared Error:", mse) # Calculate R-Squared Score r2 = r2_score(y_test, y_pred) print("R-Squared Score:", r2)
- R² Score close to 1 means a good fit.
Step 7: Visualizing the Regression Line
plt.scatter(X, y, color="blue", label="Actual Prices") plt.plot(X, model.predict(X), color="red", label="Regression Line") plt.xlabel("House Size (sq ft)") plt.ylabel("Price ($1000s)") plt.title("House Price Prediction using Linear Regression") plt.legend() plt.show()
4. Understanding the Output
- Regression Line: Shows the best-fit line for the data.
- MSE (Mean Squared Error): Measures how well the model predicts (lower is better).
- R² Score: Measures how much variance is explained by the model (higher is better).
Summary
✔ Linear Regression predicts continuous values using a straight-line equation.
✔ Training involves minimizing the error between actual and predicted values.
✔ Python’s sklearn
library makes implementation easy.
✔ Visualizing results helps understand the model’s accuracy.