๐น SVM is a supervised learning algorithm used for classification and regression tasks.
๐น It finds the optimal decision boundary (hyperplane) that separates classes in the best possible way.
๐น SVM is particularly effective in high-dimensional spaces and works well for small datasets.
Examples of SVM Applications
โ
Spam Detection (Spam / Not Spam)
โ
Face Recognition
โ
Medical Diagnosis (Cancer Detection)
โ
Stock Market Prediction
1. How Does SVM Work?
1๏ธโฃ Finding the Optimal Hyperplane
- A hyperplane is a decision boundary that separates different classes.
- The best hyperplane is the one that maximizes the margin (distance between the nearest data points of different classes).
- The nearest data points to the hyperplane are called Support Vectors.
2๏ธโฃ Handling Non-Linearly Separable Data
- If data is not linearly separable, SVM uses kernel functions to map data into a higher-dimensional space where it becomes linearly separable.
- Common kernel functions:
- Linear Kernel: Used when data is linearly separable.
- Polynomial Kernel: Maps data to a higher degree polynomial space.
- Radial Basis Function (RBF) Kernel: Commonly used for complex datasets.
- Sigmoid Kernel: Similar to a neural network activation function.
2. Mathematical Formulation of SVM
1. Hard Margin SVM (Linearly Separable Case)
1.1. Given a Dataset
Consider a binary classification problem with a dataset:
\( D = \{(x_i, y_i)\}_{i=1}^{n}, \quad y_i \in \{-1, +1\}, \quad x_i \in \mathbb{R}^d \)
1.2. Hyperplane Equation
A hyperplane is defined as:
\( w \cdot x + b = 0 \)
1.3. Margin Calculation
For a given data point \( (x_i, y_i) \), the functional margin is:
\( \gamma_i = y_i (w \cdot x_i + b) \)
To ensure correct classification, we require:
\( y_i (w \cdot x_i + b) \geq 1, \quad \forall i \)
The geometric margin is:
\( \frac{1}{\|w\|} \)
1.4. Optimization Problem
To maximize the margin, we minimize \( \frac{1}{2} \|w\|^2 \) while ensuring all data points are correctly classified:
\( \min_{w, b} \frac{1}{2} \|w\|^2 \)
subject to:
\( y_i (w \cdot x_i + b) \geq 1, \quad \forall i \)
2. Soft Margin SVM (Linearly Non-Separable Case)
When data is not perfectly separable, we introduce slack variables \( \xi_i \) to allow misclassification:
\( y_i (w \cdot x_i + b) \geq 1 – \xi_i, \quad \xi_i \geq 0 \)
3. Dual Formulation (Using Lagrange Multipliers)
We derive the dual problem:
\( \max_{\alpha} \sum_{i=1}^{n} \alpha_i – \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_i \alpha_j y_i y_j K(x_i, x_j) \)
subject to:
\( \sum_{i=1}^{n} \alpha_i y_i = 0, \quad 0 \leq \alpha_i \leq C \)
For non-linearly separable data, we use a kernel function \( K(x_i, x_j) \) to transform data into a higher-dimensional space where it becomes linearly separable.
Common Kernel Functions
- Linear Kernel: \( K(x_i, x_j) = x_i \cdot x_j \)
- Polynomial Kernel: \( K(x_i, x_j) = (x_i \cdot x_j + c)^d \)
- Radial Basis Function (RBF) Kernel: \( K(x_i, x_j) = \exp \left(-\frac{\|x_i – x_j\|^2}{2\sigma^2} \right) \)
- Sigmoid Kernel: \( K(x_i, x_j) = \tanh (\beta x_i \cdot x_j + c) \)
3. Implementing SVM in Python
Step 1: Install Required Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC, SVR
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import load_iris
4. SVM for Classification (Iris Dataset Example)
Step 1: Load Dataset
# Load dataset
iris = load_iris()
X = iris.data # Features
y = iris.target # Target classes
# Convert to DataFrame
df = pd.DataFrame(X, columns=iris.feature_names)
df['Target'] = y
# Display first 5 rows
print(df.head())
Step 2: Split Data into Training and Testing Sets
# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 3: Train the SVM Classifier
# Create SVM model with RBF kernel
svm_model = SVC(kernel='rbf', C=1.0, gamma='scale')
# Train the model
svm_model.fit(X_train, y_train)
Step 4: Make Predictions
# Predict on test data
y_pred = svm_model.predict(X_test)
Step 5: Evaluate Model Performance
# Accuracy Score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
# Classification Report
print("Classification Report:\n", classification_report(y_test, y_pred))
5. SVM for Regression (California Housing Dataset Example)
Step 1: Load Dataset
from sklearn.datasets import fetch_california_housing
# Load dataset
data = fetch_california_housing()
X = data.data
y = data.target
# Convert to DataFrame
df = pd.DataFrame(X, columns=data.feature_names)
df['Target'] = y
# Display first 5 rows
print(df.head())
Step 2: Split Data into Training and Testing Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 3: Train the SVM Regressor
# Create SVM Regressor model with RBF kernel
svm_regressor = SVR(kernel='rbf', C=1.0, gamma='scale')
# Train the model
svm_regressor.fit(X_train, y_train)
Step 4: Make Predictions
# Predict on test data
y_pred = svm_regressor.predict(X_test)
Step 5: Evaluate Model Performance
from sklearn.metrics import mean_squared_error
# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
6. Understanding the Output