Data Science Support Vector Machine SVM

🔹 SVM is a supervised learning algorithm used for classification and regression tasks.
🔹 It finds the optimal decision boundary (hyperplane) that separates classes in the best possible way.
🔹 SVM is particularly effective in high-dimensional spaces and works well for small datasets.

Examples of SVM Applications

✅ Spam Detection (Spam / Not Spam)
✅ Face Recognition
✅ Medical Diagnosis (Cancer Detection)
✅ Stock Market Prediction

1. How Does SVM Work?

1️⃣ Finding the Optimal Hyperplane

A hyperplane is a decision boundary that separates different classes.
The best hyperplane is the one that maximizes the margin (distance between the nearest data points of different classes).
The nearest data points to the hyperplane are called Support Vectors.

2️⃣ Handling Non-Linearly Separable Data

If data is not linearly separable, SVM uses kernel functions to map data into a higher-dimensional space where it becomes linearly separable.
Common kernel functions:
- Linear Kernel: Used when data is linearly separable.
- Polynomial Kernel: Maps data to a higher degree polynomial space.
- Radial Basis Function (RBF) Kernel: Commonly used for complex datasets.
- Sigmoid Kernel: Similar to a neural network activation function.

2. Mathematical Formulation of SVM

1. Hard Margin SVM (Linearly Separable Case)

1.1. Given a Dataset

Consider a binary classification problem with a dataset:

\( D = \{(x_i, y_i)\}_{i=1}^{n}, \quad y_i \in \{-1, +1\}, \quad x_i \in \mathbb{R}^d \)

1.2. Hyperplane Equation

A hyperplane is defined as:

\( w \cdot x + b = 0 \)

1.3. Margin Calculation

For a given data point \( (x_i, y_i) \), the functional margin is:

\( \gamma_i = y_i (w \cdot x_i + b) \)

To ensure correct classification, we require:

\( y_i (w \cdot x_i + b) \geq 1, \quad \forall i \)

The geometric margin is:

\( \frac{1}{\|w\|} \)

1.4. Optimization Problem

To maximize the margin, we minimize \( \frac{1}{2} \|w\|^2 \) while ensuring all data points are correctly classified:

\( \min_{w, b} \frac{1}{2} \|w\|^2 \)

subject to:

\( y_i (w \cdot x_i + b) \geq 1, \quad \forall i \)

2. Soft Margin SVM (Linearly Non-Separable Case)

When data is not perfectly separable, we introduce slack variables \( \xi_i \) to allow misclassification:

\( y_i (w \cdot x_i + b) \geq 1 – \xi_i, \quad \xi_i \geq 0 \)

3. Dual Formulation (Using Lagrange Multipliers)

We derive the dual problem:

\( \max_{\alpha} \sum_{i=1}^{n} \alpha_i – \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_i \alpha_j y_i y_j K(x_i, x_j) \)

subject to:

\( \sum_{i=1}^{n} \alpha_i y_i = 0, \quad 0 \leq \alpha_i \leq C \)

4. Kernel Trick for Non-Linear SVM

For non-linearly separable data, we use a kernel function \( K(x_i, x_j) \) to transform data into a higher-dimensional space where it becomes linearly separable.

Common Kernel Functions

Linear Kernel: \( K(x_i, x_j) = x_i \cdot x_j \)
Polynomial Kernel: \( K(x_i, x_j) = (x_i \cdot x_j + c)^d \)
Radial Basis Function (RBF) Kernel: \( K(x_i, x_j) = \exp \left(-\frac{\|x_i – x_j\|^2}{2\sigma^2} \right) \)
Sigmoid Kernel: \( K(x_i, x_j) = \tanh (\beta x_i \cdot x_j + c) \)

3. Implementing SVM in Python

Step 1: Install Required Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC, SVR
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import load_iris

4. SVM for Classification (Iris Dataset Example)

Step 1: Load Dataset

# Load dataset
iris = load_iris()
X = iris.data  # Features
y = iris.target  # Target classes

# Convert to DataFrame
df = pd.DataFrame(X, columns=iris.feature_names)
df['Target'] = y

# Display first 5 rows
print(df.head())

Step 2: Split Data into Training and Testing Sets

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 3: Train the SVM Classifier

# Create SVM model with RBF kernel
svm_model = SVC(kernel='rbf', C=1.0, gamma='scale')

# Train the model
svm_model.fit(X_train, y_train)

Step 4: Make Predictions

# Predict on test data
y_pred = svm_model.predict(X_test)

Step 5: Evaluate Model Performance

# Accuracy Score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Classification Report
print("Classification Report:\n", classification_report(y_test, y_pred))

5. SVM for Regression (California Housing Dataset Example)

Step 1: Load Dataset

from sklearn.datasets import fetch_california_housing

# Load dataset
data = fetch_california_housing()
X = data.data
y = data.target

# Convert to DataFrame
df = pd.DataFrame(X, columns=data.feature_names)
df['Target'] = y

# Display first 5 rows
print(df.head())

Step 2: Split Data into Training and Testing Sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 3: Train the SVM Regressor

# Create SVM Regressor model with RBF kernel
svm_regressor = SVR(kernel='rbf', C=1.0, gamma='scale')

# Train the model
svm_regressor.fit(X_train, y_train)

Step 4: Make Predictions

# Predict on test data
y_pred = svm_regressor.predict(X_test)

Step 5: Evaluate Model Performance

from sklearn.metrics import mean_squared_error

# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

6. Understanding the Output

🔹 Accuracy Score → Percentage of correctly classified instances.
🔹 Classification Report → Precision, Recall, F1-score for each class.
🔹 Mean Squared Error (Regression) → Measures how well the model predicts continuous values.

7. Choosing the Right Kernel

Kernel	When to Use
Linear Kernel	When data is linearly separable
Polynomial Kernel	When data has polynomial relationships
RBF Kernel	When data is not linearly separable
Sigmoid Kernel	When data has similarity-based relationships

8. Advantages & Disadvantages of SVM

✅ Advantages

✔ Works well for small datasets.
✔ Effective in high-dimensional spaces.
✔ Robust to outliers (with soft margin tuning).

❌ Disadvantages

❌ Slow for large datasets.
❌ Choosing the right kernel is tricky.
❌ Difficult to interpret results compared to Decision Trees.

Summary

✔ SVM finds the best hyperplane that separates classes with the maximum margin.
✔ It uses kernel functions (Linear, RBF, Polynomial) to handle non-linear data.
✔ SVM can be used for both classification (SVC) and regression (SVR).
✔ It works best for small-to-medium-sized datasets.