Data Sceince Introduction to ML

Machine Learning (ML) is a core part of Data Science, enabling computers to learn from data and make predictions without being explicitly programmed. This tutorial introduces ML concepts and includes a simple Python example.

1. What is Machine Learning?

Machine Learning is a branch of Artificial Intelligence (AI) that focuses on creating systems that can learn from data and improve performance over time.

Types of Machine Learning

  1. Supervised Learning
    • The model learns from labeled data (input-output pairs).
    • Example: Spam detection (emails labeled as spam or not).
    • Algorithms: Linear Regression, Decision Trees, Random Forest, SVM, Neural Networks.
  2. Unsupervised Learning
    • The model learns from unlabeled data (finds patterns).
    • Example: Customer segmentation (grouping similar customers).
    • Algorithms: K-Means, DBSCAN, PCA, Hierarchical Clustering.
  3. Reinforcement Learning
    • The model learns by trial and error to maximize rewards.
    • Example: Self-driving cars, AlphaGo AI playing chess.
    • Algorithms: Q-Learning, Deep Q-Networks (DQN), Policy Gradient.

2. Machine Learning Workflow

  1. Data Collection โ†’ Gather data from sources (CSV, database, APIs).
  2. Data Preprocessing โ†’ Clean missing values, normalize, and encode.
  3. Exploratory Data Analysis (EDA) โ†’ Understand trends and distributions.
  4. Model Selection โ†’ Choose the best algorithm (Linear Regression, Decision Trees, etc.).
  5. Training the Model โ†’ Feed the data to learn patterns.
  6. Model Evaluation โ†’ Test the model with unseen data.
  7. Model Deployment โ†’ Deploy the model for real-world predictions.

3. Simple Machine Learning Example in Python

Predict House Prices Using Linear Regression

Problem Statement:
Predict house prices based on the size of the house (sq ft).

Step 1: Import Required Libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

Try It Now

Step 2: Create Sample Data

# House sizes (square feet)
X = np.array([500, 800, 1000, 1500, 1800, 2000, 2500, 3000, 3500, 4000]).reshape(-1, 1)

# House prices (in $1000s)
y = np.array([150, 200, 250, 300, 350, 400, 450, 500, 550, 600])

Try It Now

Step 3: Split Data into Training & Testing Sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Try It Now

Step 4: Train the Machine Learning Model

# Create a Linear Regression model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

Try It Now

Step 5: Make Predictions

y_pred = model.predict(X_test)

# Print predicted values
print("Predicted House Prices:", y_pred)

Try It Now

Step 6: Evaluate the Model

mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

# Model performance (higher is better)
r2_score = model.score(X_test, y_test)
print("R-squared Score:", r2_score)

Try It Now

Step 7: Visualize the Regression Line

plt.scatter(X, y, color="blue", label="Actual Prices")
plt.plot(X, model.predict(X), color="red", label="Regression Line")
plt.xlabel("House Size (sq ft)")
plt.ylabel("Price ($1000s)")
plt.title("House Price Prediction")
plt.legend()
plt.show()

Try It Now

4. Understanding the Output

  • Mean Squared Error (MSE) โ†’ Measures error (lower is better).
  • R-squared Score โ†’ Measures how well the model fits data (closer to 1 is better).
  • Graph โ†’ Shows a red best-fit line predicting house prices.

Summary

โœ” Machine Learning helps computers learn patterns from data.
โœ” Supervised Learning is the most common type (predicting prices, spam detection).
โœ” Linear Regression is a simple model for predicting continuous values.
โœ” Scikit-learn (sklearn) is a popular Python library for ML.