Machine Learning (ML) is a core part of Data Science, enabling computers to learn from data and make predictions without being explicitly programmed. This tutorial introduces ML concepts and includes a simple Python example.
1. What is Machine Learning?
Machine Learning is a branch of Artificial Intelligence (AI) that focuses on creating systems that can learn from data and improve performance over time.
Types of Machine Learning
- Supervised Learning
- The model learns from labeled data (input-output pairs).
- Example: Spam detection (emails labeled as spam or not).
- Algorithms: Linear Regression, Decision Trees, Random Forest, SVM, Neural Networks.
- Unsupervised Learning
- The model learns from unlabeled data (finds patterns).
- Example: Customer segmentation (grouping similar customers).
- Algorithms: K-Means, DBSCAN, PCA, Hierarchical Clustering.
- Reinforcement Learning
- The model learns by trial and error to maximize rewards.
- Example: Self-driving cars, AlphaGo AI playing chess.
- Algorithms: Q-Learning, Deep Q-Networks (DQN), Policy Gradient.
2. Machine Learning Workflow
- Data Collection โ Gather data from sources (CSV, database, APIs).
- Data Preprocessing โ Clean missing values, normalize, and encode.
- Exploratory Data Analysis (EDA) โ Understand trends and distributions.
- Model Selection โ Choose the best algorithm (Linear Regression, Decision Trees, etc.).
- Training the Model โ Feed the data to learn patterns.
- Model Evaluation โ Test the model with unseen data.
- Model Deployment โ Deploy the model for real-world predictions.
3. Simple Machine Learning Example in Python
Predict House Prices Using Linear Regression
Problem Statement:
Predict house prices based on the size of the house (sq ft).
Step 1: Import Required Libraries
import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error
Step 2: Create Sample Data
# House sizes (square feet) X = np.array([500, 800, 1000, 1500, 1800, 2000, 2500, 3000, 3500, 4000]).reshape(-1, 1) # House prices (in $1000s) y = np.array([150, 200, 250, 300, 350, 400, 450, 500, 550, 600])
Step 3: Split Data into Training & Testing Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 4: Train the Machine Learning Model
# Create a Linear Regression model model = LinearRegression() # Train the model model.fit(X_train, y_train)
Step 5: Make Predictions
y_pred = model.predict(X_test) # Print predicted values print("Predicted House Prices:", y_pred)
Step 6: Evaluate the Model
mse = mean_squared_error(y_test, y_pred) print("Mean Squared Error:", mse) # Model performance (higher is better) r2_score = model.score(X_test, y_test) print("R-squared Score:", r2_score)
Step 7: Visualize the Regression Line
plt.scatter(X, y, color="blue", label="Actual Prices") plt.plot(X, model.predict(X), color="red", label="Regression Line") plt.xlabel("House Size (sq ft)") plt.ylabel("Price ($1000s)") plt.title("House Price Prediction") plt.legend() plt.show()
4. Understanding the Output
- Mean Squared Error (MSE) โ Measures error (lower is better).
- R-squared Score โ Measures how well the model fits data (closer to 1 is better).
- Graph โ Shows a red best-fit line predicting house prices.
Summary
โ Machine Learning helps computers learn patterns from data.
โ Supervised Learning is the most common type (predicting prices, spam detection).
โ Linear Regression is a simple model for predicting continuous values.
โ Scikit-learn (sklearn
) is a popular Python library for ML.