Python Multiprocessing

Multiprocessing in Python allows you to create multiple processes that run concurrently, enabling your programs to take full advantage of multiple CPU cores. It is particularly useful for CPU-bound tasks like data processing, mathematical computations, and machine learning workloads.

1. What is Multiprocessing?

Multiprocessing is a parallel execution technique in which multiple processes run independently, each with its own memory space. Unlike multithreading, multiprocessing avoids Python’s Global Interpreter Lock (GIL), allowing true parallel execution on multiple cores.

2. The multiprocessing Module

Python provides the multiprocessing module to create and manage processes. You can use it to execute functions in parallel.

Basic Syntax for Creating a Process:

from multiprocessing import Process

def print_numbers():
    for i in range(5):
        print(f"Number: {i}")

if __name__ == "__main__":
    process = Process(target=print_numbers)
    process.start()
    process.join()

Try It Now

In this example, a new process is created to run the print_numbers function in parallel with the main process.

3. Multiprocessing vs Multithreading

Feature Multiprocessing Multithreading
Execution True parallelism on multiple CPU cores. Concurrent execution but limited by GIL.
Memory Usage Separate memory for each process. Shared memory space.
Use Case CPU-bound tasks (data processing). I/O-bound tasks (file I/O, network requests).

4. Example: Multiprocessing with Multiple Processes

from multiprocessing import Process
import time

def task(name):
    print(f"Task {name} is starting...")
    time.sleep(2)
    print(f"Task {name} is complete.")

if __name__ == "__main__":
    processes = []
    for i in range(3):
        process = Process(target=task, args=(f"Process-{i}",))
        processes.append(process)
        process.start()

    for process in processes:
        process.join()

    print("All processes are complete.")

Try It Now

5. Sharing Data Between Processes

Since each process has its own memory space, you can use multiprocessing.Value or multiprocessing.Array to share data between processes.

Example Using Value:

from multiprocessing import Process, Value
import time

def increment(counter):
    for _ in range(100000):
        counter.value += 1

if __name__ == "__main__":
    counter = Value('i', 0)  # 'i' stands for integer
    process1 = Process(target=increment, args=(counter,))
    process2 = Process(target=increment, args=(counter,))

    process1.start()
    process2.start()
    process1.join()
    process2.join()

    print(f"Final Counter Value: {counter.value}")

Try It Now

6. Using multiprocessing.Pool for Task Distribution

The Pool class in the multiprocessing module allows you to manage a pool of worker processes, making it easier to parallelize a function across multiple inputs.

Example Using Pool:

from multiprocessing import Pool

def square(n):
    return n * n

if __name__ == "__main__":
    with Pool(4) as pool:
        numbers = [1, 2, 3, 4, 5]
        results = pool.map(square, numbers)
    print(f"Squared Results: {results}")

Try It Now

7. Handling Process Communication with Queue

Use the multiprocessing.Queue class for safe communication between processes.

Example:

from multiprocessing import Process, Queue

def producer(q):
    for i in range(5):
        q.put(i)
        print(f"Produced: {i}")

def consumer(q):
    while not q.empty():
        item = q.get()
        print(f"Consumed: {item}")

if __name__ == "__main__":
    q = Queue()
    producer_process = Process(target=producer, args=(q,))
    consumer_process = Process(target=consumer, args=(q,))

    producer_process.start()
    producer_process.join()

    consumer_process.start()
    consumer_process.join()

Try It Now

Conclusion

Python’s multiprocessing module is a powerful tool for parallelizing CPU-bound tasks and improving performance. Whether you need to perform complex computations or distribute work across multiple cores, multiprocessing is the ideal solution.