Multiprocessing in Python allows you to create multiple processes that run concurrently, enabling your programs to take full advantage of multiple CPU cores. It is particularly useful for CPU-bound tasks like data processing, mathematical computations, and machine learning workloads.
1. What is Multiprocessing?
Multiprocessing is a parallel execution technique in which multiple processes run independently, each with its own memory space. Unlike multithreading, multiprocessing avoids Python’s Global Interpreter Lock (GIL), allowing true parallel execution on multiple cores.
2. The multiprocessing
Module
Python provides the multiprocessing
module to create and manage processes. You can use it to execute functions in parallel.
Basic Syntax for Creating a Process:
from multiprocessing import Process def print_numbers(): for i in range(5): print(f"Number: {i}") if __name__ == "__main__": process = Process(target=print_numbers) process.start() process.join()
In this example, a new process is created to run the print_numbers
function in parallel with the main process.
3. Multiprocessing vs Multithreading
Feature | Multiprocessing | Multithreading |
---|---|---|
Execution | True parallelism on multiple CPU cores. | Concurrent execution but limited by GIL. |
Memory Usage | Separate memory for each process. | Shared memory space. |
Use Case | CPU-bound tasks (data processing). | I/O-bound tasks (file I/O, network requests). |
4. Example: Multiprocessing with Multiple Processes
from multiprocessing import Process import time def task(name): print(f"Task {name} is starting...") time.sleep(2) print(f"Task {name} is complete.") if __name__ == "__main__": processes = [] for i in range(3): process = Process(target=task, args=(f"Process-{i}",)) processes.append(process) process.start() for process in processes: process.join() print("All processes are complete.")
5. Sharing Data Between Processes
Since each process has its own memory space, you can use multiprocessing.Value
or multiprocessing.Array
to share data between processes.
Example Using Value
:
from multiprocessing import Process, Value import time def increment(counter): for _ in range(100000): counter.value += 1 if __name__ == "__main__": counter = Value('i', 0) # 'i' stands for integer process1 = Process(target=increment, args=(counter,)) process2 = Process(target=increment, args=(counter,)) process1.start() process2.start() process1.join() process2.join() print(f"Final Counter Value: {counter.value}")
6. Using multiprocessing.Pool
for Task Distribution
The Pool
class in the multiprocessing
module allows you to manage a pool of worker processes, making it easier to parallelize a function across multiple inputs.
Example Using Pool
:
from multiprocessing import Pool def square(n): return n * n if __name__ == "__main__": with Pool(4) as pool: numbers = [1, 2, 3, 4, 5] results = pool.map(square, numbers) print(f"Squared Results: {results}")
7. Handling Process Communication with Queue
Use the multiprocessing.Queue
class for safe communication between processes.
Example:
from multiprocessing import Process, Queue def producer(q): for i in range(5): q.put(i) print(f"Produced: {i}") def consumer(q): while not q.empty(): item = q.get() print(f"Consumed: {item}") if __name__ == "__main__": q = Queue() producer_process = Process(target=producer, args=(q,)) consumer_process = Process(target=consumer, args=(q,)) producer_process.start() producer_process.join() consumer_process.start() consumer_process.join()
Conclusion
Python’s multiprocessing
module is a powerful tool for parallelizing CPU-bound tasks and improving performance. Whether you need to perform complex computations or distribute work across multiple cores, multiprocessing
is the ideal solution.