Multiprocessing in Python allows you to create multiple processes that run concurrently, enabling your programs to take full advantage of multiple CPU cores. It is particularly useful for CPU-bound tasks like data processing, mathematical computations, and machine learning workloads.
1. What is Multiprocessing?
Multiprocessing is a parallel execution technique in which multiple processes run independently, each with its own memory space. Unlike multithreading, multiprocessing avoids Python’s Global Interpreter Lock (GIL), allowing true parallel execution on multiple cores.
2. The multiprocessing Module
Python provides the multiprocessing module to create and manage processes. You can use it to execute functions in parallel.
Basic Syntax for Creating a Process:
from multiprocessing import Process
def print_numbers():
for i in range(5):
print(f"Number: {i}")
if __name__ == "__main__":
process = Process(target=print_numbers)
process.start()
process.join()
In this example, a new process is created to run the print_numbers function in parallel with the main process.
3. Multiprocessing vs Multithreading
| Feature | Multiprocessing | Multithreading |
|---|---|---|
| Execution | True parallelism on multiple CPU cores. | Concurrent execution but limited by GIL. |
| Memory Usage | Separate memory for each process. | Shared memory space. |
| Use Case | CPU-bound tasks (data processing). | I/O-bound tasks (file I/O, network requests). |
4. Example: Multiprocessing with Multiple Processes
from multiprocessing import Process
import time
def task(name):
print(f"Task {name} is starting...")
time.sleep(2)
print(f"Task {name} is complete.")
if __name__ == "__main__":
processes = []
for i in range(3):
process = Process(target=task, args=(f"Process-{i}",))
processes.append(process)
process.start()
for process in processes:
process.join()
print("All processes are complete.")
5. Sharing Data Between Processes
Since each process has its own memory space, you can use multiprocessing.Value or multiprocessing.Array to share data between processes.
Example Using Value:
from multiprocessing import Process, Value
import time
def increment(counter):
for _ in range(100000):
counter.value += 1
if __name__ == "__main__":
counter = Value('i', 0) # 'i' stands for integer
process1 = Process(target=increment, args=(counter,))
process2 = Process(target=increment, args=(counter,))
process1.start()
process2.start()
process1.join()
process2.join()
print(f"Final Counter Value: {counter.value}")
6. Using multiprocessing.Pool for Task Distribution
The Pool class in the multiprocessing module allows you to manage a pool of worker processes, making it easier to parallelize a function across multiple inputs.
Example Using Pool:
from multiprocessing import Pool
def square(n):
return n * n
if __name__ == "__main__":
with Pool(4) as pool:
numbers = [1, 2, 3, 4, 5]
results = pool.map(square, numbers)
print(f"Squared Results: {results}")
7. Handling Process Communication with Queue
Use the multiprocessing.Queue class for safe communication between processes.
Example:
from multiprocessing import Process, Queue
def producer(q):
for i in range(5):
q.put(i)
print(f"Produced: {i}")
def consumer(q):
while not q.empty():
item = q.get()
print(f"Consumed: {item}")
if __name__ == "__main__":
q = Queue()
producer_process = Process(target=producer, args=(q,))
consumer_process = Process(target=consumer, args=(q,))
producer_process.start()
producer_process.join()
consumer_process.start()
consumer_process.join()
Conclusion
Python’s multiprocessing module is a powerful tool for parallelizing CPU-bound tasks and improving performance. Whether you need to perform complex computations or distribute work across multiple cores, multiprocessing is the ideal solution.