Multiprocess N files: Pool
- multiprocess
- Pool
- map
In this example we "analyze" files by counting how many characters they have, how many digits, and how many spaces.
Analyze N files in parallel.
import multiprocessing as mp
import os
import sys
def analyze(filename):
print("Process {:>5} analyzing {}".format(os.getpid(), filename))
digits = 0
letters = 0
spaces = 0
other = 0
total = 0
with open(filename) as fh:
for line in fh:
for char in line:
total += 1
if char.isdigit():
digits += 1
break
if char.isalnum():
letters += 1
break
if char == ' ':
spaces += 1
break
other += 1
return {
'filename': filename,
'total': total,
'digits': digits,
'spaces': spaces,
'letters': letters,
'other': other,
}
def main():
if len(sys.argv) < 3:
exit(f"Usage: {sys.argv[0]} POOL_SIZE FILEs")
size = int(sys.argv[1])
files = sys.argv[2:]
with mp.Pool(size) as pool:
results = pool.map(analyze, files)
for res in results:
print(res)
if __name__ == '__main__':
main()
$ python multiprocess_files.py 3 multiprocess_*.py
Process 12093 analyzing multiprocess_files.py
Process 12093 analyzing multiprocess_pool_async.py
Process 12095 analyzing multiprocess_load.py
Process 12094 analyzing multiprocessing_and_logging.py
Process 12094 analyzing multiprocess_pool.py
{'filename': 'multiprocess_files.py', 'total': 47, 'digits': 0, 'spaces': 37, 'letters': 6, 'other': 4}
{'filename': 'multiprocessing_and_logging.py', 'total': 45, 'digits': 0, 'spaces': 27, 'letters': 11, 'other': 7}
{'filename': 'multiprocess_load.py', 'total': 32, 'digits': 0, 'spaces': 20, 'letters': 7, 'other': 5}
{'filename': 'multiprocess_pool_async.py', 'total': 30, 'digits': 0, 'spaces': 16, 'letters': 6, 'other': 8}
{'filename': 'multiprocess_pool.py', 'total': 21, 'digits': 0, 'spaces': 11, 'letters': 6, 'other': 4}
We asked it to use 3 processes, so looking at the process ID you can see one of them worked twice. The returned results can be any Python datastructure. A dictionary is usually a good idea.