Does it make sense to use multiprocessing to read multiple files with Python?

Question

Does it make sense to use multiprocessing to read multiple files with Python?

I intend to use multiprocessing to read a set of small files with Python multiprocessing capabilities. However, in a sense, this is inconvenient for me, because if the disc is spinning, then the neck of the bottle is the spinning time, and even - although I use several processes, the total read time should be similar to one read process. Am I wrong? Your comments?

Also, do you think that using multiprocessing can lead to interlaced reading of files, so that the contents of those files are somehow garbled?

+3

python file-io python-multiprocessing

erogol 01 dec. 14 at 12:01

source to share

2 answers

NPE · Answer 1 · 2014-12-01T12:10:19+0000

Your reasoning sounds like this, but the only way to be sure of this is by benchmarking (and it is unlikely that reading several small files in parallel will increase performance compared to reading them sequentially).

I'm not really sure what you mean by "intertwined reading", but - if there are no errors in the code, or the files change while they are being read, you will get exactly the same content no matter how you read it.

Unapiedra · Answer 2 · 2014-12-01T12:10:54+0000

You are really right, the bottleneck will be disk-IO .

However, the only way to know is to measure both approaches.

If you have influence over the files, you can go to one larger file rather than many smaller files.

Does it make sense to use multiprocessing to read multiple files with Python?

More articles: