Does it make sense to use multiprocessing to read multiple files with Python?

I intend to use multiprocessing to read a set of small files with Python multiprocessing capabilities. However, in a sense, this is inconvenient for me, because if the disc is spinning, then the neck of the bottle is the spinning time, and even - although I use several processes, the total read time should be similar to one read process. Am I wrong? Your comments?

Also, do you think that using multiprocessing can lead to interlaced reading of files, so that the contents of those files are somehow garbled?

+3


source to share


2 answers


Your reasoning sounds like this, but the only way to be sure of this is by benchmarking (and it is unlikely that reading several small files in parallel will increase performance compared to reading them sequentially).



I'm not really sure what you mean by "intertwined reading", but - if there are no errors in the code, or the files change while they are being read, you will get exactly the same content no matter how you read it.

0


source


You are really right, the bottleneck will be disk-IO .

However, the only way to know is to measure both approaches.



If you have influence over the files, you can go to one larger file rather than many smaller files.

0


source







All Articles