Does it make sense to use multiprocessing to read multiple files with Python?
I intend to use multiprocessing to read a set of small files with Python multiprocessing capabilities. However, in a sense, this is inconvenient for me, because if the disc is spinning, then the neck of the bottle is the spinning time, and even - although I use several processes, the total read time should be similar to one read process. Am I wrong? Your comments?
Also, do you think that using multiprocessing can lead to interlaced reading of files, so that the contents of those files are somehow garbled?
source to share
Your reasoning sounds like this, but the only way to be sure of this is by benchmarking (and it is unlikely that reading several small files in parallel will increase performance compared to reading them sequentially).
I'm not really sure what you mean by "intertwined reading", but - if there are no errors in the code, or the files change while they are being read, you will get exactly the same content no matter how you read it.
source to share