Erlang consumer queue
I have a problem where I want to pull discrete chunks of data from disk into a queue and delete them in another process. This data is randomly allocated to disk, so it won't be significantly extracted from sequential reads. This is a lot of data, so I can't load everything at once, nor is it efficiently pulling a block out at once.
I'd like the consumer to be able to run at their own speed, but to keep a healthy queue of data ready for it so I don't constantly wait on disk, read as I process chunks.
Is there an established way to do this? I. with the framework of jobs or securityvalve? Implementing this is similar to wheel reuse, since a slow consumer running on disk data is a common problem.
Any suggestions as to how best to tackle this Erlang path?
source to share
You can use the option {read_ahead, Bytes}
on file:open/2
:
{read_ahead, Size}
This option enables buffering of read data. If the calls
read/2
are significantly less thanSize
bytes, read operations against the operating system are still performed on blocks ofSize
bytes. Additional data is buffered and returned on subsequent callsread/2
, which gives a performance boost as the number of calls to the operating system decreases.The buffer is
read_ahead
also widely used by the functionread_line/1
in moderaw
, why this option is recommended (for performance reasons) when accessing raw files using this function.If the calls
read/2
for sizes are not significantly smaller or even larger thanSize
bytes, performance gains cannot be expected.
You were vague about the sizes you talked about, but it looks like with this buffer size there should be a decent start to implementing what you need.
source to share