Reading parallel stream FloolDir

Question

Reading parallel stream FloolDir

Since I am not allowed to create Flume on prod servers, I have to download logs, put them in Flume spoolDir and have a shell to consume from pipe and write to Cassandra. Everything works fine.

However, since I have a lot of log files in spoolDir, and the current setup only processes one file at a time, it takes a while. I want to be able to process many files at the same time. One way I thought was to use spoolDir, but distribute files in 5-10 different directories and define multiple sources / channels / destinations, but this is a bit clunky. Is there a better way to achieve this?

thank

+3

apache flume flume-ng

user1061258 Sep 16 14 at 18:01

source to share

1 answer

smola · Accepted Answer · 2014-09-17T12:30:17+0000

For the record only, this was answered on the Flume mailing list:

Hari Shreedharan wrote:

Unfortunately no. The spoolDir source was supported single-threaded, so the deserializer implementation can be straightforward. The multiple spoolDir source approach is correct, although they can all write to the same channel (s), so you only need more sources, they can all use the same channel t need more sinks if you don't want retrieve data faster.

http://mail-archives.apache.org/mod_mbox/flume-user/201409.mbox/browser

Reading parallel stream FloolDir

More articles: