MCE chunk size when reading from STDIN

I am writing a Perl program that processes a large number of log entries. To speed things up, I am using MCE to create multiple workflows for parallel processing. This is great so far, but I've found that I'm trying to use different block sizes in a very unscientific way. Here are some prerequisites before I get to my question.

We get logs from several syslog sources and collect them in a central location. The central syslog server writes these records to a single text file. The program in question takes the logs from this text file, does some manipulation, and sends them elsewhere. The raw log files are then archived.

To read the log files, I do this:

my $tail = 'tail -q -F '.$logdir.$logFile;
open my $tail_fh, "-|", $tail or die "Can't open tail\n";

      

Then I use mce_loop_f to iterate over the file descriptor:

mce_loop_f { my $hr = process($_); MCE->gather($hr); } $tail_fh;

      

This works well for the most part, but if there is a spike in the log, the program starts to get bogged down. While there are a number of factors to make things "go faster", one of those factors I'm a little unsure about is the chunk_size in MCE.

I understand that chunk_size is "auto", but how would that work with a file descriptor that is a tail pipe? Would it be convenient here?

What factors should be considered when tuning chunk_size? The log records occur at a rate of 1000-2000 events per second, depending on the time of day (1000 at night, 2000 during the day).

I'm a neophyte too when it comes to MCE, so if mce_loop_f is a bad idea for this use case, please let me know.

+3


source to share





All Articles