Improving the performance of Perl code

Question

Improving the performance of Perl code

The following program works fine, but big data takes infinite time.
INPUT.txt . In fact, I have up to 1000 lines with 1 to 100 items per line.

10  
6  
9  
7  
9 11  
3 4  
1 9  
5 12  
1 11  
5 11  
9 12  
10 5 8  
7 4 1
and so on...  
last: 1 2 3 4 5 6 7 . . .any number of elements (100 in my case).

matrix.txt (TAB DELIMIITED)

1   1   1   1   1   1   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   1   1   1   
1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   
1   1   1   1   1   1   0   1   1   1   1   1   1   1   1   0   1   1   1   1   0   0   1   1   1   1   1   1   
1   1   1   1   1   1   0   1   1   1   1   1   1   1   0   1   1   0   1   1   1   1   0   1   0   0   1   1   
1   1   1   1   1   1   0   1   1   1   1   1   1   1   1   1   1   0   1   1   1   1   1   1   1   1   1   0   
1   0   1   1   1   1   0   1   1   1   1   0   1   1   0   1   1   0   1   1   1   1   0   1   0   1   1   1   
1   1   1   1   1   1   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   1   1   1   
1   0   1   1   1   1   0   1   1   1   1   0   1   1   0   0   1   0   1   1   1   1   1   1   0   0   1   1   
1   1   1   1   1   1   0   1   1   1   1   1   1   1   1   0   1   0   1   0   1   1   1   1   1   1   1   0   
and so on....upto 25000 lines

output.txt
is the sum of the elements in matrix.txt at the index position taken from each INPUT.txt row.
actual amounts may differ from this hypothetical output.

1   1   1   1   1   0   1   1   1   2   2   2   2   2 . . .columns upto number of lines in input.txt
1   1   1   1   1   1   1   1   1   2   2   2   2   2
1   0   0   1   1   1   1   1   1   2   2   2   2   2
1   1   1   0   1   0   0   1   1   2   2   2   2   2
1   1   1   1   1   1   1   1   0   2   2   2   2   2
1   1   1   0   1   0   1   1   1   1   2   2   2   2
1   1   1   1   1   0   1   1   1   2   2   2   2   2
1   1   1   1   1   0   0   1   1   1   2   2   2   2
0   1   1   1   1   1   1   1   0   2   2   2   2   2

code: Check out the code to help you figure out what's going on.

use List::Util 'sum';
my @indexes = do {
    open my $fh, '<', "INPUT.txt";
    map { [map {$_ - 1} split ' '] } <$fh>
};
open my $infh, '<', "matrix.txt";
open OUT, '>', "output.txt";
while (<$infh>) {
    my @vals = split ' ';
    print OUT join('    ', map {sum(@vals[@$_])} @indexes), "\n";
}
close OUT;

Is there any other way to complete this task in less time.

File availability:
INPUT: https://www.dropbox.com/s/48ikhnfs7gzk8vm/input.txt?dl=0
MATRIX: https://www.dropbox.com/s/ebxi608eday9z1e/matrix.txt?dl=0

+3

performance matrix perl hash

BioDeveloper 26 nov. '14 at 10:00

source to share

3 answers

Doggerel · Answer 1 · 2014-11-26T11:41:11+0000

One thing to watch / try is the math and matrix modules on cpan. Some of them use their own code (based on c perl extensions) which should be faster. Here's a (dated) primer on them -

http://www.perlmonks.org/?node_id=284324

Tim tom · Answer 2 · 2014-11-27T06:43:34+0000

I made a version of PDL that takes advantage of the fact that you are essentially doing matrix multiplication with a selection vector. This version assumes that there will always be 100 elements in the matrix. If it is not, you need to change the null call accordingly.

It runs about twice as fast for entering the size (1,000 x 100) (25,000 x 100). Reading the entire matrix into memory and then processing the results in the same runtime, although it may be faster if you enable parallelism. In case you are wondering what the approximate half run time would be, the optimized version of c is about 4x faster (8x the original). All times are, of course, tied to my machine, but I would expect similar relationships on most computers. Nor am I claiming that my PDL is optimal, as I used that as an excuse to study it.

use strict;
use warnings;

use PDL;

my $indexes = PDL::long(do {
    open(my $fh, '<', 'INPUT.txt') or die;
    # The first map is if you allow duplicates in the index list (i.e. 2 2 is a valid row)
    # map { my $p = zeroes(100); $p->slice($_)++ foreach (map {$_ - 1} split /\t/); $p } <$fh>
    map { zeroes(100)->dice([map {$_ - 1} split /\t/])++ } <$fh>
})->xchg(0, 1);

open(my $input, '<', 'matrix.txt') or die;
open(my $output, '>', 'output.txt') or die;

while(<$input>) {
    my $vals = PDL::long(split(/\t/));
    print $output join("\t", ($vals x $indexes)->list) . "\n";
}

Sobrique · Answer 3 · 2014-11-26T11:38:22+0000

Do you have a better idea as to what "bit" is costing you productivity?

The reason I'm asking is some holy trinity of performance bottlenecks:

CPU - the actual operations performed on the processor
"Active" memory (size of memory profile or available RAM and how much you are shuffling).
IO - transfer data to / from disk.

You can often compromise one against the other - improve CPU efficiency by creating lookup tables, things like that.

Operations like map are the ones I'm starting to get closer to - things like map / sort / grep are very efficient, but may use a less optimal algorithm.

If you are CPU bound you can try using multithreading or forking to increase CPU access. At first glance, it looks like you have no dependencies on the processing of "matrix.txt" (for example, each row is self-contained), so it might be a good candidate for parallelism.

I would consider using Parallel :: ForkManager to wrap this loop while

. The downside to this is that you will have a non-deterministic order of your output to refer to.

So a starter for 10 could be:

use List::Util 'sum';
use Data::Dumper;
use Fcntl qw(:flock);

use Parallel::ForkManager;

my $mgr = Parallel::ForkManager->new(10);

my @indexes = do {
    open my $fh, '<', "INPUT.txt";
    map {
        [ map { $_ - 1 } split ' ' ]
    } <$fh>;
};
open my $infh,   '<', "matrix.txt";
open my $out_fh, '>', "output.txt";
while (<$infh>) {
    $mgr->start and next;
    my @vals = split ' ';
    my $output_line = join( '    ', map { sum( @vals[@$_] ) } @indexes ),
        "\n";
    {
        flock( $out_fh, LOCK_EX );
        print {$out_fh} $output_line;
    }
}
close $out_fh;

Note. This will work, but you end up with a random order of output, which is almost certainly not what you want. But it will use 10 processors at the same time to perform the "join / map / sum" operation.

(Of course, if you're tied to IO, this won't help.)

But for IO synchronization, I think multithreading is not a bad option:

 use warnings;
 use strict;

use List::Util 'sum';

use threads; 
use Thread::Queue;

my $line_q = Thread::Queue -> new(); 
my $output_q = Thread::Queue -> new(); 

my %line_output : shared; 

    my @indexes = do {
        open my $fh, '<', "INPUT.txt";
        map {
            [ map { $_ - 1 } split ' ' ]
        } <$fh>;
};


sub generate_output {
   while ( my $item = $line_q -> dequeue() ) {
   print "processing $item \n";
       my ( $line_num, @vals ) = split ( ' ', $item );           
       $output_q -> enqueue($line_num.":". join('    ', map {sum(@vals[@$_])} @indexes ). "\n");
   }
}

sub coalesce_output {
    open my $out_fh, '>', "output.txt";
    my $current_line = 0; 
    my %lines;
    while ( my $item = $output_q -> dequeue ) {
        my ( $line_num, $output_line ) = split ( ":", $item );
        if ( $line_num = $current_line ) { 
            print {$out_fh} $output_line;
            $current_line++; 
        }
        else {
           $lines{$line_num} = $output_line; 
        }
        while ( defined $lines{$current_line} ) {
            print {$out_fh} $lines{$current_line};
            delete $lines{$current_line};
            $current_line++;
        }
    }
}




open my $infh,   '<', "matrix.txt";

my @workers;
for ( 1..10 ) {
  push ( @workers, threads -> create ( \&generate_output ) ); 
}

threads -> create ( \&coalesce_output );

while (my $line = <$infh>) {
    $line_q -> enqueue ( "$.: $line" );
}

$line_q -> end();
foreach my $thr ( @workers ) {
  $thr -> join(); 
}

$output_q -> end();

For example. Rotates 10 "workers" to perform parallel summing operations, and one "output" stream to write data in the correct order.

So something like:

use warnings;
use strict;

use List::Util 'sum';

use threads;
use Thread::Queue;

my $line_q   = Thread::Queue->new();
my $output_q = Thread::Queue->new();

my @indexes = do {
    open my $fh, '<', "INPUT.txt";
    map {
        [ map { $_ - 1 } split ' ' ]
    } <$fh>;
};


sub generate_output {
    while ( my $item = $line_q->dequeue() ) {

        #print "processing $item \n";
        my ( $line_num, @vals ) = split( ' ', $item );
        $output_q->enqueue( $line_num . ":"
                . join( '    ', map { sum( @vals[@$_] ) } @indexes )
                . "\n" );
    }
}

sub coalesce_output {
    open my $out_fh, '>', "output.txt";
    my $current_line = 1;
    my %lines;
    while ( my $item = $output_q->dequeue ) {

        my ( $line_num, $output_line ) = split( ":", $item );

        #     print "Got $line_num ($current_line) $item\n";
        if ( $line_num = $current_line ) {

            #   print "printing $current_line = $output_line\n";
            print {$out_fh} $output_line;
            $current_line++;
        }
        else {
            $lines{$line_num} = $output_line;
        }
        while ( defined $lines{$current_line} ) {

    #   print "printing  (while) $current_line = $lines{$current_line}\n";
            print {$out_fh} $lines{$current_line};
            delete $lines{$current_line};
            $current_line++;
        }
    }
}


open my $infh, '<', "matrix.txt";

my @workers;
for ( 1 .. 40 ) {
    push( @workers, threads->create( \&generate_output ) );
}

threads->create( \&coalesce_output );

while ( my $line = <$infh> ) {
    $line_q->enqueue("$. $line");
}

$line_q->end();
foreach my $thr (@workers) {
    $thr->join();
}

$output_q->end();
foreach my $thr ( threads -> list ) { $thr -> join(); }

Produces (+ more):

 1    1    1    1    1    1    1    1    1    1    1    1    2    2    2    2
 2    2    2    2    2    2    2    2    2    2    2    2    2    2    2    2
 2    2    2    2    2    2    2    2    2    2    2    2    2    2    2    2
 2    2    2    2    2    2    2    2    2    2    2    2    2    2    2    2
 2    2    2    2    2    2    2    2    2    2    2    2    2    2    3    3
 3    3    3    3    3    3    3    3    3    3    3    3    3    3    3    3
 3    3    3    3    3    3    3    3    3    3    3    3    3    3    3    3

At the end of the day, though, it rather depends on what your limiting factor is.

Running a quick and dirty test gives

Started at 1417007048,
finished at 1417007064
Took:16s

Vs.

Started at 1417007118
finished at 1417007161
Took:43s

(I have not confirmed the outputs of both exhaustively)

Improving the performance of Perl code

More articles: