Perl creates objects very slowly

I have a perl script that reads ~ 50,000 lines from a database and stores them in an array of hashes. Standard DBI code. Instead of working directly with hashes, I prefer to put data in objects that I can pass to other code units in a very clean way. The table I am reading has 15+ columns. My code basically looks like this:

my $db = DBI->connect(); # Just pretend you see a proper DBI connect here
my $resultSet = $db->selectall_arrayref($sql);
$db->disconnect();

# Here where the problem starts.
my %objects;
for my $row (@{$resultSet}) {
    my ($col1, $col2, ..., $col15) = @{$row};
    my %inputHash;
    $inputHash{col1} = $col1 if $col1;
    ...
    $inputHash{col15} = $col1 if $col15;
    my $obj = Model::Object->new(%inputHash);
    $objects{$col1} = $obj;
}
return values %objects;

      

It collects material into a hash to eliminate duplicates from the selected one. The problem starts in the loop below the comment, which says, "Here's where the problem starts." I put a message in a loop to write a line for every 100 objects that were created. The first 100 objects were created in 5 seconds. The next 100 took 16 seconds. Arriving at 300 took another 30 seconds. It's up to 9000 objects and takes 12+ minutes to create 100 objects. I didn't think 50,000 objects were big enough to create such problems.

The generated Model :: Object is a class with getters and setters for each of the properties. It has a new method and serialization method (essentially toString) and that's it. There is no logic there.

I am running ActiveState Perl 5.16 on a Windows laptop with 8GB of RAM, an i7 processor (3 years old) and an SSD with reasonable space. I've seen this on a Linux machine with the same Perl version, so I don't think it is a hardware thing. I need to stay on 5.16 AS Perl. Any advice on how to improve performance would be appreciated. Thank.

+3


source to share


2 answers


First of all: Profile your program! You've already narrowed it down to one sub, Devel::NYTProf

(for example) you can narrow it down to the line that is the culprit.

Here are some general considerations on my part:

Just by looking at this, there are some likely slowdown factors right off the bat, but you cannot be sure that you are not profiling your program :

Mayhe's hash distribution is taking too long. As the hash grows, %objects

perl will continually allocate more memory. You can preset the hash size $objects

. This feature is documented here . Since this is a memory allocation problem, you won't know if you are a profile with too small dataset.

# somewhere outside of the loop
keys(%objects) = $number_of_rows * 1.2;
# the hash should be a little bigger than the objects to be stored in it

      



Secondly, it may be that the creation of the object is taking too long. Take a look at Model::Object

. I don't know what's in there, so I can't comment on this. But you should definitely consider submitting %inputHash

as a reference. With, Model::Object->new(%inputHash);

you push keys and values ​​onto the stack and then pop them, in the worst case like my %options = @_;

. With this move, you recalculate the hash for each key.

Maybe you can think of a way to get rid of the little one completely $inputHash

. I could quickly come up with several ways that would be based on defined

nes, but you are a plausibility check (are you sure that's true, btw? "0"

Is false, for example).

But again, the most important thing: Profile your program. Perhaps take a smaller dataset, but you won't be able to see memory allocation problems as clearly. But with profiling you will definitely see, and at this point your program takes longer.

The perldoc

has something to say about speeding up your program
. It has a good chapter on profiling .

+5


source


As you've read, it is imperative that you use a profiler to determine where the bottlenecks are in your code before you move forward with optimizations. However, as I described in my comment, it is possible to rewrite your loop in a different way so that unused hashes are not unnecessarily created and discarded

You should also see an improvement from passing the hash by reference instead of a simple list of keys and values.



You can modify the code here, which should give you some ideas.

use constant COLUMN_NAMES => [ qw/
  col1  col2  col3  col4  col5
  col6  col7  col8  col9  col10
  col11 col12 col13 col14 col15 
/ ];

sub object_results {

    my $dbh = DBI->connect($dsn, $user, $pass);
    my $result_set = $dbh->selectall_arrayref($sql);
    $dbh->disconnect;

    my %objects;
    for ( my $i = $#$result_set; $i >= 0; --$i ) {
        my $row = $result_set->[$i];
        next if exists $objects{$row->[0]};

        my %input_hash;
        for my $i ( 0 .. $#$row ) {
          my $v = $row->[$i];
          next unless defined $v;
          $input_hash{COLUMN_NAMES->[$i]} = $v;
        }

        $objects{$input_hash{col1}} = Model::Object->new(\%input_hash);
    }

    values %objects;
}

      

+1


source







All Articles