Why is using a regex precompiled with qr slower than using a constant regex?
I just saw this question about optimizing a specific regex in Perl. I wondered how many matches my machine could do, so I tried the following simple test:
- case 1 - using a regular expression precompiled with
qr
- case 2 - plain
/regex/
matches
use 5.014;
use warnings;
use Benchmark qw(:all);
my $str = "SDZ";
my $qr = qr/S?T?K?P?W?H?R?A?O?\*?E?U?F?R?P?B?L?G?T?S?D?Z?/;
say "match [$&]" if( $str =~ $qr );
my $res = timethese(-10, {
stdrx => sub { $str =~ /S?T?K?P?W?H?R?A?O?\*?E?U?F?R?P?B?L?G?T?S?D?Z?/ },
qr_rx => sub { $str =~ $qr },
});
cmpthese $res;
To my surprise, it gave the following output:
match [SDZ] Benchmark: running qr_rx, stdrx for at least 10 CPU seconds... qr_rx: 10 wallclock secs ( 9.99 usr + 0.01 sys = 10.00 CPU) @ 1089794.90/s (n=10897949) stdrx: 11 wallclock secs (10.58 usr + 0.04 sys = 10.62 CPU) @ 1651340.11/s (n=17537232) Rate qr_rx stdrx qr_rx 1089795/s -- -34% stdrx 1651340/s 52% --
i.e. downtime is $str =~ /regex/
about 50% faster than in use $str =~ qr
. I expected the opposite result.
Am I doing something wrong? Why am I getting this result?
EDIT:
Just downloaded said book, I have a lot to learn :). But the book cited also says:
If the regex literal does not have a variable interpolation, Perl knows that the regex variable cannot be used for use, so after compiling the regex, that compiled form is saved ("cached") for use when execution reaches the same code again. The regular expression is checked and compiled only once, no matter how often it is used during program execution.
So, in the above, both regex are literal with no variable interpolation. So, a "precompiled" regex should be as fast as a regular expression . In this example, it is 50% slower.
Ikegami explained why it $str =~ $qr
is slower. (and frankly, "slower" is not the correct term because we are talking about a few microseconds ... :))
but the perl docs says:
By pre-compiling the template internally at qr (), you avoid recompiling the template every time / $ pat / is attempted.
From the point of view of a regular perl user ("not some high-level perl monk") this means: precompiling your template - it will be faster, but true - it only helps if the regex contains some "non-static" parts ...
To be honest, I still don't understand it completely - but I got the book and I'm going to study. :) Perhaps one more sentence in the documents - may help newcomers not to be mistaken in understanding qr
when they start learning.
Thanks everyone!
source to share
Regular expression patterns are compiled at compile time unless they are interpolated. Neither the regular expression in the operator qr//
nor any of the match operators in the operator stdrx
are interpolated, so both are compiled at compile time.
An extra 30 Ξs wasted in the test qr_rx
wasted "compiling" the third regular expression: one in the match operator in qr_rx
. Don't forget what is $_ =~ $re
wrong for $_ =~ m/$re/
. Now no compilation occurs when the entire pattern consists of an interpolated pre-compiled regex, because that case is handled on purpose, but it seems to take a little time to coax an op match into using a pre-compiled regex. (Maybe you need to clone it?)
source to share