Perl regex puts all matches into an array, including the full match
I have the following perl code:
use Data::Dumper;
$key = 'foobar:foo:bar';
$pattern = '^[^:]+:([a-z]{3}):(.+)$';
my @matches = $key =~ /$pattern/i;
print Dumper(@matches);
Output:
$VAR1 = 'foo';
$VAR2 = 'bar';
Or, alternatively, I can print $1
for the first capture group, $2
for the second.
What I want to know is how to get a complete pattern match. For example, in PHP, if I used preg_match
, I would get this:
Array
(
[0] => foobar:foo:bar
[1] => foo
[2] => bar
)
If the first element (or $ 0 or \ 0) is a complete match. How can I get this in Perl?
source to share
You can use variables $&
or for this ${^MATCH}
, although there is a performance limitation in Perl versions prior to 5.20 (which is significant for $&
). From perldoc perlvar
:
- $ MATCH
- $ &
The string to match for the last successful pattern match (not counting matches hidden in BLOCK or
eval()
enclosed in the current BLOCK).[...]
- $ {^ MATCH}
This is similar to
$&
($MATCH
), except that it does not incur the performance penalty associated with that variable.[...]
Perl v5.18 and earlier is guaranteed to return only a specific value when a template has been compiled or executed with a modifier
/p
. In Perl v5.20, the modifier/p
does nothing, so${^MATCH}
it does the same thing as$MATCH
.This variable was added in Perl v5.10.0.
Performance issues
Again from perldoc perlvar
:
Traditionally, in Perl, any use of any of the three variables
$`
,$&
or$'
(or their equivalentsuse English
) anywhere in the code, has resulted in all subsequent successful pattern matches making a copy of the matched string if the code can subsequently access one of those variables. This imposed a significant performance penalty on the entire program, so the use of these variables was generally discouraged.[...]
Perl 5.10.0 introduced the operator flag
/p
and variables${^PREMATCH}
,${^MATCH}
and${^POSTMATCH}
, which allowed you to suffer penalties only on patterns marked/p
.In Perl 5.18.0 onwards, perl began to mark the presence of each of the three variables separately and only copied that part of the required line; therefore in
$`; $&; "abcdefgh" =~ /d/
perl will only copy the "abcd" part of the string. It can make a big difference in something like
$str = 'x' x 1_000_000; $&; # whoops $str =~ /x/g # one char copied a million times, not a million chars
Perl 5.20.0 included a new copy-on-write system by default, which ultimately fixes all performance issues with these three variables and makes them safe to use anywhere.
Example:
perl -wE 'say for "foo:bar" =~ /^(\w+):(\w+)$/p; say ${^MATCH}'
Output:
foo
bar
foo:bar
source to share
Start and end your regex with parentheses and make the whole expression another capturing group.
my @matches = $key =~ /($pattern)/i;
print Dumper( ["foobar:foo:bar"=~/$pattern/i] );
$VAR1 = [
'foo',
'bar'
];
print Dumper( ["foobar:foo:bar"=~/($pattern)/i] );
$VAR1 = [
'foobar:foo:bar',
'foo',
'bar'
];
source to share