Perl regex puts all matches into an array, including the full match

I have the following perl code:

use Data::Dumper;
$key = 'foobar:foo:bar';
$pattern = '^[^:]+:([a-z]{3}):(.+)$';
my @matches = $key =~ /$pattern/i;
print Dumper(@matches);

      

Output:

$VAR1 = 'foo';
$VAR2 = 'bar';

      

Or, alternatively, I can print $1

for the first capture group, $2

for the second.

What I want to know is how to get a complete pattern match. For example, in PHP, if I used preg_match

, I would get this:

Array
(
    [0] => foobar:foo:bar
    [1] => foo
    [2] => bar
)

      

If the first element (or $ 0 or \ 0) is a complete match. How can I get this in Perl?

+3


source to share


2 answers


You can use variables $&

or for this ${^MATCH}

, although there is a performance limitation in Perl versions prior to 5.20 (which is significant for $&

). From perldoc perlvar

:

  • $ MATCH
  • $ &

The string to match for the last successful pattern match (not counting matches hidden in BLOCK or eval()

enclosed in the current BLOCK).

[...]

  • $ {^ MATCH}

This is similar to $&

( $MATCH

), except that it does not incur the performance penalty associated with that variable.

[...]

Perl v5.18 and earlier is guaranteed to return only a specific value when a template has been compiled or executed with a modifier /p

. In Perl v5.20, the modifier /p

does nothing, so ${^MATCH}

it does the same thing as $MATCH

.

This variable was added in Perl v5.10.0.

Performance issues

Again from perldoc perlvar

:

Traditionally, in Perl, any use of any of the three variables $`

, $&

or $'

(or their equivalents use English

) anywhere in the code, has resulted in all subsequent successful pattern matches making a copy of the matched string if the code can subsequently access one of those variables. This imposed a significant performance penalty on the entire program, so the use of these variables was generally discouraged.

[...]

Perl 5.10.0 introduced the operator flag /p

and variables ${^PREMATCH}

, ${^MATCH}

and ${^POSTMATCH}

, which allowed you to suffer penalties only on patterns marked /p

.

In Perl 5.18.0 onwards, perl began to mark the presence of each of the three variables separately and only copied that part of the required line; therefore in

$`; $&; "abcdefgh" =~ /d/

      

perl will only copy the "abcd" part of the string. It can make a big difference in something like

$str = 'x' x 1_000_000;
$&; # whoops
$str =~ /x/g # one char copied a million times, not a million chars

      

Perl 5.20.0 included a new copy-on-write system by default, which ultimately fixes all performance issues with these three variables and makes them safe to use anywhere.



Example:

perl -wE 'say for "foo:bar" =~ /^(\w+):(\w+)$/p; say ${^MATCH}'

      

Output:

foo
bar
foo:bar

      

+2


source


Start and end your regex with parentheses and make the whole expression another capturing group.



my @matches = $key =~ /($pattern)/i;

print Dumper( ["foobar:foo:bar"=~/$pattern/i] );
$VAR1 = [
      'foo',
      'bar'
    ];

print Dumper( ["foobar:foo:bar"=~/($pattern)/i] );
$VAR1 = [
      'foobar:foo:bar',
      'foo',
      'bar'
    ];

      

+5


source







All Articles