Perl regex to capture a repeating group
I want a regex that matches something at the beginning of a line and then matches (and returns) all the other words. For example, given this line:
$line = "one two three etc";
I need something like this (it doesn't work):
@matches= $line=~ /^one(?:\s+(\S+))$/;
to go back to @matches, the words "two", "three", "etc.".
I don't want to know how to get words. I want to do this with regex. It seems so simple, but I couldn't find a solution.
source to share
You cannot have an unknown number of capture groups. If you try to repeat the capture group, the last instance will override the contents of the capture group:
- Expression:
^one(?:\s+(\S+))+$
- Capture # 1:
etc
Or:
- Expression:
^one\s+(\S+)\s+(\S+)\s+(\S+)$
- Capture # 1:
two
- Capture # 2:
three
- Capture # 3:
etc
I suggest either grabbing the whole group and then splitting the spaces:
- Expression:
^one\s+((?:\S+\s*)+)$
- Capture # 1:
two three etc
Or you can do a global match and use \G
and \K
:
- Expression:
(?:^one|(?<!\A)\G).*?\K\S+
- Match number 1:
two
- Match number 2:
three
- Match number 3:
etc
source to share
The simplest solution is probably split
after the fact:
use strict;
use warnings;
my $line = "one two three etc";
my @matches = $line =~ /^one\s+(.*)/ ? split(' ', $1) : ();
use Data::Dump;
dd @matches;
Outputs:
("two", "three", "etc")
However, it can also be used \G
to continue from where the previous match was left, and therefore find all non-null spaces using the modifier /g
.
The only trick is to not match \G
at the beginning of the line, so the word one
must match:
my @matches = $line =~ /(?:^one|(?<!\A)\G)\s+(\S+)/g;
source to share