Perl's regex \ d + and [0-9] statements show only one digit in an alphanumeric string
I ran into the following problem: If I use the code in the first example, the variable $1
only contains the last digit of each line. However, if I use the third example, where each "line" is just a number, the variable $1
shows the complete number with all digits. It seems to me that the operator \d+
works differently in an alpha digital context and just in a numeric context.
Here are my questions: Can you reproduce this? Is this behavior expected? How can I grab the full number in an alpha numeric context using a regex operation in perl? If the nature of the operator \d
is inherently lazy, can I make it more greedy (if that's true, how would I do it?)?
Example 1:
perl -e 'for ($i = 199; $i < 201; $i ++) { print "words".$i."words\n"}' | perl -ne 'if (/\A\w+(\d+)\w+/) {$num = $1; print $num,"\n";}'
Output:
9 0
Example 2:
perl -e 'for ($i = 199; $i < 201; $i ++) { print "words".$i."words\n"}' | perl -ne 'if (/\A\w+([0-9]+)\w+/) {$num = $1; print $num,"\n";}'
Output:
9 0
Example 3:
perl -e 'for ($i = 199; $i < 201; $i ++) { print "words".$i."words\n"}' | perl -ne 'if (/(\d+)/) {$num = $1; print $num,"\n";}'
Output:
199 200
Thanks in advance. Any help is appreciated.
Best, Chris
source to share
Expected results. The /\A\w+(\d+)\w+/
first \w+
is a greedy pattern and will capture as many characters as it can match, and since it \w
also matches numbers.
Use a lazy quantifier - /\A\w+?(\d+)\w+/
or subtract a digit from \w
(like in /\A[^\W\d]+(\d+)\w+/
). \w+?
will match 1 or more word characters (letters / numbers / _
) as little as possible, but [^\W\d]
matches any letter characters or _
, so there is no need to use a lazy quantifier with this pattern.
source to share
the problem is that the numbers match \ w.
You must replace "\ w" with "\ D" ("not a digit"). For example:
perl -e 'for ($i = 199; $i < 201; $i ++) { print "words".$i."words\n"}' | perl -ne 'if (/\A\D+(\d+)\D+/) {$num = $1; print $num,"\n";}'
Output:
199 200
Of course, if your data can contain more than one number of digits on a single line, you need a more precise regexp.
source to share