Perl's regex \ d + and [0-9] statements show only one digit in an alphanumeric string

I ran into the following problem: If I use the code in the first example, the variable $1

only contains the last digit of each line. However, if I use the third example, where each "line" is just a number, the variable $1

shows the complete number with all digits. It seems to me that the operator \d+

works differently in an alpha digital context and just in a numeric context.

Here are my questions: Can you reproduce this? Is this behavior expected? How can I grab the full number in an alpha numeric context using a regex operation in perl? If the nature of the operator \d

is inherently lazy, can I make it more greedy (if that's true, how would I do it?)?

Example 1:

perl -e 'for ($i = 199; $i < 201; $i ++) { print "words".$i."words\n"}' | perl -ne 'if (/\A\w+(\d+)\w+/) {$num = $1; print $num,"\n";}'

      

Output:

9
0

      

Example 2:

perl -e 'for ($i = 199; $i < 201; $i ++) { print "words".$i."words\n"}' | perl -ne 'if (/\A\w+([0-9]+)\w+/) {$num = $1; print $num,"\n";}'

      

Output:

9
0

      

Example 3:

perl -e 'for ($i = 199; $i < 201; $i ++) { print "words".$i."words\n"}' | perl -ne 'if (/(\d+)/) {$num = $1; print $num,"\n";}'

      

Output:

199
200

      

Thanks in advance. Any help is appreciated.

Best, Chris

+3


source to share


2 answers


Expected results. The /\A\w+(\d+)\w+/

first \w+

is a greedy pattern and will capture as many characters as it can match, and since it \w

also matches numbers.



Use a lazy quantifier - /\A\w+?(\d+)\w+/

or subtract a digit from \w

(like in /\A[^\W\d]+(\d+)\w+/

). \w+?

will match 1 or more word characters (letters / numbers / _

) as little as possible, but [^\W\d]

matches any letter characters or _

, so there is no need to use a lazy quantifier with this pattern.

+4


source


the problem is that the numbers match \ w.

You must replace "\ w" with "\ D" ("not a digit"). For example:

perl -e 'for ($i = 199; $i < 201; $i ++) { print "words".$i."words\n"}' | perl -ne 'if (/\A\D+(\d+)\D+/) {$num = $1; print $num,"\n";}'

      



Output:

199
200

      

Of course, if your data can contain more than one number of digits on a single line, you need a more precise regexp.

+1


source







All Articles