Why doesn't my regex match when I use a single char set?

I am trying to map monetary values ​​in Perl. While in the UK, I'm going to start by matching £

s and later fork into other currencies, so to denote that, I put the symbol £

in the character set. The code looks like this:

my $re = qr/ Spent \s+ [£] (?<amount> \d+) /x;
if ( $input =~ $re ) {
    print $+{amount};
}

      

And here's an example of the input file:

- Spent £6 on beer
- Spent £4 on sobriety pills

      

And yet, if I run this file, it doesn't match anything! However, if I remove £

from my character set:

my $re = qr/ Spent \s+ £ (?<amount> \d+) /x;

      

Now the numbers are being printed. Please note that I have removed []

from regex. Do symbolic characters mean to match their nested meanings? What's even weirder is if I replace a character in £

both the regex and the input file with something in ASCII, for example E

, it works fine even though it's enclosed in a character set.

Both the script and the input file are UTF-8, I'm on Perl 5.18.2, and the only module I'm importing is Moose.

+3


source to share


2 answers


What encoding are you using? UTF-8? Did you tell Perl you were doing this?

use utf8; # The source is in UTF-8.

      



Also, if the $ input comes from a file, did you tell Perl what encoding it uses?

open my $HANDLE, '<:encoding(utf-8)', 'input.txt' or die $!;

      

+2


source


Replace £

it with unicode:



my $re = qr/ Spent \s+ [\u00A3] (?<amount> \d+) /x;

      

+2


source







All Articles