Why doesn't my regex match when I use a single char set?

Question

Why doesn't my regex match when I use a single char set?

I am trying to map monetary values in Perl. While in the UK, I'm going to start by matching £

s and later fork into other currencies, so to denote that, I put the symbol £

in the character set. The code looks like this:

my $re = qr/ Spent \s+ [£] (?<amount> \d+) /x;
if ( $input =~ $re ) {
    print $+{amount};
}

And here's an example of the input file:

- Spent £6 on beer
- Spent £4 on sobriety pills

And yet, if I run this file, it doesn't match anything! However, if I remove £

from my character set:

my $re = qr/ Spent \s+ £ (?<amount> \d+) /x;

Now the numbers are being printed. Please note that I have removed []

from regex. Do symbolic characters mean to match their nested meanings? What's even weirder is if I replace a character in £

both the regex and the input file with something in ASCII, for example E

, it works fine even though it's enclosed in a character set.

Both the script and the input file are UTF-8, I'm on Perl 5.18.2, and the only module I'm importing is Moose.

+3

regex perl

Ben S 11 Sep 14 at 10:14

source to share

2 answers

choroba · Answer 1 · 2014-09-11T10:33:58+0000

What encoding are you using? UTF-8? Did you tell Perl you were doing this?

use utf8; # The source is in UTF-8.

Also, if the $ input comes from a file, did you tell Perl what encoding it uses?

open my $HANDLE, '<:encoding(utf-8)', 'input.txt' or die $!;

JonB · Answer 2 · 2014-09-11T10:37:54+0000

Replace £

it with unicode:

my $re = qr/ Spent \s+ [\u00A3] (?<amount> \d+) /x;

Why doesn't my regex match when I use a single char set?

More articles: