Generating a hash with regular expressions in perl

Question

Generating a hash with regular expressions in perl

Let's say I have a file like below:

And I want to store all decimal numbers in a hash.

hello world 10 20
world 10 10 10 10 hello 20
hello 30 20 10 world 10

I was looking at this

and this worked fine:

> perl -lne 'push @a,/\d+/g;END{print "@a"}' temp
10 20 10 10 10 10 20 30 20 10 10

Then I needed to count the number of occurrences of each regex.

for this i think it would be better to store all matches in a hash and assign an incremental value to each key.

so i tried:

perl -lne '$a{$1}++ for ($_=~/(\d+)/g);END{foreach(keys %a){print "$_.$a{$_}"}}' temp

which gives me the output:

> perl -lne '$a{$1}++ for ($_=~/(\d+)/g);END{foreach(keys %a){print "$_.$a{$_}"}}' temp
10.4
20.7

Can someone please correct me if I was wrong?

the output I'm expecting is the following:

10.7
20.3
30.1

although i can do it in awk, i would only like to do it in perl

Also the order of the output doesn't bother me.

+3

perl

user1939168 Jan 31. 13 at 10:51

source to share

2 answers

Another option would be the following:

$a{$1}++ while ($_=~/(\d+)/g);

This does what I think you expected your code to do: iterate over each successful match as it matches. Thus, it $1

will be what you think.

Just to understand the difference:

The only argument for

in Perl means "do something for each item in the list":

for (@array)
{
    #do something to each array element
}

So, in your code, a list of matches was first created, and only after the entire list of matches was found, you had the opportunity to do something with the results. $1

got reset in every match as the list was created, but by the time your code ran, the last match on that line was set. That's why your results didn't make sense.

On the other hand, a while loop means "check if this condition is true every time and keep going until the condition is false." Therefore, the code in the while loop will execute on each match of the regular expression, and $1

it matters for that match.

Other times, the difference is important in Perl - it's file handling. for (<FILE>) { ... }

reads the entire file into memory first, which is wasteful. It is recommended to use it instead while (<FILE>)

, because then you go through the file line by line and only save the information you want.

+4

user1919238 Jan 31. At 11:09 am

source to share

melpomene · Accepted Answer · 2013-01-31T10:55:34+0000

$a{$1}++ for ($_=~/(\d+)/g);

It should be

$a{$_}++ for ($_=~/(\d+)/g);

and can be simplified to

$a{$_}++ for /\d+/g;

The reason for this is that it /\d+/g

creates a hit list which is then iterated over with for

. The current item is at $_

. I guess it $1

will contain whatever is left there in the last match, but that's definitely not what you want to use in this case.

Generating a hash with regular expressions in perl

More articles: