How to keep hash order in Perl?
I have a .sql file from which I am reading my input. Suppose the file contains the following input ...
Message Fruits Fruit="Apple",Color="Red",Taste="Sweet";
Message Flowers Flower="Rose",Color="Red";
I have now written a perl script to generate a hash from this file.
use strict;
use Data::Dumper;
if(open(MYFILE,"file.sql")){
my @stack;
my %hash;
push @stack,\%hash;
my @file = <MYFILE>;
foreach my $row(@file){
if($row =~ /Message /){
my %my_hash;
my @words = split(" ",$row);
my @sep_words = split(",",$words[2]);
foreach my $x(@sep_words){
my($key,$value) = split("=",$x);
$my_hash{$key} = $value;
}
push @stack,$stack[$#stack]->{$words[1]} = {%my_hash};
pop @stack;
}
}
print Dumper(\%hash);
}
I am getting the following output.
$VAR1 = {
'Flowers' => {
'Flower' => '"Rose"',
'Color' => '"Red";'
},
'Fruits' => {
'Taste' => '"Sweet";',
'Fruit' => '"Apple"',
'Color' => '"Red"'
}
};
Now here the hash does not preserve the reading order of the input. I want my hash to be in the same order as the input file. I found some libraries like Tie :: IxHash, but I want to avoid using any libraries. Can anyone help me?
source to share
For a low key approach, you can always store the keys in an array that is in order.
foreach my $x(@sep_words){
my($key,$value) = split("=",$x);
$my_hash{$key} = $value;
push(@list_keys,$key);
}
And then, to extract, repeat the keys
foreach my $this_key (@list_keys) {
# do something with $my_hash{$this_key}
}
But that has a problem, you rely on the array of keys and the hash being in sync. You can also accidentally add the same key multiple times if you're not careful.
source to share
A hash is a collection of key-value pairs with unique keys. The set is never ordered by itself.
An array is a sequence of any number of scalars. The array is ordered on its own, but uniqueness must be enforced externally.
Here is my solution to your problem:
#!/usr/bin/perl
use strict; use warnings;
use Data::Dumper;
local $/ = ";\n";
my @messages;
while (<DATA>) {
chomp;
my ($msg, $to, $what) = split ' ', $_, 3; # limit number of fragments.
my %options;
while($what =~ /(\w+) = "((?:[^"]++|\\.)*)" (?:,|$)/xg) {
$options{$1} = $2;
}
push @messages, [$to => \%options];
}
print Dumper \@messages;
__DATA__
Message Fruits Fruit="Apple",Color="Red",Taste="Sweet";
Message Flowers Flower="Rose",Color="Red";
I am putting messages in an array because it needs to be sorted. Also, I don’t do weird gymnastics with a stack that I don’t need.
I am not separating all newlines because you could specify a value containing newlines. For the same reason, I didn't blindly split into ,
or =
and use a sane regex. It might be worth adding error detection, for example die if not defined pos $what or pos($what) != length($what);
at the end (requires a flag /c
in the regex) to see if we actually processed everything or were thrown out of the loop prematurely.
This gives:
$VAR1 = [
[ 'Fruits',
{
'Taste' => 'Sweet',
'Fruit' => 'Apple',
'Color' => 'Red'
}
],
[ 'Flowers',
{
'Flower' => 'Rose',
'Color' => 'Red'
}
]
];
(with a different indentation, but it doesn't matter).
There is one file: the file must be terminated with a newline character, or the last semicolon is missing.
source to share