How to keep hash order in Perl?
I have a .sql file from which I am reading my input. Suppose the file contains the following input ...
Message Fruits Fruit="Apple",Color="Red",Taste="Sweet";
Message Flowers Flower="Rose",Color="Red";
I have now written a perl script to generate a hash from this file.
use strict;
use Data::Dumper;
if(open(MYFILE,"file.sql")){
my @stack;
my %hash;
push @stack,\%hash;
my @file = <MYFILE>;
foreach my $row(@file){
if($row =~ /Message /){
my %my_hash;
my @words = split(" ",$row);
my @sep_words = split(",",$words[2]);
foreach my $x(@sep_words){
my($key,$value) = split("=",$x);
$my_hash{$key} = $value;
}
push @stack,$stack[$#stack]->{$words[1]} = {%my_hash};
pop @stack;
}
}
print Dumper(\%hash);
}
I am getting the following output.
$VAR1 = {
'Flowers' => {
'Flower' => '"Rose"',
'Color' => '"Red";'
},
'Fruits' => {
'Taste' => '"Sweet";',
'Fruit' => '"Apple"',
'Color' => '"Red"'
}
};
Now here the hash does not preserve the reading order of the input. I want my hash to be in the same order as the input file. I found some libraries like Tie :: IxHash, but I want to avoid using any libraries. Can anyone help me?
For a low key approach, you can always store the keys in an array that is in order.
foreach my $x(@sep_words){
my($key,$value) = split("=",$x);
$my_hash{$key} = $value;
push(@list_keys,$key);
}
And then, to extract, repeat the keys
foreach my $this_key (@list_keys) {
# do something with $my_hash{$this_key}
}
But that has a problem, you rely on the array of keys and the hash being in sync. You can also accidentally add the same key multiple times if you're not careful.
Joel has it right - you can't reliably trust the hash order in Perl. If you need a specific order, you will need to store the information in an array.
A hash is a collection of key-value pairs with unique keys. The set is never ordered by itself.
An array is a sequence of any number of scalars. The array is ordered on its own, but uniqueness must be enforced externally.
Here is my solution to your problem:
#!/usr/bin/perl
use strict; use warnings;
use Data::Dumper;
local $/ = ";\n";
my @messages;
while (<DATA>) {
chomp;
my ($msg, $to, $what) = split ' ', $_, 3; # limit number of fragments.
my %options;
while($what =~ /(\w+) = "((?:[^"]++|\\.)*)" (?:,|$)/xg) {
$options{$1} = $2;
}
push @messages, [$to => \%options];
}
print Dumper \@messages;
__DATA__
Message Fruits Fruit="Apple",Color="Red",Taste="Sweet";
Message Flowers Flower="Rose",Color="Red";
I am putting messages in an array because it needs to be sorted. Also, I don’t do weird gymnastics with a stack that I don’t need.
I am not separating all newlines because you could specify a value containing newlines. For the same reason, I didn't blindly split into ,
or =
and use a sane regex. It might be worth adding error detection, for example die if not defined pos $what or pos($what) != length($what);
at the end (requires a flag /c
in the regex) to see if we actually processed everything or were thrown out of the loop prematurely.
This gives:
$VAR1 = [
[ 'Fruits',
{
'Taste' => 'Sweet',
'Fruit' => 'Apple',
'Color' => 'Red'
}
],
[ 'Flowers',
{
'Flower' => 'Rose',
'Color' => 'Red'
}
]
];
(with a different indentation, but it doesn't matter).
There is one file: the file must be terminated with a newline character, or the last semicolon is missing.