How to keep hash order in Perl?

I have a .sql file from which I am reading my input. Suppose the file contains the following input ...

Message Fruits Fruit="Apple",Color="Red",Taste="Sweet";

Message Flowers Flower="Rose",Color="Red";

      

I have now written a perl script to generate a hash from this file.

use strict;
use Data::Dumper;

if(open(MYFILE,"file.sql")){
    my @stack;
    my %hash;
    push @stack,\%hash;
    my @file = <MYFILE>;
    foreach my $row(@file){
        if($row =~ /Message /){
            my %my_hash;
            my @words = split(" ",$row);
            my @sep_words = split(",",$words[2]);

            foreach my $x(@sep_words){
                my($key,$value) = split("=",$x);
                $my_hash{$key} = $value;
            }
            push @stack,$stack[$#stack]->{$words[1]} = {%my_hash};
            pop @stack;
        }
    }
    print Dumper(\%hash);
}

      

I am getting the following output.

$VAR1 = {
          'Flowers' => {
                         'Flower' => '"Rose"',
                         'Color' => '"Red";'
                       },
          'Fruits' => {
                        'Taste' => '"Sweet";',
                        'Fruit' => '"Apple"',
                        'Color' => '"Red"'
                      }
        };

      

Now here the hash does not preserve the reading order of the input. I want my hash to be in the same order as the input file. I found some libraries like Tie :: IxHash, but I want to avoid using any libraries. Can anyone help me?

+3


source to share


3 answers


For a low key approach, you can always store the keys in an array that is in order.

foreach my $x(@sep_words){
    my($key,$value) = split("=",$x);
    $my_hash{$key} = $value;
    push(@list_keys,$key);
}

      

And then, to extract, repeat the keys



foreach my $this_key (@list_keys) {
    # do something with $my_hash{$this_key}
}

      

But that has a problem, you rely on the array of keys and the hash being in sync. You can also accidentally add the same key multiple times if you're not careful.

+4


source


Joel has it right - you can't reliably trust the hash order in Perl. If you need a specific order, you will need to store the information in an array.



+3


source


A hash is a collection of key-value pairs with unique keys. The set is never ordered by itself.

An array is a sequence of any number of scalars. The array is ordered on its own, but uniqueness must be enforced externally.

Here is my solution to your problem:

#!/usr/bin/perl

use strict; use warnings;
use Data::Dumper;

local $/ = ";\n";

my @messages;

while (<DATA>) {
    chomp;
    my ($msg, $to, $what) = split ' ', $_, 3; # limit number of fragments.
    my %options;
    while($what =~ /(\w+) = "((?:[^"]++|\\.)*)" (?:,|$)/xg) {
        $options{$1} = $2;
    }
    push @messages, [$to => \%options];
}

print Dumper \@messages;

__DATA__
Message Fruits Fruit="Apple",Color="Red",Taste="Sweet";
Message Flowers Flower="Rose",Color="Red";

      

I am putting messages in an array because it needs to be sorted. Also, I don’t do weird gymnastics with a stack that I don’t need.

I am not separating all newlines because you could specify a value containing newlines. For the same reason, I didn't blindly split into ,

or =

and use a sane regex. It might be worth adding error detection, for example die if not defined pos $what or pos($what) != length($what);

at the end (requires a flag /c

in the regex) to see if we actually processed everything or were thrown out of the loop prematurely.

This gives:

$VAR1 = [
      [ 'Fruits',
        {
          'Taste' => 'Sweet',
          'Fruit' => 'Apple',
          'Color' => 'Red'
        }
      ],
      [ 'Flowers',
        {                                                                   
          'Flower' => 'Rose',                                               
          'Color' => 'Red'                                                  
        }
      ]
];

      

(with a different indentation, but it doesn't matter).

There is one file: the file must be terminated with a newline character, or the last semicolon is missing.

+2


source







All Articles