Parsing a text file in Perl

Question

Parsing a text file in Perl

I am new to Perl and it is very difficult for me to write a Perl script that will successfully parse a structured text file.

I have a bunch of files that look like this:

name:
    John Smith
occupation:
    Electrician
date of birth:
    2/6/1961
hobbies:
    Boating
    Camping
    Fishing

Etc. The field name is always followed by a colon, and all data associated with those fields is always indented with a single tab (\ t).

I would like to create a hash that will directly bind the content of the field to the field name, for example:

 $contents{$name} = "John Smith"
 $contents{$hobbies} = "Boating, Camping, Fishing"

Or something like that.

So far I've managed to get all the field names into a hash on my own, but I haven't been able to frantically wrap the field data into a form that can be stored well in a hash. Obviously, substituting / splitting newlines followed by tabs won't work (I tried, somewhat naively). I also tried a rough look where I am creating a duplicate array of strings from a file and using that to figure out where the field boundaries are, but that doesn't really matter in terms of memory consumption.

FWIW, I'm currently looking at file by line, but I'm not really sure if this is the best solution. Is there a way to do this parsing in a simple way?

+3

perl

MARS 11 oct. 14 at 14:59

source to share

2 answers

This text file is actually pretty close to yaml. And it's not hard to convert it to a valid yaml file:

Once you have the yaml file, you can use YAML :: Tiny or another module to parse it, resulting in cleaner codes:

#!/usr/bin/perl
use strict;
use warnings;

use YAML::Tiny;
use Data::Dumper;

convert( './data.yaml', 'output.yaml' );
parse('output.yaml');

sub parse {
    my $yaml    = shift;
    my $yamlobj = YAML::Tiny->read($yaml);

    my $name    = $yamlobj->[0]->{name}[0];
    my $occ     = $yamlobj->[0]{occupation}[0];
    my $birth   = $yamlobj->[0]{'date of birth'}[0];
    my $hobbies = $yamlobj->[0]{hobbies};

    my $hobbiestring = join ", ", @$hobbies;

    my $contents = {
        name       => $name,
        occupation => $occ,
        birth      => $birth,
        hobbies    => $hobbiestring,
    };

    print "#RESULT:\n\n";
    print Dumper($contents);
}

sub convert {
    my ( $input, $output ) = @_;

    open my $infh,  '<', $input  or die "$!";
    open my $outfh, '>', $output or die "$!";

    while ( my $line = <$infh> ) {
        $line =~ s/^\s+\K$/-/g;
        print $outfh ($line);
    }
}

+2

Luke 11 oct. 14 at 16:22

source to share

chilemagic · Accepted Answer · 2014-10-11T15:12:40+0000

Reading line by line is a good way. Here I am creating a hash of array references. This is how you would just read one file. You can read each file this way and put the hash of the arrays into the hash of the array hashes.

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

my %contents;
my $key;
while(<DATA>){
    chomp;
    if ( s/:\s*$// ) {
        $key = $_;
    } else {
        s/^\s+//g; # remove extra whitespace
        push @{$contents{$key}}, $_;
    }
}
print Dumper \%contents;

__DATA__
name:
    John Smith
occupation:
    Electrician
date of birth:
    2/6/1961
hobbies:
    Boating
    Camping
    Fishing

Output:

$VAR1 = {
          'occupation' => [
                             'Electrician'
                           ],
          'hobbies' => [
                          'Boating',
                          'Camping',
                          'Fishing'
                        ],
          'name' => [
                       'JohnSmith'
                     ],
          'date of birth' => [
                                '2/6/1961'
                              ]
        };

Parsing a text file in Perl

More articles: