Use Regex to change a specific column in CSV

I want to convert some strings to CSV that are in the format 0000-2400 hours, to the format 00-24 hours. eg.

2011-01-01,"AA",12478,31703,12892,32575,"0906",-4.00,"1209",-26.00,2475.00
2011-01-02,"AA",12478,31703,12892,32575,"0908",-2.00,"1236",1.00,2475.00
2011-01-03,"AA",12478,31703,12892,32575,"0907",-3.00,"1239",4.00,2475.00

      

The 7th and 9th columns are the departure and arrival times, respectively. Preferably the lines should look like this when I'm done:

2011-01-01,"AA",12478,31703,12892,32575,"09",-4.00,"12",-26.00,2475.00

      

The entire csv will eventually be imported into R and I want to try to handle some of the processing ahead of time because it will be very large. At first I tried to do it with Perl, but I am having trouble selecting multiple digits with regex. I can get one digit before a given comma with a lookbehind expression, but not more than one.

I am also open to saying that doing this in Perl is uselessly stupid and I have to stick with R. :)

+3


source to share


2 answers


As I mentioned in the comments, using a CSV module like Text :: CSV is a safe option. This is a quick script example of how it is used. You will notice that it does not preserve quotes, although it should, as I entered keep_meta_info

. If this matters to you, I'm sure there is a way to fix it.

use strict;
use warnings;
use Data::Dumper;

use Text::CSV;
my $csv = Text::CSV->new({
        binary => 1,
        eol => $/,
        keep_meta_info => 1,
});
while (my $row = $csv->getline(*DATA)) {
    for ($row->[6], $row->[8]) {
        s/\d\d\K\d\d//;
    }
    $csv->print(*STDOUT, $row);
}

__DATA__
2011-01-01,"AA",12478,31703,12892,32575,"0906",-4.00,"1209",-26.00,2475.00
2011-01-02,"AA",12478,31703,12892,32575,"0908",-2.00,"1236",1.00,2475.00
2011-01-03,"AA",12478,31703,12892,32575,"0907",-3.00,"1239",4.00,2475.00

      



Output:

2011-01-01,AA,12478,31703,12892,32575,09,-4.00,12,-26.00,2475.00
2011-01-02,AA,12478,31703,12892,32575,09,-2.00,12,1.00,2475.00
2011-01-03,AA,12478,31703,12892,32575,09,-3.00,12,4.00,2475.00

      

+2


source


I can also suggest my own solution which



s/"(\d\d)\d\d"/"$1"/g

      

+3


source







All Articles