Preserve quotes in CSV fields that were specified in the input

I have a CSV file so that some of the fields are quoted regardless of whether they should be. What I want to do is load this file, change some of the values, and create a modified CSV with the fields saved.

I am currently using the Perl Text :: CSV package to try and fix this problem, but ran into a small obstacle. Below is a small test script to demonstrate the problem:

use Text::CSV;

my $csv = Text::CSV->new ({'binary' => 1, 'allow_loose_quotes' => 1, 'keep_meta_info' => 1});
my $line = q^hello,"world"^;

print qq^input:  $line\n^;

$csv->parse($line);
my @flds = $csv->fields();
$csv->combine(@flds);

print 'output:  ', $csv->string(), "\n";

      

gives:

input:  hello,"world"
output:  hello,world

      

According to the Text :: CSV documentation there is an is_quoted () function to check if a field has been entered in the input, but if I use that to add surrounding quotes to the field, I get unexpected results:

my $csv = Text::CSV->new ({'binary' => 1, 'allow_loose_quotes' => 1, 'keep_meta_info' => 1});
my $line = q^hello,"world"^;

print qq^input:  $line\n^;

$csv->parse($line);
my @flds = $csv->fields();

for my $idx (0..$#flds) {
    if ($csv->is_quoted($idx)) {
            $flds[$idx] = qq^"$flds[$idx]"^;
    }
}

$csv->combine(@flds);

print 'output:  ', $csv->string(), "\n";

      

Production:

input:  hello,"world"
output:  hello,"""world"""

      

where I believe the quotes I added before combine()

are treated as part of the field and therefore they are escaped with the second double quote when processed combine()

.

What would be the best way to ensure that the quote fields remain the same from input to output? I'm not sure if the application will accept always_quote

'ed fields ... Is there some combination of attributes on the Text :: CSV object that will keep the quotes immutable? Or perhaps I got away with setting up the entry after << 27>?

+3


source to share


1 answer


It's a shame, but it seems that while keep_meta_info

giving you access to the metadata, there is no way to tell you to Text::CSV

reapply the state is_quoted

on the output.

Depending on how complex your recording is, you can simply put it together yourself. But then you have to deal with changes to string fields that were previously safely ordered but after processing now require quotes. This will depend on the types of changes you make, i.e. Do you expect a previously "safe" string value to become unsafe. If the answer is never (ie 0.00000% probability), you should simply build yourself and document what you did.

Post-processing will require CSV parsing of the string to handle the possibility of commas and other unsafe characters within strings, so this might not be an option.

Or you can dive into the code for Text::CSV

and implement the functionality you want . That is, they allow the user to force the citation of a specific field in the output. I've been playing around with it and it looks like some of the required mechanism might be in place, but unfortunately I have access to an XS version that delegates native code, so I can't go deeper at this time. This is how I understood it:



The original combine

method. Note the setting _FFLAGS

on undef

.

sub combine
{
    my $self = shift;
    my $str  = "";
    $self->{_FIELDS} = \@_;
    $self->{_FFLAGS} = undef;
    $self->{_STATUS} = (@_ > 0) && $self->Combine (\$str, \@_, 0);
    $self->{_STRING} = \$str;
    $self->{_STATUS};
    } # combine

      

My attempt. I assumed the second argument combine

could be a flag, but since the (string) combine

API is based on getting an array, not an array, there is no way to pass two arrays. I changed it to expect two arrayrefs and tried to pass the second one to combine

, but that failed with the "Cannot call method" to print "on non-object reference" method.

sub combine2
{
    my $self = shift;
    my $str  = "";
    my $f    = shift;
    my $g    = shift;
    $self->{_FIELDS} = $f;
    $self->{_FFLAGS} = $g;
    $self->{_STATUS} = (@$f > 0) && $self->Combine (\$str, $f, $g);
    $self->{_STRING} = \$str;
    $self->{_STATUS};
    } # combine

      

+2


source







All Articles