How can I filter a specific column from a CSV file in Perl?

I am just a beginner in Perl and need some help filtering columns using a Perl script. I have about 10 comma separated columns in a file and I need to keep 5 columns in this file and get rid of all other columns from that file. How do we achieve this?

Thanks a lot for helping anyone.

cheers, Nile

0


source to share


10 replies


Have a look at Text :: CSV (or Text :: CSV_XS ) for parsing CSV files in Perl. It is available on CPAN , or you can get it through your package manager if you are using Linux or another Unix-like OS. On Ubuntu, the package is named libtext-csv-perl.



It can handle cases such as fields that are quoted because they contain a comma, something that a simple split command cannot handle.

+19


source


CSV is a bad, complex format (weird quoting, comma and space issues). Find a library that can handle the nuances for you, and also provide you with conveniences like indexing on column names.



Of course, if you just want to split the text file with commas, look no further than @Pax's solution.

+6


source


Use split to align the line and then output the ones you want (say every second column) create the following xx.pl file:

while(<STDIN>) {
    chomp;
    @fields = split (",",$_);
    print "$fields[1],$fields[3],$fields[5],$fields[7],$fields[9]\n"
}

      

then do:

$ echo 1,2,3,4,5,6,7,8,9,10 | perl xx.pl
2,4,6,8,10

      

+5


source


If you are talking about CSV files in windows (generated from Excel for example), you need to be careful to take care of fields that contain commas but are enclosed in quotes.

In this case, a simple split will not work.

+3


source


Alternatively, you can use Text :: ParseWords which is in the standard library. Add to

use Text::ParseWords;

      

at the top of the Pax example above and then replace

  my @fields = parse_line(q{,}, 0, $_);

      

for a split.

+2


source


You can use some of Perl's built-in options to do this on the command line:

$ echo "1,2,3,4,5" | perl -a -F, -n -e 'print join(q{,}, $F[0], $F[3]).qq{\n}'

1,4

The above would be -a (utosplit) using -F (ield) comma. Then it joins the fields you are interested in and prints them back out (with a line separator). This assumes simple data with no nested commas. I did this with a non-printable field separator (\ x1d), so this is not a problem for me.

See http://perldoc.perl.org/perlrun.html#Command-Switches for details .

+2


source


It looks like a search did not find a good csv compatible filtering program that is flexible to be useful than just one, so I wrote one. Enjoy.

Primary use:

bash $ csvfilter [-r <columnTitle>] * [-quote] <csv.file>

#! / usr / bin / perl

use strict;
use warnings;
use Getopt :: Long;

use Text :: CSV;

my $ always_quote = 0;

my @remove;
if (! GetOptions ('remove: s' => \ @remove,
          'quote-always' => sub {$ always_quote = 1;})) {
   die "$ 0: invalid option (use --remove [--quote-always])";
}

my @ cols2remove;

sub filter (@)
{
   my @ fields = @ _;
   my @r;
   my $ i = 0;
   for my $ c (@ cols2remove) {
       my $ p;
       #if ($ i $ i) {
       push (@r, splice (@fields, $ i));
   }
   return @r;
}

# create just one if these
my $ csvOut = new Text :: CSV ({always_quote => $ always_quote});

sub printLine (@)
{
    my @ fields = @ _;
    my $ combined = $ csvOut-> combine (filter (@fields));
    my $ str = $ csvOut-> string ();
    if (length ($ str)) {
     print "$ str \ n";
    }
}

my $ csv = Text :: CSV-> new ();

my $ od;
open ($ od, "| cat") || die "output: $!";
while () {
    $ csv-> parse ($ _);
    if ($. == 1) {
    my $ failures = 0;
    my @ cols = $ csv-> fields;
    for my $ rm (@remove) {
        for (my $ c = 0; $ c $ b} @ cols2remove);
    }
    printLine ($ csv-> fields);
}

exit (0);
\
+1


source


In addition to what people here have said about handling comma separated files, I would like to point out that it is possible to extract even (or odd) array elements using an array and / or an array map:

@myarray[map { $_ * 2 } (0 .. 4)]

      

Hope it helps.

0


source


My personal favorite way to do CSV is the AnyData module . This seems to make things pretty straightforward, and dropping the named column can be done quite easily. Take a look at CPAN .

0


source


This is the answer to a much larger question, but seems like a good relevant bit of information.

The unix cut command can do what you want (and more). This has been overridden in Perl .

-2


source







All Articles