How can I filter a specific column from a CSV file in Perl?

Question

How can I filter a specific column from a CSV file in Perl?

I am just a beginner in Perl and need some help filtering columns using a Perl script. I have about 10 comma separated columns in a file and I need to keep 5 columns in this file and get rid of all other columns from that file. How do we achieve this?

Thanks a lot for helping anyone.

cheers, Nile

0

perl csv

Neel 09 jan. '09 at 1:48

source to share

10 replies

Paul tomblin · Answer 1 · 2009-01-09T02:10:39+0000

Have a look at Text :: CSV (or Text :: CSV_XS ) for parsing CSV files in Perl. It is available on CPAN , or you can get it through your package manager if you are using Linux or another Unix-like OS. On Ubuntu, the package is named libtext-csv-perl.

It can handle cases such as fields that are quoted because they contain a comma, something that a simple split command cannot handle.

Josh lee · Answer 2 · 2009-01-09T01:57:55+0000

CSV is a bad, complex format (weird quoting, comma and space issues). Find a library that can handle the nuances for you, and also provide you with conveniences like indexing on column names.

Of course, if you just want to split the text file with commas, look no further than @Pax's solution.

paxdiablo · Answer 3 · 2009-01-09T01:56:36+0000

Use split to align the line and then output the ones you want (say every second column) create the following xx.pl file:

while(<STDIN>) {
    chomp;
    @fields = split (",",$_);
    print "$fields[1],$fields[3],$fields[5],$fields[7],$fields[9]\n"
}

then do:

$ echo 1,2,3,4,5,6,7,8,9,10 | perl xx.pl
2,4,6,8,10

PolyThinker · Answer 4 · 2009-01-09T02:02:22+0000

If you are talking about CSV files in windows (generated from Excel for example), you need to be careful to take care of fields that contain commas but are enclosed in quotes.

In this case, a simple split will not work.

oylenshpeegul · Answer 5 · 2009-01-09T02:24:24+0000

Alternatively, you can use Text :: ParseWords which is in the standard library. Add to

use Text::ParseWords;

at the top of the Pax example above and then replace

  my @fields = parse_line(q{,}, 0, $_);

for a split.

haytona · Answer 6 · 2009-01-30T03:04:21+0000

You can use some of Perl's built-in options to do this on the command line:

$ echo "1,2,3,4,5" | perl -a -F, -n -e 'print join(q{,}, $F[0], $F[3]).qq{\n}'

1,4

The above would be -a (utosplit) using -F (ield) comma. Then it joins the fields you are interested in and prints them back out (with a line separator). This assumes simple data with no nested commas. I did this with a non-printable field separator (\ x1d), so this is not a problem for me.

See http://perldoc.perl.org/perlrun.html#Command-Switches for details .

JVeldhuis · Answer 7 · 2009-01-17T20:21:42+0000

It looks like a search did not find a good csv compatible filtering program that is flexible to be useful than just one, so I wrote one. Enjoy.

Primary use:

bash $ csvfilter [-r <columnTitle>] * [-quote] <csv.file>

#! / usr / bin / perl

use strict;
use warnings;
use Getopt :: Long;

use Text :: CSV;

my $ always_quote = 0;

my @remove;
if (! GetOptions ('remove: s' => \ @remove,
          'quote-always' => sub {$ always_quote = 1;})) {
   die "$ 0: invalid option (use --remove [--quote-always])";
}

my @ cols2remove;

sub filter (@)
{
   my @ fields = @ _;
   my @r;
   my $ i = 0;
   for my $ c (@ cols2remove) {
       my $ p;
       #if ($ i $ i) {
       push (@r, splice (@fields, $ i));
   }
   return @r;
}

# create just one if these
my $ csvOut = new Text :: CSV ({always_quote => $ always_quote});

sub printLine (@)
{
    my @ fields = @ _;
    my $ combined = $ csvOut-> combine (filter (@fields));
    my $ str = $ csvOut-> string ();
    if (length ($ str)) {
     print "$ str \ n";
    }
}

my $ csv = Text :: CSV-> new ();

my $ od;
open ($ od, "| cat") || die "output: $!";
while () {
    $ csv-> parse ($ _);
    if ($. == 1) {
    my $ failures = 0;
    my @ cols = $ csv-> fields;
    for my $ rm (@remove) {
        for (my $ c = 0; $ c $ b} @ cols2remove);
    }
    printLine ($ csv-> fields);
}

exit (0);
\

Shlomi fish · Answer 8 · 2009-01-09T10:05:48+0000

In addition to what people here have said about handling comma separated files, I would like to point out that it is possible to extract even (or odd) array elements using an array and / or an array map:

@myarray[map { $_ * 2 } (0 .. 4)]

Hope it helps.

Jack M. · Answer 9 · 2009-01-09T17:01:49+0000

My personal favorite way to do CSV is the AnyData module . This seems to make things pretty straightforward, and dropping the named column can be done quite easily. Take a look at CPAN .

Sparr · Answer 10 · 2009-01-09T01:57:36+0000

This is the answer to a much larger question, but seems like a good relevant bit of information.

The unix cut command can do what you want (and more). This has been overridden in Perl .

How can I filter a specific column from a CSV file in Perl?

More articles: