How can I filter a specific column from a CSV file in Perl?
I am just a beginner in Perl and need some help filtering columns using a Perl script. I have about 10 comma separated columns in a file and I need to keep 5 columns in this file and get rid of all other columns from that file. How do we achieve this?
Thanks a lot for helping anyone.
cheers, Nile
source to share
Have a look at Text :: CSV (or Text :: CSV_XS ) for parsing CSV files in Perl. It is available on CPAN , or you can get it through your package manager if you are using Linux or another Unix-like OS. On Ubuntu, the package is named libtext-csv-perl.
It can handle cases such as fields that are quoted because they contain a comma, something that a simple split command cannot handle.
source to share
CSV is a bad, complex format (weird quoting, comma and space issues). Find a library that can handle the nuances for you, and also provide you with conveniences like indexing on column names.
Of course, if you just want to split the text file with commas, look no further than @Pax's solution.
source to share
Use split to align the line and then output the ones you want (say every second column) create the following xx.pl file:
while(<STDIN>) {
chomp;
@fields = split (",",$_);
print "$fields[1],$fields[3],$fields[5],$fields[7],$fields[9]\n"
}
then do:
$ echo 1,2,3,4,5,6,7,8,9,10 | perl xx.pl
2,4,6,8,10
source to share
Alternatively, you can use Text :: ParseWords which is in the standard library. Add to
use Text::ParseWords;
at the top of the Pax example above and then replace
my @fields = parse_line(q{,}, 0, $_);
for a split.
source to share
You can use some of Perl's built-in options to do this on the command line:
$ echo "1,2,3,4,5" | perl -a -F, -n -e 'print join(q{,}, $F[0], $F[3]).qq{\n}'
1,4
The above would be -a (utosplit) using -F (ield) comma. Then it joins the fields you are interested in and prints them back out (with a line separator). This assumes simple data with no nested commas. I did this with a non-printable field separator (\ x1d), so this is not a problem for me.
See http://perldoc.perl.org/perlrun.html#Command-Switches for details .
source to share
It looks like a search did not find a good csv compatible filtering program that is flexible to be useful than just one, so I wrote one. Enjoy.
Primary use:
bash $ csvfilter [-r <columnTitle>] * [-quote] <csv.file>
#! / usr / bin / perl use strict; use warnings; use Getopt :: Long; use Text :: CSV; my $ always_quote = 0; my @remove; if (! GetOptions ('remove: s' => \ @remove, 'quote-always' => sub {$ always_quote = 1;})) { die "$ 0: invalid option (use --remove [--quote-always])"; } my @ cols2remove; sub filter (@) { my @ fields = @ _; my @r; my $ i = 0; for my $ c (@ cols2remove) { my $ p; #if ($ i $ i) { push (@r, splice (@fields, $ i)); } return @r; } # create just one if these my $ csvOut = new Text :: CSV ({always_quote => $ always_quote}); sub printLine (@) { my @ fields = @ _; my $ combined = $ csvOut-> combine (filter (@fields)); my $ str = $ csvOut-> string (); if (length ($ str)) { print "$ str \ n"; } } my $ csv = Text :: CSV-> new (); my $ od; open ($ od, "| cat") || die "output: $!"; while () { $ csv-> parse ($ _); if ($. == 1) { my $ failures = 0; my @ cols = $ csv-> fields; for my $ rm (@remove) { for (my $ c = 0; $ c $ b} @ cols2remove); } printLine ($ csv-> fields); } exit (0); \
My personal favorite way to do CSV is the AnyData module . This seems to make things pretty straightforward, and dropping the named column can be done quite easily. Take a look at CPAN .
source to share
This is the answer to a much larger question, but seems like a good relevant bit of information.
The unix cut command can do what you want (and more). This has been overridden in Perl .
source to share