How to average the column values ββfrom a tab delimited data file, ignoring the header row and left column?
First of all, I apologize if this or a similar request has been posted before, but I took steps and looked here and beyond, which is why I resort to asking the question, which I have rarely ever done.
Background before my request - I am the last undergraduate biomedical student and I decided to take the Bioinformatics paper - a document that just started being offered at my university this year. I thought it would be a good change, but now that I've had the experience for two weeks, I don't find it particularly appealing. Difficult, yes, but not attractive because I have never done any programming in my life and I expect to find out so suddenly. So everything I present to you is a complete "beginner" and I admit that I literally have little knowledge of how to do something and really want to try and learn.
Anyway, at my request ...
My task is to compute averages from the following data file called Lab1_table.txt
:
retrovirus genome gag pol env
HIV-1 9181 1503 3006 2571
FIV 9474 1353 2993 2571
KoRV 8431 1566 3384 1980
GaLV 8088 1563 3498 2058
PERV 8072 1560 3621 1532
I need to write a script, which will open and read the file, reading each line, dividing the contents of an array and a computer, the average number of numeric values ( genome
, gag
, pol
, env
) and record a new file from the average value of each of the above columns.
I've tried my best to figure out how to ignore the first row or first column, but every time I try to execute on the command line, I keep coming up with explicit package name errors.
Global symbol @average requires explicit package name at line 23.
Global symbol @average requires explicit package name at line 29.
Execution aborted due to compilation errors.
I understand what this includes @
and $
, but even knowing that I was unable to change the errors.
This is my code, but I stress that I am a beginner only started this last week:
#!/usr/bin/perl -w
use strict;
my $infile = "Lab1_table.txt"; # This is the file path
open INFILE, $infile or die "Can't open $infile: $!";
my $count = 0;
my $average = ();
while (<INFILE>) {
chomp;
my @columns = split /\t/;
$count++;
if ( $count == 1 ) {
$average = @columns;
}
else {
for( my $i = 1; $i < scalar $average; $i++ ) {
$average[$i] += $columns[$i];
}
}
}
for( my $i = 1; $i < scalar $average; $i++ ) {
print $average[$i]/$count, "\n";
}
I would appreciate any insight, and I would also greatly appreciate it if you can provide me with a list that lists everything you do, at every step - as needed. I would like to know, and it would make more sense to me if I could read what someone was handling.
Here are the points you need to change
Use a different variable for headers
my $count = 0;
my @header = ();
my @average = ();
then change the logic inside the if statement
if ( $count == 1 ) {
@header = @columns;
}
Now don't use @average
for constraint, use operator $i < scalar @columns
for else. Initially @average
zero, you never get inside a for loop.
else {
for( my $i = 1; $i < scalar @columns; $i++ ) {
$average[$i] += $columns[$i];
}
}
Finally add -1
to your counter. Remember you are incrementing the counter when parsing the header
for( my $i = 1; $i < scalar @average; $i++ ) {
print $average[$i]/($count-1), "\n";
}
Here is the final code
you can use @header
to display the result neatly
#!/usr/bin/perl -w
use strict;
my $infile = "Lab1_table.txt"; # This is the file path
open INFILE, $infile or die "Can't open $infile: $!";
my $count = 0;
my @header = ();
my @average = ();
while (<INFILE>) {
chomp;
my @columns = split /\t/;
$count++;
if ( $count == 1 ) {
@header = @columns;
}
else {
for( my $i = 1; $i < scalar @columns; $i++ ) {
$average[$i] += $columns[$i];
}
}
}
for( my $i = 1; $i < scalar @average; $i++ ) {
print $average[$i]/($count-1), "\n";
}
There are other ways of writing this code, but I thought it would be best to fix your code so that you can easily understand what is wrong with your code. Hope it helps.