How to average the column values ​​from a tab delimited data file, ignoring the header row and left column?

First of all, I apologize if this or a similar request has been posted before, but I took steps and looked here and beyond, which is why I resort to asking the question, which I have rarely ever done.

Background before my request - I am the last undergraduate biomedical student and I decided to take the Bioinformatics paper - a document that just started being offered at my university this year. I thought it would be a good change, but now that I've had the experience for two weeks, I don't find it particularly appealing. Difficult, yes, but not attractive because I have never done any programming in my life and I expect to find out so suddenly. So everything I present to you is a complete "beginner" and I admit that I literally have little knowledge of how to do something and really want to try and learn.

Anyway, at my request ...

My task is to compute averages from the following data file called Lab1_table.txt

:

retrovirus      genome  gag     pol     env
HIV-1           9181    1503    3006    2571
FIV             9474    1353    2993    2571
KoRV            8431    1566    3384    1980
GaLV            8088    1563    3498    2058
PERV            8072    1560    3621    1532

      

I need to write a script, which will open and read the file, reading each line, dividing the contents of an array and a computer, the average number of numeric values ( genome

, gag

, pol

, env

) and record a new file from the average value of each of the above columns.

I've tried my best to figure out how to ignore the first row or first column, but every time I try to execute on the command line, I keep coming up with explicit package name errors.

Global symbol @average requires explicit package name at line 23.
Global symbol @average requires explicit package name at line 29.
Execution aborted due to compilation errors.

      

I understand what this includes @

and $

, but even knowing that I was unable to change the errors.

This is my code, but I stress that I am a beginner only started this last week:

#!/usr/bin/perl -w
use strict;

my $infile = "Lab1_table.txt"; # This is the file path
open INFILE, $infile or die "Can't open $infile: $!";

my $count = 0;
my $average = ();

while (<INFILE>) {
    chomp;
    my @columns = split /\t/;
    $count++;
    if ( $count == 1 ) {
        $average = @columns;
    }
    else {
        for( my $i = 1; $i < scalar $average; $i++ )  {
            $average[$i] += $columns[$i];
        }
    }
}

for( my $i = 1; $i < scalar $average; $i++ ) {
    print $average[$i]/$count, "\n";
}

      

I would appreciate any insight, and I would also greatly appreciate it if you can provide me with a list that lists everything you do, at every step - as needed. I would like to know, and it would make more sense to me if I could read what someone was handling.

+3


source to share


1 answer


Here are the points you need to change
Use a different variable for headers

my $count = 0;
my @header = ();
my @average = ();

      

then change the logic inside the if statement

if ( $count == 1 ) {
    @header = @columns;
}

      

Now don't use @average

for constraint, use operator $i < scalar @columns

for else. Initially @average

zero, you never get inside a for loop.

else {
    for( my $i = 1; $i < scalar @columns; $i++ )  {
        $average[$i] += $columns[$i];
    }
}

      



Finally add -1

to your counter. Remember you are incrementing the counter when parsing the header

for( my $i = 1; $i < scalar @average; $i++ ) {
    print $average[$i]/($count-1), "\n";
}

      

Here is the final code
you can use @header

to display the result neatly

#!/usr/bin/perl -w

use strict;

my $infile = "Lab1_table.txt"; # This is the file path
open INFILE, $infile or die "Can't open $infile: $!"; 

my $count = 0;
my @header = ();
my @average = ();

while (<INFILE>) {
    chomp;


    my @columns = split /\t/;
    $count++;
    if ( $count == 1 ) {
        @header = @columns;
    }
    else {
        for( my $i = 1; $i < scalar @columns; $i++ )  {
            $average[$i] += $columns[$i];
        }
    }
} 

for( my $i = 1; $i < scalar @average; $i++ ) {
    print $average[$i]/($count-1), "\n";
}

      

There are other ways of writing this code, but I thought it would be best to fix your code so that you can easily understand what is wrong with your code. Hope it helps.

+2


source







All Articles