Using the tr /// operator to count letters in a string
I would like to count the number A, C and G in a sequence or string. I wrote the following code.
But when I print the values, only A is printed. C and G are displayed as zero. In the code below, I evaluate A first, but if I switch the order first, evaluating C first, I get C values, but now A and G are printed out as zero.
Can anyone tell me what is wrong with my code? Thank!
#! /usr/bin/perl
use strict;
use warnings;
open(IN, "200BP_junctions_fasta.faa") or die "Cannot open the file: $!\n";
while(<IN>)
next if $_ =~ /\>/;
my $a = ($_ = tr/A//);
my $c = ($_ = tr/C//);
my $g = ($_ = tr/G//);
print "A:$a, C:$c, G:$g\n";
}
The file looks like this:
> A_Seq
ATGCTAGCTAGCTAGCTAGTC
> B_Seq
ATGCGATCGATCGATCGATAG
source to share
Because it '5'
doesn't have 'C'
or 'G'
. You assign the translation value $_
to $_
. If you bind the ( $_ =~ tr//
) operation to $_
, you get the desired result.
But you don't really need to bind to the context variable. Binding is where you can apply a regular expression or translate an operation to another variable. You'd better write:
my $a = tr/A//;
my $c = tr/C//;
my $g = tr/G//;
But you can do this too:
$_{$_}++ foreach m/[ACG]/g;
say "A:$_{A}, C:$_{C}, G:$_{G}";
source to share
The answer is that you need the bind operator =~
instead of the operat0r assignment, =
or you don't need to bind the default variable.
Lately I've been using printf
for things like this:
while( <DATA> ) {
next if /\>/;
printf "A:%s C:%s G:%s\n", tr/A//, tr/C//, tr/G//;
}
I often wanted to tr///
be able to interpolate so that I could write this, which doesn't work:
while( my $line = <DATA> ) {
next if $line =~ /\>/;
print "Line is $_\n";
printf "A:%s C:%s G:%s\n", map { $line =~ tr/$_// } qw(A C G);
}
Note that I would have unnecessary annoyance on the counter $_
if I used the default variable in while
. I know I can do it eval
, but not only is it more of a hassle, but l4m3:
while( my $line = <DATA> ) {
next if $line =~ /\>/;
print "Line is $_\n";
printf "A:%s C:%s G:%s\n", map { eval "\$line =~ tr/$_//" } qw(A C G);
}
I didn't need to know the implementation details, so I could move this into a sub until I can figure out how to get rid of eval
, although additional subroutine calls can slow down a lot of data processing:
while( my $line = <DATA> ) {
next if $line =~ /\>/;
print "Line is $line\n";
printf "A:%s C:%s G:%s\n", map { count_bases( $line, $_ ) } qw(A C G);
}
sub count_bases { eval "\$_[0] =~ tr/$_[1]//" }
Probably some clever way to do XOR strings if you don't like it tr///
, but I've never chased it long enough to figure it out (not that it's better than what you're already doing).
source to share