Nested quotes in the Perl system ()

I am trying to modify a perl script. Here's the part I'm trying to change:

Original:

        system ("tblastn -db $BLASTDB -query $TMP/prot$$.fa \\
             -word_size 6 -max_target_seqs 5 -seg yes -num_threads $THREADS -lcase_masking \\
             -outfmt \"7 sseqid sstart send sframe bitscore qseqid\"\\
             > $TMP/blast$$") && die "Can't run tblastn\n";

      

I am trying to replace the system ("tblastn .....") like this:

system ("cat $TMP/prot$$.fa | parallel --block 50k --recstart '>' --pipe \\ tblastn -db $BLASTDB -query - -word_size 6 -outfmt \'7 sseqid sstart send sframe bitscore qseqid\' -max_target_seqs 5 -seg yes -lcase_masking > $TMP/blast$$") && die "Can't run tblastn\n";

      

This replaces the regular tblastx program with GNU parallel, which issues the tblastx command. Executing the above command in bash (replacing temp input with actual files) works fine, but when trying to execute a perl script, the error log (for tblastx) says it finished too early, after sseqids. The same error occurs if you run the same command with no escape characters in bash.

Because of this, I am guessing that the error is related to the only quote around "7 ssequids sstart ..." which is not being handled properly. I'm not sure how to do nested quotes correctly in perl. I thought I was doing it right as it works through bash but not through a perl script. I've looked a lot of perl documentation and everything says that the escape character \ should work with quotes or double quotes ... but for my instance it doesn't work.

Can anyone point out why the quotes are not being processed?

+3


source to share


2 answers


The problem here is almost certainly interpolation related. Each time you "lay out", you expand a different layer of quotes. What it does inside the quotes is whether you are doing a double quote "

- it interpolates or a single quote '

and then treats it as a literal before moving on to the next shell.

See perlop

in regards to how perl is quoted. I suggest that you try putting the team together like this:

my $parallel = q{parallel --block 50k --recstart '>' --pipe};
my $outfmt = q{'7 sseqid sstart send sframe bitscore qseqid'};

print $parallel,"\n";
print $outfmt,"\n";

my $command = "cat $TMP/prot$$.fa | $parallel \\ tblastn -db $BLASTDB -query - -word_size 6 -outfmt $outfmt -max_target_seqs 5 -seg yes -lcase_masking > $TMP/blast$$";

print $command; 
system ( $command );

      

(obviously by checking what your "command" looks like right before passing it on to the system)

But can I suggest a different approach? How about instead of nesting cat

and parallel

you can do it natively in perl

.



I am late. I'm not entirely familiar with the command you are running, but it would be something like this:

#!/usr/bin/perl

use strict;
use warnings;

open( my $input, "<", "$TMP/prot$$.fa" ) or die $!;

my $fork_manager = Parallel::ForkManager->new($THREADS);

while ( my $line = <$input> ) {
    $fork_manager->start and next;
    chomp $line;
    system(
        "tblastn -db $BLASTDB -query $line \\
                 -word_size 6 -max_target_seqs 5 -seg yes  -lcase_masking \\
                 -outfmt \"7 sseqid sstart send sframe bitscore qseqid\"\\
                 > $TMP/blast$$"
    ) && die "Can't run tblastn\n";
    $fork_manager->finish;
}
close ( $input );

      

If coalescing pooling is desired, I would probably switch to using streams:

#!/usr/bin/perl

use strict;
use warnings;
use IPC::Open2;
use threads;
use Thread::Queue; 

my $num_threads = 8; 

my $work_q = Thread::Queue -> new(); 
my $results_q = Thread::Queue -> new(); 

sub worker {
    open2 ( my $blast_out, my $blast_in, "tblastn -db $BLASTDB -query - -word_size 6 -outfmt '7 sseqid sstart send sframe bitscore qseqid' -max_target_seqs 5 -seg yes -lcase_masking");
    while ( my $query = $work_q -> dequeue ) {
        print {$blast_in} $query;
        $results_q -> enqueue ( <$blast_out> ); #one line - you'll need something different for multi-line results.
    }
    close ( $blast_out );
    close ( $blast_in ); 
}

sub collate_results {
    open ( my $output, "$TMP/results.$$" ) or die $!; 
    while ( my $result = $results_q -> dequeue ) {
        print {$output} $result,"\n"; 
    }
    close ( $output ); 
}

my @workers; 
for (1..$num_threads) {
    push ( @workers, threads -> create ( \&worker ) ); 
}

my $collator = threads -> create ( \&collate_results ); 

open( my $input, "<", "$TMP/prot$$.fa" ) or die $!;
while ( my $line = <$input> ) {
    chomp $line;
    $work_q -> enqueue ( $line ); 
}
close ( $input );
$work_q -> end;

foreach my $thr ( @workers ) { 
    $thr -> join(); 
}
$results_q -> end;

$collator -> join; 

      

Now I understand that both of them can look a little more confusing and complex. But they are more examples of how to do perl in parallel, because in doing so you have more power and flexibility than you are, by running perl but shelling out to do something.

+1


source


The quote is bitch. Quoting twice is a bitch.

Check first that what you think is actually running. Where you print STDERR

can work wonders.

In your case, I think this will solve it:

my $TMP = $ENV{'TMP'};
my $BLASTDB = $ENV{'BLASTDB'};
my $cmd = qq{cat $TMP/prot$$.fa | parallel --block 50k --recstart '>' --pipe tblastn -db $BLASTDB -query - -word_size 6 -outfmt \\''7 sseqid sstart send sframe bitscore qseqid'\\' -max_target_seqs 5 -seg yes -lcase_masking > $TMP/blast$$};
print STDERR $cmd,"\n"; # Remove this when it works.
system($cmd) && die "Can't run tblastn\n";

      



If you are going to read $TMP/blast$$

and delete it again , you can do this instead:

my $TMP = $ENV{'TMP'};
my $BLASTDB = $ENV{'BLASTDB'};
open(my $fh, "-|", qq{cat $TMP/prot$$.fa | parallel --block 50k --recstart '>' --pipe tblastn -db $BLASTDB -query - -word_size 6 -outfmt \\''7 sseqid sstart send sframe bitscore qseqid'\\' -max_target_seqs 5 -seg yes -lcase_masking}) || die "Can't run tblastn\n";
while(<$fh>) { ... }
close $fh;

      

This will avoid creating a temporary file, and if it $TMP

can be written by an attacker, it will plug a security hole as well. As an added bonus, you will receive data earlier, as you do not have to wait for all tasks to be completed.

+1


source







All Articles