Inconsistency between linux grep results and Ruby scan results

I have a list of DNA sequences (one per line):

ACTGCTCGGGGG .....

CGCTCGCTTCTCTC ...

etc.

Most sequences contain two specific motives, one towards the beginning and one towards the end. I extract sequences in between:

  • with grep: grep "motif1.*motif2" inputfile > outputfile

  • in ruby ​​with verification, where sequences

    is an array of DNA sequences:

     sequences.each do |seq|
      tmp=seq.scan(/motif1.*motif2/)[0]
      outputfile << tmp if tmp
     end
    
          

The problem is that I am getting a different number of extracted sequences. Why?

+3


source to share


1 answer


Ruby scan

returns an array with the corresponding default regexp parts. Grep doesn't do this, it returns the entire string with a match if color

set to auto

. To get matched parts only from, use -o

.

grep -o "motif1.*motif2" inputfile > outputfile

      



The previous command should keep the same output as scanning does.

+2


source







All Articles