Inconsistency between linux grep results and Ruby scan results

I have a list of DNA sequences (one per line):




Most sequences contain two specific motives, one towards the beginning and one towards the end. I extract sequences in between:

  • with grep: grep "motif1.*motif2" inputfile > outputfile

  • in ruby ​​with verification, where sequences

    is an array of DNA sequences:

     sequences.each do |seq|
      outputfile << tmp if tmp

The problem is that I am getting a different number of extracted sequences. Why?


source to share

1 answer

Ruby scan

returns an array with the corresponding default regexp parts. Grep doesn't do this, it returns the entire string with a match if color

set to auto

. To get matched parts only from, use -o


grep -o "motif1.*motif2" inputfile > outputfile


The previous command should keep the same output as scanning does.



All Articles