Inconsistency between linux grep results and Ruby scan results

Question

Inconsistency between linux grep results and Ruby scan results

I have a list of DNA sequences (one per line):

ACTGCTCGGGGG .....

CGCTCGCTTCTCTC ...

etc.

Most sequences contain two specific motives, one towards the beginning and one towards the end. I extract sequences in between:

with grep: grep "motif1.*motif2" inputfile > outputfile

in ruby with verification, where sequences

is an array of DNA sequences:

 sequences.each do |seq|
  tmp=seq.scan(/motif1.*motif2/)[0]
  outputfile << tmp if tmp
 end

The problem is that I am getting a different number of extracted sequences. Why?

+3

string ruby regex grep

kwicher May 29 '15 at 21:58

source to share

1 answer

ShellFish · Accepted Answer · 2015-05-29T22:11:31+0000

Ruby scan

returns an array with the corresponding default regexp parts. Grep doesn't do this, it returns the entire string with a match if color

set to auto

. To get matched parts only fromgrep, use -o

.

grep -o "motif1.*motif2" inputfile > outputfile

The previous command should keep the same output as ruby scanning does.

Inconsistency between linux grep results and Ruby scan results

More articles: