Inconsistency between linux grep results and Ruby scan results
I have a list of DNA sequences (one per line):
ACTGCTCGGGGG .....
CGCTCGCTTCTCTC ...
etc.
Most sequences contain two specific motives, one towards the beginning and one towards the end. I extract sequences in between:
- with grep:
grep "motif1.*motif2" inputfile > outputfile
-
in ruby ββwith verification, where
sequences
is an array of DNA sequences:sequences.each do |seq| tmp=seq.scan(/motif1.*motif2/)[0] outputfile << tmp if tmp end
The problem is that I am getting a different number of extracted sequences. Why?
source to share
Ruby scan
returns an array with the corresponding default regexp parts. Grep doesn't do this, it returns the entire string with a match if color
set to auto
. To get matched parts only fromgrep, use -o
.
grep -o "motif1.*motif2" inputfile > outputfile
The previous command should keep the same output as ruby scanning does.
source to share