Read lines from file, grep in second file and dump the file for each line $
I have the following two files:
sequences.txt
158333741 Acaryochloris_marina_MBIC11017_uid58167 158333741 432 1 432 COG0001 0
158339504 Acaryochloris_marina_MBIC11017_uid58167 158339504 491 1 491 COG0002 0
379012832 Acetobacterium_woodii_DSM_1030_uid88073 379012832 430 1 430 COG0001 0
302391336 Acetohalobium_arabaticum_DSM_5501_uid51423 302391336 441 1 441 COG0003 0
311103820 Achromobacter_xylosoxidans_A8_uid59899 311103820 425 1 425 COG0004 0
332795879 Acidianus_hospitalis_W1_uid66875 332795879 369 1 369 COG0005 0
332796307 Acidianus_hospitalis_W1_uid66875 332796307 416 1 416 COG0005 0
allids.txt
COG0001
COG0002
COG0003
COG0004
COG0005
Now I want to read each line in allids.txt
, search for all lines in sequences.txt
(especially column 7), and write for each line
to a allids.txt
file with the filename $line
.
my approach is to use a simple grep:
while read line; do
grep "$line" sequences.txt
done <allids.txt
but where can I include the command for output? If there is a team that is faster, feel free to suggest!
My expected output:
COG0001.txt
158333741 Acaryochloris_marina_MBIC11017_uid58167 158333741 432 1 432 COG0001 0
379012832 Acetobacterium_woodii_DSM_1030_uid88073 379012832 430 1 430 COG0001 0
COG0002.txt
158339504 Acaryochloris_marina_MBIC11017_uid58167 158339504 491 1 491 COG0002 0
[and so on]
source to share
Extending your approach seems to have worked:
while read line; do
# touching is not necessary as pointed out by @123
# touch "$line.txt"
grep "$line" sequences.txt > "$line.txt"
done <allids.txt
It creates text files with the required output. But I cannot comment on the effectiveness of this approach.
EDIT :
As noted in the comments, this method is slow and breaks for any file that violates the unreasonable assumptions used in the answer. I'm leaving this here to see how a quick and hacky solution can backfire.
source to share