Opening and editing multiple files in a folder using python
I am trying to modify my .fasta files like this:
>YP_009208724.1 hypothetical protein ADP65_00072 [Achromobacter phage phiAxp-3]
MSNVLLKQ...
>YP_009220341.1 terminase large subunit [Achromobacter phage phiAxp-1]
MRTPSKSE...
>YP_009226430.1 DNA packaging protein [Achromobacter phage phiAxp-2]
MMNSDAVI...
:
>Achromobacter phage phiAxp-3
MSNVLLKQ...
>Achromobacter phage phiAxp-1
MRTPSKSE...
>Achromobacter phage phiAxp-2
MMNSDAVI...
Now I already have a script that can do this in a single file:
with open('Achromobacter.fasta', 'r') as fasta_file:
out_file = open('./fastas3/Achromobacter.fasta', 'w')
for line in fasta_file:
line = line.rstrip()
if '[' in line:
line = line.split('[')[-1]
out_file.write('>' + line[:-1] + "\n")
else:
out_file.write(str(line) + "\n")
but I cannot automate the process for all 120 files in my folder.
I tried using glob.glob, but I cannot get it to work:
import glob
for fasta_file in glob.glob('*.fasta'):
outfile = open('./fastas3/'+fasta_file, 'w')
with open(fasta_file, 'r'):
for line in fasta_file:
line = line.rstrip()
if '[' in line:
line2 = line.split('[')[-1]
outfile.write('>' + line2[:-1] + "\n")
else:
outfile.write(str(line) + "\n")
it gives me this result:
A
c
i
n
e
t
o
b
a
c
t
e
r
.
f
a
s
t
a
I was able to get a list of all files in a folder, but I cannot open certain files using the object in the list.
import os
file_list = []
for file in os.listdir("./fastas2/"):
if file.endswith(".fasta"):
file_list.append(file)
source to share
Given that you can now change the contents of the filename, you need to automate the process. We changed the function for a single file by removing the file handler that was used twice to open the file.
def file_changer(filename):
data_to_put = ''
with open(filename, 'r+') as fasta_file:
for line in fasta_file.readlines():
line = line.rstrip()
if '[' in line:
line = line.split('[')[-1]
data_to_put += '>' + str(line[:-1]) + "\n"
else:
data_to_put += str(line) + "\n"
fasta_file.write(data_to_put)
fasta_file.close()
Now we need to iterate over all your files. So let's use the glob
module for it
import glob
for file in glob.glob('*.fasta'):
file_changer(file)
source to share
Iterates over the filename, which gives you all the characters in the name instead of the lines of the file. Here is the corrected version of the code:
import glob
for fasta_file_name in glob.glob('*.fasta'):
with open(fasta_file_name, 'r') as fasta_file, \
open('./fastas3/' + fasta_file_name, 'w') as outfile:
for line in fasta_file:
line = line.rstrip()
if '[' in line:
line2 = line.split('[')[-1]
outfile.write('>' + line2[:-1] + "\n")
else:
outfile.write(str(line) + "\n")
Alternatively to a Python script, you can simply use sed
from the command line:
sed -i 's/^>.*\[\(.*\)\].*$/>\1/' *.fasta
This will change all files in place, so copy them first.
source to share