Opening and editing multiple files in a folder using python

I am trying to modify my .fasta files like this:

>YP_009208724.1 hypothetical protein ADP65_00072 [Achromobacter phage phiAxp-3]
MSNVLLKQ...

>YP_009220341.1 terminase large subunit [Achromobacter phage phiAxp-1]
MRTPSKSE...

>YP_009226430.1 DNA packaging protein [Achromobacter phage phiAxp-2]
MMNSDAVI...

      

:

>Achromobacter phage phiAxp-3
MSNVLLKQ...

>Achromobacter phage phiAxp-1
MRTPSKSE...

>Achromobacter phage phiAxp-2
MMNSDAVI...

      

Now I already have a script that can do this in a single file:

with open('Achromobacter.fasta', 'r') as fasta_file:
    out_file = open('./fastas3/Achromobacter.fasta', 'w')
    for line in fasta_file:
        line = line.rstrip()
        if '[' in line:
            line = line.split('[')[-1]
            out_file.write('>' + line[:-1] + "\n")
        else:
            out_file.write(str(line) + "\n")

      

but I cannot automate the process for all 120 files in my folder.

I tried using glob.glob, but I cannot get it to work:

import glob

for fasta_file in glob.glob('*.fasta'):
    outfile = open('./fastas3/'+fasta_file, 'w')
    with open(fasta_file, 'r'):
        for line in fasta_file:
            line = line.rstrip()
            if '[' in line:
                line2 = line.split('[')[-1]
                outfile.write('>' + line2[:-1] + "\n")
            else:
                outfile.write(str(line) + "\n")

      

it gives me this result:

A
c
i
n
e
t
o
b
a
c
t
e
r
.
f
a
s
t
a

      

I was able to get a list of all files in a folder, but I cannot open certain files using the object in the list.

import os


file_list = []
for file in os.listdir("./fastas2/"):
    if file.endswith(".fasta"):
        file_list.append(file)

      

+3
python edit biopython


source to share


2 answers


Given that you can now change the contents of the filename, you need to automate the process. We changed the function for a single file by removing the file handler that was used twice to open the file.

def file_changer(filename):
    data_to_put = ''
    with open(filename, 'r+') as fasta_file:
        for line in fasta_file.readlines():
            line = line.rstrip()
            if '[' in line:
                line = line.split('[')[-1]
                data_to_put += '>' + str(line[:-1]) + "\n"
            else:
                data_to_put += str(line) + "\n"
        fasta_file.write(data_to_put) 
        fasta_file.close()

      



Now we need to iterate over all your files. So let's use the glob

module for it

import glob
for file in glob.glob('*.fasta'):
    file_changer(file)

      

+2


source to share


Iterates over the filename, which gives you all the characters in the name instead of the lines of the file. Here is the corrected version of the code:

import glob

for fasta_file_name in glob.glob('*.fasta'):
    with open(fasta_file_name, 'r') as fasta_file, \
            open('./fastas3/' + fasta_file_name, 'w') as outfile:
        for line in fasta_file:
            line = line.rstrip()
            if '[' in line:
                line2 = line.split('[')[-1]
                outfile.write('>' + line2[:-1] + "\n")
            else:
                outfile.write(str(line) + "\n")

      

Alternatively to a Python script, you can simply use sed

from the command line:



sed -i 's/^>.*\[\(.*\)\].*$/>\1/' *.fasta

      

This will change all files in place, so copy them first.

+1


source to share







All Articles
Loading...
X
Show
Funny
Dev
Pics