Converting FASTA to GenBank

Is there a way to use BioPython to convert FASTA files to Genbank format? There are many answers on how to convert from Genbank to FASTA, but not vice versa.

+3


source to share


2 answers


before converting, you must assign an alphabet to the sequence (DNA or protein)

from Bio import SeqIO
from Bio.Alphabet import generic_dna, generic_protein

input_handle = open("test.fasta", "rU")
output_handle = open("test.gb", "w")

sequences = list(SeqIO.parse(input_handle, "fasta"))

#asign generic_dna or generic_protein
for seq in sequences:
  seq.seq.alphabet = generic_dna

count = SeqIO.write(sequences, output_handle, "genbank")

output_handle.close()
input_handle.close()
print "Coverted %i records" % count

      

for input:



> I28Q9A102FII8J rank = 0668881 x = 2144.0 y = 1105.0 length = 418
ACGTCATGAGAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGATGAA
GCTCCAGCTTGCTGGGGTGGATTAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCTTGACTCTGGGAT
AAGCGTTGGAAACGACGTCTAATACCGGATATGACGACCGATGGCATCATCTGGTTGTGGAAAGAATTTTGGTC
AAGGATGGACTCGCGGCCTATCAGGTAGTTGGTGAGGTAATGGCTCACCAAGCCTACGACGGGTAGCCGGCCTG
AGAGGGTGACCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCA
CAATGGGCGAAAGCCTGATGCAGCAACGCCGCGTGAGGGATGACGGCC
> I28Q9A102JMH72 rank = 0320459 x = 3829.0 y = 3120.0 length = 512
ACGTCATGAGAGTTTGATCCTGGCTCAGGATGAACGCTAGCGGCAGGCTTAACACATGCAAGTCGAGGGTAGAA
ATAGCTTGCTATTTTGAGACCGGCGCACGGGTGCGTAACGCGTATGCAATCTGCCTTTTACAGGGGAATAGCCC
AGAGAAATTTGGATTAATGCCCCATAGCGCTGCAGGGCGGCATCGCCGAGCAGCTAAAGTCACAACGGTAAAGA
TGAGCATGCGTCCCATTAGCTAGTTGGTAAGGTAACGGCTTACCAAGGCGATGATGGGTAGGGTCCTGAGAGGG
AGATCCCCCACACTGGTACTGAGACACGGACCAGACTCCTACGGGAGGCAGCAGTGAGGAATATTGGTCAATGG
GCGCAAGCCTGAACCAGCCATGCCGCGTGCAGGATGAAGGCCTTCGGGTTGTAAACTGCTTTTGACGGAACGAA
AAAGCT

You get:

LOCUS I28Q9A102FII8J 418 bp DNA UNK 01-JAN-1980
DEFINITION I28Q9A102FII8J rank = 0668881 x = 2144.0 y = 1105.0 length = 418
ACCESSION I28Q9A102FII8J
VERSION I28Q9A102FII8J
KEYWORDS.
SOURCE.
  ORGANISM.
            ...
FEATURES Location / Qualifiers
ORIGIN
        1 acgtcatgag agtttgatca tggctcagga cgaacgctgg cggcgtgctt aacacatgca
       61 agtcgaacga tgaagctcca gcttgctggg gtggattagt ggcgaacggg tgagtaacac
      121 gtgagtaacc tgcccttgac tctgggataa gcgttggaaa cgacgtctaa taccggatat
      181 gacgaccgat ggcatcatct ggttgtggaa agaattttgg tcaaggatgg actcgcggcc
      241 tatcaggtag ttggtgaggt aatggctcac caagcctacg acgggtagcc ggcctgagag
      301 ggtgaccggc cacactggga ctgagacacg gcccagactc ctacgggagg cagcagtggg
      361 gaatattgca caatgggcga aagcctgatg cagcaacgcc gcgtgaggga tgacggcc
//
LOCUS I28Q9A102JMH72 450 bp DNA UNK 01-JAN-1980
DEFINITION I28Q9A102JMH72 rank = 0320459 x = 3829.0 y = 3120.0 length = 512
ACCESSION I28Q9A102JMH72
VERSION I28Q9A102JMH72
KEYWORDS.
SOURCE.
  ORGANISM.
            ...
FEATURES Location / Qualifiers
ORIGIN
        1 acgtcatgag agtttgatcc tggctcagga tgaacgctag cggcaggctt aacacatgca
       61 agtcgagggt agaaatagct tgctattttg agaccggcgc acgggtgcgt aacgcgtatg
      121 caatctgcct tttacagggg aatagcccag agaaatttgg attaatgccc catagcgctg
      181 cagggcggca tcgccgagca gctaaagtca caacggtaaa gatgagcatg cgtcccatta
      241 gctagttggt aaggtaacgg cttaccaagg cgatgatggg tagggtcctg agagggagat
      301 cccccacact ggtactgaga cacggaccag actcctacgg gaggcagcag tgaggaatat
      361 tggtcaatgg gcgcaagcct gaaccagcca tgccgcgtgc aggatgaagg ccttcgggtt
      421 gtaaactgct tttgacggaa cgaaaaagct
//
+5


source


Could you explain where the locations come from if there is no reference genome call?



0


source







All Articles