Comma separator between JSON objects with json.dump

I was messing around with outputting a json file with some file attributes in a directory. My problem is that there is no delimiter between each object when adding to file. I could just add a comma after each "f" and remove the last one, but that seems like a messy job.

import os
import os.path
import json

#Create and open file_data.txt and append 
with open('file_data.txt', 'a') as outfile:

    files = os.listdir(os.curdir)


    for f in files:

        extension = os.path.splitext(f)[1][1:]
        base = os.path.splitext(f)[0]
        name = f

        data = {
            "file_name" : name,
            "extension" : extension,
            "base_name" : base
                }

        json.dump(data, outfile)

      

Output:

{"file_name": "contributors.txt", "base_name": "contributors", "extension": "txt"}{"file_name": "read_files.py", "base_name": "read_files", "extension": "py"}{"file_name": "file_data.txt", "base_name": "file_data", "extension": "txt"}{"file_name": ".git", "base_name": ".git", "extension": ""}

I would like the JSON:

{"file_name": "contributors.txt", "base_name": "contributors", "extension": "txt"},{"file_name": "read_files.py", "base_name": "read_files", "extension": "py"},{"file_name": "file_data.txt", "base_name": "file_data", "extension": "txt"}{"file_name": ".git", "base_name": ".git", "extension": ""}

+3


source to share


1 answer


What you get is not a JSON object, but a stream of individual JSON objects.

Whatever you want, it's still not a JSON object, but a stream of individual JSON objects with commas in between. It won't be more collapsible. *

* The JSON specification is straightforward enough for manual parsing, and it should be pretty clear that an object followed by another object with a comma in between doesn't match any valid production.


If you are trying to create a JSON array, you can do that. The obvious way, if there are no memory issues, is to create a list of dicts and then dump everything at once:

output = []
for f in files:
    # ...
    output.append(data)
json.dump(output, outfile)

      



If the problem is memory, you have several options:

  • For a quick and dirty solution, you can fake it by writing [

    , ,

    and ]

    by hand. (But note that JSON does not have an extra trailing comma after the last value, even though some decoders accept it.)
  • You can wrap your loop in a generator function that each gives data

    , and extend JSONEncoder

    to convert iterators to arrays. (Note that this is actually used as an example in the docs on why and how to extend JSONEncoder

    , although you could write a more memory efficient implementation.)
  • You can look for a third party JSON library that has a built-in iterative streaming API.

However, it is worth considering what you are trying to do. Perhaps a stream of individual JSON objects is actually the correct file format / protocol / API for what you are trying to do. Since JSON is self-delimiting, there is no reason to add a delimiter between individual values. (And it doesn't even help with certainty unless you use a delimiter that won't appear in all real JSON as it ,

is.) For example, you have exactly what the JSON-RPC should look like. If you're just asking for something different because you don't know how to parse such a file, it's pretty easy. For example (for simplicity, a string is used, not a file):

i = 0
d = json.JSONDecoder()
while True:
    try:
        obj, i = d.raw_decode(s, i)
    except ValueError:
        return
    yield obj

      

+8


source







All Articles