How to remove comment lines from JSON file in python

I am getting a JSON file with the following format:

// 20170407
// http://info.employeeportal.org

{
 "EmployeeDataList": [
{
 "EmployeeCode": "200005ABH9",
 "Skill": CT70,
 "Sales": 0.0,
 "LostSales": 1010.4
} 
 ]
} 

      

It is necessary to remove the extra lines of comments present in the file.

I've tried with the following code:

import json
import commentjson

with open('EmployeeDataList.json') as json_data:
            employee_data = json.load(json_data)
            '''employee_data = json.dump(json.load(json_data))'''
            '''employee_data = commentjson.load(json_data)'''
            print(employee_data)`

      

Still unable to remove comments from file and fetch JSON file in correct format.

Can't, where does it go wrong? Any direction in this regard is highly appreciated. thanks in advance

+3


source to share


6 answers


You are not using it commentjson

correctly. It has the same interface as the module json

:

import commentjson

with open('EmployeeDataList.json', 'r') as handle:
    employee_data = commentjson.load(handle)

print(employee_data)

      

While your comments are simple enough in this case, you probably don't need to install an additional module to remove them:



import json

with open('EmployeeDataList.json', 'r') as handle:
    fixed_json = ''.join(line for line in handle if not line.startswith('//'))
    employee_data = json.loads(fixed_json)

print(employee_data)

      

Note that the difference between the two code snippets is what is json.loads

used instead json.load

, since you are processing a string instead of a file object.

+2


source


If it's the same number of lines every time you can simply:

fh = open('EmployeeDataList.NOTjson',"r")
rawText = fh.read()
json_data = rawText[rawText.index("\n",3)+1:]

      



So json_data is now a string of text without the first three lines.

0


source


The good thing is that this is not a valid format json

, so just open it as if the text document deleted anything from //

to \n

.

with open("EmployeeDataList.json", "r") as rf:
    with open("output.json", "w") as wf:
        for line in rf.readlines():
            if line[0:2] == "//"
                continue
            wf.write(line)

      

0


source


Try JSON-minify :

JSON-minify minifies blocks of JSON-like content into valid JSON, removing all whitespace and JS-style comments (single-line // and multi-line / * .. * /).

0


source


Your file can be analyzed using HOCON .

pip install pyhocon

>>> from pyhocon import ConfigFactory
>>> conf = ConfigFactory.parse_file('data.txt')
>>> conf
ConfigTree([('EmployeeDataList',
             [ConfigTree([('EmployeeCode', '200005ABH9'),
                          ('Skill', 'CT70'),
                          ('Sales', 0.0),
                          ('LostSales', 1010.4)])])])

      

0


source


I usually read JSON as a regular file, remove the comments and then parse it as a JSON string. This can be done in one line with the following snippet:

with open(path,'r') as f: jsonDict = json.loads('\n'.join([row for row in f.readlines() if len(row.split('//')) == 1]))

      

IMHO this is very handy because it doesn't require CommentJSON or any other non-standard library.

0


source







All Articles