How to trim all spaces except quotes for a large JSON file
I am currently working on a large JSON file and want to shorten it by removing any extra spaces, tabs, backtracks, etc. that are not in quotes. The file contains about 100,000 lines of code and is difficult to use for other scripts. The file initially looks like this:
{
"path": "/math/",
"id": "math",
"title": "Math Title",
"icon_url": "/images/power-mode/badges/circles-40x40.png",
"contains": [
"Topic",
"Video",
"Exercise"
],
"children": [],
"parent_id": "root",
"ancestor_ids": [
"root"
],
"description": "null",
"kind": "Topic",
"h_position": -10,
"v_position": 6,
"slug": "math"
}
and I want it to look like this after removing unnecessary spaces, tabs, backtracks, etc .:
{"path":"/math/","id":"math","title":"Math Title","icon_url":"/images/power-mode/badges/circles-40x40.png",
"contains":["Topic","Video","Exercise"],"children":[],"parent_id":"root","ancestor_ids":["root"],
"description": "null","kind":"Topic","h_position":-10,"v_position":6,"slug":"math"}
Basically every place should be removed, except for those in quotes.
source to share
You can read the json in the code and then output it to a file specifying a compact format, your spaces inside the quotes will be stored in strings.
In python you can use native json libraries
import json
json.loads(your filestream)
json.dumps(your output stream) // the native output of json.dumps is compact
Details in the python docs https://docs.python.org/2/library/json.html
But you should be able to do the same technique in any language that handles json.
source to share
Why not just run it through perl ???
perl -0pe 's#((^[^"]+")|("[^"]+$)|("[^"]+")|(^[^"]+$))#($x=$1)=~s/\s+/ /g;$x#ge'
- -0 will set the field separator to zero so that while (<>) sees one big line, and you can process multiple lines with only spaces without creating extra spaces.
- -p does the while (<>) print bit for you.
- -e says this is our perl code to run.
The code basically matches:
- Between the beginning of the line and the first quote.
- Between the last quote and the end of the line.
- Between the two quotes, which, thanks to the last two matches, will only contain the text that is outside the quotes.
- Or strings without quotes at all.
And then it replaces all sets of one or more whitespace characters with a single space.
Basically replace spaces only between quotes with slight modifications ....
source to share
You can use JSON minifier on the web . Do a google search for JSON minifier. Google
This is what the JSON minifier returned to me:
{"path":"/math/","id":"math","title":"Math Title","icon_url":"/images/power-mode/badges/circles-40x40.png","contains":["Topic","Video","Exercise"],"children":[],"parent_id":"root","ancestor_ids":["root"],"description":"null","kind":"Topic","h_position":-10,"v_position":6,"slug":"math"}
You can see that it does not remove spaces between quotes. for example, "Name of mathematics"
source to share