How to encode a unicode string (from JSON) to "utf-8" in python?

I am creating REST API using Flask-Python. One of the URLs (/ uploads) accepts (HTTP POST request) and JSON '{"src": "void", "settings": "my settings"}'. I can individually fetch each object and encode into a byte string, which can then be hashed using hashlib in python. However, my goal is to take the entire string and then encode it so it looks like ... myfile.encode ('utf-8'). Myfile print shows like this: {u'src ': u'void', u'settings ': u'my settings'}, is there anyway I can take the above string without encoding and then encode to utf-8 before sequence bytes for hashlib.sha1 (mayflies.encode ('uff-8'). Let me know for more clarification. Thanks in advance.

fileSRC = request.json['src']
fileSettings = request.json['settings']

myfile = request.json
print myfile

#hash the filename using sha1 from hashlib library
guid_object = hashlib.sha1(fileSRC.encode('utf-8')) // this works however I want myfile to be encoded not fileSRC
guid = guid_object.hexdigest() //this works 
print guid

      

+3


source to share


1 answer


As you said in the comments, you solved the problem using:

jsonContent = json.dumps(request.json)
guid_object = hashlib.sha1(jsonContent.encode('utf-8'))

      

But it's important to understand why this works. Flask sends you unicode()

for non-ASCII and str()

for ASCII
. Dumping the result with JSON will give you consistent results as it abstracts Python's internal representation as if you only had unicode()

.

Python 2

In Python 2 (the version of Python you are using) you don't need .encode('utf-8')

it because the default ensure_ascii

of of json.dumps()

is True

. When you send non-ASCII data in json.dumps()

, it will use JSON escapes to actually flush ASCII: no need to encode UTF-8. Also, since Zen of Python says "Explicit is better than implicit" even if ensure_ascii

already True

, you can specify this:



jsonContent = json.dumps(request.json, ensure_ascii=True)
guid_object = hashlib.sha1(jsonContent)

      

Python 3

However, in Python 3 this will no longer work. Inded, json.dumps()

returns unicode

in Python 3 even if everything in the string unicode

is ASCII. But hashlib.sha1

it only works for bytes

. You must make the conversion explicit, even if the ASCII encoding is all you need:

jsonContent = json.dumps(request.json, ensure_ascii=True)
guid_object = hashlib.sha1(jsonContent.encode('ascii'))

      

This is why Python 3 is the best language: it forces you to describe the text used more clearly, be it str

(Unicode) or bytes

. This avoids many, many problems in the future.

+1


source







All Articles