Convert each dictionary value to utf-8 (dictionary comprehension?)
I have a dictionary and I want to convert each value to utf-8. This works, but is there a "more pythonic" way?
for key in row.keys():
row[key] = unicode(row[key]).encode("utf-8")
For a list that I could make
[unicode(s).encode("utf-8") for s in row]
but I'm not sure how to do the equivalent thing for dictionaries.
This differs from Python Dictionary Comprehension because I am not trying to create a dictionary from scratch, but from an existing dictionary. The solutions to the linked question don't show me how to quote key / value pairs in an existing dictionary in order to modify them into new k / v pairs for a new dictionary. The answer (already accepted) below shows how to do this and is much clearer to read / understand for someone with a task like mine than the answers to a related question, which is more complex.
source to share
It depends on why you are implicitly encoding UTF-8. If it has to do with the fact that you are writing a file, the Putin way is to leave your strings as Unicode and encode in the output:
with io.open("myfile.txt", "w", encoding="UTF-8") as my_file:
for (key, values) in row.items():
my_string = u"{key}: {value}".format(key=key, value=value)
my_file.write(my_string)
source to share
Since I also had this problem, I built a very simple function that allows any dict to be decoded to utf-8 (the problem with the current answer is that it only applies to a simple dict).
If it can help anyone, that's great, here is the function:
def utfy_dict(dic):
if isinstance(dic,unicode):
return(dic.encode("utf-8"))
elif isinstance(dic,dict):
for key in dic:
dic[key] = utfy_dict(dic[key])
return(dic)
elif isinstance(dic,list):
new_l = []
for e in dic:
new_l.append(utfy_dict(e))
return(new_l)
else:
return(dic)
source to share
The best approach for converting a non-ascii dictionary to ascii is :
mydict = {k: unicode(v, errors='ignore').encode('ascii','ignore') for k,v in mydict.iteritems()}
The best approach for converting a non-utf-8 dictionary to utf-8 is :
mydict = {k: unicode(v, errors='ignore').encode('utf-8','ignore') for k,v in mydict.iteritems()}
For more information read the python unicode documentation
source to share