Convert each dictionary value to utf-8 (dictionary comprehension?)

I have a dictionary and I want to convert each value to utf-8. This works, but is there a "more pythonic" way?

            for key in row.keys():
                row[key] = unicode(row[key]).encode("utf-8")

      

For a list that I could make

[unicode(s).encode("utf-8") for s in row]

      

but I'm not sure how to do the equivalent thing for dictionaries.

This differs from Python Dictionary Comprehension because I am not trying to create a dictionary from scratch, but from an existing dictionary. The solutions to the linked question don't show me how to quote key / value pairs in an existing dictionary in order to modify them into new k / v pairs for a new dictionary. The answer (already accepted) below shows how to do this and is much clearer to read / understand for someone with a task like mine than the answers to a related question, which is more complex.

+5


source to share


5 answers


Use a dictionary. It looks like you start with a dictionary like so:

 mydict = {k: unicode(v).encode("utf-8") for k,v in mydict.iteritems()}

      



An example for understanding the dictionary is located near the end of the block in the link.

+8


source


It depends on why you are implicitly encoding UTF-8. If it has to do with the fact that you are writing a file, the Putin way is to leave your strings as Unicode and encode in the output:



with io.open("myfile.txt", "w", encoding="UTF-8") as my_file:
    for (key, values) in row.items():
        my_string = u"{key}: {value}".format(key=key, value=value)
        my_file.write(my_string)

      

+1


source


You can just iterate over the keys if you like:

{x:unicode(a[x]).encode("utf-8") for x in a.keys()}

      

0


source


Since I also had this problem, I built a very simple function that allows any dict to be decoded to utf-8 (the problem with the current answer is that it only applies to a simple dict).

If it can help anyone, that's great, here is the function:

def utfy_dict(dic):
    if isinstance(dic,unicode):
        return(dic.encode("utf-8"))
    elif isinstance(dic,dict):
        for key in dic:
            dic[key] = utfy_dict(dic[key])
        return(dic)
    elif isinstance(dic,list):
        new_l = []
        for e in dic:
            new_l.append(utfy_dict(e))
        return(new_l)
    else:
        return(dic)

      

0


source


The best approach for converting a non-ascii dictionary to ascii is :

mydict = {k: unicode(v, errors='ignore').encode('ascii','ignore') for k,v in mydict.iteritems()} 

      

The best approach for converting a non-utf-8 dictionary to utf-8 is :

mydict = {k: unicode(v, errors='ignore').encode('utf-8','ignore') for k,v in mydict.iteritems()}

      

For more information read the python unicode documentation

-1


source







All Articles