Python sqlite3.ProgrammingError: you shouldn't use 8-bit bytes unless you are using text_factory which can interpret 8-bit bytes

I am writing a script that recursively scans a directory and stores them in a dictionary, which is a collection of lists. This list at the edges contains a list that contains the file name and file size. This filename can contain UTF-8 characters as shown below.

['test.rus (\xd0\xa5\xd0\xb5\xd0\xbb\xd1\x8c\xd1\x88\xd0\xb8).srt', 23930]
test.rus ().srt

      

Now while trying to insert this data into the database, I am getting an error like below

Traceback (most recent call last):
  File "filedup.py", line 267, in <module>
    read_file_directory(directory)
  File "filedup.py", line 118, in read_file_directory
    (values[i][0], each, values[i][1]))
sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

      

The function performing this operation is given below

from collections import defaultdict
dirDict = defaultdict(list)    
def read_file_directory(path):
    global dirDict
    logger.debug("Path being scanned %s" %path)
    fileStats = []
    for root, subFolders, files in os.walk(path):
        for file_name in files:
            fileStats = []
            fileStats.insert(0, file_name)
            fileSize = os.path.getsize(os.path.join(root,file_name))
            fileStats.insert(1, fileSize)
            dirDict[root].append(fileStats)
    #Insert the data in DB
    cursor = dbHandler.cursor()
    keys = dirDict.keys()
    for each in keys:
        values = dirDict[each]
        print values
        for i in xrange(len(values)):
            print values[i]
            print values[i][0]
            print values[i][1]
            fileName = values[i][0]
            fileSize = values[i][1]
            cursor.execute("insert or ignore into master \
                (FileName, FilePath, FileSize) values(?,?,?)", \
                (values[i][0], each, values[i][1]))
            logger.debug("Insert data for %s, %s, %s" %(values[i][0], each, values[i][1]))

      

Now when I try to learn Python I don't understand how to fix this problem. The Python version used is given below

$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2

      

So, any thoughts on how to fix the current version of Python, as I am looking for a generic fix so that it can work even in higher versions. Also I noticed that due to this error, none of the data is being entered into the database. So how can I make sure that even if some operation fails, the previous data can be inserted into the database.

+3


source to share


2 answers


In exception, sqlite

it is recommended to switch to unicode strings, so you should do that.

Python directory enumeration functions such as os.walk

have an interesting property ; they return normal strings given normal strings, and return Unicode strings given given Unicode strings. So when used os.walk(path)

as in your code, you have to make sure it path

is a unicode string.



To do this, you can explicitly convert to unicode using a function unicode()

, for example by writing path = unicode(path)

before calling os.walk

.

Also, you need to call cursor.commit()

in your code to actually write to the database. Calling it once after the end of the loop over all filenames should be sufficient.

+4


source


Try changing the line:

fileStats.insert(0, file_name)

      



to

fileStats.insert(0, file_name.decode('utf8'))

      

+2


source







All Articles