Python sqlite3.ProgrammingError: you shouldn't use 8-bit bytes unless you are using text_factory which can interpret 8-bit bytes
I am writing a script that recursively scans a directory and stores them in a dictionary, which is a collection of lists. This list at the edges contains a list that contains the file name and file size. This filename can contain UTF-8 characters as shown below.
['test.rus (\xd0\xa5\xd0\xb5\xd0\xbb\xd1\x8c\xd1\x88\xd0\xb8).srt', 23930] test.rus ().srt
Now while trying to insert this data into the database, I am getting an error like below
Traceback (most recent call last): File "filedup.py", line 267, in <module> read_file_directory(directory) File "filedup.py", line 118, in read_file_directory (values[i], each, values[i])) sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
The function performing this operation is given below
from collections import defaultdict dirDict = defaultdict(list) def read_file_directory(path): global dirDict logger.debug("Path being scanned %s" %path) fileStats =  for root, subFolders, files in os.walk(path): for file_name in files: fileStats =  fileStats.insert(0, file_name) fileSize = os.path.getsize(os.path.join(root,file_name)) fileStats.insert(1, fileSize) dirDict[root].append(fileStats) #Insert the data in DB cursor = dbHandler.cursor() keys = dirDict.keys() for each in keys: values = dirDict[each] print values for i in xrange(len(values)): print values[i] print values[i] print values[i] fileName = values[i] fileSize = values[i] cursor.execute("insert or ignore into master \ (FileName, FilePath, FileSize) values(?,?,?)", \ (values[i], each, values[i])) logger.debug("Insert data for %s, %s, %s" %(values[i], each, values[i]))
Now when I try to learn Python I don't understand how to fix this problem. The Python version used is given below
$ python Python 2.7.6 (default, Mar 22 2014, 22:59:56) [ ] on linux2
So, any thoughts on how to fix the current version of Python, as I am looking for a generic fix so that it can work even in higher versions. Also I noticed that due to this error, none of the data is being entered into the database. So how can I make sure that even if some operation fails, the previous data can be inserted into the database.
source to share
it is recommended to switch to unicode strings, so you should do that.
Python directory enumeration functions such as
have an interesting property ; they return normal strings given normal strings, and return Unicode strings given given Unicode strings. So when used
as in your code, you have to make sure it
is a unicode string.
To do this, you can explicitly convert to unicode using a function
, for example by writing
path = unicode(path)
Also, you need to call
in your code to actually write to the database. Calling it once after the end of the loop over all filenames should be sufficient.
source to share