Is it possible to detect corrupted Python dictionaries

I have a data file saved using the shelve module in python 2.7 that is corrupted somehow. I can load it with db = shelve.open('file.db')

, but when I call len(db)

or even bool(db)

it freezes and I have to kill the process.

However, I can go through the whole thing and create a new, intact file:

db = shelve.open('orig.db')
db2 = shelve.open('copy.db')
for k, v in db.items():
    db2[k] = v
db2.close() # copy.db will now be a fully working copy

      

The question is, how can I check the dict and avoid freezing?

BTW, I still have the original file and it exhibits the same behavior when copied to other computers, in case someone also wants to help me figure out what is really wrong with the file in the first place!

+3


source to share


1 answer


I don't know of any validation methods other than dbm.whichdb () . To debug a possible pickle mismatch protocol in such a way that you can skip lengthy tests, perhaps try:



import shelve
import pickle
import dbm
import multiprocessing
import time
import psutil

def protocol_check():
    print('orig.db is', dbm.whichdb('orig.db'))
    print('copy.db is', dbm.whichdb('copy.db'))
    for p in range(pickle.HIGHEST_PROTOCOL + 1):
        print('trying protocol', p)
        db = shelve.open('orig.db', protocol=p)
        db2 = shelve.open('copy.db')
        try:
            for k, v in db.items():
                db2[k] = v
        finally:
            db2.close()
            db.close()
        print('great success on', p)

def terminate(grace_period=2):
    procs = psutil.Process().children()
    for p in procs:
        p.terminate()
    gone, still_alive = psutil.wait_procs(procs, timeout=grace_period)
    for p in still_alive:
        p.kill()

process = multiprocessing.Process(target=protocol_check)
process.start()
time.sleep(10)
terminate()

      

+1


source







All Articles