Should I interfere with normal Python garbage collection

I have a large hierarchical dataset in Python. After I'm done with this, I need to get rid of it, so I just do del

in the root node of the hierarchy.

It would be nice to do gc.collect()

it manually - is it a good practice to delete big data quickly or should I not do this and let Python do it?

What (if any) are the correct templates to use gc

manually?

+3


source to share


2 answers


The CPython garbage collector is still pretty much based on reference counting, so if your data structure is indeed hierarchical (doesn't contain circular references), del

it should clear it from memory at the last reference, and there is no need to use a module gc

.

That being said, I would recommend not using del

. It is much more elegant to set up your functions in such a way that the last reference to the data structure simply disappears when the last function to use it returns:



def load():
    return some_huge_data_structure

def process(ds):
    do_whatever_to(ds)

process(load())  # after this, the huge DS will be gone

      

+3


source


When CPython garbage collects something, it doesn't always return that memory back to the operating system.

Python uses a complex memory system "arenas" and "pools" (eg http://www.evanjones.ca/memoryallocator/ ). Objects live in these pools and arenas, and memory is only returned to the OS when all memory is collected.



This means that in the worst case, you could have 1000 objects that take up 250MB of memory, just because each object lives in its own arena, which can be 256k large. Python now allocates memory in a pretty smart way, so this worst case (almost) never happens.

If you are constantly allocating and de-allocating tons of objects of varying sizes, you might run into these memory fragmentation problems. In this case, Python doesn't return a lot of OS memory, and unfortunately there isn't much you can do.

+1


source







All Articles