Python: OOP overhead?
I was working on a real-time application and noticed that some OOP design patterns create incredible overhead in Python (tested with 2.7.5).
How simple is it, why is it that simple dictionary value accessors take almost 5 times as long as the dictionary is encapsulated by another object?
For example, by running the code below, I got:
Dict Access: 0.167706012726
Attribute Access: 0.191128969193
Method Wrapper Access: 0.711422920227
Property Wrapper Access: 0.932291030884
Executable code:
class Wrapper(object):
def __init__(self, data):
self._data = data
@property
def id(self):
return self._data['id']
@property
def name(self):
return self._data['name']
@property
def score(self):
return self._data['score']
class MethodWrapper(object):
def __init__(self, data):
self._data = data
def id(self):
return self._data['id']
def name(self):
return self._data['name']
def score(self):
return self._data['score']
class Raw(object):
def __init__(self, id, name, score):
self.id = id
self.name = name
self.score = score
data = {'id': 1234, 'name': 'john', 'score': 90}
wp = Wrapper(data)
mwp = MethodWrapper(data)
obj = Raw(data['id'], data['name'], data['score'])
def dict_access():
for _ in xrange(100):
uid = data['id']
name = data['name']
score = data['score']
def method_wrapper_access():
for _ in xrange(100):
uid = mwp.id()
name = mwp.name()
score = mwp.score()
def property_wrapper_access():
for _ in xrange(100):
uid = wp.id
name = wp.name
score = wp.score
def object_access():
for _ in xrange(100):
uid = obj.id
name = obj.name
score = obj.score
import timeit
print 'Dict Access:', timeit.timeit("dict_access()", setup="from __main__ import dict_access", number=10000)
print 'Attribute Access:', timeit.timeit("object_access()", setup="from __main__ import object_access", number=10000)
print 'Method Wrapper Access:', timeit.timeit("method_wrapper_access()", setup="from __main__ import method_wrapper_access", number=10000)
print 'Property Wrapper Access:', timeit.timeit("property_wrapper_access()", setup="from __main__ import property_wrapper_access", number=10000)
source to share
This has to do with dynamic lookup, which the Python interpreter (CPython) does to send all your calls, indexing, etc. Dynamic search queries provide a lot of flexibility in the language, but at an execution cost. When you use a Method Wrapper, this (at least) happens:
- search
mwp.id
is a method, but it is also just an object assigned to an attribute and needs to be searched like any other - call
mwp.id()
- inside the method, find
self._data
- find
__getitem__
inself._data
- call
__getitem__
(this will at least be a C function, but you still had to go through all those dynamic queries to get here)
By comparison, the "Dict Access" test case only needs to search __getitem__
and then call it.
As Matteo Italia points out in a comment, this is a concrete implementation. In the Python ecosystem, you now also have PyPy (uses JIT and runtime optimizations), Cython (compiles in C, with optional static type annotations, etc.), Nuitka (compiles in C ++, it is supposed to take the code as is) and several other implementations.
One way to optimize these searches in pure Python with CPython is to get direct references to objects and assign them to local variables outside of loops, and then use local variables inside the loops. This is an optimization that can potentially result from cluttering the code and / or breaking encapsulation.
source to share