Python: Object Creation Timing in Loop / Understanding / Matching vs. One Time

I am asking for this in general, as I cannot post the actual code for various reasons. IPython Notebook does the following:

I created a class structured this way (it requires numpy)

class MyClassName(object):
    def __init__(self, filename):
        self.filename = filename
        self.read_binary_file()      # Run these on object creation
        self.calculate_parameters()
        self.check_for_errors()

        ...
    def read_binary_file( self ):    # This requires numpy.
        #                            #      The file is 250MB binary and
        #                            #      ultimately yields a numpy array
        #                            #      32 x 32 x 100000 element
        ...
    def calculate_parameters( self ):
        ...
    def check_for_errors( self ):
        ...
    def other_function1( self ):
        ...
    def other_function2( self ):

      

and etc.

The code sounds. I can do the following

q = MyClassName('testfile.dat') # Instantiate an object
q.other_function1()             # Invoke methods

      

and etc.

%timeit q = MyClassName('testfile.dat') 

      

gives about 0.9 seconds for this creation

But , if I have a list of files and create objects in a loop, understanding or map filenames = ['f1.dat', 'f2.dat', ..., 'f10.dat']

Chomp = map( MyClassName, filenames )

Chomp = [ MyClassName(j) for j in filenames ]

Chomp = []
for j in filenames:
    Chomp.append( MyClassName(j) )

      

each object takes over 3.5 seconds to create. Loop takes 3.5 seconds / file x number of files to complete

What I have tried: I have been looking for information on list creation, list addition timings, memory management / assumptions, disable / re-garbage collection after each object is created, etc.

I also imported the cprofile run when creating one object.

They all report 3.5 seconds. cprofile says that a numeric binary read took 2.5 seconds 3.5s to create a single object. But this same procedure is called when I create a separate object outside of the loop or cprofile.

Only the creation of one object is fast.

I am running on a Windows 7 computer and controlled by the task manager. At one point it looked like I pulled out of physical memory and was replaced by a page, so I rebooted, restarted iPython / Notebook, only enabled one core, and had few other programs. Memory load decreased, but loop performance did not improve.

I am new to OOP in general, have been working with Python for several months now and am interested in understanding what is going on, so I can code in a more appropriate way.

+3


source to share


1 answer


[Answer converted from question]


Decision

  • There was no real problem (!) ... just really bad observations with me.

As noted by M. Wasowski and JonZwink in the comments, it %timeit

is executed several times. And as they said, subsequent runs artificially deflate the time due to caching.



From everything I've tried, I haven't tried the following:

import time
tin = time.time()
q = MyClassName('testfile.dat')
print time.time() - tin

      

The first time I create an instance of "testfile.dat" it takes a full 3.3-3.5 seconds. If I run this snippet again, it appears in ~ 0.9 seconds So timeit was taking the best of multiple runs as commenters said

And I should know better than trusting my empirical observations about how long it took to instantiate the object manually. A single object has never instantiated faster than a loop.

Thanks everyone for the quick answers.

0


source







All Articles