Pytables - how to speed up finding data in a list

I have a query that returns ~ 1 million rows in the following format:

data = [[i['field1'], i['field2']] for i in tbl.where(conditions)]

      

and it takes over 5 minutes.

When I tried tbl.where(conditions)

it on my own, the query took less than a second, so most of the time seems to be spent repeating strings in a list comprehension.

Is there a faster way to extract field1 / field2 from a query? (I have enough memory to store the results in memory)

+3


source to share


1 answer


I don't know anything about your "tbl", but here are some things I would look at:



  • Is your table used for compression? print tbl.filters.complevel


  • Assuming your "conditions" depend on the table fields, have you created an index on those fields? print tbl.indexedcolpathnames

  • What if you are using idx = tbl.get_where_list(conditions)

    ,tbl[:]['field1'][idx]

  • If you have enough RAM, try opening the file in memory:
    h5file = tables.open_file('myfile.h5', driver="H5FD_CORE")


    tbl = h5file.root.tbl

    and see if that helps.
0


source







All Articles