Pytables - how to speed up finding data in a list

Question

Pytables - how to speed up finding data in a list

I have a query that returns ~ 1 million rows in the following format:

data = [[i['field1'], i['field2']] for i in tbl.where(conditions)]

and it takes over 5 minutes.

When I tried tbl.where(conditions)

it on my own, the query took less than a second, so most of the time seems to be spent repeating strings in a list comprehension.

Is there a faster way to extract field1 / field2 from a query? (I have enough memory to store the results in memory)

+3

pytables

user1320615 08 Apr 12 at 18:45

source to share

1 answer

Joel vroom · Answer 1 · 2013-11-14T19:42:21+0000

I don't know anything about your "tbl", but here are some things I would look at:

Is your table used for compression? print tbl.filters.complevel
Assuming your "conditions" depend on the table fields, have you created an index on those fields? print tbl.indexedcolpathnames
What if you are using idx = tbl.get_where_list(conditions)

,tbl[:]['field1'][idx]
If you have enough RAM, try opening the file in memory:
h5file = tables.open_file('myfile.h5', driver="H5FD_CORE")

tbl = h5file.root.tbl

and see if that helps.

Pytables - how to speed up finding data in a list

More articles: