Pytables - how to speed up finding data in a list
I have a query that returns ~ 1 million rows in the following format:
data = [[i['field1'], i['field2']] for i in tbl.where(conditions)]
and it takes over 5 minutes.
When I tried tbl.where(conditions)
it on my own, the query took less than a second, so most of the time seems to be spent repeating strings in a list comprehension.
Is there a faster way to extract field1 / field2 from a query? (I have enough memory to store the results in memory)
+3
source to share
1 answer
I don't know anything about your "tbl", but here are some things I would look at:
- Is your table used for compression?
print tbl.filters.complevel
- Assuming your "conditions" depend on the table fields, have you created an index on those fields?
print tbl.indexedcolpathnames
- What if you are using
idx = tbl.get_where_list(conditions)
,tbl[:]['field1'][idx]
- If you have enough RAM, try opening the file in memory:
h5file = tables.open_file('myfile.h5', driver="H5FD_CORE")
tbl = h5file.root.tbl
and see if that helps.
0
source to share