How do I write a Python composite key, dictionary of values ββto a database?
I have a complex dictionary in Python like:
key value
("a","b") (1,2,3)
("c","d") (4,5,6)
("e","f") (7,8,9)
Both keys and values ββare tuples. I would like to store it in the most efficient way for a sqlite database table like this:
row col1 col2 col3 col4 col5
1 "a" "b" 1 2 3
2 "c" "d" 4 5 6
3 "e" "f" 7 8 9
To do this, I was trying to concatenate keys and values ββin turn, and then pass that structure to executemany () using INSERT. This works, but very slow:
writedata = []
for i in range(0, len(mydict.keys())):
writedata.append(mydict.keys()[i] + mydict.values()[i])
sql = "INSERT INTO mytable VALUES (?,?,?,?,?)"
mydbcursor.executemany(sql, writedata)
I have a very large dataset, so ideally I don't even want to create an additional structure like writedata.
I've tried many solutions here on SO (list comprehension, list +, list expansion, generators, ...), but I haven't found the correct recipe for solving this case.
Any help or links would be appreciated! Thank!
source to share
Cursor.executemany()
works great with iterators:
The sqlite3 module also allows you to use iterator input parameters instead of a sequence.
So, you can pass a generator to it instead of creating an intermediate list first:
mydbcursor.executemany(sql, (k + v for k, v in mydict.iteritems()))
Will .iteritems()
not work in Python3 , there you can use .items()
that returns a view of the items instead of creating a list in memory.
We can speed things up a bit if we remove the loop completely for
from our code using itertools.starmap
with operator.add
:
mydbcursor.executemany(sql, starmap(add, dct.iteritems())
The timing of the comparison shows that the iterator consumed above takes less time:
In [34]: from operator import add
In [35]: from itertools import starmap
In [36]: dct = {(i, 'a', 'b'): (1, 2, 3) for i in xrange(10**6)}
In [37]: %timeit for _ in (k+v for k, v in dct.iteritems()): pass
1 loops, best of 3: 290 ms per loop
In [38]: %timeit for _ in starmap(add, dct.iteritems()): pass
1 loops, best of 3: 256 ms per loop
In [39]: %timeit list(starmap(add, dct.iteritems()))
1 loops, best of 3: 335 ms per loop
In [40]: %timeit list(k+v for k, v in dct.iteritems())
1 loops, best of 3: 415 ms per loop
source to share
The code should be:
writedata = [key + value for key, value in mydict.items()]
If it's not fast after that, it might just be a lot of data. The above code avoids indexing and lots of function calls, so it should be slightly faster than what you wrote.
Whichever approach you choose, it must eventually be turned into a SQL query. So while you can avoid intermediate data structures, you still have to build the query.
source to share