How do I write a Python composite key, dictionary of values ​​to a database?

I have a complex dictionary in Python like:

key          value
("a","b")    (1,2,3)
("c","d")    (4,5,6)
("e","f")    (7,8,9)

      

Both keys and values ​​are tuples. I would like to store it in the most efficient way for a sqlite database table like this:

row    col1   col2   col3   col4   col5
1      "a"    "b"    1      2      3
2      "c"    "d"    4      5      6
3      "e"    "f"    7      8      9

      

To do this, I was trying to concatenate keys and values ​​in turn, and then pass that structure to executemany () using INSERT. This works, but very slow:

writedata = []

for i in range(0, len(mydict.keys())):
    writedata.append(mydict.keys()[i] + mydict.values()[i])

sql = "INSERT INTO mytable VALUES (?,?,?,?,?)"
mydbcursor.executemany(sql, writedata)

      

I have a very large dataset, so ideally I don't even want to create an additional structure like writedata.

I've tried many solutions here on SO (list comprehension, list +, list expansion, generators, ...), but I haven't found the correct recipe for solving this case.

Any help or links would be appreciated! Thank!

+3


source to share


2 answers


Cursor.executemany()

works great with iterators:

The sqlite3 module also allows you to use iterator input parameters instead of a sequence.

So, you can pass a generator to it instead of creating an intermediate list first:

mydbcursor.executemany(sql, (k + v for k, v in mydict.iteritems()))

      

Will .iteritems()

not work in Python3 , there you can use .items()

that returns a view of the items instead of creating a list in memory.



We can speed things up a bit if we remove the loop completely for

from our code using itertools.starmap

with operator.add

:

mydbcursor.executemany(sql, starmap(add, dct.iteritems())

      

The timing of the comparison shows that the iterator consumed above takes less time:

In [34]: from operator import add                                                                                                                

In [35]: from itertools import starmap                                                                                                    

In [36]: dct = {(i, 'a', 'b'): (1, 2, 3) for i in xrange(10**6)}

In [37]: %timeit for _ in (k+v for k, v in dct.iteritems()): pass                                                                                
1 loops, best of 3: 290 ms per loop                                                                                                              

In [38]: %timeit for _ in starmap(add, dct.iteritems()): pass                                                                                    
1 loops, best of 3: 256 ms per loop   

In [39]: %timeit list(starmap(add, dct.iteritems()))                                                                                             
1 loops, best of 3: 335 ms per loop                                                                                                              

In [40]: %timeit list(k+v for k, v in dct.iteritems())                                                                                           
1 loops, best of 3: 415 ms per loop    

      

+1


source


The code should be:

writedata = [key + value for key, value in mydict.items()]

      



If it's not fast after that, it might just be a lot of data. The above code avoids indexing and lots of function calls, so it should be slightly faster than what you wrote.

Whichever approach you choose, it must eventually be turned into a SQL query. So while you can avoid intermediate data structures, you still have to build the query.

0


source







All Articles