Speed up database inserts from ORM

Question

Speed up database inserts from ORM

I have a Django view that creates 500-5000 new INSERTS of a database in a loop. The problem is that it is very slow! I am getting about 100 inserts per minute on Postgres 8.3. We have used MySQL on smaller hardware (smaller EC2 instance) and have never experienced such speed issues.

More details: Postgres 8.3 on Ubuntu 9.04 server. The server is a "big" Amazon EC2 with an EBS (ext3) database - 11GB / 20GB.

Here are some of my postgresql.conf - let me know if you need more

shared_buffers = 4000MB
effective_cache_size = 7128MB

My python:

for k in kw:
        k = k.lower()
        p = ProfileKeyword(profile=self)
        logging.debug(k)
        p.keyword, created = Keyword.objects.get_or_create(keyword=k, defaults={'keyword':k,})
        if not created and ProfileKeyword.objects.filter(profile=self, keyword=p.keyword).count():
            #checking created is just a small optimization to save some database hits on new keywords
            pass #duplicate entry
        else:
            p.save()

Some output from above:

top - 16:56:22 up 21 days, 20:55,  4 users,  load average: 0.99, 1.01, 0.94
Tasks:  68 total,   1 running,  67 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.8%us,  0.2%sy,  0.0%ni, 90.5%id,  0.7%wa,  0.0%hi,  0.0%si,  2.8%st
Mem:  15736360k total, 12527788k used,  3208572k free,   332188k buffers
Swap:        0k total,        0k used,        0k free, 11322048k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                            
14767 postgres  25   0 4164m 117m 114m S   22  0.8   2:52.00 postgres                                                                                                                                            
    1 root      20   0  4024  700  592 S    0  0.0   0:01.09 init                                                                                                                                                
    2 root      RT   0     0    0    0 S    0  0.0   0:11.76 migration/0                                                                                                                                         
    3 root      34  19     0    0    0 S    0  0.0   0:00.00 ksoftirqd/0                                                                                                                                         
    4 root      RT   0     0    0    0 S    0  0.0   0:00.00 watchdog/0                                                                                                                                          
    5 root      10  -5     0    0    0 S    0  0.0   0:00.08 events/0                                                                                                                                            
    6 root      11  -5     0    0    0 S    0  0.0   0:00.00 khelper                                                                                                                                             
    7 root      10  -5     0    0    0 S    0  0.0   0:00.00 kthread                                                                                                                                             
    9 root      10  -5     0    0    0 S    0  0.0   0:00.00 xenwatch                                                                                                                                            
   10 root      10  -5     0    0    0 S    0  0.0   0:00.00 xenbus                                                                                                                                              
   18 root      RT  -5     0    0    0 S    0  0.0   0:11.84 migration/1                                                                                                                                         
   19 root      34  19     0    0    0 S    0  0.0   0:00.01 ksoftirqd/1

Let me know if other details would be helpful.

+2

django postgresql django-models amazon-ec2

erikcw 09 Sep 09 at 16:44

source to share

2 answers

One of the common reasons for slow mass operations like this is each insert that happens in its own transaction. If you can get all of them to happen in a single transaction, it can be much faster.

+6

Ned batchelder 09 Sep '09 at 16:59

source to share

Daniel roseman · Accepted Answer · 2009-09-09T18:04:10+0000

First, ORM operations will always be slower than pure SQL. I once wrote an update for a large database in ORM code and installed it working, but left it after a few hours when it only finished a small fraction. After rewriting in SQL, it all lasted less than a minute.

Second, keep in mind that your code here executes up to four separate operations with the database for each row of your data set - get

in get_or_create, perhaps, also create

, count

in the filter, and finally save

. This is a large database access.

Bearing in mind that no more than 5000 objects are small, you should first read the entire dataset into memory. You can then do a single filter

to get all the existing keyword objects in one go, keeping a huge number of queries in the keyword get_or_create

, and also avoiding the need to duplicate keyword profiles in the first place.

Speed ​​up database inserts from ORM

More articles:

Speed up database inserts from ORM