Speed up database inserts from ORM
I have a Django view that creates 500-5000 new INSERTS of a database in a loop. The problem is that it is very slow! I am getting about 100 inserts per minute on Postgres 8.3. We have used MySQL on smaller hardware (smaller EC2 instance) and have never experienced such speed issues.
More details: Postgres 8.3 on Ubuntu 9.04 server. The server is a "big" Amazon EC2 with an EBS (ext3) database - 11GB / 20GB.
Here are some of my postgresql.conf - let me know if you need more
shared_buffers = 4000MB
effective_cache_size = 7128MB
My python:
for k in kw:
k = k.lower()
p = ProfileKeyword(profile=self)
logging.debug(k)
p.keyword, created = Keyword.objects.get_or_create(keyword=k, defaults={'keyword':k,})
if not created and ProfileKeyword.objects.filter(profile=self, keyword=p.keyword).count():
#checking created is just a small optimization to save some database hits on new keywords
pass #duplicate entry
else:
p.save()
Some output from above:
top - 16:56:22 up 21 days, 20:55, 4 users, load average: 0.99, 1.01, 0.94
Tasks: 68 total, 1 running, 67 sleeping, 0 stopped, 0 zombie
Cpu(s): 5.8%us, 0.2%sy, 0.0%ni, 90.5%id, 0.7%wa, 0.0%hi, 0.0%si, 2.8%st
Mem: 15736360k total, 12527788k used, 3208572k free, 332188k buffers
Swap: 0k total, 0k used, 0k free, 11322048k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14767 postgres 25 0 4164m 117m 114m S 22 0.8 2:52.00 postgres
1 root 20 0 4024 700 592 S 0 0.0 0:01.09 init
2 root RT 0 0 0 0 S 0 0.0 0:11.76 migration/0
3 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0
4 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0
5 root 10 -5 0 0 0 S 0 0.0 0:00.08 events/0
6 root 11 -5 0 0 0 S 0 0.0 0:00.00 khelper
7 root 10 -5 0 0 0 S 0 0.0 0:00.00 kthread
9 root 10 -5 0 0 0 S 0 0.0 0:00.00 xenwatch
10 root 10 -5 0 0 0 S 0 0.0 0:00.00 xenbus
18 root RT -5 0 0 0 S 0 0.0 0:11.84 migration/1
19 root 34 19 0 0 0 S 0 0.0 0:00.01 ksoftirqd/1
Let me know if other details would be helpful.
First, ORM operations will always be slower than pure SQL. I once wrote an update for a large database in ORM code and installed it working, but left it after a few hours when it only finished a small fraction. After rewriting in SQL, it all lasted less than a minute.
Second, keep in mind that your code here executes up to four separate operations with the database for each row of your data set - get
in get_or_create, perhaps, also create
, count
in the filter, and finally save
. This is a large database access.
Bearing in mind that no more than 5000 objects are small, you should first read the entire dataset into memory. You can then do a single filter
to get all the existing keyword objects in one go, keeping a huge number of queries in the keyword get_or_create
, and also avoiding the need to duplicate keyword profiles in the first place.
One of the common reasons for slow mass operations like this is each insert that happens in its own transaction. If you can get all of them to happen in a single transaction, it can be much faster.