Data Warehouse and Django

This is more of an architectural issue than a technology issue per se.

I am currently building a business website / social network that needs to store large amounts of data and use that data to draw analytics (consumer behavior).

I am using Django and PostgreSQL database.

Now my question is, I want to extend this architecture to include a data warehouse. Ideally, the operational DB would be the current Django PostgreSQL database, and the datastore would be something additional, preferably in a multidimensional model.

We're still in a very early stage, we will be testing 50 users, so a primitive one like a single column table for starters is enough.

I would like to know if anyone has any experience with this situation and that could recommend a framework for me to create a data store, all the while maintaining an operational DB with Django models for usability (if possible).

Thank you in advance!

+3


source to share


2 answers


Here are some interesting Open Source tools I've used recently:



  • Kettle is a great ETL tool, you can use it to pull data from your operational database to your warehouse. Supports any database with a JDBC driver and makes it easy to build, eg. star-shaped scheme.
  • Saikou is a nice Web 2.0 interface built with Pentaho Mondrian (MDX implementation). This allows your users to easily create complex aggregate queries (think a pivot table in Excel), and the Mondrian layer provides caching, etc. to speed things up. Try the demo here.
+7


source


My answer is not necessarily about data storage. In your case, I see an opportunity to implement a NoSQL database solution alongside OLTP relational storage, which in this case is PostgreSQL.

Why count NoSQL? Besides the obvious scalability benefits, NoSQL offers a number of benefits that are likely to apply to your scenario. For example, the flexibility to write records with different sets of fields and access to keys.



Since you are still in the "trial" phase, it may be easier for you to decide which NoSQL database solution to decide depending on your hosting provider. For example AWS have SimpleDB , Google App Engine provide their own DataStore , etc. However, there are many other NoSQL solutions out there that you can use to have good Python bindings.

0


source







All Articles