Heavy math queries and NoSQL databases

I have a very specific data format and need for queries and I need to know the suitability of NoSQL DB for this need. I am not asking "which database is better". I'm interested in possibilities .

I need to store data in EAV style. Sparse index document stores are ideal for this. This way I can create an index for each parameter based on its values. Only the required indexes will be affected by the query. MongoDB, for example, is perfect for this. This is needed number 1.

The request is performed in two stages. The first is a simple "WHERE" equivalent and includes a series of & lt = = operations against real numbers. Results can be tens of thousands of records, but usually they will be in the thousands. This is necessity # 2.

The second stage involves the hard math that I have to do in stage 1 in order to rank them. This math involves heavy use of powers and simpler operations. The results are then sorted by rank and the "top 100" is returned to the client. This is necessity # 3.

MongoDB is the only NoSQL DBMS I'm relatively familiar with, so I'll use it as a reference. I don't believe it can do math in queries, and even if it can, it will probably be slow. I believe the math should be done on the client (in C or CUDA). This means that data must be transferred very quickly from the database to the client. I know MongoDB has its own binary connection, but for example Couchbase uses REST, which I believe will slow down the migration of large datasets.

The reason I didn't settle for MongoDB is because I need distributed servers, which, for example, Couchbase seems to be more suitable for.

So, I need a solution that can either do the math quickly by limiting the number of records that need to be transferred, or can transfer records very quickly so that they can be processed on the client. I realize the only way to find out is to check, but what I don't know, hence the question, is that NoSQL DB has the capabilities specified.


source to share

1 answer

MongoDB provides server side javascript execution, which might solve some of your problems, but I'm afraid I can't tell you how efficiently. However, I suspect your workflow is I / O-related (you mentioned thousands of entries), so it is probably best not to do client processing. Of course, the benchmark will tell the truth, but I suggest a different solution.

Have you tried Redis ? It has powerful sorted sets that are perfect for your range and rank queries. In addition, the next release will introduce LUA scripts that eliminate the I / O nature of your workflow. Keep in mind that Redis is really super fast.



All Articles