Data storage principles and NoSQL

with MongoDB, CouchDB and related technologies we can get a faster query, so is this still relevant?

"A copy of transactional data specially restructured for queries and analyzes." (R. Kimball Data Warehouse Toolkit, 1996.

I mean, do we really need to restructure our data into an OLAP schema in order to query it for analysis purposes? More specifically, can you drill, slice and dice, and other reports for analysis with NoSQL (not necessarily with OLAP modeling)? Also is it possible to overcome the "data subset" querying OLAP limitation and reporting across the entire data universe using NoSQL?

+3


source to share


2 answers


In my assessment, OLAP subsets or structures will not go away and may become more prevalent for several reasons. In no particular order: f) Map-reduce is all you get in many cases. Mongodb is on a more stable foot with faster aggregate piping; u) The big problem with NoSQL is the lack of joins or relationships. This means that your underlying data hasto be ugly to support many OLAP reports; b) constructing it appropriately to "throw away" or volatile subsets of data simply to keep the master table / collection clean; a) NoSQL is great for redundant datasets: there is no need to create a table or even a schema, its dead simply to expand and kill the collections; r) NoSQL is heaps that scale more easily for additional dataset than SQL; d) An unexpected start can avoid the cost and resources required to support two db technologies (one for OLAP and one for OLTP); and b) you will find that your backend / frontend code is much simpler and manageable with massive datasets; and c) the unrivaled speed advantage of producing off-the-shelf datasets with their own provisional indexes.



+3


source


The answer to both questions is YES. 1. It is still fair to restructure transaction data for analysis. 2. You can use NoSQL to do whatever you asked.

As you only mentioned Query / Analysis / OLAP, I guess the only consideration here is to build a query / reporting framework. So an OLTP system and whether NoSQL can handle it or not is out of the question.

It is difficult to answer this question without having an associated context. Context such as whether you are creating this platform for a team, department, vertical, business line, etc. Organizations, or you create this platform for the entire organization as a central repository.



If you configure it for a team / department, then the volume is not large, fewer users will request it, the query rate is not so high that OLAP is still valid. But if the volume is loud and with high query rates and large numbers of users, and you see that you will need to scale in the future, then NoSQL will be your bet.

Also, if you are building an enterprise-level NoSQL framework. Let's say you're building an enterprise data warehouse or data lake that serves any audience in an organization. But within the organization, teams / departments can create their own OLAPs, creating a Data Mart to suit their own needs. Thus, OLAP and NoSQL remain valid in this case.

I would say it totally depends on your use case. To make a decision, various factors must be considered. Pros and cons always exist for any technology in question. There is no common answer for this kind of comparison. You need to answer questions such as: what are your data sources and their format; if they are structured, semi-structured, unstructured? Who are your users and how many; if there are multiple departments with different needs, if they need a separate dashboard, do they need to access other users' data? How much data will you be processing? What is the frequency of requests to the reporting platform? And there are many more questions you can ask yourself. When answering these questions, choose the one that works best for you.

+2


source







All Articles