How denormalization can be a NoSQL DB attribute

When discussing NoSQL DBMS with traditional DBMS, many articles say that in NoSQL-DB all related data is stored together, so joins are excluded. Thus, data retrieval is faster. In short, the data is denormalized. There are also disadvantages to denormalization. e.g. redundancy, extra space, need to update data in multiple locations, etc.

But regardless of the pros and cons of denormalization; it is an attribute of the database design. How can this be related to a specific type of DB? If it is okay to denormalize the data in this case, then the same can be achieved in RDBM.

So why is denormalization being discussed as a NoSQL db attribute?

+3


source to share


2 answers


Secondary John Saunders who can denormalize data in a DBMS also - denormalization is an attribute of most NoSQL databases ("most" means "excluding graph databases") because in many cases you MUST denormalize to get decent performance.

Continuing with our example, let's say that I have a Person record that has a foreign key to Car Car (one car per person in this example to keep things simple), which has a foreign key to the manufacturer record.For a given person, I want to record for this person, for your car and for your car manufacturer.

In a DBMS, I can normalize this data and get it all in one query using a join, or I can denormalize that data - a denormalized read will be slightly cheaper than a normalized read because the joins are not free, but in this case the difference in read performance is likely will not be significant.



My NoSQL database probably doesn't support joins, so if I normalize this data I have to do three separate searches for it, for example. using the keyed database, I will first retrieve the Person which contains the Car key, then I would take the car containing the manufacturer key, then I would get the manufacturer; if this data has been denormalized, I only need one lookup, so the performance improvement will be significant. In the rare case that a NoSQL database maintains connections, it is almost certainly agnostic, so the Person, Car, and Manufacturer records can be on different servers or even in different datacenters, making them very expensive.

Thus, an oversimplified breakdown of your options:

  • Traditional DBMS, good with normalized data, but difficult to scale
  • NoSQL database, relatively easily scalable, but a bit shit with normalized data
  • Distributed OLAP database (e.g. Aster, Greenplum), relatively easy to scale and good with normalized data, but very expensive
+5


source


You seem to be reading hype, not database design articles. You can denormalize any database. Yes, NoSQL is intended for cases where denormalized data is a good thing, such as when storing documents where nested documents are used instead of joins to another table. This works best if the subdocuments are not duplicated. Of course, if they are duplicated, then you have the usual denormalized data problems.

Example: a person uses a car. In a relational database you will have a Persons table and a Cars table and a navigation table, possibly CarsUsedByPerson. In a NoSQL system, you can have a car document embedded in a person document.



Of course, if two people are using the same car, then you have the same data in several places and you will need to update it in all such places, or it will be inconsistent.

NoSQL is designed for cases where you need performance more than you need consistency.

+4


source







All Articles