Scheme design

Let's say you are a GM dba and you have to design around GM models.

Is it better to do this?

  • table_model
    • type {cadillac, saturn, chevrolet}

Or that?

  • table_cadillac_model
  • table_saturn_model
  • table_chevrolet_model

Let's say that the business rows have the same columns for the model and that there are over a million records for each subtype.

EDIT:

  • there are many CRUDs
  • there are a lot of CPU intensive reports
  • any schema has a model_detail table containing 3-5 records for each model and the details for each model are different (you cannot add a cadillac part to a saturn model)
  • the development team has no problem with db complexity.
  • I'm not sure if this is a normalization issue. although the structures are the same, they can be viewed as different entities.

EDIT:

Reasons for splitting the structure into multiple tables - business lines may have different business rules for parts - addModelDetail () can be different for each business line (even if the data format is the same) - high add / update activity - better performance with a partitioned structure instead of a single structure (I'm guessing and not sure here)?

I think this is a variation of the EAV problem. As an EAV design, a unified table structure is generally voted as a bad idea. When done this way, a single table structure is generally considered a good idea. Interesting...

I think the most interesting answer has two different structures, one for crud and one for post. I think I'll try a concatenated / flattened view for reports and multiple tables for crud and see how it works.

0


source to share


11 replies


Definitely the first example. Do you want to add tables to your database whenever you add a new model to your range?



+10


source


In data with a large number of records (such as an OLTP application), it is better to have narrower tables (such as tables with fewer fields). There will be less conflict conflicts because you are writing small amounts of data to different tables.

So, according to the criteria you described, the table structure I would have is:

Vehicle
  VehicleType
  Other common fields

CadillacVehicle
  Fields specific to a Caddy

SaturnVehicle
  Fields specific to a Saturn

      



For reporting purposes, I will have a completely different database on a completely different server that does not have a normalized structure (for example, only the CadillacVehicle and SaturnVehicle tables with all the fields from the Vehicle table duplicated in them).

With proper indexes in place, even an OLTP database can be done in your SELECT, no matter there are tens of millions of rows. However, since you mentioned there are CPU intensive reports, so I will have a completely separate reporting database.

Last comment. About business rules ... the data warehouse doesn't care about business rules. If the business rules differ between the models, it really shouldn't affect your design decisions about the database schema (other than to help determine which fields are nullable and their data types).

+3


source


Use the first one. Setting up separate tables for specializations will complicate your code and provide no benefits that cannot be achieved in other ways. It will also greatly simplify your reporting.

+2


source


If the tables do have the same columns, then the former is the best way to do it. Even if they have different columns, you probably still want the shared columns in their own table and retain their type designation.

+1


source


You can try to have two separate databases.

One is OLTP (OnLine Transaction Processing), which must be highly normalized for the data model to be eminently correct. Report performance shouldn't be an issue and you will handle non-reporting reporting with indexes / denormalization etc. In each case. The data model should try to get close to the conceptual model.

Another reporting system that has to periodically pull data from the OLTP system, and mass and reorder that data in a way that simplifies and improves reporting performance. The data model should not try to mix too closely with the conceptual model. You should be able to regenerate all the data in the reporting database at any time from the data currently in the main database.

+1


source


I would say the first method looks better.

Are there any reasons why you might want to do it the second way?

The first method works best for normalization and is closer to how most relational database schemas evolve.

The second way seems to be more difficult to maintain.

If I really have a good reason for this, I would go with the first method.

+1


source


Given the description you gave us, the answer is also.

In other words, you did not give us enough information to give a decent answer. Please describe what queries you expect to perform on the data.

[Having said that, I think the answer will come first ;-) Since I am drawing, even if they are different models, the data for each model is likely to be very similar.

But that's a complete guess at the moment.]

Edit: Given your updated edit, I'd say the first one is definitely. Since they have all the same data, they must go to the same table.

0


source


Another thing to consider when determining "is better" is whether end-users will access this data directly? Extremely normalized data for end users is difficult. Of course this can be overcome with views, but you still need to think about how you complete your design.

I agree with two other people who answered, which form is "better" is subjective and depends on what you hope to achieve. If you are hoping to get very fast requests, this is one thing. If you are hoping to achieve high programmer productivity, this is a different goal and possibly conflicts with fast queries.

0


source


The choice depends on the required performance. The best database is a normalized database. But in a normalized database there may be performance issues, then you should denormalize it. The "Normalize first, denormalize for performance" principle works well.

0


source


It depends on the datamodel and use case. If you ever need to report a query that wants to get data from "models", the former is preferable, because otherwise (with the latter) you will have to change the query (include a new table) every time you add a new model.

Oh, and by "former" we mean this option:

table_model
* type {cadillac, saturn, chevrolet}

      

0


source


@mson asked the question What do you do when the question is not satisfactorily answered by SO? "which is a direct link to the existing answers to this question.

I contributed the following response to this discussion, primarily by criticizing the problem question.


Quote (verbatim):

I went over the original question yesterday and decided not to provide an answer.

One problem was the use of the term "model" as in "GM models" that cited "Chevrolet, Saturn, Cadillac" as "models." As far as I understand, these are not models at all; they are "brands", although there may be an industry term for them that I am not familiar with, such as "separation". The model will be "Saturn Vue" or "Chevrolet Impala" or "Cadillac Escalade". Indeed, there may well be models at a more detailed level than these, for example, the various Saturn Vue variants.

So, I didn't think the starting point was well framed. I did not criticize this; it wasn't entirely convincing and there were answers and I let other people try.

The next problem is that it is not clear what your DBMS will store as data. If you are storing a million records for a "model" ("brand"), what data do you mean? Lurking in the background is a different scenario - a real scenario - and your question used an analogy that was not realistic enough. This means that the "it depends" part of the answer is much more extensive than "this is how to do it". There is only very little information about the data that needs to be modeled so that we can guess which might be better.

Ultimately, it will depend on what people are using for the data. If information travels in all different directions (different data structures in different brands, different data structures at the car model level, different structures for different dealers - Chevrolet dealers are treated differently than Saturn dealers and Cadillac dealers), then the integrated structure provides limited benefit. If everything is the same throughout, then an integrated structure will be of great benefit.

Are there legal reasons (or advantages) for separating data? To what extent do different brands separate legal entities in which common records can be a liability? Are there privacy concerns so it will be easier to control access to data if the data for individual brands is stored separately?

Without more detailed information about the scenarios being modeled, no one can provide a reliable general answer - at least not more than the top voter already (or does not) give.

  • Data modeling is not easy.
  • Data modeling without sufficient information cannot be done reliably.

I copied the material here as it is more directly relevant. I think that in order to satisfactorily answer this question, much more context needs to be given. And it is possible that there should be enough additional context to make SO the wrong place to ask this. SO has its limitations, and one of them is that it cannot resolve issues that require lengthy explanations.

From the SO FAQ page:

What questions can I ask here?

Programming questions, of course! While your question is:

  • detailed and specific
  • written clearly and simply
  • interest in at least one other programmer somewhere

...

What questions can I not ask here?

Avoid asking questions that are subjective, argumentative, or require extended discussion. This is the place for questions that can be answered!

This issue, IMO, is close to the limit "requires extended discussion".

0


source







All Articles