Is it good or bad practice to have multiple foreign keys in one table when other tables can be connected using joins?

Let's say I wanted to create a database that could be used to track bank accounts and transactions for a user. A database that can be used in the Checkbook app.

If I have a user table with the following properties:

  • user_id
  • Email
  • password

And then I create an accounts table that can be associated with a specific user:

  • account_id
  • account_description
  • account_balance
  • user_id

And to go to the next step, I create a transaction table:

  • TRANSACTION_ID
  • transaction_description
  • is_withdrawal
  • account_id // Account that this transaction belongs to.
  • user_id // User who owns this transaction.

Does user_id in transaction table have a good option? This would make the query cleaner if I wanted to get all transactions for each user, e.g .:

SELECT * FROM transactions
JOIN users ON users.user_id = transactions.user_id

      

Or, I could just go back to the users table from the accounts table

SELECT * FROM transactions
JOIN accounts ON accounts.account_id = transactions.account_id
JOIN users ON users.user_id = accounts.user_id

      

I know the first query is much cleaner, but is this the best way to go?

It worries me that by having this extra column in the transaction table I wasted space when I can achieve the same result without the specified column.

+3


source to share


5 answers


Let's look at it from a different angle. Where does a request or series of requests begin? Once you have customer information, you can get account information and then transaction information, or just a transaction per customer. You need all three tables for meaningful information. If you have credential information, you can get transaction information and a pointer to the customer. But to get customer information, you need to go to the customer table, so you still need all three tables. If you have transaction information, you can get the account information, but it is pointless without the customer information, or you can get the customer information without the account information, but transactions for the customer are useless without the account information.

You can trim it anyway, the information needed for any possible use is shared across three tables and you will need to access all three to get meaningful information, not just a data dump.



The presence of the FK client in the transaction table may provide you with a way to make a "clean" query, but the result of that query is of questionable utility. So you got nothing. I have been working on writing Anti-Money Laundering (AML) scanners for an international credit card company so I am not hypothetical. In any case, you will always need all three tables.

Btw, the fact that there are FKs in the first place tells me that the question is about an OLTP environment. An OLAP (data warehouse) environment does not need FKs or any other data integrity checks since the warehouse data is static. The data comes from an OLTP environment where data integrity checks have already been done. So, you can denormalize your content. Therefore, let's not provide answers that are applicable to an OLAP environment to a question regarding an OLTP environment.

+3


source


You shouldn't use two foreign keys in the same table. This is not good database design.

The user makes transactions through the account. This is how it is logically done; therefore this is how the database should be designed.



Using unions is how it should be done. You shouldn't use the key user_id

as it is already in the accounts table.

White space is not necessary and is bad database design.

+2


source


Denormalizing is usually a bad idea. First, it is often not faster than the performance standard. What it does is put data integrity at risk, which can create serious problems if you end up changing the 1-1 relationship to 1-many.

For example, what to say that each account will have only one user? In your table design, that's all you could get, and that's what I find suspicious right off the bat. Accounts on my system can have thousands of users. So this is the first thing I ask your model. Did you really think that there should be 1-1 or 1-many in between? Or did you just do an asssumpltion? Datamodels are NOT easy to set up after you have millions of records, you need to plan for the future much more in database design and think much more about data needs over time than application design.

But suppose you have a one-to-one relationship. And three months after you go live, you get a new account where they need to have 3 users. Now you must remember all the places that you denornmalized in order to correct the data correctly. This can create a lot of confusion as you will inevitably forget some of them.

Also, even if you never need to go to a more robust model, how are you going to maintain this if the user_id changes as they will do often. Now, to maintain data integrity, you need to have a trigger to persist the data as it changes. Even worse, if the data can be changed from any table, you can get inconsistent changes. How do you deal with them?

So you've created a mess for maintenance and possibly risked your data integrity to write cleaner code and save all ten seconds of writing a connection? You are not getting anything in terms of things that are important to database design, such as performance or security or data integrity, and you risk a lot. What is myopia?

You need to stop thinking in terms of "clean code" when developing databases. The best query code is often the most complex because it is the most efficient and database critical. Don't design object-oriented coding techniques in database development, they are two very different things with very different needs. You need to start thinking about how this is going to happen as data changes that you obviously don't, or you wouldn't even think about. You need to think more about the importance of data and the Principles of Software Engineering, which are taught as if they apply to everything, but don't really apply very well to databases.

+2


source


It depends. If you can get the data fast enough, use the normalized version (where the user_id is NOT in the transaction table). If you're concerned about performance, include user_ID. It will use more space in the database by storing redundant information, but you can get the data back faster.

EDIT

There are several factors to consider when deciding whether to denormalize a data structure. Each situation must be considered unambiguously; there is no answer without looking at the specific situation (hence the "It Depends" that starts this answer). For the simple example above, denormalization is probably not the optimal solution.

+1


source


In my opinion, if you have a simple Many-to-Many relationship, just use two primary keys and that's it.

Otherwise, if you have a Many-to-Many relationship with additional columns, use one primary key and two foreign keys. It is easier to manage this table as a single Entity like Doctrine does. In general, simple many-to-many relationships are rare and are only useful for linking two tables.

+1


source







All Articles