Mysql one-to-many table versus database with key
I am creating a comment system where a comment can have many responses.
If I implemented this in mysql, I would build a table comments
and have columns:
-
comment_id
, -
parent_comment_id
...
If parent comment id is 0 for the comment and is the parent comment id for the reply. So if I were looking for answers for a specific comment, I would look for comments that parent_comment_id
match the comment I'm looking for.
This seems like overkill to me, as I would have to go through the whole table comments
to see if a comment responds or not (especially for big data). Where, if I have a keystore database, I will have a key for the comment id, and inside it will be a list of replies ordered by date.
So which approach do you think is the best for this problem?
Also, I would like to generalize the problem to any one-to-many relationship that would be stored as a list in the keystore database. And if you recommend using a keystore database, which one would you recommend for big data? (I don't want to use redis for this as it is in memory and I doubt that comment responses should be available often.)
Thanks for answers.
source to share
A relational database should do a great job with this "adjacency list" model.
First of all, don't use 0 in the parent_comment_id
"root" comment, use NULL. Then you can create a FOREIGN KEY from parent_comment_id
to comment_id
, which will prevent you from mistakenly adding a response to a non-existent comment.
I will need to go through the entire comment table to see if a comment responds or not.
Assuming you specified parent_comment_id
(which InnoDB did automatically if you created the FK above), finding the first level of answers for a given comment would require a range scan... To understand index range scans and why they are effective, you first need to understand the Anatomy of a SQL Index .
Searching for the second level will require another scan of the range, etc. Unfortunately MySQL does not support a recursive query that will allow you to do all of this in one database round trip, but it will still be quite efficient.
If you have taken measurements and come to the conclusion that there are other strategies for representing the hierarchy (with various trade-offs) such as "nested sets" and "closings". Take a look at this presentation by Bill Karwin .
source to share
In fact, most relational databases don't have to go through all the comments to find out what answers a given comment. After all, all these types of queries are fairly frequent and highly optimized. Also consider creating an index above parent_comment_id
. Again, this only works if you have one level of control. If you can have a comment that will be commented in turn, perhaps another storage medium will serve you better.
source to share
Perspective on Branko's answer. the index on the parent field is good. In this case, NULLS perform better than zero. Plus referential integrity will help you more than it hurts you.
Several additional points.
If you are using nested-sets and not your existing contiguous list approach, you should be able to search the entire subtree from responses and responses to responses, etc., rather than just immediate responses. This might be helpful.
Second, there is a data structure known as a "forest". It is a table containing a set of trees, where each tree has a parentless comment as its root in this case. A web search should give you some good articles on designing a discussion forest, where each discussion starts with a comment and each discussion is a response tree. Many people have developed this particular case.
source to share
you can create tables and make them more flexible.
comments => comment_id, the_comment, count_replays
comments_replay => parent_id, the_comment
when there is a repetition for a comment, there is an update for count_replays.
and now you can make an if statement if there are repetitions and only then ask for them.
source to share