Mongo db - schema for implementing similar functionality in webapp
I have a mongodb database where I got 2 collections. posts
and users
.
posts The json structure is like
{title:"Title", content:"content goes here", postedby: "userid"}
and users are like
{username:"", name:""}
Now I need to implement a similar function where users like posts.
Solution 1
I can put an internal array in users like
{username:"", name:"", likes:[postid1,postid2..]}
the problem is that it easily asks for messages that the user liked. But it's hard to get people to like the article.
Solution 2
I can put an internal array in messages like
{title:"Title", content:"content goes here", postedby: "userid", like:[userid1,userid2 ..]}
the problem here is it's easy to get people to like the article. But it's difficult to query for messages that the user liked.
How can I solve this? I am currently thinking about having both ways. As well as storing internal arrays in both collections. I know I am storing redundant data, is this the best way to approach this problem?
I personally would not use an array like this.
It's all too common to love to grow out of control with someone who likes too many messages; to the point that it might interfere with the amount of top-level user data that you might be able to store in this document.
You should also consider your request template here. Most likely, you will want to do some kind of graph combining of similar users by multiple users. Currently, for dynamic use, you must use the aggregation framework: http://docs.mongodb.org/manual/applications/aggregation/ (preliminary data: http://docs.mongodb.org/manual/use-cases/pre- aggregated-reports / would also be a useful tool here, but I'll skip that) using $unwind
.
$unwind
is RAM which can slow down with remote aggregation of many users, especially if each user sits at least 1000 likes (50x1000 already hits the memory limit for $unwind
and post $group
$sort
which has a memory limit of 10% of system memory). In general, the aggregation framework will not be a peformant method to query those likes.
MongoDB can easily store this structure though, evne in its gorwing form, since the subdocument is like maybe 12 bytes per record, so you can just use cardinality 2 sizes ( http://docs.mongodb.org/manual / reference / command / collMod / # usePowerOf2Sizes ) to fix the problems you usually get (fragmentation) using framework.
So with that in mind, I would keep the preferences in a separate collection. It is true that you will lose a single round trip record where there are similar documents in the custom document, but I believe what I said above is worth it.
The important question you are asking yourself are the different ways in which you will need to obtain this data?
You can query users who liked a particular page from user.find({"likes":postId})
in the first case, and the opposite query of that in the second case. But is this a good idea? You want to avoid the ever-growing documents in MongoDB, plus you probably don't want to know for a specific user all the pages they like and for a specific page all the users who like it.
So how about storing likes in your own collection, and keeping aggregates (i.e. counts) in user and page collections? You also have the option to keep the most recent "N" s on the page or whatever is most useful to your application and its performance.
It is rarely possible to create a "perfect" schema in MongoDB without knowing the use case (ie read and write patterns) and what the requirements are.
I think just storing the array you like in the mail document would be good enough.
You can get messages that the user likes using the type field. Performance is also good if you have an index on a field like this.
The only drawback is that with this approach, the size of the post object changes depending on the length of such an array. Mongo is not very good at dealing with such data structures, so if you have thousands of likes to post posts, all IDs can slow down polling performance, but overall posts do not have what many people like and your overall system will work. ok i believe. You might consider limiting the number of liked IDs for a post (for example, keeping the last user ID 1000) to ensure that the document size doesn't grow unusually.