How to search for comma separated data in mongodb
I have a database of movies with different fields. the Genre field contains a comma delimited string, for example:
{genre: 'Action, Adventure, Sci-Fi'}
I know that I can use a regular expression to find matches. I've also tried:
{'genre': {'$in': genre}}
the problem is the running time. it will take a long time to return the query result. the database contains about 300 thousand documents, and I did normal indexing in the "genre" field.
source to share
Let's say use Map-Reduce to create a separate collection that is stored as an array with values coming from a comma-separated string, which you can then run a Map-Reduce job and manage requests on the output collection. genre
For example, I've created several example documents in a collection foo
:
db.foo.insert([
{genre: 'Action, Adventure, Sci-Fi'},
{genre: 'Thriller, Romantic'},
{genre: 'Comedy, Action'}
])
The following map / reduce operation will then create a collection from which you can apply executable queries:
map = function() {
var array = this.genre.split(/\s*,\s*/);
emit(this._id, array);
}
reduce = function(key, values) {
return values;
}
result = db.runCommand({
"mapreduce" : "foo",
"map" : map,
"reduce" : reduce,
"out" : "foo_result"
});
The query will be simple using index queries with multiple keys per field value
:
db.foo_result.createIndex({"value": 1});
var genre = ['Action', 'Adventure'];
db.foo_result.find({'value': {'$in': genre}})
Output
/* 0 */
{
"_id" : ObjectId("55842af93cab061ff5c618ce"),
"value" : [
"Action",
"Adventure",
"Sci-Fi"
]
}
/* 1 */
{
"_id" : ObjectId("55842af93cab061ff5c618d0"),
"value" : [
"Comedy",
"Action"
]
}
source to share
Well, you can't do it efficiently, so I'm glad you used the "performance" tag in your question.
If you want to do it with "comma separated data" on a string, you need to do this:
Or with a regular expression in general, if it fits:
db.collection.find({ "genre": { "$regex": "Sci-Fi" } })
But not very effective.
Or by evaluating JavaScript via : $where
db.collection.find(function() {
return (
this.genre.split(",")
.map(function(el) {
return el.replace(/^\s+/,"")
})
.indexOf("Sci-Fi") != -1;
)
})
Ineffective and probably equal to above.
Or better yet, and something that can use an index separate to the array and use the underlying query:
{
"genre": [ "Action", "Adventure", "Sci-Fi" ]
}
With index:
db.collection.ensureIndex({ "genre": 1 })
Then the request:
db.collection.find({ "genre": "Sci-Fi" })
This is when you make it that easy. The efficiency really is .
You make a choice.