How to search for comma separated data in mongodb

I have a database of movies with different fields. the Genre field contains a comma delimited string, for example:

{genre: 'Action, Adventure, Sci-Fi'}


I know that I can use a regular expression to find matches. I've also tried:

{'genre': {'$in': genre}}


the problem is the running time. it will take a long time to return the query result. the database contains about 300 thousand documents, and I did normal indexing in the "genre" field.


source to share

2 answers

Let's say use Map-Reduce to create a separate collection that is stored as an array with values ​​coming from a comma-separated string, which you can then run a Map-Reduce job and manage requests on the output collection. genre

For example, I've created several example documents in a collection foo

    {genre: 'Action, Adventure, Sci-Fi'},
    {genre: 'Thriller, Romantic'},
    {genre: 'Comedy, Action'}


The following map / reduce operation will then create a collection from which you can apply executable queries:

map = function() {
    var array = this.genre.split(/\s*,\s*/);
    emit(this._id, array);

reduce = function(key, values) {
    return values;

result = db.runCommand({
    "mapreduce" : "foo", 
    "map" : map,
    "reduce" : reduce,
    "out" : "foo_result"


The query will be simple using index queries with multiple keys per field value


db.foo_result.createIndex({"value": 1});

var genre = ['Action', 'Adventure'];
db.foo_result.find({'value': {'$in': genre}})



/* 0 */
    "_id" : ObjectId("55842af93cab061ff5c618ce"),
    "value" : [ 

/* 1 */
    "_id" : ObjectId("55842af93cab061ff5c618d0"),
    "value" : [ 




Well, you can't do it efficiently, so I'm glad you used the "performance" tag in your question.

If you want to do it with "comma separated data" on a string, you need to do this:

Or with a regular expression in general, if it fits:

db.collection.find({ "genre": { "$regex": "Sci-Fi" } })


But not very effective.

Or by evaluating JavaScript via : $where

db.collection.find(function() {
     return ( 
             .map(function(el) { 
                 return el.replace(/^\s+/,"") 
             .indexOf("Sci-Fi") != -1;


Ineffective and probably equal to above.

Or better yet, and something that can use an index separate to the array and use the underlying query:

    "genre": [ "Action", "Adventure", "Sci-Fi" ] 


With index:

db.collection.ensureIndex({ "genre": 1 })


Then the request:

db.collection.find({ "genre": "Sci-Fi" })


This is when you make it that easy. The efficiency really is .

You make a choice.



All Articles