How to search for comma separated data in mongodb

I have a database of movies with different fields. the Genre field contains a comma delimited string, for example:

{genre: 'Action, Adventure, Sci-Fi'}

      

I know that I can use a regular expression to find matches. I've also tried:

{'genre': {'$in': genre}}

      

the problem is the running time. it will take a long time to return the query result. the database contains about 300 thousand documents, and I did normal indexing in the "genre" field.

+3


source to share


2 answers


Let's say use Map-Reduce to create a separate collection that is stored as an array with values ​​coming from a comma-separated string, which you can then run a Map-Reduce job and manage requests on the output collection. genre

For example, I've created several example documents in a collection foo

:

db.foo.insert([
    {genre: 'Action, Adventure, Sci-Fi'},
    {genre: 'Thriller, Romantic'},
    {genre: 'Comedy, Action'}
])

      

The following map / reduce operation will then create a collection from which you can apply executable queries:

map = function() {
    var array = this.genre.split(/\s*,\s*/);
    emit(this._id, array);
}

reduce = function(key, values) {
    return values;
}

result = db.runCommand({
    "mapreduce" : "foo", 
    "map" : map,
    "reduce" : reduce,
    "out" : "foo_result"
});

      



The query will be simple using index queries with multiple keys per field value

:

db.foo_result.createIndex({"value": 1});

var genre = ['Action', 'Adventure'];
db.foo_result.find({'value': {'$in': genre}})

      

Output

/* 0 */
{
    "_id" : ObjectId("55842af93cab061ff5c618ce"),
    "value" : [ 
        "Action", 
        "Adventure", 
        "Sci-Fi"
    ]
}

/* 1 */
{
    "_id" : ObjectId("55842af93cab061ff5c618d0"),
    "value" : [ 
        "Comedy", 
        "Action"
    ]
}

      

+3


source


Well, you can't do it efficiently, so I'm glad you used the "performance" tag in your question.

If you want to do it with "comma separated data" on a string, you need to do this:

Or with a regular expression in general, if it fits:

db.collection.find({ "genre": { "$regex": "Sci-Fi" } })

      

But not very effective.

Or by evaluating JavaScript via : $where

db.collection.find(function() {
     return ( 
         this.genre.split(",")
             .map(function(el) { 
                 return el.replace(/^\s+/,"") 
             })
             .indexOf("Sci-Fi") != -1;
    )
})

      

Ineffective and probably equal to above.



Or better yet, and something that can use an index separate to the array and use the underlying query:

{
    "genre": [ "Action", "Adventure", "Sci-Fi" ] 
}

      

With index:

db.collection.ensureIndex({ "genre": 1 })

      

Then the request:

db.collection.find({ "genre": "Sci-Fi" })

      

This is when you make it that easy. The efficiency really is .

You make a choice.

0


source







All Articles