Index strategy for queries with dynamic match criteria
I have a collection that stores machine data as well as mobile data, the data is written to a pipe and maintained at the same level without embedding an object, the structure looks like this
{
"Id": ObjectId("544e4b0ae4b039d388a2ae3a"),
"DeviceTypeId":"DeviceType1",
"DeviceTypeParentId":"Parent1",
"DeviceId":"D1",
"ChannelName": "Login",
"Timestamp": ISODate("2013-07-23T19:44:09Z"),
"Country": "India",
"Region": "Maharashtra",
"City": "Nasik",
"Latitude": 13.22,
"Longitude": 56.32,
//and more 10 - 15 fields
}
Most of the queries are aggregation queries used for Analytics dashboard and real-time analysis, the $ match pipeline looks like this
{$match:{"DeviceTypeId":{"$in":["DeviceType1"]},"Timestamp":{"$gte":ISODate("2013-07-23T00:00:00Z"),"$lt":ISODate("2013-08-23T00:00:00Z")}}}
or
{$match:{"DeviceTypeParentId":{"$in":["Parent1"]},"Timestamp":{"$gte":ISODate("2013-07-23T00:00:00Z"),"$lt":ISODate("2013-08-23T00:00:00Z")}}}
and many of my DAL find and findOne queries are mostly about DeviceType or DeviceTypeParentId criteria .
The collection is huge and growing, I used a composite index to support these queries, the indexes are as follows
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "DB.channel_data"
},
{
"v" : 1,
"key" : {
"DeviceType" : 1,
"Timestamp" : 1
},
"name" : "DeviceType_1_Timestamp_1",
"ns" : "DB.channel_data"
},
{
"v" : 1,
"key" : {
"DeviceTypeParentId" : 1,
"Timestamp" : 1
},
"name" : "DeviceTypeParentId_1_Timestamp_1",
"ns" : "DB.channel_data"
}
]
Now we will add support for matching criteria in DeviceId , and if I follow the same strategy as for DeviceType and DeviceTypeParentId , not good since I fell under my current approach, I create a lot of indexes and most of them will be the same and huge. So their a good way to do indexing. I've read a bit about Index Intersection Index but don't know how that would be helpful.
If any wrong approach is followed by me please point this out as this is my first project and first time using MongoDB.
source to share
All of these indexes look good for your queries, including the new ones you are proposing. Three separate indexes supporting your three kinds of queries are the best option for fast queries. You can put indexes on each field and let the planner use index intersection, but it won't be as good as composite indexes. The indexes are not the same as they support different queries.
I think the real question is, are (apparently) large amounts of index memory actually an issue at the moment? Are you having a lot of page faults due to paging indexes and data from disk?
source to share