Mongodb - Multiple text index: index key too large error code 67

Question

Mongodb - Multiple text index: index key too large error code 67

I have the following Mongodb database structure:

{ 
    "_id" : "519817e508a16b447c00020e", 
    "keyword" : "Just an example query", 
    "rankings" : 
    {
        results:
        {
            "1" : { "domain" : "example1.com", "href" : "http://www.example1.com/"},
            "2" : { "domain" : "example2.com", "href" : "http://www.example2.com/"},
            "3" : { "domain" : "example3.com", "href" : "http://www.example3.com/"},
            "4" : { "domain" : "example4.com", "href" : "http://www.example4.com/"},
            "5" : { "domain" : "example5.com", "href" : "http://www.example5.com/"},
            ...
            ...
            "99" : { "domain" : "example99.com", "href" : "http://www.example99.com/"}
            "100" : {"domain" : "example100.com", "href" : "http://www.example100.com/"}
        }, 
        "plus":"many", 
        "other":"not", 
        "interesting" : "stuff", 
        "for": "this question"
    }
}

In a previous question, I asked how to index text so that I can search for keyword and domain using for example

db.ranking.find({ $text: { $search: "\"example9.com\" \"Just an example query\""}})

John Petron's amazing answer was:

db.ranking.ensureIndex(
{
    "keyword": "text",
    "rankings.results.1.domain" : "text",
    "rankings.results.2.domain" : "text",
    ...
    ...
    "rankings.results.99.domain" : "text",
    "rankings.results.100.domain" : "text"
}

However, if it works just fine when I have 10 results, I ran into an "too big index template error" code 67 from the Mongo shell when I try to index 100 results.

So the big question is:

How (the heck) can I decide if the error is "too big an index error"?

EDIT: 18/08/2014 Document structure clarified

{ 
    "_id" : "519817e508a16b447c00020e", #From Mongodb
    "keyword" : "Just an example query", 
    "date" : "2014-03-28"
    "rankings" :
    {
            "1" : { "domain" : "example1.com", "href" : "http://www.example1.com/", "plus" : "stuff1"},
            ...
            "100" : {"domain" : "example100.com", "href" : "http://www.example100.com/"plus" : "stuff100"}"}
    }, 
    "plus":"many", 
    "other":"not", 
    "interesting" : "stuff", 
    "for": "this question"
}

+3

indexing mongodb nosql mongodb-query

antoinet 16 Aug '14 at 1:23

source to share

2 answers

Problem with suggested structure:

{
 keyword" : "Just an example query", 
 "rankings" :
    [{"rank" : 1, "domain" : "example1.com", "href" : "example1.com"},
     ...{ "rank" : 99, "domain" : "example99.com", "href" : "example99.com"}
 ]}
}

This is what you can now do

db.ranking.ensureIndex({"rankings.href":"text", "rankings.domain":"text"})

and then run queries like:

db.ranking.find({$text:{$search:"example1"}});

now the entire array document will be returned with the array element matched.

You might want to consider linking so that each ranking result is a separate document and keywords and other metadata are referenced to avoid repeating a lot of information.

So, you have a document with keywords / metadata, for example:

{_id:1, "keyword":"example query", "querydate": date, "other stuff":"other meta data"},
{_id:2, "keyword":"example query 2", "querydate": date, "other stuff":"other meta data 2"}

and then document the results, for example:

{keyword_id:1, {"rank" : 1, "domain" : "example1.com", "href" : "example1.com"},
... keyword_id:1, {"rank" : 99, "domain" : "example99.com", "href" : "example99.com"},
 keyword_id:2, {"rank" : 1, "domain" : "example1.com", "href" : "example1.com"},
 ...keyword_id:2, {"rank" : 99, "domain" : "example99.com", "href" : "example99.com"}}

where keyword_id refers to (links) the keywords / metadata table - obviously, in practice, _ids will look like "_id": "519817e508a16b447c00020e", but that's just for readability. Now you can index by keyword, domain and href either together or separately, depending on your query types, and you don't get index key pattern too large error

and you only get one matching document, not the whole array.

I am not quite clear where you want fuzzy / regex search queries and whether you are looking for metadata or just href and domain, but I think this structure should be a cleaner way to start thinking about non-maximizing indexing as and before. It will also allow you to combine regular index finds with text indexes depending on your query pattern.

Perhaps you can find this answer MongoDB relationship: embed or reference? useful when you are reviewing the struture document.

+1

John Powell aka Barça 17 Aug 14 at 10:59

source to share

antoinet · Accepted Answer · 2014-08-27T11:38:11+0000

So this is my solution: I decided to stick with the inline document with an overly simple modification: Replacing dictionary keys containing the actual rank with an array containing the rank, and that's it:

{ 
  "_id" : "519817e508a16b447c00020e", #From Mongodb
  "keyword" : "Just an example query", 
  "date" : "2014-03-28"
  "rankings" :
  [
    { 
      "domain" : "example1.com", "href" : "http://www.example1.com/", "plus" : "stuff1", "rank" : 1
    },
    ...
    {
      "domain" : "example100.com", "href" : "http://www.example100.com/"plus" : "stuff100", "rank" : 100
    }
  ]
  "plus":"many", 
  "more":"uninteresting", 
  "stuff" : "for", 
  "this": "question"
}

Then I can select the entire document using for example:

> db.ranking.find({"keyword":"how are you doing", "rank_date" : "2014-08-27")

Or one result using projections, which is simply awesome, and a new feature in Mongodb 2.6: -D

> db.collection.find({ "rank_date" : "2014-04-09", "rankings.href": "http://www.example100.com/" }, { "rankings.$": 1 })

  [
    { 
      "domain" : "example100.com", "href" : "http://www.example100.com/", "plus" : "stuff100", "rank" : 100
    },
  ]

And even get one url rank at once:

> db.collection.find({"rank_date" : "2014-04-09", "rankings.href": "http://www.example5.com/"}, { "rankings.$": 1 })[0]['rankings'][0]['rank']
5

And finally, I also create an index based on the url:

> db.collection.ensureIndex( {"rankings.href" : "text"} )

With an index, I can either search for a single url, partial url, subdomain, or an entire domain, to just fine:

> db.collection.find({ $text: { $search: "example5.com"}})

And it really is! Thanks a lot for helping everyone, especially @ JohnBarça: -D

Mongodb - Multiple text index: index key too large error code 67

More articles: