SOLR Out Of Memory error while reading / indexing a large index

I am running an OOM error with Solr when indexing large amounts of data. I know the general advice would be to split the index into shards, but in fact it is already the case. I am indexing shards and further splitting is not an option at the moment. I want to understand what is going on and why I am getting this error and if there is anything I can do other than splitting or providing more RAM.

I would be sad if the RAM consumption was linear (or worse) in this case, I would prefer it to be sublinear.

The thing is, I'm indexing documents with random strings (so the dictionary is very large). Each document has a pair of fields of 20-30 characters and one field of about 200-500 characters. The size of the index in each shard is about 250-260 GB, each solr instance processing this index has about 4 GB of memory. When OOM happened, after restarting Solr, HeapDump looked about the same, so it's probably not tied to indexing but Solr Searcher. Shortly before OOM, the largest heapdump objects look like this:

<tree type="Heap walker - Biggest objects">
  <object leaf="false" class="org.apache.solr.core.SolrCore" objectId="0xf02c" type="instance" retainedBytes="120456864" retainedPercent="97.4">
    <outgoing leaf="false" class="org.apache.solr.search.SolrIndexSearcher" objectId="0xfb52" type="instance" retainedBytes="120383232" retainedPercent="97.3" referenceType="not specified" referenceName="[transitive reference]">
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x1018e" type="instance" retainedBytes="8161688" retainedPercent="6.6" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10185" type="instance" retainedBytes="8148072" retainedPercent="6.6" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10188" type="instance" retainedBytes="8138232" retainedPercent="6.6" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10186" type="instance" retainedBytes="8129160" retainedPercent="6.6" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10191" type="instance" retainedBytes="8124608" retainedPercent="6.6" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x1018a" type="instance" retainedBytes="8123144" retainedPercent="6.6" referenceType="not specified" referenceName="[transitive reference]"/>

      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10192" type="instance" retainedBytes="8100904" retainedPercent="6.5" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10190" type="instance" retainedBytes="8097984" retainedPercent="6.5" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x1018b" type="instance" retainedBytes="8096160" retainedPercent="6.5" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x1018d" type="instance" retainedBytes="8081656" retainedPercent="6.5" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10187" type="instance" retainedBytes="8042504" retainedPercent="6.5" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x1018c" type="instance" retainedBytes="8039336" retainedPercent="6.5" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10189" type="instance" retainedBytes="8036952" retainedPercent="6.5" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x1018f" type="instance" retainedBytes="7948568" retainedPercent="6.4" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10195" type="instance" retainedBytes="832448" retainedPercent="0.7" referenceType="not specified" referenceName="[transitive reference]"/>

      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10196" type="instance" retainedBytes="830584" retainedPercent="0.7" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10194" type="instance" retainedBytes="829232" retainedPercent="0.7" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10197" type="instance" retainedBytes="828808" retainedPercent="0.7" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10198" type="instance" retainedBytes="827312" retainedPercent="0.7" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10199" type="instance" retainedBytes="824736" retainedPercent="0.7" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x1019a" type="instance" retainedBytes="822608" retainedPercent="0.7" referenceType="not specified" referenceName="[transitive reference]"/>
      <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10193" type="instance" retainedBytes="783424" retainedPercent="0.6" referenceType="not specified" referenceName="[transitive reference]"/>
      <cutoff objectCount="96" totalSizeBytes="534976" maximumSingleSizeBytes="87560"/>
    </outgoing>

    <cutoff objectCount="53" totalSizeBytes="73496" maximumSingleSizeBytes="40992"/>
  </object>
  <object leaf="false" class="org.mortbay.jetty.webapp.WebAppClassLoader" objectId="0xdf88" type="instance" retainedBytes="420208" retainedPercent="0.3"/>
  <object leaf="false" class="org.apache.solr.core.SolrConfig" objectId="0xe5f5" type="instance" retainedBytes="184976" retainedPercent="0.1"/>
 ..... 

      

A simple jmap dump looks like this:

Attaching to process ID 27000, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 20.5-b03

using thread-local object allocation.
Parallel GC with 2 thread(s)

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 268435456 (256.0MB)
   NewSize          = 1310720 (1.25MB)
   MaxNewSize       = 17592186044415 MB
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 31719424 (30.25MB)
   used     = 17420488 (16.61347198486328MB)
   free     = 14298936 (13.636528015136719MB)
   54.92056854500258% used
From Space:
   capacity = 26673152 (25.4375MB)
   used     = 10550856 (10.062080383300781MB)
   free     = 16122296 (15.375419616699219MB)
   39.55608995892199% used
To Space:
   capacity = 27000832 (25.75MB)
   used     = 0 (0.0MB)
   free     = 27000832 (25.75MB)
   0.0% used
PS Old Generation
   capacity = 178978816 (170.6875MB)
   used     = 168585552 (160.7757110595703MB)
   free     = 10393264 (9.911788940429688MB)
   94.19302002757689% used
PS Perm Generation
   capacity = 42008576 (40.0625MB)
   used     = 41690016 (39.758697509765625MB)
   free     = 318560 (0.303802490234375MB)
   99.24167865152106% used

      

I don't see anything here that could give me any clues as to how to deal with this other than just providing more RAM, which is generally not a solution, I would like to know what is going on, why Searcher and it ReadOnlySegmentReaders take all the memory and is it really needed, can I do something about it?

UPDATE: I ran a test with a much smaller dictionary of about 150k words (not random words), I reached an index size of about 350GB and there is no OOME so this is not directly related to the index size, it might have to do more with the term vector size ( unique terms). But still I would like to understand the limitations that I have and how I can get around them.

+3


source to share


1 answer


It is up to you to ensure that all documents are indexed on every shard of your server farm. There is no built-in support for distributed indexing, but your method can be as simple as the round robin method: index each document to the next server in the circle. A simple hashing system will work as well, and the Solr Wiki suggests a uniqueId.hashCode ()% numServers as an adequate hashing function.



Note that Solr does not compute universal term / doc frequencies. On a large scale, it is unlikely that tf / idf is computed at the shard level, however, if your collection is heavily distorted across servers, you may end up with relevancy results. Its probably best to randomly distribute documents to your shards. note → → →> try using hashcode instead of random strings to index documents

0


source







All Articles