Solr multicore vs sharding vs 1 big collection

I currently have one collection with 40 million documents and an index size of 25 GB. Collections are updated every n minutes, and as a result, the number of deleted documents is constantly growing. The data in the collection is a union of over 1000+ customer records. The number of documents per client averages about 100,000 records.

Now as I speak, I am trying to get information about the growing remote size of the document. The growing index used up disk space and memory. And I would like to reduce it to a manageable size.

I was thinking about splitting the data into multiple cores, 1 for each client. This will allow me to easily manage a small collection and quickly create and update a collection. My concern is that the number of collections can be a problem. Any suggestions to solve this problem.

Solr: 4.9
Index size:25 GB
Max doc: 40 million
Doc count:29 million

      

thank

+3


source to share


1 answer


I had a similar problem with multiple client and large indexed data.

I have implemented it with version 3.4, creating a separate kernel for the client.

ie One core per client. Kernel creation is a kind of index creation or data splitting just like adding ...

This is where you split large indexed data into different smaller segments.

No matter what happens, it will carry in a smaller indexed segment .. so the response times will be faster ..

I have almost 700 cores built so far and it works fine for me.



At the moment, I have not encountered any kernel management issue ...

I would suggest going with a combination of core and shards ...

This will help you achieve

Allows you to have a different configuration for each kernel with different behavior and will not affect other kernels.

you can perform actions like update, download, etc. on each core differently.

+2


source







All Articles