Solr - Reindex Recommended Lot Size

I just installed Solr in my Rails application (using sunspot).

I want solr to re-index a couple of columns in one of my tables, the tables are quite large (~ 50M records).

What is the recommended batch size? I am currently using 1000 and has been working throughout the day.

Any ideas?

+3


source to share


1 answer


The batch size is not that important, 1000 is probably OK, although I wouldn't get bigger than that. It depends on the size of the documents, the number of bytes of text for each of them.

Do you perform after each party? It can be slow. I am loading a 23M document index with one commit at the end. The documents are small, metadata for books, and it takes about 90 minutes. To get this speed, I needed to use a single SQL query to load. Using any subqueries resulted in a 10X slowdown.



I am using the JDBC support in DataInputHandler, although I can go for some custom code that makes a DB query and sends batches.

I have heard that the CSV input handler is very efficient, so it can work to dump the data to CSV and then load it using that handler.

+2


source







All Articles