Disable the need for commit in solr to speed up indexing

I am currently using solr as a search engine. My problem is that I am doing a lot of realtime indexing on the dataset (although the document size is very small, only 100 characters long). I was wondering how I can speed up this by disabling the need to commit, autocomment, etc. Just add it to the index, I'm not too worried about the dataset being too volatile. I am using node js library for indexing in solr. Here's a snippet:

 var doc = {
                id: id.id,
                text_t: id.words
                };

                var callback = function(err, response) {
                    if (err) throw err;
                    solr.commit();

                };
            solr.add(doc, callback);

      

Deletion, solr.commit()

does not index the document (although I thought commit () just saved it to disk)

+3


source to share


2 answers


An upcoming version of Solr will have a soft commit feature that might interest you. A soft commit is similar to a commit, but does not do fsync to ensure that data has been written to disk. This means you can lose data (for example, on a power outage, but not in the case of a Solr crash while the server is running), but a soft commit is likely to be much faster than a normal (hard) commit, since the OS can use the buffer cache. ...



In the current version of Solr, a good compromise would be to use Solr's UpdateHandler commitWithin

function
. For example, by using 10000 as the value for the commitWithin parameter, you ensure that any document is executed no more than 10 seconds after being added to the index, and maintains a commit rate of less than 1 commit every 10 seconds. Lower commitWithin values ​​will provide better data freshness, while higher values ​​will shrink disks less.

+6


source


Like a database transaction, the document will not be added to Solr until it is committed. The problem is that Solr is very expensive, as you may have noticed. Unfortunately, there is currently no way around this, Solr does not work well for realtime searches. A way to improve performance for adding multiple documents is to add them as a batch and copy the entire set of documents once.



Ideally you could use Near Realtime Search , but this is still in development Solr 4.0

+1


source







All Articles