Discover CommitWithin in Solr

You may have been using Apache Solr for some time, and you all know that you have to do a <commit/> in order for the <add>ed content to become indexed. But what commit strategy should you choose? Many rely on the explicit commit from the client, or perhaps AutoCommit in solrconfig.xml. Explicit commits leaves all the responsibility to the client and you soon end up with too frequent/unnecessary commits (causing resource waste) or too few commits.

Sure, we have AutoCommit, where clients don’t need to think about committing, but then it gets less flexible; What if you sometimes want to index in larger batches, while other times you need low latency?

Discover CommitWithin! CommitWithin is a commit strategy introduced in Solr 1.4, which lets the client ask Solr to make sure this <add> request gets committed within a certain time. This leaves the control of when to do the commit to Solr itself, optimizing number of commits to a minimum while still fulfilling the update latency requirements. If I send an <add commitWithin=10000> (in an XML update), that tells Solr to make sure the document gets committed within 10000ms, i.e. 10s. You can then continue to add other documents, and Solr will automatically do a <commit> when the oldest <add> is due.

First, CommitWithin was only possible to specify when using XML updates, on the <add> parameter. Later we got the same support in the JSON updates, and as HTTP parameter for Binary and JSON handlers. You can also specify commitWithin through SolrJ, but you have to write four lines of code instead of the one line required to do a plain add():

    UpdateRequest req = new UpdateRequest();
    req.add(mySolrInputDocument);
    req.setCommitWithin(10000);
    req.process(server);

vs:

    server.add(mySolrInputDocument);

I decided to try fixing the situation, making this highly desired feature more visible and easier to use from all kind of clients and Request Handlers. This effort resulted in a new Wiki page describing the feature, and several JIRA issues. One already made it into Solr 3.4 which is on its way out the door these days:

SOLR-2540: CommitWithin as an Update Request parameter. (Solr3.4) This patch introduces the commitWithin request parameter in XML, CSV and Extracting request handlers, letting you specify on the URL itself whether this content is time critical or not. This enables clients like ManifoldCF to prioritize certain data sources or crawls over other, simply by adding a request parameter on the updates (CONNECTORS-202).

The others are scheduled for 3.5 – whenever that will be released. If you need any of these today, add them as patches and build your own version of Solr:

SOLR-2742Add commitWithin to SolrServer’s add methods. This patch adds another optional commitWithinMs parameter to all the add() methods of SolrJ’s SolrServer, making it fast and much more intuitive for developers to use this feature.

SOLR-2280commitWithin ignored for a delete query. This patch makes it possible to also specify commitWithin for <delete>s, which makes sense since it is just as important for many applications to quickly commit deletes as it is for adds

This entry was posted in Technology. Bookmark the permalink.