Discover CommitWithin in Solr

You may have been using Apache Solr for some time, and you all know that you have to do a <commit/> in order for the <add>ed content to become indexed. But what commit strategy should you choose? Many rely on the explicit commit from the client, or perhaps AutoCommit in solrconfig.xml. Explicit commits leaves all the responsibility to the client and you soon end up with too frequent/unnecessary commits (causing resource waste) or too few commits.

Sure, we have AutoCommit, where clients don’t need to think about committing, but then it gets less flexible; What if you sometimes want to index in larger batches, while other times you need low latency?

Discover CommitWithin! CommitWithin is a commit strategy introduced in Solr 1.4, which lets the client ask Solr to make sure this <add> request gets committed within a certain time. This leaves the control of when to do the commit to Solr itself, optimizing number of commits to a minimum while still fulfilling the update latency requirements. If I send an <add commitWithin=10000> (in an XML update), that tells Solr to make sure the document gets committed within 10000ms, i.e. 10s. You can then continue to add other documents, and Solr will automatically do a <commit> when the oldest <add> is due.

Solr architecture diagram

We at Cominvent have often had the need to visualize the internal architecture of Apache Solr in order to explain both the relationships of the components as well as the flow of data and queries. The result is this conceptual architecture diagram, clearly showing how Solr relates to the app-server, how cores relate to a