Solr 1.4 with nice improvements

The Apache Solr 1.3 search server is very capable and stable, so there is nothing that should keep you from deploying it just because a new version is around the corner (within a few months that is). Upgrades should be smooth as well.

So what are some of the improvements in 1.4?

Apache Tika integration

One of the more useful news for those doing intranet search is the integration with Tika (or Solr Cell), which is a document parsing component which can read MS Office formats, PDF and more.

Improved performance

You will get improved faceting speed for free, and with a small change in the schema, you will also benefit from greatly improved integer (and float, date) range searches. This is due to a smarter internal implementation, so instead of expanding an integer range into an OR of ALL discrete string values in the range. Say you want to search the range 0-1234. In v1.3 that would expand into an OR of (0, 1, 2, 3, 4…1229, 1230, 1231, 1232, 1233, 1234). For simplicity lets pretend the implementation split along 10-decimalsthe range prefixed th for thousand, hu for hundred, te for ten, and substitute “full” ranges, then the new OR will be (th0, hu0, hu1, te0, te1, te2, 1230, 1231, 1232, 1233, 1234), thereby reducing number of terms from thousands to tens.

Easier index replication

Replicating your index for large systems has not been for the amateur so far, needing to setup all nodes manually (no installer support) as well as scripts for rsyncing index to slaves. In 1.4 the default index replication is by Java process-to-process, so it is easier to setup, and will also work on Windows!

And more

There are also more news, such as Lucene improvements which Solr also benefits from. Read more here.

So look out for Solr 1.4 on a website near you…

Cominvent AS - Enterprise search consultants

Search, and you will find!

Comments (1)

Leave a comment