AutoComplete or AutoSuggest has in recent years become a “must-have” search feature. Solr can do AutoComplete in a number of ways (such as Suggester, TermsComponent and Faceting using facet.prefix), but in this post we’ll consider a more advanced and flexible option, namely querying a dedicated Solr Core search index for the suggestions. You may think that this sounds heavy weight, but we’re talking small data here so it is really efficient and snappy! 
Even if it’s some work setting up, the benefits to this approach are really compelling: Read the rest of this entry »
Today a new version of Apache Solr was released, version 3.5.0. Here’s the release statement from the Lucene PMC:
| The Lucene PMC is pleased to announce the release of Apache Solr 3.5.0!
See the CHANGES.txt file included with the release for a full list of details.
Solr 3.5.0 Release Highlights:
- Bug fixes and improvements from Apache Lucene 3.5.0, including a very substantial (3-5X) RAM reduction required to hold the terms index on opening an IndexReader. (LUCENE-2205)
- Added support for distributed result grouping. (SOLR-2066, SOLR-2776)
- Added support for Hunspell stemmer TokenFilter supporting stemming for 99 languages. (SOLR-2769)
- A new contrib module “langid” adds language identification capabilities as an Update Processor, using Tika’s LanguageIdentifier or Cybozu language-detection library (SOLR-1979)
- Numeric types including Trie and date types now support sortMissingFirst/Last. (SOLR-2881)
- Added hl.q parameter. It is optional and if it is specified, it overrides q parameter in Highlighter. (SOLR-1926)
- Several minor bugfixes like date parsing for years from 0001-1000, ignored configurations when using QueryAnalyzer with SpellCheckComponent and many more. See CHANGES.txt entries for full details.
|
Contributions from Cominvent include LanguageIdentifier, Plugging in Hunspell stemmer in Solr and SOLR-2742 which makes commitWithin more accessible through the SolrJ APIs. Also, Apache Tika is upgraded to version 0.10, fixing several bugs in parsing PDFs and Office documents.
You may have been using Apache Solr for some time, and you all know that you have to do a <commit/> in order for the <add>ed content to become indexed. But what commit strategy should you choose? Many rely on the explicit commit from the client, or perhaps AutoCommit in solrconfig.xml. Explicit commits leaves all the responsibility to the client and you soon end up with too frequent/unnecessary commits (causing resource waste) or too few commits.
Sure, we have AutoCommit, where clients don’t need to think about committing, but then it gets less flexible; What if you sometimes want to index in larger batches, while other times you need low latency?
Discover CommitWithin! CommitWithin is a commit strategy introduced in Solr 1.4, which lets the client ask Solr to make sure this <add> request gets committed within a certain time. This leaves the control of when to do the commit to Solr itself, optimizing number of commits to a minimum while still fulfilling the update latency requirements. If I send an <add commitWithin=10000> (in an XML update), that tells Solr to make sure the document gets committed within 10000ms, i.e. 10s. You can then continue to add other documents, and Solr will automatically do a <commit> when the oldest <add> is due.
Read the rest of this entry »
After upgrading to Lion this week I got several issues, even if I’m using 10.7.1. I thought I’d share them – and their solutions with you.
Spinning beachball at login screen
I bought a new SSD disk and performed a clean install, just to start from scratch. But even before restoring any of my old settings, I got an issue with spinning beach-ball on the login screen before I could log in. Sometimes it also went straight to “bluescreen” telling me to restart.
The solution was found here, in short you need to login quickly before the lockup, then open Energy Saving preferences and disable automatic graphics switching. It solved the issue for me.. Read the rest of this entry »
The Apache way of developing open source software relies on an active community of users, contributors and developers. All of us can contribute in some way or another. Being a committer means that you participate actively in the software development work and have write access to the source code repository. Each project is lead by a the PMC (Project Management Committee) which consists of some of the committers taking an extra responsibility of staking out the future of the project. Read the rest of this entry »