Visualization of Lucene segment merges

Lucene guru Mike McCandless just released on his blog an impressive piece of work visualizing how Lucene MergePolicy really works through a series of YouTube videos. He feeds Solr with a 10Gb Wikipedia dump and also some random add/delete data source, and then records every single segment written and merged during the whole process.

Mike also introduces a cool new merge policy called TieredMergePolicy (LUCENE-854) which is much smarter and slightly more efficient than the default one. Hope this becomes the new default merge policy in Solr.

Posted in Search technology | Leave a comment

Live Solr chat support?

Ever needed an urgent answer to some Solr/Lucene question? Haven’t got a support contract and someone to call yet?

I assume you’re already on the mailing lists and know about that channel. But wiat many people do not know is that the Solr/Lucene community also hosts a live chat where you can get help real quickly, as professionals from around the globe participate. The magic is made possible by an old technology called IRC (but IRC was not dead), and the server is and channels are #solr and #lucene.

There are multiple IRC clients available, which is probably the best option if you’re going to be very active. But to get started here and now, Cominvent as set up a web-based IRC chat page which opens the channels #solr and #lucene for you automatically:

Go directly to the Solr Chat by clicking the link or the thumbnail.

Leave a comment

Turbulent Java times

Times have been turbulent in the Java camp since Oracle took over Sun, trying to make Java a less open specification. Well, now the Apache Software Foundation has made real their promise to leave the JCP EC if Oracle continued its ego-play with Java, not listening to the community.

But what’s the future now for Java and as importantly all the Open Source projects based on Java? much will depend on Oracle’s own actions in the next months. Personally I hope that their bullying around will start to hurt so much in their brand perception and customer satisfaction polls, that they desperately see the need for a new Open Source strategy, cooperating with the developers instead of fighting them.

However, by the time Ellison & co lands on this decision, I fear that it will already be too late to unify Java. The majority of the Java development community including Google and hopefully IBM, will have laid out a plan to revive the Java comminity on their own.

In his very interesting blog post “The case for a new Apache/Google “Java”“, Sola plays with one scenario where a new Java-like programming language based on Harmony takes over the whole eco-system, and that ASF deprecates the Java versions of all projects. Wow, drastic move but it could maybe work?

Posted in In the news | Leave a comment

The Solr distros are coming

Open Source Search is gaining more and more traction. First you had Lucene (2001), giving great search for programmers. Then we got Solr (2006) making search accessible for non programmers, but a certain level of expertise is still needed. And then came Constellio, an open source (GPL) enterprise search distribution (distro) built on Solr, adding a slick GUI, connector and crawling support and more.

Say again. A Solr distro?

I call it “distro” because I like to compare the evolution to what we have seen in GNU/Linux. First there was the Linux core. Then there was the GNU tools that made Linux so much more usable but still only for engineers comfortable with the command line. And last, companies like RedHat and Suse built complete distros including modern GUI, ready-to use tools such as OpenOffice, Thunderbird and more. Without these distros, Linux would just have been a “core” leaving to the user to add the extra sugar. Continue reading

Posted in Open Source, Search technology, Solr distros, Technology, Trends | Tagged | Leave a comment

The first real FAST Search book

Book cover © Amazon & Wrox

Over due by several years, Wrox just published a book about Microsoft Enterprise Search, including the different FAST flavours. Bravo!

You can ask how all the users of FAST technology could have managed for so many years without some public source of learning the products. Up until now FAST/MS and their partners have been the sole source of learning FAST Search [1]. Now, we’re part of that eco-system and may have profited on the lack of material available, but that’s another story. Continue reading

Posted in Search technology, Technology | Tagged | Leave a comment