Archive for the ‘Open Source’ Category

Cominvent AS provides professional support for Lucene/Solr

Wednesday, March 4th, 2009

So you are thinking of adopting open source search Solr or Lucene, but are reluctant because of the lack of support fom the open source world?

Or perhaps you are already using Apache Solr or Lucene in your organization, and would like to have an expert partner to support you with your current solution and to help you refine your search solution to better fit your needs and to give better results and performance.

We at Cominvent AS are experts in search and experts in Apache Solr and Lucene. But being a small organization we have not offered support contracts until now. We are now pleased to announce that Cominvent AS, through our partnership with Lucid Imagination, can offer you commercial support, consulting and training. Lucid Imagination was the first commercial entity to offer professional paid support for the Apache Lucene and Solr products, and some of the most skilled coders and engineers are associated with them.

Please contact us for a talk about your needs.

Apache Solr has become grown-up

Wednesday, March 4th, 2009

The open source search server Solr from Apache Foundation has become a mature technology ready for prime-time.

The recent editions has added features which previously were only found in commercial offerings, such as

  • Automatic replication for large installations with distributed search
  • Java-API (SolrJ)
  • Conversion of Office-documents
  • Full faceted search
  • Advanced tokenization, highlighting and stemming

Apache Solr is being adopted more widely, and some companies even start replacing their expensive commercial engine with Solr with good results. In that way they can spend less on licenses and more on content quality and tuning.

Feeling ready to try Solr? Contact us for a talk, or download it yourself and try the tutorial. Here’s a short video introducing you to the basics:

I will soon write about how to obtain professional support for Apache Solr.

Stallman in Oslo – Free as in “Fanatic”?

Monday, February 23rd, 2009

stallman_osloThe legendary founder of Free Software Foundation (FSF), and initiator of GNU (Gnu is Not Unix), GPL and more, visited Oslo today. I attended his speech about Copyright at the University of Oslo, where a few hundred people were gathered to hear (and see) this strange looking and strange speaking man.

This is indeed one piece of a strange person. When being told that “the floor is yours”, he knelt down on the floor repeating “the floor is mine!”, and later in the show he dressed in black, comparing the Emacs text editor to a religion.

At the end of the lecture, that was indeed the impression I got, that for mr. Stallman, this is indeed all about “religion” and ideology, more than anything else. He is living in his own little ideal, ideological bubble, hoping that all big coroporations will go away and that everyone will be able to copy everything to everyone, and that nobody should be allowed to make money on software licenses.

Stallman’s problem is not that he advocates free software and the benefits of the whole movement, but that he is so completely ignorant of the real world around him – not willing to see that there are other goals in this world than fulfilling his own four degrees of (complete) freedom of use and redistribution when it comes to software (and other works).

Now a few words about the topic of his speech – Copyright. Stallman has designed his own copyright laws which he’d like the world to adopt. Basically what he suggests is a division of copyright into three: A) All software should be free (as in free speech), as should all school textbooks and all encyclopedias and other fact books. B) Works that express someone’s thought or in other ways must be kept as is, should be covered by a limited copyright, say 10 years, and C) works of art should be protected in a limited way – it should be always be allowed to share them with others (copy), and even use fragments of other’s art in your own works.

All this is good – international copyright laws could need a makeover, and limited period of protection. Where I no longer follow is when mr. Stallman talks about copying or “sharing” of e.g. music or movies – he completely lacks respect for legal agreements or law, in encouraging the breaking of such agreements in order to “be kind” and share with your friends. A normal grown-up intellectual human being would not face hundreds of students encouraging such crime, and at the same time demanding to be taken seriously by the record labels, commercial software houses, lawyers and others. Which is unfortunate, because he would be sooo much a better advocat for free and open software if he would not live in his bubble pretending the world was all as he wished. It is ok to encourage people to choose, use and even write free software. But to encourage people to break the law by infringing copyright and license laws, is not the way to go.

Free and Open Source software is a super way to make software. And it can meet the competition fine without turning to legal disobedience, hatred against others etc. Let people choose FREEly what software to use, and spend your time writing excellent Free software which is better than the alternatives, and create an ecosystem of open and closed source which works together to create better and cheaper software for tomorrow!

OpenPipeline – an open-source document processing pipeline

Tuesday, May 20th, 2008

Open PipelineMost commercial search engines include a more or less advanced document processing pipeline for transforming raw input into something that can be indexed. The process involves normalization, entity extraction, linguistic processing, annotation, data cleansing etc.

When it comes to Open Source search engines, they start getting pretty good at the core of indexing and search, however they typically lack a proper document processing pipeline. When I started looking for such frameworks a few days ago, I came across this post announcing that Dieselpoint just released their own document processing pipeline as open source at www.openpipeline.org. I have not yet tried it out but it looks very promising, and could have the potential of being the preferred pipeline for deployments of Apache Solr and other open source engines.

There are also other initiatives like OpenPipe which is similar, which you can read more about in Rogério Pereira Araújo’s blog about the same subject. I might find time for a comparison later on.

Good luck, Dieselpoint, in contributing to open source. I hope that you will let the OS community really contribute and help adapt and improve this framework going forward.

Test-driving Apache SOLR (part 1)

Friday, May 2nd, 2008

SOLR Logo white bgSome of you read my previous posts The state of open source search. I will in this post go through the process of downloading, installing, configuring and using Apache SOLR to index some sample XML data and search it.

This is the first post in a series, where each new post will explore some new feature. We will simply follow the tutorial to get SOLR up and running locally.

We start by visiting the SOLR Tutorial for the first steps, and simply get the app running:

  1. Check Java version, download SOLR (I chose this file), unpack it, and cd to apache-solr-1.2.0/example
  2. java -jar start.jar
  3. Visit http://localhost:8983/solr/admin/ to see the admin page with a simple test GUI
  4. Try hit the search button, you get the XML response back with 0 results
  5. Now try indexing some content. CD to example/exampledocs
  6. java -jar post.jar solr.xml monitor.xml
  7. Now try searching again and see that you get some hits!

For part two we will get SOLR running in a Tomcat instance, customize the schema a bit and present search results in a custom web page using the client API.