Archive for the ‘Open Source’ Category

Test-driving Apache SOLR (part 1)

Friday, May 2nd, 2008

SOLR Logo white bgSome of you read my previous posts The state of open source search. I will in this post go through the process of downloading, installing, configuring and using Apache SOLR to index some sample XML data and search it.

This is the first post in a series, where each new post will explore some new feature. We will simply follow the tutorial to get SOLR up and running locally.

We start by visiting the SOLR Tutorial for the first steps, and simply get the app running:

  1. Check Java version, download SOLR (I chose this file), unpack it, and cd to apache-solr-1.2.0/example
  2. java -jar start.jar
  3. Visit http://localhost:8983/solr/admin/ to see the admin page with a simple test GUI
  4. Try hit the search button, you get the XML response back with 0 results
  5. Now try indexing some content. CD to example/exampledocs
  6. java -jar post.jar solr.xml monitor.xml
  7. Now try searching again and see that you get some hits!

For part two we will get SOLR running in a Tomcat instance, customize the schema a bit and present search results in a custom web page using the client API.

Rana vs Wium Lie

Friday, May 2nd, 2008

Note: Links are to Norwegian sites.

Shahzad RanaHåkon Wium LieIn a recnt post on Shahzad Rana’s (Microsoft’s most profiled OOXML promoter in Norway) blog, he comments on Håkon Wium Lie’s (Opera Software’s tech director and profiled standards promoter) wording in a comment to VG TV. Here, Lie introduces the term “Microsoft tax” to explain what happens when ordinary people feel forced to purchase MS-Office to read documents from the government or their kid’s scool. Lie says that the consequence of widespread use of Microsoft Office’s new document format OOXML, could be many more years of vendor lock-in since OOXML allows arbitrary non-standardized, non-open extensions. An example is if a parent recives a document from her kid’s teacher, which contains an Equation binary object, which is not part of the OOXML specification, and thus cannot possibly be implemented by other office packages wanting to support OOXML.

Futher, Rana asks Lie to produce some evidence of an OOXML document from a teacher to a student or parent that is only readable on Windows and MS Office, whereby Lie refers to an OOXML document that Rana himself had sent by email. A funny thing here is that Rana had to rename the .docx file as .doc to be able to upload it to WordPress. This caused a lot of trouble for the users, thus examplifying even stronger what kind of trouble the new format would cause for ordinary people. Rana should of course have zipped the file, or better, modified WordPress to accept .docx files for upload. But a MS supporter is probably not used to the idea of freely being able to modify ones own GPL software :-)

Norweigan search portal Sesam.no releases middleware as GPL

Sunday, March 16th, 2008

Sesam logoIn this blog post, Sesam annonces that their middleware architecture, Sesam Search Application Toolkit (SESAT) is released as open source software. This is the piece of software (written in Java) which sits between the portal (such as sesam.no) and the data sources (such as FAST ESP, Yahoo! or a database) and dispatches in parallel a single user query into multiple underlying requests and returns everything according to business rules. This is often referred to as federated search.

Here’s Sesam’s own description of the software:

“SESAT is search middleware and a search portal framework. SESAT enables a single user query to be dispatched to multiple information sources. The result is analysed, weighted and presented to the user according to configurable business rules.”

Congratulations with contributing to Open Source, Sesam! And good luck with creating a community around this important piece of middleware, we’ll see more and more demand for it in the future!

Now, go check it out on http://sesat.no/ if this is something that can be useful to you!

PS: Learn more about other federated search solutions at the federatedsearchblog.com