Archive for May, 2008

Test-driving Apache SOLR (part 1)

Friday, May 2nd, 2008

SOLR Logo white bgSome of you read my previous posts The state of open source search. I will in this post go through the process of downloading, installing, configuring and using Apache SOLR to index some sample XML data and search it.

This is the first post in a series, where each new post will explore some new feature. We will simply follow the tutorial to get SOLR up and running locally.

We start by visiting the SOLR Tutorial for the first steps, and simply get the app running:

  1. Check Java version, download SOLR (I chose this file), unpack it, and cd to apache-solr-1.2.0/example
  2. java -jar start.jar
  3. Visit http://localhost:8983/solr/admin/ to see the admin page with a simple test GUI
  4. Try hit the search button, you get the XML response back with 0 results
  5. Now try indexing some content. CD to example/exampledocs
  6. java -jar post.jar solr.xml monitor.xml
  7. Now try searching again and see that you get some hits!

For part two we will get SOLR running in a Tomcat instance, customize the schema a bit and present search results in a custom web page using the client API.

Rana vs Wium Lie

Friday, May 2nd, 2008

Note: Links are to Norwegian sites.

Shahzad RanaHåkon Wium LieIn a recnt post on Shahzad Rana’s (Microsoft’s most profiled OOXML promoter in Norway) blog, he comments on Håkon Wium Lie’s (Opera Software’s tech director and profiled standards promoter) wording in a comment to VG TV. Here, Lie introduces the term “Microsoft tax” to explain what happens when ordinary people feel forced to purchase MS-Office to read documents from the government or their kid’s scool. Lie says that the consequence of widespread use of Microsoft Office’s new document format OOXML, could be many more years of vendor lock-in since OOXML allows arbitrary non-standardized, non-open extensions. An example is if a parent recives a document from her kid’s teacher, which contains an Equation binary object, which is not part of the OOXML specification, and thus cannot possibly be implemented by other office packages wanting to support OOXML.

Futher, Rana asks Lie to produce some evidence of an OOXML document from a teacher to a student or parent that is only readable on Windows and MS Office, whereby Lie refers to an OOXML document that Rana himself had sent by email. A funny thing here is that Rana had to rename the .docx file as .doc to be able to upload it to WordPress. This caused a lot of trouble for the users, thus examplifying even stronger what kind of trouble the new format would cause for ordinary people. Rana should of course have zipped the file, or better, modified WordPress to accept .docx files for upload. But a MS supporter is probably not used to the idea of freely being able to modify ones own GPL software :-)