OpenPipeline – an open-source document processing pipeline

Most commercial search engines include a more or less advanced document processing pipeline for transforming raw input into something that can be indexed. The process involves normalization, entity extraction, linguistic processing, annotation, data cleansing etc. When it comes to Open Source search engines, they start getting pretty good at the core of indexing and search,

Rana vs Wium Lie

Note: Links are to Norwegian sites. In a recnt post on Shahzad Rana’s (Microsoft’s most profiled OOXML promoter in Norway) blog, he comments on Håkon Wium Lie’s (Opera Software’s tech director and profiled standards promoter) wording in a comment to VG TV. Here, Lie introduces the term “Microsoft tax” to explain what happens when ordinary