<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
		xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
	xmlns:media="http://search.yahoo.com/mrss/"
>

<channel>
	<title>Cominvent AS - Enterprise search consultants &#187; Document Processing Pipeline</title>
	<atom:link href="http://www.cominvent.com/tag/document-processing-pipeline/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cominvent.com</link>
	<description>Search, and you will find!</description>
	<lastBuildDate>Thu, 26 Jan 2012 12:33:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<copyright>2006-2007 </copyright>
	<managingEditor>ci@cominvent.com (Cominvent AS - Enterprise search consultants)</managingEditor>
	<webMaster>ci@cominvent.com (Cominvent AS - Enterprise search consultants)</webMaster>
	<image>
		<url>http://www.cominvent.com/wp-content/plugins/podpress/images/powered_by_podpress.jpg</url>
		<title>Cominvent AS - Enterprise search consultants</title>
		<link>http://www.cominvent.com</link>
		<width>144</width>
		<height>144</height>
	</image>
	<itunes:subtitle></itunes:subtitle>
	<itunes:summary>Search, and you will find!</itunes:summary>
	<itunes:keywords></itunes:keywords>
	<itunes:category text="Society &#38; Culture" />
	<itunes:author>Cominvent AS - Enterprise search consultants</itunes:author>
	<itunes:owner>
		<itunes:name>Cominvent AS - Enterprise search consultants</itunes:name>
		<itunes:email>ci@cominvent.com</itunes:email>
	</itunes:owner>
	<itunes:block>no</itunes:block>
	<itunes:explicit>no</itunes:explicit>
	<itunes:image href="http://www.cominvent.com/wp-content/plugins/podpress/images/powered_by_podpress_large.jpg" />
		<item>
		<title>OpenPipeline &#8211; an open-source document processing pipeline</title>
		<link>http://www.cominvent.com/2008/05/20/openpipeline-an-open-source-document-processing-pipeline/</link>
		<comments>http://www.cominvent.com/2008/05/20/openpipeline-an-open-source-document-processing-pipeline/#comments</comments>
		<pubDate>Tue, 20 May 2008 13:46:14 +0000</pubDate>
		<dc:creator>janhoy</dc:creator>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Search technology]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Dieselpoint]]></category>
		<category><![CDATA[Document Processing Pipeline]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.cominvent.com/2008/05/20/openpipeline-an-open-source-document-processing-pipeline/</guid>
		<description><![CDATA[Most commercial search engines include a more or less advanced document processing pipeline for transforming raw input into something that can be indexed. The process involves normalization, entity extraction, linguistic processing, annotation, data cleansing etc. When it comes to Open Source search engines, they start getting pretty good at the core of indexing and search, [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.cominvent.com/wp-content/uploads/2008/05/openpipeline_logo.gif" alt="Open Pipeline" align="left" />Most commercial search engines include a more or less advanced document processing pipeline for transforming raw input into something that can be indexed. The process involves normalization, entity extraction, linguistic processing, annotation, data cleansing etc.</p>
<p>When it comes to Open Source search engines, they start getting pretty good at the core of indexing and search, however they typically lack a proper document processing pipeline. When I started looking for such frameworks a few days ago, I came across <a href="http://www.enterprisesearchblog.com/2008/10/reviewing-diese.html" target="_blank">this post</a> announcing that <a href="http://dieselpoint.com/" target="_blank">Dieselpoint</a> just released their own document processing pipeline as open source at <a href="http://www.openpipeline.org/" target="_blank">ww</a><a href="http://www.openpipeline.org/" target="_blank">w.openpipeline.org</a>. I have not yet tried it out but it looks very promising, and could have the potential of being the preferred pipeline for deployments of Apache Solr and other open source engines.</p>
<p>There are also other initiatives like <a href="http://openpipe.berlios.de/" target="_blank">OpenPipe</a> which is similar, which you can read more about in <a onclick="javascript:urchinTracker('/outbound/faces.eti.br/2008/04/03/search-solutions-for-java-platform/');" href="http://faces.eti.br/2008/04/03/search-solutions-for-java-platform/" target="_blank">Rogério Pereira Araújo’s blog</a> about the same subject. I might find time for a comparison later on.</p>
<p>Good luck, Dieselpoint, in contributing to open source. I hope that you will let the OS community really contribute and help adapt and improve this framework going forward.</p>
<p><a href="http://faces.eti.br/2008/04/03/search-solutions-for-java-platform/" target="_blank"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cominvent.com/2008/05/20/openpipeline-an-open-source-document-processing-pipeline/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

