<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
		xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
	xmlns:media="http://search.yahoo.com/mrss/"
>

<channel>
	<title>Cominvent AS - Enterprise search consultants &#187; Search technology</title>
	<atom:link href="http://www.cominvent.com/category/enterprise-search-related-articles/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cominvent.com</link>
	<description>Search, and you will find!</description>
	<lastBuildDate>Thu, 26 Jan 2012 12:33:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<copyright>2006-2007 </copyright>
	<managingEditor>ci@cominvent.com (Cominvent AS - Enterprise search consultants)</managingEditor>
	<webMaster>ci@cominvent.com (Cominvent AS - Enterprise search consultants)</webMaster>
	<image>
		<url>http://www.cominvent.com/wp-content/plugins/podpress/images/powered_by_podpress.jpg</url>
		<title>Cominvent AS - Enterprise search consultants</title>
		<link>http://www.cominvent.com</link>
		<width>144</width>
		<height>144</height>
	</image>
	<itunes:subtitle></itunes:subtitle>
	<itunes:summary>Search, and you will find!</itunes:summary>
	<itunes:keywords></itunes:keywords>
	<itunes:category text="Society &#38; Culture" />
	<itunes:author>Cominvent AS - Enterprise search consultants</itunes:author>
	<itunes:owner>
		<itunes:name>Cominvent AS - Enterprise search consultants</itunes:name>
		<itunes:email>ci@cominvent.com</itunes:email>
	</itunes:owner>
	<itunes:block>no</itunes:block>
	<itunes:explicit>no</itunes:explicit>
	<itunes:image href="http://www.cominvent.com/wp-content/plugins/podpress/images/powered_by_podpress_large.jpg" />
		<item>
		<title>Solr 3.5 released</title>
		<link>http://www.cominvent.com/2011/11/27/solr-3-5-released/</link>
		<comments>http://www.cominvent.com/2011/11/27/solr-3-5-released/#comments</comments>
		<pubDate>Sat, 26 Nov 2011 23:32:15 +0000</pubDate>
		<dc:creator>janhoy</dc:creator>
				<category><![CDATA[Search technology]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Solr 3.5]]></category>

		<guid isPermaLink="false">http://www.cominvent.com/?p=569</guid>
		<description><![CDATA[Today a new version of Apache Solr was released, version 3.5.0. Here&#8217;s the release statement from the Lucene PMC: The Lucene PMC is pleased to announce the release of Apache Solr 3.5.0! See the CHANGES.txt file included with the release for a full list of details. Solr 3.5.0 Release Highlights: Bug fixes and improvements from Apache [...]]]></description>
			<content:encoded><![CDATA[<p>Today a new version of Apache Solr was released, version 3.5.0. Here&#8217;s the release statement from the Lucene PMC:</p>
<table width="80%">
<tbody>
<tr bgcolor="#CCCCCC">
<td>The Lucene PMC is pleased to announce the release of <a href="http://www.apache.org/dyn/closer.cgi/lucene/solr">Apache Solr 3.5.0</a>!</p>
<p>See the <a href="http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_5/solr/CHANGES.txt?view=co">CHANGES.txt</a> file included with the release for a full list of details.</p>
<p>Solr 3.5.0 Release Highlights:</p>
<ul>
<li>Bug fixes and improvements from Apache Lucene 3.5.0, including a very substantial (3-5X) RAM reduction required to hold the terms index on opening an IndexReader. (<a href="https://issues.apache.org/jira/browse/LUCENE-2205">LUCENE-2205</a>)</li>
<li>Added support for distributed result grouping. (<a href="https://issues.apache.org/jira/browse/SOLR-2066">SOLR-2066</a>, <a href="https://issues.apache.org/jira/browse/SOLR-2776">SOLR-2776</a>)</li>
<li>Added support for Hunspell stemmer TokenFilter supporting stemming for 99 languages. (<a href="https://issues.apache.org/jira/browse/SOLR-2769">SOLR-2769</a>)</li>
<li>A new contrib module &#8220;langid&#8221; adds language identification capabilities as an Update Processor, using Tika&#8217;s LanguageIdentifier or Cybozu language-detection library (<a href="https://issues.apache.org/jira/browse/SOLR-1979">SOLR-1979</a>)</li>
<li>Numeric types including Trie and date types now support sortMissingFirst/Last. (<a href="https://issues.apache.org/jira/browse/SOLR-2881">SOLR-2881</a>)</li>
<li>Added hl.q parameter. It is optional and if it is specified, it overrides q parameter in Highlighter. (<a href="https://issues.apache.org/jira/browse/SOLR-1926">SOLR-1926</a>)</li>
<li>Several minor bugfixes like date parsing for years from 0001-1000, ignored configurations when using QueryAnalyzer with SpellCheckComponent and many more. See CHANGES.txt entries for full details.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Contributions from Cominvent include LanguageIdentifier, Plugging in Hunspell stemmer in Solr and <a href="https://issues.apache.org/jira/browse/SOLR-2742" target="_blank">SOLR-2742</a> which makes commitWithin more accessible through the SolrJ APIs. Also, Apache Tika is upgraded to version 0.10, fixing several bugs in parsing PDFs and Office documents.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cominvent.com/2011/11/27/solr-3-5-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Becoming a committer</title>
		<link>http://www.cominvent.com/2011/06/16/becoming-a-committer/</link>
		<comments>http://www.cominvent.com/2011/06/16/becoming-a-committer/#comments</comments>
		<pubDate>Thu, 16 Jun 2011 22:26:43 +0000</pubDate>
		<dc:creator>janhoy</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Search technology]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[committer]]></category>
		<category><![CDATA[lucene]]></category>

		<guid isPermaLink="false">http://www.cominvent.com/?p=528</guid>
		<description><![CDATA[The Apache way of developing open source software relies on an active community of users, contributors and developers. All of us can contribute in some way or another. Being a committer means that you participate actively in the software development work and have write access to the source code repository. Each project is lead by [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cominvent.com/wp-content/uploads/2011/06/apache.jpeg"><img class="alignright size-full wp-image-529" title="apache" src="http://www.cominvent.com/wp-content/uploads/2011/06/apache.jpeg" alt="" width="284" height="85" /></a>The Apache way of developing open source software relies on an active community of users, contributors and developers. All of us can contribute in some way or another. Being a <a href="http://en.wikipedia.org/wiki/Committer">committer</a> means that you participate actively in the software development work and have write access to the source code repository. Each project is lead by a the PMC (Project Management Committee) which consists of some of the committers taking an extra responsibility of staking out the future of the project.<span id="more-528"></span></p>
<p>I&#8217;ve been actively participating in the Lucene/Solr community for a few years: Answering questions on the solr-user and dev mailing lists, reporting bugs, uploading <a href="http://en.wikipedia.org/wiki/Patch_(computing)">patches</a> etc. This week I was invited by the PMC as a committer on the Lucene/Solr project, joining the twenty-something other existing committers. I&#8217;m honored of this invite, which also shows that the ASF works as a true <a href="http://en.wikipedia.org/wiki/Meritocracy#Open_Source">meritocracy</a>, those who show persistent contribution over time will be given more responsibility.</p>
<p>Looking forward to my first commit and many more to come, improving an already great code base. If you have a JIRA issue you&#8217;d like me to work on, please suggest one in the comment field!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cominvent.com/2011/06/16/becoming-a-committer/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Apache Solr 3.1 released</title>
		<link>http://www.cominvent.com/2011/04/01/apache-solr-3-1-released/</link>
		<comments>http://www.cominvent.com/2011/04/01/apache-solr-3-1-released/#comments</comments>
		<pubDate>Fri, 01 Apr 2011 11:09:15 +0000</pubDate>
		<dc:creator>janhoy</dc:creator>
				<category><![CDATA[Search technology]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[edismax]]></category>
		<category><![CDATA[function queries]]></category>
		<category><![CDATA[GEO]]></category>
		<category><![CDATA[geospatial]]></category>
		<category><![CDATA[range facets]]></category>
		<category><![CDATA[Solr 3.1]]></category>
		<category><![CDATA[velocity]]></category>

		<guid isPermaLink="false">http://www.cominvent.com/?p=493</guid>
		<description><![CDATA[It&#8217;s been a long wait, and now it&#8217;s here &#8211; the release of Solr version 3.1. The 1.4.1 release was in June 2010, and for various reasons there was never a 1.4.2 nor a 1.5 release. Part of the reason is the merge of Lucene and Solr codebase which is also why the version number [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cominvent.com/wp-content/uploads/2010/07/Apache-Solr-3.1-Product-Sheet-Cominvent-AS.pdf"><img class="alignright size-full wp-image-340" style="margin-left: 10px;" title="Apache Solr product sheet thumbnail" src="http://www.cominvent.com/wp-content/uploads/2010/07/Apache-Solr-product-sheet-thumbnail1.png" alt="" width="171" height="241" /></a>It&#8217;s been a long wait, and now it&#8217;s here &#8211; the <a href="http://lucene.apache.org/solr/#March+2011+-+Solr+3.1+Released" target="_blank">release of Solr version 3.1</a>. The 1.4.1 release was in June 2010, and for various reasons there was never a 1.4.2 nor a 1.5 release. Part of the reason is the merge of Lucene and Solr codebase which is also why the version number is 3.1 instead of 1.5.</p>
<p>So what&#8217;s new? For me, the single most important features are the Extended Dismax parser (<a href="https://issues.apache.org/jira/browse/SOLR-1553" target="_blank">SOLR-1553</a>) and Geospatial search. The full list of improvements is found in <a href="http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_1/solr/CHANGES.txt?view=co" target="_blank">CHANGES.TXT</a>, but here are my favorites:</p>
<p><span id="more-493"></span><span style="font-size: 15px; font-weight: bold;">eDisMax query parser</span></p>
<p>This new parser builds on disMax, and allows full Lucene query syntax, including support for wildcard and fuzzy searches. It also improves on stopword handling and gives you new pf2 and pf3 parameters for bi- and tri-gram phrase boosts. Your new favourite query parser. Enable with defType=edismax</p>
<h3>GEO-spatial querying</h3>
<p>Have lat/lon info for your data? Want to sort or boost results based on distance from a point? Want to filter results within a certain radius? You can do this with the new built-in GEO support. Filtering for all results within 5km is done as simple as: <em>&amp;fq={!geofilt pt=45.15,-93.85 sfield=store d=5}</em>. More info at the <a href="http://wiki.apache.org/solr/SpatialSearch" target="_blank">Solr Wiki</a>.</p>
<h3>Better prototyping GUI</h3>
<p>The built-in Velocity-based prototyping GUI is improved, and will now work with faceting, GEO, did you mean etc. It&#8217;s my favourite place to start mocking up a search UI. It now lives at /solr/browse instead of /solr/itas, and has a nicer color scheme. It&#8217;s still not perfect, and my patches to make range faceting dynamic (<a href="https://issues.apache.org/jira/browse/SOLR-2383" target="_blank">SOLR-2383</a>) and to add ability to see all fields (<a href="https://issues.apache.org/jira/browse/SOLR-2384" target="_blank">SOLR-2384</a>) did not make it in time for the code freeze, but you may add them yourself.</p>
<h3>Sort by function</h3>
<p>The sort feature now accepts a Function Query as input, meaning you can do stuff like &amp;sort=sum(a,b) asc. This generic feature is also what is used to sort on distance, since GEO distance can be expressed as a function.</p>
<h3>Numeric range facets</h3>
<p>Earlier we had date range facets, but no numeric range facet. With the new range facet you can do both numeric and date range faceting with the same feature, specifying the gap. E.g. &amp;facet.range=price&amp;facet.range.gap=50 will bring you back a price facet automatically split in ranges of 0-50, 50-100, 100-150&#8230; Plans are under way to make range facets even more flexible by allowing unequally spaced gaps, but that will be in a later release (see <a href="https://issues.apache.org/jira/browse/SOLR-2366" target="_blank">SOLR-2366</a>)</p>
<h2>Take action</h2>
<p>So what are you waiting for? Go and <a href="http://apache.uib.no/lucene/solr/3.1.0/" target="_blank">download a fresh copy</a> and check it out!</p>
<p>Also, grab a copy of Cominvent&#8217;s <a href="http://www.cominvent.com/wp-content/uploads/2010/07/Apache-Solr-3.1-Product-Sheet-Cominvent-AS.pdf">Solr Product Sheet</a>, giving a quick functional overview of Solr which even you boss understands <img src='http://www.cominvent.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Cominvent have updated our training offering to cover all the new features. Check it out over at <a href="http://www.solrtraining.com/">www.solrtraining.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cominvent.com/2011/04/01/apache-solr-3-1-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Apache Solr 3.1 Product Sheet</title>
		<link>http://www.cominvent.com/2011/04/01/apache-solr-3-1-product-sheet/</link>
		<comments>http://www.cominvent.com/2011/04/01/apache-solr-3-1-product-sheet/#comments</comments>
		<pubDate>Fri, 01 Apr 2011 09:30:48 +0000</pubDate>
		<dc:creator>janhoy</dc:creator>
				<category><![CDATA[Search technology]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[product sheet]]></category>
		<category><![CDATA[produktark]]></category>
		<category><![CDATA[Solr 3.1]]></category>

		<guid isPermaLink="false">http://www.cominvent.com/?p=325</guid>
		<description><![CDATA[The brand new version 3.1 of Apache Solr was released yesterday. We have created a 2-page Apache Solr product sheet, which very briefly (and beautifully) describes the high-level features of the popular search engine, including links for downloading and getting started. Use it to explain to business persons and decision makers what open source search [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cominvent.com/wp-content/uploads/2010/07/Apache-Solr-3.1-Product-Sheet-Cominvent-AS.pdf"><img class="alignleft size-full wp-image-326" style="margin-left: 10px; margin-right: 10px;" title="Apache Solr product sheet thumbnail" src="http://www.cominvent.com/wp-content/uploads/2010/07/Apache-Solr-product-sheet-thumbnail.png" alt="" width="290" height="409" /></a>The brand new version 3.1 of Apache Solr was released yesterday.</p>
<p>We have created a 2-page Apache Solr product sheet, which very briefly (and beautifully) describes the high-level features of the popular search engine, including links for downloading and getting started.</p>
<p>Use it to explain to business persons and decision makers what open source search can do. This is the missing &#8220;glossy&#8221; merchandise piece of the puzzle if you like.</p>
<p>You are free to re-use the product sheet in your commercial business, as it is licensed under Creative Commons BY-SA, meaning you can even change it as long as you leave the credit and link to Cominvent in place and also share your changes under the same license in the ODF source form.</p>
<p>Download the <a href="http://www.cominvent.com/wp-content/uploads/2010/07/Apache-Solr-3.1-Product-Sheet-Cominvent-AS.pdf">Solr 3.1 product sheet (PDF)</a>.</p>
<p>And here is the <a href="https://docs.google.com/leaf?id=0B5sMZSogVbD9NzEzMmYzMDgtNjg5NC00NDcxLTg0MTctY2EyMmI4MTY5YTcy&amp;hl=en" target="_blank">Solr 3.1 product sheet master (ODT)</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cominvent.com/2011/04/01/apache-solr-3-1-product-sheet/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Our GoOpen talk about DN.no migrating to Solr</title>
		<link>http://www.cominvent.com/2011/03/23/goopen-2011-dnno-migrating-to-solr/</link>
		<comments>http://www.cominvent.com/2011/03/23/goopen-2011-dnno-migrating-to-solr/#comments</comments>
		<pubDate>Wed, 23 Mar 2011 16:01:34 +0000</pubDate>
		<dc:creator>janhoy</dc:creator>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Search technology]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[FAST ESP]]></category>
		<category><![CDATA[GoOpen]]></category>
		<category><![CDATA[news search]]></category>
		<category><![CDATA[nhst]]></category>
		<category><![CDATA[presentations]]></category>

		<guid isPermaLink="false">http://www.cominvent.com/?p=487</guid>
		<description><![CDATA[We held a talk at the Open Source Conference GoOpen 2011 in Oslo today, together with our customer NHST, represented by Hans Jørgen Hoel. The talk was about the process of migrating from FAST ESP to Apache Solr for all of NHST&#8217;s news publications and other data sources. The presentation is in Norwegian. Dagens Næringslivs [...]]]></description>
			<content:encoded><![CDATA[<p>We held a talk at the Open Source Conference GoOpen 2011 in Oslo today, together with our customer NHST, represented by Hans Jørgen Hoel. The talk was about the process of migrating from FAST ESP to Apache Solr for all of NHST&#8217;s news publications and other data sources.</p>
<p>The presentation is in Norwegian.</p>
<div id="__ss_7359263" style="width: 425px;"><strong style="display: block; margin: 12px 0 4px;"><a title="Dagens Næringslivs overgang til Lucene/Solr søk" href="http://www.slideshare.net/janhoy/go-open">Dagens Næringslivs overgang til Lucene/Solr søk</a></strong> <object id="__sse7359263" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=goopennhstcominvent-110323093135-phpapp01&amp;stripped_title=go-open&amp;userName=janhoy" /><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><embed type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=goopennhstcominvent-110323093135-phpapp01&amp;stripped_title=go-open&amp;userName=janhoy" allowfullscreen="true" allowscriptaccess="always" name="__sse7359263"></embed></object></div>
<h2 style="width: 425px;"><span id="more-487"></span>English transcript</h2>
<p style="width: 425px;">NHST Media Group publishes many online newspapers including DN.no (financial), Tradewinds.no (shipping), ReCharge (renewable energy) etc. This presentation was held by NHST and Cominvent.</p>
<p style="width: 425px;">Agenda:</p>
<ul>
<li>Project background</li>
<li>Architecture</li>
<li>Search ABC</li>
<li>The project</li>
<li>Summary</li>
</ul>
<h3>Project Background</h3>
<p>Large amount of news articles on paper and online.</p>
<p>FAST ESP as search platform since 2006, Solr for tax report search since 2009.</p>
<p>Open source, Linux and Java is heavily used in the organization.</p>
<p>FAST was acquired by Microsoft in 2008 and Linux support discontinued. This prompted a new evaluation of the search architecture, and Solr was chosen for the future, with Cominvent as technology partner.</p>
<h3>Architecture before</h3>
<p>FAST uses one monolithic index, so all sources shared the same data schema (index-profile). Escenic is the main source of content. A plugin existed to push content to FAST based on triggers.</p>
<p>On the search side each publication were either using the FAST search API directly or some flavor of a home-grown search middleware. However, each publication had their own result presentation logic and innovations in one publication would not benefit the others.</p>
<h3>Search ABC</h3>
<p>Search is NOT database. Optimized for free text, but also handles boolean logic well.</p>
<p>Commercial engines: FAST/Microsoft, Google Search Appliance (GSA), Autonomy IDOL</p>
<p>Open source engines: Apache Solr/Lucene, Xapian, Elastic Search</p>
<p>Usage areas: Intranet, shopping, social media, news etc</p>
<p>Solr is an open source Java based search server with Lucene in the core. It is released by the Apache Foundation under the permissive open source license Apache Software License 2.0, meaning you can do almost anything you like with the software, including sharing it or closing it and charging for it.</p>
<h3>The project</h3>
<p>We introduced a new, common search and indexing middleware which all publications use. The role of the middleware  is to isolate the clients from details and changes in the search engine. There is also a presentation layer in the middleware which provide JSP taglibs for delivering a standard result page with pagination, facets, did-you-mean etc. This makes it very rapid to plug in search in a new publication.</p>
<p>All the data sources also now use the same middleware for indexing, taking care of indexing the content to the right search core .</p>
<p><strong>Challenges</strong></p>
<p>Some features of FAST did not exist in Solr. FAST is more a search <strong>platform</strong> while Solr is a search server. The major difference was linguistic support which is strong in FAST. This was solved in Solr.</p>
<p>We were using entity extraction in FAST, but did not include that in Solr in this project, as it does not come out of the box, but need integration with 3rd party solutions.</p>
<p><strong>Differences</strong></p>
<p>While FAST uses a monolithic index, Solr can be split in <strong>cores</strong>, each having its own data schema and configuration. This means that if you need to reconfigure or re-index one data source such as tax-list, you do not affect the rest of the articles. It also allows for easy staging of new content to a new core, and then swapping it into production when ready, without the need for another physical staging server as was needed with FAST.</p>
<p>FAST ships with Lemmatization, while in Solr we use stemming, which is inferior and causes some problems. These are mitigated by tuning the stemming dictinoaries.</p>
<p>To give Solr language support, we implemented some language abstractions in the middleware, adding a language field to each document, and choosing separate fields title_no for Norwegian content and title_en for english content, and then making this implementation detail transparent from the search clients.</p>
<p><strong>Tuning</strong></p>
<p>News is fresh meat. You need immediate indexing as things change (push instead of pull). We also implemented date boost through Solr&#8217;s Function Query formula. There are tons of formulas available, and there is almost no limit to what you can tune and boost.</p>
<h3>Summary</h3>
<p>Solr is a lot less resource demanding than FAST. Can easily run virtualized or in the cloud. NHST scaled into the Amazon EC2 cloud during the peak period of the tax list search last year.</p>
<p>Each developer may run a local copy of Solr on his laptop, this was very hard with FAST.</p>
<p>Cleaner architecture than before, more flexible with multiple cores.</p>
<p>A big win to gather all search related business logic into a common search middleware, including a JSP presentation layer.</p>
<p>Superb tuning possibilities, easier to tune than the old engine.</p>
<p>Although there were challenges and we had to sacrifice entity extraction in the first phase, we&#8217;re very happy with the decision to migrate to Solr</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><script src="http://b.scorecardresearch.com/beacon.js?c1=7&amp;c2=7400849&amp;c3=1&amp;c4=&amp;c5=&amp;c6="></script> <script src="http://b.scorecardresearch.com/beacon.js?c1=7&amp;c2=7400849&amp;c3=1&amp;c4=&amp;c5=&amp;c6="></script> <script src="http://b.scorecardresearch.com/beacon.js?c1=7&amp;c2=7400849&amp;c3=1&amp;c4=&amp;c5=&amp;c6="></script> <script src="http://b.scorecardresearch.com/beacon.js?c1=7&amp;c2=7400849&amp;c3=1&amp;c4=&amp;c5=&amp;c6="></script><script src="http://b.scorecardresearch.com/beacon.js?c1=7&amp;c2=7400849&amp;c3=1&amp;c4=&amp;c5=&amp;c6="></script> <script src="http://b.scorecardresearch.com/beacon.js?c1=7&amp;c2=7400849&amp;c3=1&amp;c4=&amp;c5=&amp;c6="></script></p>
<p><script src="http://b.scorecardresearch.com/beacon.js?c1=7&amp;c2=7400849&amp;c3=1&amp;c4=&amp;c5=&amp;c6="></script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cominvent.com/2011/03/23/goopen-2011-dnno-migrating-to-solr/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cominvent expands Solr Training</title>
		<link>http://www.cominvent.com/2011/03/21/cominvent-expands-solr-training/</link>
		<comments>http://www.cominvent.com/2011/03/21/cominvent-expands-solr-training/#comments</comments>
		<pubDate>Mon, 21 Mar 2011 00:43:34 +0000</pubDate>
		<dc:creator>janhoy</dc:creator>
				<category><![CDATA[Search technology]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[certified]]></category>
		<category><![CDATA[course]]></category>
		<category><![CDATA[training]]></category>

		<guid isPermaLink="false">http://www.cominvent.com/?p=482</guid>
		<description><![CDATA[Cominvent has been delivering professional training within enterprise search for more than 7 years. First on the FAST platform, and then on Solr/Lucene. We were the first to introduce Solr training in Europe. We have now expanded our comprehensive training offering, as shown in the training modules illustration, covering the whole range from short half-day [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.solrtraining.com/"><img class="alignright size-medium wp-image-478" title="SolrTrainingcourses" src="http://www.cominvent.com/wp-content/uploads/2009/07/SolrTrainingcourses-300x189.png" alt="" width="300" height="189" /></a>Cominvent has been delivering professional training within enterprise search for more than 7 years. First on the FAST platform, and then on Solr/Lucene. We were the first to introduce Solr training in Europe.</p>
<p>We have now expanded our comprehensive training offering, as shown in the training modules illustration, covering the whole range from short half-day introduction for anyone to full certification track for developers.</p>
<p>Go visit our training site <a href="http://www.solrtraining.com/">www.solrtraining.com</a> and sign up for the training which fits you best.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cominvent.com/2011/03/21/cominvent-expands-solr-training/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Visualization of Lucene segment merges</title>
		<link>http://www.cominvent.com/2011/02/24/visualization-of-lucene-segment-merges/</link>
		<comments>http://www.cominvent.com/2011/02/24/visualization-of-lucene-segment-merges/#comments</comments>
		<pubDate>Thu, 24 Feb 2011 15:59:50 +0000</pubDate>
		<dc:creator>janhoy</dc:creator>
				<category><![CDATA[Search technology]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[mergepolicy]]></category>
		<category><![CDATA[segments]]></category>

		<guid isPermaLink="false">http://www.cominvent.com/?p=468</guid>
		<description><![CDATA[Lucene guru Mike McCandless just released on his blog an impressive piece of work visualizing how Lucene MergePolicy really works through a series of YouTube videos. He feeds Solr with a 10Gb Wikipedia dump and also some random add/delete data source, and then records every single segment written and merged during the whole process. Mike [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html"><img class="alignright size-medium wp-image-469" title="Screen shot 2011-02-24 at 16.54.59" src="http://www.cominvent.com/wp-content/uploads/2011/02/Screen-shot-2011-02-24-at-16.54.59-300x168.png" alt="" width="300" height="168" /></a>Lucene guru Mike McCandless just released on his blog an impressive piece of work <a href="http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html">visualizing how Lucene MergePolicy really works</a> through a series of YouTube videos. He feeds Solr with a 10Gb Wikipedia dump and also some random add/delete data source, and then records every single segment written and merged during the whole process.</p>
<p>Mike also introduces a cool new merge policy called TieredMergePolicy (<a href="https://issues.apache.org/jira/browse/LUCENE-854" target="_blank">LUCENE-854</a>) which is much smarter and slightly more efficient than the default one. Hope this becomes the new default merge policy in Solr.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cominvent.com/2011/02/24/visualization-of-lucene-segment-merges/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Solr distros are coming</title>
		<link>http://www.cominvent.com/2010/11/12/the-solr-distros-are-coming/</link>
		<comments>http://www.cominvent.com/2010/11/12/the-solr-distros-are-coming/#comments</comments>
		<pubDate>Fri, 12 Nov 2010 19:40:10 +0000</pubDate>
		<dc:creator>janhoy</dc:creator>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Search technology]]></category>
		<category><![CDATA[Solr distros]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Trends]]></category>
		<category><![CDATA[Constellio]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google connectors]]></category>
		<category><![CDATA[Google OneBox]]></category>
		<category><![CDATA[GPL]]></category>
		<category><![CDATA[GSA]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.cominvent.com/?p=405</guid>
		<description><![CDATA[Open Source Search is gaining more and more traction. First you had Lucene (2001), giving great search for programmers. Then we got Solr (2006) making search accessible for non programmers, but a certain level of expertise is still needed. And then came Constellio, an open source (GPL) enterprise search distribution (distro) built on Solr, adding a [...]]]></description>
			<content:encoded><![CDATA[<p><img class="size-full wp-image-406  alignright" style="margin: 10px;" title="Constellio search" src="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-16.50.40.png" alt="" width="207" height="178" /></p>
<p>Open Source Search is gaining more and more traction. First you had <a href="http://en.wikipedia.org/wiki/Lucene" target="_blank">Lucene</a> (2001), giving great search for programmers. Then we got <a href="http://en.wikipedia.org/wiki/Apache_Solr" target="_blank">Solr</a> (2006) making search accessible for non programmers, but a certain level of expertise is still needed. And then came <a href="http://www.constellio.com/" target="_blank">Constellio</a>, an open source (<a href="http://en.wikipedia.org/wiki/GPL" target="_blank">GPL</a>) enterprise search distribution (distro) built on Solr, adding a slick GUI, connector and crawling support and more.</p>
<h2>Say again. A Solr distro?</h2>
<p>I call it &#8220;distro&#8221; because I like to compare the evolution to what we have seen in <a href="http://en.wikipedia.org/wiki/GNU/Linux" target="_blank">GNU/Linux</a>. First there was the Linux core. Then there was the GNU tools that made Linux so much more usable but still only for engineers comfortable with the command line. And last, companies like RedHat and Suse built complete distros including modern GUI, ready-to use tools such as OpenOffice, Thunderbird and more. Without these distros, Linux would just have been a &#8220;core&#8221; leaving to the user to add the extra sugar.<span id="more-405"></span></p>
<p>I dare state that the same is about to happen with Open Source Search. There are many companies out there already with their own proprietary Apache Solr/Lucene based &#8221;distro&#8221;, but Constellio is the first open-source one I have seen so far.</p>
<p>As a Solr user, you&#8217;ll feel at home within ./constellio/tomcat/webapps/constellio/WEB-INF/solrcores/&lt;your_core&gt; where you&#8217;ll find the regular schema, solrconfig etc. But I do suspect that any manual edits here will be overwritten by the GUI&#8230;</p>
<h2>Tapping into Google Search Appliance</h2>
<p>The creators of Constellio have done a pretty good job in this first 1.0 release. Easy installation, nice administration GUI, easy to get started crawling, etc. And they have been bold enough to tap into Google&#8217;s open-sourced GSA connectors available at <a href="http://code.google.com/p/google-enterprise-connector-manager/" target="_blank">Google Code</a> as opposed to using <a href="http://incubator.apache.org/connectors/" target="_blank">ManifoldCF</a> from Apache or another connector framework. They also hook in to <a href="http://www.google.com/enterprise/marketplace/search?categoryId=18&amp;orderBy=rating" target="_blank">Google OneBox</a> APIs, thus enabling users to plug in to all the smart search &#8220;widgets&#8221; that can for instance intercept the query, and if it detects a stock ticker, deliver a stock price graph on top of search results. Nifty! I bet Google didn&#8217;t anticipate their connector framework being used outside of the GSA&#8230;</p>
<h2>So what&#8217;s the catch?</h2>
<p>Well, for one, it is GPL (v3), meaning that it excludes some potential users right away (unless they are able to dual license?). You have to register on the site in order to download, meaning you&#8217;ll probably be contacted at some point in time by sales &#8211; no big deal. It is open source and the source code is available, but it is not developed by a community in an open way. You can download the source as a zip, but if you change it, who&#8217;s gonna maintain your changes? Probably yourself&#8230;</p>
<p>Luckily there is no limits on number of documents you can index or the QPS rate. Thus it is a true free (as in free beer) solution, which cannot be said about the weak MS search server Express or the old and maxdoc-limited Omnifind Yahoo! edition. Being free™ may be enough reason to give value to many users who would otherwise have to pay consultants to bring up a solution from scratch based on the individual components.</p>
<p>Constellio&#8217;s business model is to live from support and consulting fees, and that may very well work. But I cannot see how they will be able to create a true open community around their product, and for that reason I believe it will be a distro without very large adoption.</p>
<h2>Quirks</h2>
<p>It is obviously an early version 1.0. If it was an ASF project it would probably have version number 0.x. A few quirks: The logo upload did not work. It identified my Norwegian web pages as Danish, and it crashed on me (see screen shots). But good luck to the creators with making this into a mature Solr distro.</p>
<h2>Screen shots</h2>
<div id="attachment_408" class="wp-caption alignleft" style="width: 310px"><a href="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-17.13.03.png"><img class="size-medium wp-image-408 " title="Constellio search page" src="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-17.13.03-300x137.png" alt="" width="300" height="137" /></a><p class="wp-caption-text">Constellio search page</p></div>
<div id="attachment_409" class="wp-caption alignleft" style="width: 310px"><a href="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-17.14.09.png"><img class="size-medium wp-image-409" title="Constellio collection admin" src="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-17.14.09-300x153.png" alt="" width="300" height="153" /></a><p class="wp-caption-text">Constellio collection admin</p></div>
<div id="attachment_410" class="wp-caption alignleft" style="width: 310px"><a href="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-17.14.38.png"><img class="size-medium wp-image-410" title="Constellio - edit collection" src="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-17.14.38-300x227.png" alt="" width="300" height="227" /></a><p class="wp-caption-text">Constellio - edit collection</p></div>
<div id="attachment_411" class="wp-caption alignleft" style="width: 310px"><a href="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-17.16.44.png"><img class="size-medium wp-image-411" title="Constellio - server management tab" src="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-17.16.44-300x162.png" alt="" width="300" height="162" /></a><p class="wp-caption-text">Constellio - server management tab</p></div>
<div id="attachment_412" class="wp-caption alignleft" style="width: 310px"><a href="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-17.17.12.png"><img class="size-medium wp-image-412" title="Constellio - connectors management" src="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-17.17.12-300x191.png" alt="" width="300" height="191" /></a><p class="wp-caption-text">Constellio - connectors management</p></div>
<div id="attachment_413" class="wp-caption alignleft" style="width: 310px"><a href="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-17.17.23.png"><img class="size-medium wp-image-413" title="Constellio - field types" src="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-17.17.23-300x166.png" alt="" width="300" height="166" /></a><p class="wp-caption-text">Constellio - edit field types in GUI</p></div>
<div id="attachment_414" class="wp-caption alignleft" style="width: 310px"><a href="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-17.18.25.png"><img class="size-medium wp-image-414" title="Constellio - field type edit" src="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-17.18.25-300x273.png" alt="" width="300" height="273" /></a><p class="wp-caption-text">Constellio - configuring a field type with analysis</p></div>
<div id="attachment_415" class="wp-caption alignleft" style="width: 310px"><a href="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-17.20.53.png"><img class="size-medium wp-image-415" title="Constellio - error message" src="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-17.20.53-300x126.png" alt="" width="300" height="126" /></a><p class="wp-caption-text">Constellio - hey, it&#39;s only version 1.0 <img src='http://www.cominvent.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.cominvent.com/2010/11/12/the-solr-distros-are-coming/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The first real FAST Search book</title>
		<link>http://www.cominvent.com/2010/11/12/the-first-real-fast-search-book/</link>
		<comments>http://www.cominvent.com/2010/11/12/the-first-real-fast-search-book/#comments</comments>
		<pubDate>Fri, 12 Nov 2010 12:53:39 +0000</pubDate>
		<dc:creator>janhoy</dc:creator>
				<category><![CDATA[FAST ESP]]></category>
		<category><![CDATA[Search technology]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[book]]></category>
		<category><![CDATA[enterprise search]]></category>
		<category><![CDATA[FSIS]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Sharepoint]]></category>
		<category><![CDATA[Sharepoint search]]></category>

		<guid isPermaLink="false">http://www.cominvent.com/?p=391</guid>
		<description><![CDATA[Over due by several years, Wrox just published a book about Microsoft Enterprise Search, including the different FAST flavours. Bravo! You can ask how all the users of FAST technology could have managed for so many years without some public source of learning the products. Up until now FAST/MS and their partners have been the [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_392" class="wp-caption alignright" style="width: 210px"><a href="http://www.amazon.com/Professional-Microsoft-Search-SharePoint-Programmer/dp/0470584661"><img class="size-full wp-image-392" title="Professional-Microsoft-Search-Book" src="http://www.cominvent.com/wp-content/uploads/2010/11/Professional-Microsoft-Search-Book.jpg" alt="" width="200" height="251" /></a><p class="wp-caption-text">Book cover © Amazon &amp; Wrox</p></div>
<p>Over due by several years, Wrox just published a book about Microsoft Enterprise Search, including the different FAST flavours. Bravo!</p>
<p>You can ask how all the users of FAST technology could have managed for so many years without some public source of learning the products. Up until now FAST/MS and their partners have been the sole source of learning FAST Search [1]. Now, we&#8217;re part of that eco-system and may have profited on the lack of material available, but that&#8217;s another story.<span id="more-391"></span></p>
<p>The book is written by Jeff Fried (Ex-FAST), Mark Bennett, Natalya Voskresenskaya and Miles Kehoe and covers the chapters</p>
<ol>
<li>What is Enterprise Search</li>
<li>Developing a strategy &#8211; the business process of search</li>
<li>Overview of Microsoft Enterprise search products</li>
<li>Search within Sharepoint 2010</li>
<li>FAST Search within Sharepoint 2010</li>
<li>Customizing search with Sharepoint 2010</li>
<li>Introduction to FAST ESP</li>
<li>Customnization and deployment of FAST ESP 5.x</li>
<li>Advanced topics</li>
<li>Enterprise search is social search</li>
<li>Search and business intelligence</li>
<li>The future of search</li>
</ol>
<p>I have not read the book yet &#8211; but I have bought it to my Kindle and will flip though it on my iPad if time allows. I could of course, humble as I am, have authored much of this book myself, but maybe I can learn a thing or two as well <img src='http://www.cominvent.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>Click the book cover to go to Amazon.com&#8217;s book page.</p>
<p>[1]: Except from <a href="http://www.fastforum.info/" target="_blank">http://www.fastforum.info/</a> (<a href="http://blackhorseinnovations.com/" target="_blank">owner</a>) and <a href="http://fastesphelp.com/" target="_blank">http://fastesphelp.com/</a> (by Anand Kumar Pandey)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cominvent.com/2010/11/12/the-first-real-fast-search-book/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>What happens to FAST ESP?</title>
		<link>http://www.cominvent.com/2010/11/12/what-happens-to-fast-esp/</link>
		<comments>http://www.cominvent.com/2010/11/12/what-happens-to-fast-esp/#comments</comments>
		<pubDate>Fri, 12 Nov 2010 01:49:39 +0000</pubDate>
		<dc:creator>janhoy</dc:creator>
				<category><![CDATA[FAST ESP]]></category>
		<category><![CDATA[Search technology]]></category>
		<category><![CDATA[ESP]]></category>

		<guid isPermaLink="false">http://www.cominvent.com/2010/11/12/what-happens-to-fast-esp/</guid>
		<description><![CDATA[After the Microsoft takeover of FAST almost three years ago, it&#8217;s been silent and no new updates of ESP. We all know that MS discontinued Linux support, and that the major focus with the FAST technology has been to power the high-end search for Sharepoint 2010. ESP was forked and heavily modified to integrate smoothly [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-389" title="FAST ESP logo" src="http://www.cominvent.com/wp-content/uploads/2010/11/FAST-ESP-logo1.png" alt="" width="244" height="48" />After the Microsoft takeover of FAST almost three years ago, it&#8217;s been silent and no new updates of ESP. We all know that MS discontinued Linux support, and that the major focus with the FAST technology has been to power the high-end search for Sharepoint 2010. ESP was forked and heavily modified to integrate smoothly with Windows, SQL server, AD, PowerShell and more, and it made the leap to 64 bit &#8211; finally!</p>
<p>But what about non Sharepoint users? MS has an offering to them as well, called FAST Search for Internet Sites. Read <a href="http://nuggets.comperiosearch.com/2010/11/fast-search-internet-sites/" target="_blank">Comperio&#8217;s excellent blog article</a> about it. A bit disappointing that the core is still the more than three year old ESP5.3 wrapped in new MS APIs, but cool that you can still hack the ESP internals..<span id="more-382"></span></p>
<p>This figure from the book &#8220;<a href="http://www.cominvent.com/2010/11/12/the-first-real-fast-search-book/">Professional Microsoft Search</a>&#8221; shows an overview of where the ESP  based offerings fit into the overall product offerings from MS:</p>
<div id="attachment_394" class="wp-caption alignnone" style="width: 495px"><img class="size-full wp-image-394 " title="Microsoft Enterprise Search lineup" src="http://www.cominvent.com/wp-content/uploads/2010/11/Screen-shot-2010-11-12-at-13.49.04.png" alt="" width="485" height="333" /><p class="wp-caption-text">20©10 Wrox publishing</p></div>
<p>Question is; what is the long term outlook for FSIS? It can obviously not stay on the aging ESP5.3 core forever. Will Microsoft upgrade this product or just put it on the last page of the price sheet to have an answer if someone asks? Can the SP version be adapted to work standalone? Perhaps. But it too is in desperate need of a new modern search core. I guess we can assume that since a new book pops up mentioning FAST ESP explicitly, we can expect it to stick around for some while&#8230;  Give your thoughts about the future of non-sharepoint search from Microsoft.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cominvent.com/2010/11/12/what-happens-to-fast-esp/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

