[Tagdb] question about solr vs nutch

Ian Holsman lists at holsman.net
Wed Nov 15 05:20:38 GMT 2006


Hi Nitin.

your probably better off asking this question on the solr or nutch  
mailing lists.

but to answer it from my point of view (having experience with both)

SolR is for more structured data. for example a data feed of prices  
and other attributes for a set of products

Nutch is more for crawling the web and 'discovering' content sitting  
on a intranet/set of web servers.

both use lucene as their backend so they would exhibit very similar  
performance attributes.

regards
Ian


On 15/11/2006, at 10:10 AM, Nitin Borwankar wrote:

> Hi all,
>
> As there are some experts in text indexing on the list thought this  
> might be the best place to ask ....
> I see that solr ( http://incubator.apache.org/solr/ ) is an  
> enterprise search engine based on Lucene with a web-service api for  
> submitting docs to be indexed.
> Also that Nutch ( www.nutch.org )  is another search engine based  
> on Lucene which directly stores docs to disk before indexing.
> What is the performance hit of submitting docs by web-service in  
> comparison to the nutch approach, if at all this is a comparison  
> that makes sense.
> My interest is in the fielded search capabilities of solr, applied  
> to either LAN based docs or docs crawled from the web, but I am  
> concerned about the performance hit of
> web-service submission + XML overhead compared to direct disk writes.
>
> Any enlighteneing thoughts ?
>
> Nitin Borwankar
> _______________________________________________
> Tagdb mailing list
> Tagdb at lists.tagschema.com
> http://lists.tagschema.com/mailman/listinfo/tagdb

Ian Holsman
Ian at Holsman.net
http://parent-chatter.com -- what do parents know?




More information about the Tagdb mailing list