[Tagdb] question about solr vs nutch
Ian Holsman
lists at holsman.net
Wed Nov 15 05:20:38 GMT 2006
Hi Nitin.
your probably better off asking this question on the solr or nutch
mailing lists.
but to answer it from my point of view (having experience with both)
SolR is for more structured data. for example a data feed of prices
and other attributes for a set of products
Nutch is more for crawling the web and 'discovering' content sitting
on a intranet/set of web servers.
both use lucene as their backend so they would exhibit very similar
performance attributes.
regards
Ian
On 15/11/2006, at 10:10 AM, Nitin Borwankar wrote:
> Hi all,
>
> As there are some experts in text indexing on the list thought this
> might be the best place to ask ....
> I see that solr ( http://incubator.apache.org/solr/ ) is an
> enterprise search engine based on Lucene with a web-service api for
> submitting docs to be indexed.
> Also that Nutch ( www.nutch.org ) is another search engine based
> on Lucene which directly stores docs to disk before indexing.
> What is the performance hit of submitting docs by web-service in
> comparison to the nutch approach, if at all this is a comparison
> that makes sense.
> My interest is in the fielded search capabilities of solr, applied
> to either LAN based docs or docs crawled from the web, but I am
> concerned about the performance hit of
> web-service submission + XML overhead compared to direct disk writes.
>
> Any enlighteneing thoughts ?
>
> Nitin Borwankar
> _______________________________________________
> Tagdb mailing list
> Tagdb at lists.tagschema.com
> http://lists.tagschema.com/mailman/listinfo/tagdb
Ian Holsman
Ian at Holsman.net
http://parent-chatter.com -- what do parents know?
More information about the Tagdb
mailing list