[Tagdb] question about solr vs nutch

Nitin Borwankar nitin at borwankar.com
Tue Nov 14 23:10:13 GMT 2006


Hi all,

As there are some experts in text indexing on the list thought this 
might be the best place to ask ....
I see that solr ( http://incubator.apache.org/solr/ ) is an enterprise 
search engine based on Lucene with a web-service api for submitting docs 
to be indexed.
Also that Nutch ( www.nutch.org )  is another search engine based on 
Lucene which directly stores docs to disk before indexing.
What is the performance hit of submitting docs by web-service in 
comparison to the nutch approach, if at all this is a comparison that 
makes sense.
My interest is in the fielded search capabilities of solr, applied to 
either LAN based docs or docs crawled from the web, but I am concerned 
about the performance hit of
web-service submission + XML overhead compared to direct disk writes.

Any enlighteneing thoughts ?

Nitin Borwankar


More information about the Tagdb mailing list