[Tagdb] question about solr vs nutch

ogjunk-tagdb at yahoo.com ogjunk-tagdb at yahoo.com
Wed Nov 15 05:28:50 GMT 2006


That's a bit of an apples and oranges comparison.  Ian already pointed out the most obvious/basic/biggest difference.  They are meant to solve different problems.  Moreover, if you play with Nutch, you will see that's a rather complex and ambitious piece of software.  Solr is a lot smaller (code-wise) and simpler.  Again, it's hard to compare them, because they are really two pretty different things, even though they both do text indexing and searching.

Otis (Lucene/Solr/Nutch developer)

----- Original Message ----
From: Nitin Borwankar <nitin at borwankar.com>
To: tagdb at lists.tagschema.com
Sent: Tuesday, November 14, 2006 6:10:13 PM
Subject: [Tagdb] question about solr vs nutch

Hi all,

As there are some experts in text indexing on the list thought this 
might be the best place to ask ....
I see that solr ( http://incubator.apache.org/solr/ ) is an enterprise 
search engine based on Lucene with a web-service api for submitting docs 
to be indexed.
Also that Nutch ( www.nutch.org )  is another search engine based on 
Lucene which directly stores docs to disk before indexing.
What is the performance hit of submitting docs by web-service in 
comparison to the nutch approach, if at all this is a comparison that 
makes sense.
My interest is in the fielded search capabilities of solr, applied to 
either LAN based docs or docs crawled from the web, but I am concerned 
about the performance hit of
web-service submission + XML overhead compared to direct disk writes.

Any enlighteneing thoughts ?

Nitin Borwankar
_______________________________________________
Tagdb mailing list
Tagdb at lists.tagschema.com
http://lists.tagschema.com/mailman/listinfo/tagdb





More information about the Tagdb mailing list