[Tagdb] Tags and data storage

Erik Hatcher esh6h at virginia.edu
Thu Mar 23 01:46:25 GMT 2006


On Mar 22, 2006, at 7:29 PM, Nitin Borwankar wrote:
> a) How does a Lucene based  app  perform at the lower  ends  of the  
> scale - is there an overhead and a  threshold above which Lucene makes
> sense ?

Lucene works well at all scales, actually.  Certainly at the higher  
ends of scalability more sophisticated management needs to be  
considered such as distributed index servers, etc.

I think even at lower scales, Lucene makes great sense because of its  
querying capability.  I'm no SQL expert, but formulating a query such  
as "show me all objects tagged with _foo_, _bar_, but not _baz_....  
OR _baz_ and not _bar_" is likely to be tricky, but with an inverted  
index such as Lucene it is trivial.  Maybe that particular query is  
actually not too bad in SQL, but I'm also combining tag queries with  
full-text searches such as "this phrase" in the body of a document  
that has been tagged, along with other filters (for date, genre, etc).

> b) How do I hook up my web app to a Lucene-tag-backend when my web  
> app is not written in Java ?
> c) Are there commonly used JSON/XML-RPC etc. wrappers around the  
> backend so I can call it from Python/PHP/Ruby ?

Lucene by itself is just a JAR file, and by itself there is no  
"server" as such.  However, many projects have built web services  
around Lucene.  The most interesting of these is the newly donated  
Solr project:

	http://incubator.apache.org/solr/

I'm starting to prototype with it to replace my current tag system  
for my University research project (tagging and annotating library  
archives).  My current system is partly in Kowari, and partly my own  
custom Lucene search server which is a very rudimentary version of  
the more sophisticated faceted capabilities that Solr provides.  My  
front-end is in Ruby on Rails, using SOAP and XML-RPC respectively.   
Solr uses HTTP GET/POST, and it is driving CNETs faceted search  
system, where it is deployed in a distributed fashion.

	Erik



More information about the Tagdb mailing list