[Tagdb] Tags and data storage
Erik Hatcher
esh6h at virginia.edu
Thu Mar 23 01:46:25 GMT 2006
On Mar 22, 2006, at 7:29 PM, Nitin Borwankar wrote:
> a) How does a Lucene based app perform at the lower ends of the
> scale - is there an overhead and a threshold above which Lucene makes
> sense ?
Lucene works well at all scales, actually. Certainly at the higher
ends of scalability more sophisticated management needs to be
considered such as distributed index servers, etc.
I think even at lower scales, Lucene makes great sense because of its
querying capability. I'm no SQL expert, but formulating a query such
as "show me all objects tagged with _foo_, _bar_, but not _baz_....
OR _baz_ and not _bar_" is likely to be tricky, but with an inverted
index such as Lucene it is trivial. Maybe that particular query is
actually not too bad in SQL, but I'm also combining tag queries with
full-text searches such as "this phrase" in the body of a document
that has been tagged, along with other filters (for date, genre, etc).
> b) How do I hook up my web app to a Lucene-tag-backend when my web
> app is not written in Java ?
> c) Are there commonly used JSON/XML-RPC etc. wrappers around the
> backend so I can call it from Python/PHP/Ruby ?
Lucene by itself is just a JAR file, and by itself there is no
"server" as such. However, many projects have built web services
around Lucene. The most interesting of these is the newly donated
Solr project:
http://incubator.apache.org/solr/
I'm starting to prototype with it to replace my current tag system
for my University research project (tagging and annotating library
archives). My current system is partly in Kowari, and partly my own
custom Lucene search server which is a very rudimentary version of
the more sophisticated faceted capabilities that Solr provides. My
front-end is in Ruby on Rails, using SOAP and XML-RPC respectively.
Solr uses HTTP GET/POST, and it is driving CNETs faceted search
system, where it is deployed in a distributed fashion.
Erik
More information about the Tagdb
mailing list