[Tagdb] RDBMS, Lucene or both?
ogjunk-tagdb at yahoo.com
ogjunk-tagdb at yahoo.com
Tue Feb 6 09:40:02 GMT 2007
Am I *that* predictable? :) I certainly am when it comes to this. Relational databases are great for certain things, but...
My stuff that interests Nitin:
http://simpy.com/links/search/%252Bsolr%2520%252Blucene%2520username%253Aotis
Stuff that interests Nitin, but Otis doesn't yet have it:
http://simpy.com/links/search/%252Bsolr%2520%252Blucene%2520-username%253Aotis
So, yes, there is a lot of Lucene in there. There is certainly a RDBMS as well, but when it comes to searches, the RDBMS is out of the picture.
Note the related users on the right. Note how similar they are to "Similar Users" you see on http://www.simpy.com/user/otis . Some more Lucene. If you have a copy of Lucene in Action: page 186 under section 5.7 in chapter 5 (Advanced search techniques) .
Otis
----- Original Message ----
From: Nitin Borwankar <nitin at borwankar.com>
To: Ace Jayz <fourtlove at gmail.com>
Cc: tagdb at lists.tagschema.com
Sent: Tuesday, February 6, 2007 1:50:02 AM
Subject: Re: [Tagdb] RDBMS, Lucene or both?
Ace Jayz wrote:
> I've been grappling with a design of a bookmarking/tagging system and
> I'm leaning towards storing the tag data in an RDBMS so that I can
> answer fairly complex relational queries efficiently (will probably
> use a 3-table schema similar to that of Toxi), but I'm thinking about
> storing some data in a Lucene index for efficient free text searching.
Hi Ace,
One suggestion would be to try not to think in terms of a design goal of
being able to "answer fairly complex relational queries efficiently".
Because then you are already biased strongly towards an RDBMS.
More important is what kind of queries do users want and often it has
turned out that users want boolean queries.
For example the kind of query that comes up often in discussions of
del.icio.us is
"gimme all the bookmarks that have tag a and tag b but not tag c" i.e
set intersection and difference.
Turns out these are not the kinds of things that scale very well with
increasing number of terms when represented as SQL based queries. And
these are the kinds of things that text search engines claim to do very
well.
I have yet to do or see an apples to apples comparison between the two
technologies and if anyone has practical experience in this please chime
in. But the point I wa smaking is the way you frame your question may
bias you strongly *and* not be necessarily what the users want.
Aside form that, apparently Simpy *is* a tagging system based solely on
Lucene.
I am sure Otis will speak up if it is not.
Nitin Borwankar.
> I looked back in the list archives and noticed that a system based on
> a combination of a DB and Lucene has been suggested here before. Has
> anyone on the list implemented such a system? If so, care to share
> your experiences? I've read some posts from Otis G. about his Simpy
> system and I'm curious if it fits into this mold or if it uses Lucene
> exclusively. If Simpy is not an example of a tagging system solely
> based on Lucene, does anyone know of a full-featured tagging system
> that is?
>
> Thanks in advance,
> Ace.
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Tagdb mailing list
>Tagdb at lists.tagschema.com
>http://lists.tagschema.com/mailman/listinfo/tagdb
>
>
--
Nitin Borwankar
Find, Learn, Act ....
Greener, the search engine for the planet
http://greener.com
nitin at borwankar.com
510-872-7066
_______________________________________________
Tagdb mailing list
Tagdb at lists.tagschema.com
http://lists.tagschema.com/mailman/listinfo/tagdb
More information about the Tagdb
mailing list