[Tagdb] Tags and data storage

ogjunk-tagdb at yahoo.com ogjunk-tagdb at yahoo.com
Wed Mar 22 22:02:12 GMT 2006


Nitin,

Inline answers...

----- Original Message ----

For the searching part I understand but is it possible to do things like 
the following with Lucene ?

* show me all the tags used by userid x
* show me the userids for people who have tagged item y with tag z

OG: +yes +yes  ... or.... yes AND yes
OG: stupid joke.

And how do I update the data as the tagging of items by users continues ?

OG: Lucene is a java library with an API, so you use that API to update the Lucene index.

Or do I have to maintain two copies of the data - one in the db and one 
in structured text files indexed by Lucene ?

OG: How/where you store the data is up to you.  You could store everything in Lucene, there are no relations in Lucene, as there are in RDBMS.

Or am I essentially living off a database based on text files ?

OG: text files, no.  Inverted indices, yes.

My knowledge of Lucene is limited to text search so pardon the "stupid" 
questions.

OG: I hear the dudes wrote a book about Lucene and provided free code. ;)

Otis


ogjunk-tagdb at yahoo.com wrote:

>Again, what Philipp said.  Except for the MySQL full-text search piece.  Don't go there, unless you LOVE large database files.
>
>Otis
>
>----- Original Message ----
>From: Philipp Keller <phred at citrin.ch>
>To: Joshua Lippiner <jlippiner at yahoo.com>
>Cc: tagdb at lists.tagschema.com
>Sent: Wednesday, March 22, 2006 12:08:09 PM
>Subject: Re: [Tagdb] Tags and data storage
>
>
>  
>
>>Does anyone have any recommendations/thoughts on tag storage?  Are you
>>better off storing an entire list of tags associated with one item
>>into a single field and then dealing with searching issues later or,
>>in the end, do the benefits outweight the size issue by storing each
>>tag as a new dB row?
>>    
>>
>I once wrote an article, which shows different solutions to the problem
>[1], and I also did performance tests [2]
>
>As Nitin noticed: You'll probably get problems with the denormalized
>version. I suppose you won't save much space if you go for the
>denormalized version.
>
>If you just have a small user base and database then I think you
>shouldn't do the 3nf solution because it's hard to deal with the tag
>orphans.. the MySQL fulltext variant looks good in my eyes.
>
>About the scalability issue Nitin noticed: If you have more than 1
>million tagged entries you have to switch from RDBMS to, say, Lucene
>anyway, no matter which way you organize tags in you DB
>
>greets
>Philipp
>
>[1] http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html
>[2]
>http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html
>
>
>
>_______________________________________________
>Tagdb mailing list
>Tagdb at lists.tagschema.com
>http://lists.tagschema.com/mailman/listinfo/tagdb
>
>
>
>_______________________________________________
>Tagdb mailing list
>Tagdb at lists.tagschema.com
>http://lists.tagschema.com/mailman/listinfo/tagdb
>  
>






More information about the Tagdb mailing list