[Tagdb] Tags and data storage

Philipp Keller phred at citrin.ch
Fri Mar 24 06:36:52 GMT 2006


> I don't think delicious ever switched from MySQL, at least pre-Y... 
> and I still saw mentions of "rebulding DB indices" during some down-time periods.  
I just read a quote from Josua at Carsten Summit: 
“tags doesn’t map to sql at all. so use partial indexing.” [1]
That sounds that at least a part of the data or querying is done in a
non-RDBMS system.

> Schema, yeah, I'm sure that went through revisions, plus LOTS of caching 
> (ever noticed numbers are often off on delicious?).
Yeah.. they surely have a problem with that. Numbers are off a great
amount of time. Funny was, one time even their "most recently added
bookmarks" rss feed was about 1 or 2 days in the past :-)


greets
Philipp
[1] http://www.redmonk.com/jgovernor/archives/001262.html

> 
> Otis
> 
> ----- Original Message ----
> From: Philipp Keller <phred at citrin.ch>
> To: Joshua Lippiner <jlippiner at yahoo.com>
> Cc: ogjunk-tagdb at yahoo.com; tagdb at lists.tagschema.com
> Sent: Thursday, March 23, 2006 2:59:35 AM
> Subject: RE: [Tagdb] Tags and data storage
> 
> > For someone starting out with a tagging app and unsure of how it will take
> > off, does it make sense to start with Lucene if they have no working
> > knowledge of how to use it or does it make more sense to start with RDBMS
> > and then move to Lucene as it grows?
> Yeah, that's the big question. You never know how fast it grows, do you?
> In the case of delicious: I'm almost certain they started with MySQL and
> then had to switch to a non-RDBMS system. They had to stop feature
> rollout for about one year so, yeah, they should have started with a
> system like Lucene.
> 
> To Otis and Erik: Can one of you write an article about "how do I build
> a tag app using lucene"? I thought about investigating into lucene and
> write an article myself but it'd be easier if you would do that job with
> all your knowledge.. :-)
> 
> greets
> Philipp
> 
> > 
> >  
> > 
> > -----Original Message-----
> > From: tagdb-bounces at lists.tagschema.com
> > [mailto:tagdb-bounces at lists.tagschema.com] On Behalf Of
> > ogjunk-tagdb at yahoo.com
> > Sent: Wednesday, March 22, 2006 2:02 PM
> > To: tagdb at lists.tagschema.com
> > Subject: Re: [Tagdb] Tags and data storage
> > 
> > Nitin,
> > 
> > Inline answers...
> > 
> > ----- Original Message ----
> > 
> > For the searching part I understand but is it possible to do things like the
> > following with Lucene ?
> > 
> > * show me all the tags used by userid x
> > * show me the userids for people who have tagged item y with tag z
> > 
> > OG: +yes +yes  ... or.... yes AND yes
> > OG: stupid joke.
> > 
> > And how do I update the data as the tagging of items by users continues ?
> > 
> > OG: Lucene is a java library with an API, so you use that API to update the
> > Lucene index.
> > 
> > Or do I have to maintain two copies of the data - one in the db and one in
> > structured text files indexed by Lucene ?
> > 
> > OG: How/where you store the data is up to you.  You could store everything
> > in Lucene, there are no relations in Lucene, as there are in RDBMS.
> > 
> > Or am I essentially living off a database based on text files ?
> > 
> > OG: text files, no.  Inverted indices, yes.
> > 
> > My knowledge of Lucene is limited to text search so pardon the "stupid" 
> > questions.
> > 
> > OG: I hear the dudes wrote a book about Lucene and provided free code. ;)
> > 
> > Otis
> > 
> > 
> > ogjunk-tagdb at yahoo.com wrote:
> > 
> > >Again, what Philipp said.  Except for the MySQL full-text search piece.
> > Don't go there, unless you LOVE large database files.
> > >
> > >Otis
> > >
> > >----- Original Message ----
> > >From: Philipp Keller <phred at citrin.ch>
> > >To: Joshua Lippiner <jlippiner at yahoo.com>
> > >Cc: tagdb at lists.tagschema.com
> > >Sent: Wednesday, March 22, 2006 12:08:09 PM
> > >Subject: Re: [Tagdb] Tags and data storage
> > >
> > >
> > >  
> > >
> > >>Does anyone have any recommendations/thoughts on tag storage?  Are you 
> > >>better off storing an entire list of tags associated with one item 
> > >>into a single field and then dealing with searching issues later or, 
> > >>in the end, do the benefits outweight the size issue by storing each 
> > >>tag as a new dB row?
> > >>    
> > >>
> > >I once wrote an article, which shows different solutions to the problem 
> > >[1], and I also did performance tests [2]
> > >
> > >As Nitin noticed: You'll probably get problems with the denormalized 
> > >version. I suppose you won't save much space if you go for the 
> > >denormalized version.
> > >
> > >If you just have a small user base and database then I think you 
> > >shouldn't do the 3nf solution because it's hard to deal with the tag 
> > >orphans.. the MySQL fulltext variant looks good in my eyes.
> > >
> > >About the scalability issue Nitin noticed: If you have more than 1 
> > >million tagged entries you have to switch from RDBMS to, say, Lucene 
> > >anyway, no matter which way you organize tags in you DB
> > >
> > >greets
> > >Philipp
> > >
> > >[1] http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html
> > >[2]
> > >http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.h
> > >tml
> > >
> > >
> > >
> > >_______________________________________________
> > >Tagdb mailing list
> > >Tagdb at lists.tagschema.com
> > >http://lists.tagschema.com/mailman/listinfo/tagdb
> > >
> > >
> > >
> > >_______________________________________________
> > >Tagdb mailing list
> > >Tagdb at lists.tagschema.com
> > >http://lists.tagschema.com/mailman/listinfo/tagdb
> > >  
> > >
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Tagdb mailing list
> > Tagdb at lists.tagschema.com
> > http://lists.tagschema.com/mailman/listinfo/tagdb
> > 
> > _______________________________________________
> > Tagdb mailing list
> > Tagdb at lists.tagschema.com
> > http://lists.tagschema.com/mailman/listinfo/tagdb
> > 
> > 
> 
> 
> 
> 
> _______________________________________________
> Tagdb mailing list
> Tagdb at lists.tagschema.com
> http://lists.tagschema.com/mailman/listinfo/tagdb
> 
> 



More information about the Tagdb mailing list