[Tagdb] Tags and data storage

Nitin Borwankar nitin at borwankar.com
Fri Mar 24 18:17:25 GMT 2006


Philipp Keller wrote:

>>I don't think delicious ever switched from MySQL, at least pre-Y... 
>>and I still saw mentions of "rebulding DB indices" during some down-time periods.  
>>    
>>
>I just read a quote from Josua at Carsten Summit: 
>“tags doesn’t map to sql at all. so use partial indexing.” [1]
>  
>
Not sure what is meant by "tags don't map to sql"
A tag is an attribute so that is pretty well handled by SQL.
If what is meant is that the tag calculus is not supported efficiently 
by SQL *built in* predicates and functions
then I might agree.

Google's < a 
href="http://labs.google.com/papers/mapreduce-osdi04.pdf">map reduce 
</a>  may be useful here

Also what exactly is "partial indexing".

Nitin

>That sounds that at least a part of the data or querying is done in a
>non-RDBMS system.
>
>  
>
>>Schema, yeah, I'm sure that went through revisions, plus LOTS of caching 
>>(ever noticed numbers are often off on delicious?).
>>    
>>
>Yeah.. they surely have a problem with that. Numbers are off a great
>amount of time. Funny was, one time even their "most recently added
>bookmarks" rss feed was about 1 or 2 days in the past :-)
>
>
>greets
>Philipp
>[1] http://www.redmonk.com/jgovernor/archives/001262.html
>
>  
>
>>Otis
>>
>>----- Original Message ----
>>From: Philipp Keller <phred at citrin.ch>
>>To: Joshua Lippiner <jlippiner at yahoo.com>
>>Cc: ogjunk-tagdb at yahoo.com; tagdb at lists.tagschema.com
>>Sent: Thursday, March 23, 2006 2:59:35 AM
>>Subject: RE: [Tagdb] Tags and data storage
>>
>>    
>>
>>>For someone starting out with a tagging app and unsure of how it will take
>>>off, does it make sense to start with Lucene if they have no working
>>>knowledge of how to use it or does it make more sense to start with RDBMS
>>>and then move to Lucene as it grows?
>>>      
>>>
>>Yeah, that's the big question. You never know how fast it grows, do you?
>>In the case of delicious: I'm almost certain they started with MySQL and
>>then had to switch to a non-RDBMS system. They had to stop feature
>>rollout for about one year so, yeah, they should have started with a
>>system like Lucene.
>>
>>To Otis and Erik: Can one of you write an article about "how do I build
>>a tag app using lucene"? I thought about investigating into lucene and
>>write an article myself but it'd be easier if you would do that job with
>>all your knowledge.. :-)
>>
>>greets
>>Philipp
>>
>>    
>>
>>> 
>>>
>>>-----Original Message-----
>>>From: tagdb-bounces at lists.tagschema.com
>>>[mailto:tagdb-bounces at lists.tagschema.com] On Behalf Of
>>>ogjunk-tagdb at yahoo.com
>>>Sent: Wednesday, March 22, 2006 2:02 PM
>>>To: tagdb at lists.tagschema.com
>>>Subject: Re: [Tagdb] Tags and data storage
>>>
>>>Nitin,
>>>
>>>Inline answers...
>>>
>>>----- Original Message ----
>>>
>>>For the searching part I understand but is it possible to do things like the
>>>following with Lucene ?
>>>
>>>* show me all the tags used by userid x
>>>* show me the userids for people who have tagged item y with tag z
>>>
>>>OG: +yes +yes  ... or.... yes AND yes
>>>OG: stupid joke.
>>>
>>>And how do I update the data as the tagging of items by users continues ?
>>>
>>>OG: Lucene is a java library with an API, so you use that API to update the
>>>Lucene index.
>>>
>>>Or do I have to maintain two copies of the data - one in the db and one in
>>>structured text files indexed by Lucene ?
>>>
>>>OG: How/where you store the data is up to you.  You could store everything
>>>in Lucene, there are no relations in Lucene, as there are in RDBMS.
>>>
>>>Or am I essentially living off a database based on text files ?
>>>
>>>OG: text files, no.  Inverted indices, yes.
>>>
>>>My knowledge of Lucene is limited to text search so pardon the "stupid" 
>>>questions.
>>>
>>>OG: I hear the dudes wrote a book about Lucene and provided free code. ;)
>>>
>>>Otis
>>>
>>>
>>>ogjunk-tagdb at yahoo.com wrote:
>>>
>>>      
>>>
>>>>Again, what Philipp said.  Except for the MySQL full-text search piece.
>>>>        
>>>>
>>>Don't go there, unless you LOVE large database files.
>>>      
>>>
>>>>Otis
>>>>
>>>>----- Original Message ----
>>>>From: Philipp Keller <phred at citrin.ch>
>>>>To: Joshua Lippiner <jlippiner at yahoo.com>
>>>>Cc: tagdb at lists.tagschema.com
>>>>Sent: Wednesday, March 22, 2006 12:08:09 PM
>>>>Subject: Re: [Tagdb] Tags and data storage
>>>>
>>>>
>>>> 
>>>>
>>>>        
>>>>
>>>>>Does anyone have any recommendations/thoughts on tag storage?  Are you 
>>>>>better off storing an entire list of tags associated with one item 
>>>>>into a single field and then dealing with searching issues later or, 
>>>>>in the end, do the benefits outweight the size issue by storing each 
>>>>>tag as a new dB row?
>>>>>   
>>>>>
>>>>>          
>>>>>
>>>>I once wrote an article, which shows different solutions to the problem 
>>>>[1], and I also did performance tests [2]
>>>>
>>>>As Nitin noticed: You'll probably get problems with the denormalized 
>>>>version. I suppose you won't save much space if you go for the 
>>>>denormalized version.
>>>>
>>>>If you just have a small user base and database then I think you 
>>>>shouldn't do the 3nf solution because it's hard to deal with the tag 
>>>>orphans.. the MySQL fulltext variant looks good in my eyes.
>>>>
>>>>About the scalability issue Nitin noticed: If you have more than 1 
>>>>million tagged entries you have to switch from RDBMS to, say, Lucene 
>>>>anyway, no matter which way you organize tags in you DB
>>>>
>>>>greets
>>>>Philipp
>>>>
>>>>[1] http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html
>>>>[2]
>>>>http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.h
>>>>tml
>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>Tagdb mailing list
>>>>Tagdb at lists.tagschema.com
>>>>http://lists.tagschema.com/mailman/listinfo/tagdb
>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>Tagdb mailing list
>>>>Tagdb at lists.tagschema.com
>>>>http://lists.tagschema.com/mailman/listinfo/tagdb
>>>> 
>>>>
>>>>        
>>>>
>>>
>>>
>>>_______________________________________________
>>>Tagdb mailing list
>>>Tagdb at lists.tagschema.com
>>>http://lists.tagschema.com/mailman/listinfo/tagdb
>>>
>>>_______________________________________________
>>>Tagdb mailing list
>>>Tagdb at lists.tagschema.com
>>>http://lists.tagschema.com/mailman/listinfo/tagdb
>>>
>>>
>>>      
>>>
>>
>>
>>_______________________________________________
>>Tagdb mailing list
>>Tagdb at lists.tagschema.com
>>http://lists.tagschema.com/mailman/listinfo/tagdb
>>
>>
>>    
>>
>
>_______________________________________________
>Tagdb mailing list
>Tagdb at lists.tagschema.com
>http://lists.tagschema.com/mailman/listinfo/tagdb
>  
>



More information about the Tagdb mailing list