[Tagdb] Tags and data storage
Nitin Borwankar
nitin at borwankar.com
Thu Mar 23 17:15:01 GMT 2006
Philipp Keller wrote:
>>For someone starting out with a tagging app and unsure of how it will take
>>off, does it make sense to start with Lucene if they have no working
>>knowledge of how to use it or does it make more sense to start with RDBMS
>>and then move to Lucene as it grows?
>>
>>
>Yeah, that's the big question. You never know how fast it grows, do you?
>In the case of delicious: I'm almost certain they started with MySQL and
>then had to switch to a non-RDBMS system. They had to stop feature
>rollout for about one year so, yeah, they should have started with a
>system like Lucene.
>
>
I am not sure MySQL scaling was the problem that del had at that point.
They had ~ 30K users or so (based on hearsay) when that happened.
There were "schema issues" and not much detail was mentioned on the del
list.
I suspect the initial schema was the performance bottleneck and they had
to redesign the schema
and move data just when the site went into growth mode. I would be very
surprised if
MySQL *with a well designed schema* fell over with just ~30K users.
I think the growing no. of feeds put a strain on the web infrastructure
as well.
Just IMHO, with no inside info.
Nitin
>To Otis and Erik: Can one of you write an article about "how do I build
>a tag app using lucene"? I thought about investigating into lucene and
>write an article myself but it'd be easier if you would do that job with
>all your knowledge.. :-)
>
>greets
>Philipp
>
>
>
>>
>>
>>-----Original Message-----
>>From: tagdb-bounces at lists.tagschema.com
>>[mailto:tagdb-bounces at lists.tagschema.com] On Behalf Of
>>ogjunk-tagdb at yahoo.com
>>Sent: Wednesday, March 22, 2006 2:02 PM
>>To: tagdb at lists.tagschema.com
>>Subject: Re: [Tagdb] Tags and data storage
>>
>>Nitin,
>>
>>Inline answers...
>>
>>----- Original Message ----
>>
>>For the searching part I understand but is it possible to do things like the
>>following with Lucene ?
>>
>>* show me all the tags used by userid x
>>* show me the userids for people who have tagged item y with tag z
>>
>>OG: +yes +yes ... or.... yes AND yes
>>OG: stupid joke.
>>
>>And how do I update the data as the tagging of items by users continues ?
>>
>>OG: Lucene is a java library with an API, so you use that API to update the
>>Lucene index.
>>
>>Or do I have to maintain two copies of the data - one in the db and one in
>>structured text files indexed by Lucene ?
>>
>>OG: How/where you store the data is up to you. You could store everything
>>in Lucene, there are no relations in Lucene, as there are in RDBMS.
>>
>>Or am I essentially living off a database based on text files ?
>>
>>OG: text files, no. Inverted indices, yes.
>>
>>My knowledge of Lucene is limited to text search so pardon the "stupid"
>>questions.
>>
>>OG: I hear the dudes wrote a book about Lucene and provided free code. ;)
>>
>>Otis
>>
>>
>>ogjunk-tagdb at yahoo.com wrote:
>>
>>
>>
>>>Again, what Philipp said. Except for the MySQL full-text search piece.
>>>
>>>
>>Don't go there, unless you LOVE large database files.
>>
>>
>>>Otis
>>>
>>>----- Original Message ----
>>>From: Philipp Keller <phred at citrin.ch>
>>>To: Joshua Lippiner <jlippiner at yahoo.com>
>>>Cc: tagdb at lists.tagschema.com
>>>Sent: Wednesday, March 22, 2006 12:08:09 PM
>>>Subject: Re: [Tagdb] Tags and data storage
>>>
>>>
>>>
>>>
>>>
>>>
>>>>Does anyone have any recommendations/thoughts on tag storage? Are you
>>>>better off storing an entire list of tags associated with one item
>>>>into a single field and then dealing with searching issues later or,
>>>>in the end, do the benefits outweight the size issue by storing each
>>>>tag as a new dB row?
>>>>
>>>>
>>>>
>>>>
>>>I once wrote an article, which shows different solutions to the problem
>>>[1], and I also did performance tests [2]
>>>
>>>As Nitin noticed: You'll probably get problems with the denormalized
>>>version. I suppose you won't save much space if you go for the
>>>denormalized version.
>>>
>>>If you just have a small user base and database then I think you
>>>shouldn't do the 3nf solution because it's hard to deal with the tag
>>>orphans.. the MySQL fulltext variant looks good in my eyes.
>>>
>>>About the scalability issue Nitin noticed: If you have more than 1
>>>million tagged entries you have to switch from RDBMS to, say, Lucene
>>>anyway, no matter which way you organize tags in you DB
>>>
>>>greets
>>>Philipp
>>>
>>>[1] http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html
>>>[2]
>>>http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.h
>>>tml
>>>
>>>
>>>
>>>_______________________________________________
>>>Tagdb mailing list
>>>Tagdb at lists.tagschema.com
>>>http://lists.tagschema.com/mailman/listinfo/tagdb
>>>
>>>
>>>
>>>_______________________________________________
>>>Tagdb mailing list
>>>Tagdb at lists.tagschema.com
>>>http://lists.tagschema.com/mailman/listinfo/tagdb
>>>
>>>
>>>
>>>
>>
>>
>>_______________________________________________
>>Tagdb mailing list
>>Tagdb at lists.tagschema.com
>>http://lists.tagschema.com/mailman/listinfo/tagdb
>>
>>_______________________________________________
>>Tagdb mailing list
>>Tagdb at lists.tagschema.com
>>http://lists.tagschema.com/mailman/listinfo/tagdb
>>
>>
>>
>>
>
>_______________________________________________
>Tagdb mailing list
>Tagdb at lists.tagschema.com
>http://lists.tagschema.com/mailman/listinfo/tagdb
>
>
More information about the Tagdb
mailing list