[Tagdb] Tags and data storage

Nitin Borwankar nitin at borwankar.com
Thu Mar 23 05:14:20 GMT 2006


ogjunk-tagdb at yahoo.com wrote:

>To add to what Erik said - there are Lucene ports for pretty much every major programming language today - python, perl, ruby, C++, C#, even PHP and LISP.  I hear a Bash port is next.
>  
>

Yes, but when you are in a multi-language environment - which is far 
more common these days - interoperability is more important than having 
a port to a particular language.
I have a client whose backend is written in Java, uses Lucene and 
MySQL.  There are also Perl, Python and PHP components to the system.
XML-HTTP does what's needed.

Nitin

>Otis
>
>
>----- Original Message ----
>From: Nitin Borwankar <nitin at borwankar.com>
>To: ogjunk-tagdb at yahoo.com
>Cc: tagdb at lists.tagschema.com
>Sent: Wednesday, March 22, 2006 7:29:28 PM
>Subject: Re: [Tagdb] Tags and data storage
>
>ogjunk-tagdb at yahoo.com wrote:
>
>  
>
>>Nitin,
>>
>>Inline answers...
>> 
>>
>>    
>>
>
>OK, thanks very much - this is very interesting - but as you might 
>expect leads to some more questions :-)
>
> a) How does a Lucene based  app  perform at the lower  ends  of the 
>scale - is there an overhead and a  threshold above which Lucene makes
>sense ?
>
>b) How do I hook up my web app to a Lucene-tag-backend when my web app 
>is not written in Java ? 
>
>c) Are there commonly used JSON/XML-RPC etc. wrappers around the backend 
>so I can call it from Python/PHP/Ruby ?
>
>Nitin
>
>
>  
>
>>----- Original Message ----
>>
>>For the searching part I understand but is it possible to do things like 
>>the following with Lucene ?
>>
>>* show me all the tags used by userid x
>>* show me the userids for people who have tagged item y with tag z
>>
>>OG: +yes +yes  ... or.... yes AND yes
>>OG: stupid joke.
>>
>>And how do I update the data as the tagging of items by users continues ?
>>
>>OG: Lucene is a java library with an API, so you use that API to update the Lucene index.
>>
>>Or do I have to maintain two copies of the data - one in the db and one 
>>in structured text files indexed by Lucene ?
>>
>>OG: How/where you store the data is up to you.  You could store everything in Lucene, there are no relations in Lucene, as there are in RDBMS.
>>
>>Or am I essentially living off a database based on text files ?
>>
>>OG: text files, no.  Inverted indices, yes.
>>
>>My knowledge of Lucene is limited to text search so pardon the "stupid" 
>>questions.
>>
>>OG: I hear the dudes wrote a book about Lucene and provided free code. ;)
>>
>>Otis
>>
>>
>>ogjunk-tagdb at yahoo.com wrote:
>>
>> 
>>
>>    
>>
>>>Again, what Philipp said.  Except for the MySQL full-text search piece.  Don't go there, unless you LOVE large database files.
>>>
>>>Otis
>>>
>>>----- Original Message ----
>>>From: Philipp Keller <phred at citrin.ch>
>>>To: Joshua Lippiner <jlippiner at yahoo.com>
>>>Cc: tagdb at lists.tagschema.com
>>>Sent: Wednesday, March 22, 2006 12:08:09 PM
>>>Subject: Re: [Tagdb] Tags and data storage
>>>
>>>
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>Does anyone have any recommendations/thoughts on tag storage?  Are you
>>>>better off storing an entire list of tags associated with one item
>>>>into a single field and then dealing with searching issues later or,
>>>>in the end, do the benefits outweight the size issue by storing each
>>>>tag as a new dB row?
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>I once wrote an article, which shows different solutions to the problem
>>>[1], and I also did performance tests [2]
>>>
>>>As Nitin noticed: You'll probably get problems with the denormalized
>>>version. I suppose you won't save much space if you go for the
>>>denormalized version.
>>>
>>>If you just have a small user base and database then I think you
>>>shouldn't do the 3nf solution because it's hard to deal with the tag
>>>orphans.. the MySQL fulltext variant looks good in my eyes.
>>>
>>>About the scalability issue Nitin noticed: If you have more than 1
>>>million tagged entries you have to switch from RDBMS to, say, Lucene
>>>anyway, no matter which way you organize tags in you DB
>>>
>>>greets
>>>Philipp
>>>
>>>[1] http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html
>>>[2]
>>>http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html
>>>
>>>
>>>
>>>_______________________________________________
>>>Tagdb mailing list
>>>Tagdb at lists.tagschema.com
>>>http://lists.tagschema.com/mailman/listinfo/tagdb
>>>
>>>
>>>
>>>_______________________________________________
>>>Tagdb mailing list
>>>Tagdb at lists.tagschema.com
>>>http://lists.tagschema.com/mailman/listinfo/tagdb
>>>
>>>
>>>   
>>>
>>>      
>>>
>>
>>
>>_______________________________________________
>>Tagdb mailing list
>>Tagdb at lists.tagschema.com
>>http://lists.tagschema.com/mailman/listinfo/tagdb
>> 
>>
>>    
>>
>
>
>
>
>  
>



More information about the Tagdb mailing list