[Tagdb] Building tagclusters

Nitin Borwankar nitin at borwankar.com
Mon Jul 18 03:06:37 GMT 2005


Philipp Keller wrote:

>>Now there must be an algorithm to distinguish those "category"-tags and
>>"adjective" (or action, you name it..)-tags.
>>And I think taken into account the distribution, as you mentioned it,
>>should do the work. Actually a distibution of a category-like-tag (such
>>as linux/mac or webdesign for example) should have a "steep derivation"
>>and the distribution of the adjective-like tags should be quite "flat".
>>I gathered some data from delicious so I will run some tests to
>>categorize some tags and will speak up here again..
>>    
>>
>I built an algorithm to distinguish "category"-type tags from
>"adjective"-type tags. I posted it on my blog [1]. Please tell me what I
>think because I think I'm onto something..
>
>  
>

This is fascinating!  I am not sure I understand this completely so a 
couple of comments.

Comment 1.  I am not sure I agree with your method of handling what you 
call 'synonyms'.
Simply a large number of connections isn't enough to decide in my 
opinion, especially when  you have
synonyms and categories with similar weights in your table.

So let me suggest a possibly algorithmic approach - if there is an exact 
(or close to exact) correlation between occurrence of tag A and tag B 
then they are synonyms.  Perhaps conditional probability might be useful.
If P(A|B) ~ 1 then A syn B  where 'syn' is a like an equivalence relation.

So you'll have to take data from your Del feed and calculate 
probabilities based on occurrences.
This doesn't catch the ''Del' == 'del.icio.us' real world synonym, only 
strong correlations of occurrences of strings.

Comment 2.  Perhaps I need to re-read your post a number of times but I 
didn't get from the graph what the derivative is respect to.  The X-axis 
has 'tag's which are not an ordered set so either I am missing something 
or the graph needs tweaking

Anyway,  interesting stuff, as always. Keep it up.

Nitin Borwankar



>greets
>Philipp
>
>[1]
>http://www.pui.ch/phred/archives/2005/07/analyzing-tag-connections.html
>
>
>  
>
>>But after removing these "adjective"-like tags, the mincut algorithm
>>should do the work to cluster delicious into different "categories", or
>>what do you think..?
>>IMO that would be a great benefit..
>>
>>greets
>>Philipp
>>
>>
>>
>>_______________________________________________
>>Tagdb mailing list
>>Tagdb at lists.tagschema.com
>>http://lists.tagschema.com/mailman/listinfo/tagdb
>>
>>
>>    
>>
>
>  
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: nitin.vcf
Type: text/x-vcard
Size: 150 bytes
Desc: not available
Url : http://nelson.textdrive.com/pipermail/tagdb/attachments/20050718/4b1fe4de/nitin.vcf


More information about the Tagdb mailing list