[Tagdb] Building tagclusters
Nitin Borwankar
nitin at borwankar.com
Mon Jul 18 03:06:37 GMT 2005
Philipp Keller wrote:
>>Now there must be an algorithm to distinguish those "category"-tags and
>>"adjective" (or action, you name it..)-tags.
>>And I think taken into account the distribution, as you mentioned it,
>>should do the work. Actually a distibution of a category-like-tag (such
>>as linux/mac or webdesign for example) should have a "steep derivation"
>>and the distribution of the adjective-like tags should be quite "flat".
>>I gathered some data from delicious so I will run some tests to
>>categorize some tags and will speak up here again..
>>
>>
>I built an algorithm to distinguish "category"-type tags from
>"adjective"-type tags. I posted it on my blog [1]. Please tell me what I
>think because I think I'm onto something..
>
>
>
This is fascinating! I am not sure I understand this completely so a
couple of comments.
Comment 1. I am not sure I agree with your method of handling what you
call 'synonyms'.
Simply a large number of connections isn't enough to decide in my
opinion, especially when you have
synonyms and categories with similar weights in your table.
So let me suggest a possibly algorithmic approach - if there is an exact
(or close to exact) correlation between occurrence of tag A and tag B
then they are synonyms. Perhaps conditional probability might be useful.
If P(A|B) ~ 1 then A syn B where 'syn' is a like an equivalence relation.
So you'll have to take data from your Del feed and calculate
probabilities based on occurrences.
This doesn't catch the ''Del' == 'del.icio.us' real world synonym, only
strong correlations of occurrences of strings.
Comment 2. Perhaps I need to re-read your post a number of times but I
didn't get from the graph what the derivative is respect to. The X-axis
has 'tag's which are not an ordered set so either I am missing something
or the graph needs tweaking
Anyway, interesting stuff, as always. Keep it up.
Nitin Borwankar
>greets
>Philipp
>
>[1]
>http://www.pui.ch/phred/archives/2005/07/analyzing-tag-connections.html
>
>
>
>
>>But after removing these "adjective"-like tags, the mincut algorithm
>>should do the work to cluster delicious into different "categories", or
>>what do you think..?
>>IMO that would be a great benefit..
>>
>>greets
>>Philipp
>>
>>
>>
>>_______________________________________________
>>Tagdb mailing list
>>Tagdb at lists.tagschema.com
>>http://lists.tagschema.com/mailman/listinfo/tagdb
>>
>>
>>
>>
>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nitin.vcf
Type: text/x-vcard
Size: 150 bytes
Desc: not available
Url : http://nelson.textdrive.com/pipermail/tagdb/attachments/20050718/4b1fe4de/nitin.vcf
More information about the Tagdb
mailing list