[Tagdb] Multi-Word Tags Vs Single Word Tags

Colin Viebrock cviebrock at tucows.com
Fri Apr 7 14:29:27 GMT 2006


I understand your point, but in English the phrase is "hot dog".

Making users enter "hot_dog" or "hotdog" isn't user-centric.  Your 
forcing them to do something counter-intuitive.

What about allowing them to enter all their tags, separated by commas.  
So, if on my recent trip to London I found a great hot dog restaurant, 
I'd tag it:

	hot dog, trip to london, excellent restaurant

You could then break that into the phrases:

	hot dog
	trip to london
	excellent restaurant

This is how the user wants to describe their document, so let them.  
But lets try and add some "better" tagging to it.  This is the tricky 
part.  :)

Look for any "stop words" in those phrases, and remove them.  If the 
stop word is in the middle of a phrase, break that phrase into it's 
parts.  So the "to" in "trip to london" is removed, and the phrases 
left are:

	hot dog
	trip
	london
	excellent restaurant

You could then break the phrases on whitespace, except if you are 
breaking a compound word.  This would require a lookup list of known 
compound words that shouldn't be broken ... and therefore might be very 
difficult to actually do.

Assuming you could, "hot dog" would stay, but "excellent restaurant" 
would be split, leaving you:

	hot dog
	trip
	london
	excellent
	restaurant

Personally, I'd wonder about the usefullness of "excellent" as a tag 
when it's out of context.  But this was just a mind exercise for me 
anyway.  :)

- Colin


On 6-Apr-06, at 11:29 AM, anand wrote:

>>> Yeah, but what about someone who wants to tag a document with "hot 
>>> dog"?
> This is exactly what I meant when I wrote the following:
> "I would use 'java_rmi' only incase the two words are inseparable and 
> have
> no mean indepently."
>
> 'hot dog' is an entirely different entity which is formed by combining 
> hot
> and dog. Therefore user would be inclined to use it as a single word 
> of the
> form 'hotdog' or 'hot_dog'. But 'london' 'trip' is formed by two 
> different
> words having the same semantics when used in multi words or even as a 
> single
> word.
>
>
> On 4/6/06, Colin Viebrock <colin at tucows.com> wrote:
>>
>>> For instance if I have to
>>> tag a document with tags java and rmi, I would rarely go ahead and 
>>> tag
>>> it as
>>> 'java_rmi' but rather I would tag it as 'java' and 'rmi'. I would use
>>> 'java_rmi' only incase the two words are inseparable and have no mean
>>> indepently.
>>>
>>> Now with multi-word tags users can use entire phrases like 'a trip to
>>> london' to tag items which they would have tagged as 'london' 'trip'
>>> incase
>>> of single word tags. This makes the tagged item difficult to 
>>> discover,
>>> making the tag and thus the user to tagged item relationship 
>>> non-social
>>> (quite opposite to what a tag is supposed to do).
>>
>> Yeah, but what about someone who wants to tag a document with "hot 
>> dog"?

- Colin

>>
>
>
> --
> - Andie



More information about the Tagdb mailing list