[Tagdb] Single and multi-word tags / swarming and spreading

Timothy Spalding tspalding at maine.rr.com
Fri Apr 7 11:02:59 GMT 2006


So here's my two cents:

First, this is a usability and social question, not a technical one.  
You can do it either way as fast as you need, so start at the right  
end of the question.

By "social" and "usabiltiy" I mean that you need to consider both  
what is most obvious and what will produce the social effects you  
want. The social effects—I have in mind Michal's "swaming" and other  
effects—directly affects the quality, quality and diversity of the  
tag data, which ought to have a multiplier effect on your site's  
enjoyability or usefulness, so that's a third consideration. In my  
experience, these facts are closely but not perfectly aligned.

Let's start with usability. First, I recommend hitting yourself on  
the head or drinking whiskey until Michal's sentence no longer makes  
any sense: "it feels intuitively simpler to me to tag something  
"London" and "trip", and then search for the union of those tags."

Michal was not born with that intuition. He learned it. And now he's  
so deep in the knowledge that it seems like intuition. (It seems  
intuitive to me to spout Italian numerals when ordering coffee, but  
really Starbucks has been training me for years.) For good or ill,  
most web users have no such intuition. 99.9% would never speak of the  
"union" of two tags, this being some sort of trickle-down from set- 
theory talk. Far  fewer would have that intuition in terms like "find  
all pictures tagged both 'london' and 'trip.'" And of those, few  
would have any idea how to do it. Yes, most search engines allow all  
sorts of clever boolean logic (+london +tip -"pigeons shitting on me  
in trafalgar square.") No, nobody uses that logic.

The same thing goes for any solution that requires users to write  
words or phrases in special ways. Underscores? Periods? Hyphens?  
Camel caps? Alpaca caps? Lower case? Exclusion of any character that  
can be typed? Users see no earthly reason why a tag shouldn't allow  
anything.

The union of "London" and "trip" may seem intuitive to you. But  
London isn't Los Angeles, Santa Clara or Tiera del Fuego. Or take  
"spring semester. "London" and "trip" make sense on their own;  
"Spring semester" does not. The union of "spring" and "semester"? How  
about the union of "spring," "training" "red" and "sox"? Or shall we  
look for the union of "springTraining" and "red_sox"?  
Congratulations, you need an "about" page to tell people how to tag,  
and your users are all programmers.

As my proof, I offer that in seven months, 30k customers and 3  
million tags, I've never received a complaint about LibraryThing's  
multi-word tags. But when LibraryThing turned "children's literature"  
into "childrens literature"? When the system showed tags in lower  
case? People went BANANAS. "My stuff is now tagged 'childrens  
literature' and 'london'? People will think I'm an idiot!"

 From an input perspective then, you need to let people tag however  
they want. True, you do need some way to mark breaks between tags.  
I'd go with commas. Commas ARE intuitive, or at least taught from  
grade school—milk, eggs, blueberry muffins, organic asian pears. And  
it's easy to put a example tag list underneath the input box. As a  
fall-back, I'd give people a gentle warning if they enter a 50- 
character tag. Chances are their intuitions have been trained and  
they're using spaces as separators.

Once you've got the input side, you need to make the search side as  
easy as possible. If someone searches for "london," they should get  
both "london trip" and "london." They should get "London" and  
"LONDON" too. (On LibraryThing about 1% of users tag everything in  
capital letters. Their intuitions appear to have been formed by an  
Apple II.) On a global level, you should determine which is more  
common—London or london—and go with that.

Whether you force one-word tags or not, you will get semantically  
identical tags "spelled" differently—wwii, ww2, worldwartwo, world  
war II, etc. etc. LibraryThing violates the rules by allowing users  
to combine these tags on the global level (ie., no individual users'  
tags are changed). If your users are passionate like LibraryThing's I  
recommend this approach. In my experience, computing "similar tags,"  
nets only the most popular "synonyms." At 3 million tags, the system  
can't recognize that wwii and world war 2 are the same. I suspect it  
wouldn't at 30 million either.

Now for the social and statistical side. It seems to me that the  
effects you might want are:

1. "Swaming." Users seeing how others tag and tagging that way  
themselves.
2. "Spreading out." Users tagging how every they like and creating a  
sparser, more diverse web of meaning.

I can see arguments for both sides. "Swaming" looks good initially.  
When your service is just starting out, swarming can make it look  
used. In the long run, however, should should care more about the  
quality of your data than its density. You can just DO more with  
complex tags than you can with simple ones. (This is also why  
LibraryThing doesn't suggest others' tags; although less "usable" it  
also discourages herd thinking.) As a rule, users who use detailed  
tags, like "trip to london" will often add "london" as a separate tag  
anyway.

Lastly, I think that everything above could change if your  
application needed it to. If something about it meant that tagged  
items lived for a short time and had few taggers, encouraging  
"swaming" might trump data complexity, etc.

Tim Spalding
LibraryThing.com


More information about the Tagdb mailing list