[Tagdb] Single and multi-word tags / swarming and spreading
Timothy Spalding
tspalding at maine.rr.com
Fri Apr 7 11:02:59 GMT 2006
So here's my two cents:
First, this is a usability and social question, not a technical one.
You can do it either way as fast as you need, so start at the right
end of the question.
By "social" and "usabiltiy" I mean that you need to consider both
what is most obvious and what will produce the social effects you
want. The social effects—I have in mind Michal's "swaming" and other
effects—directly affects the quality, quality and diversity of the
tag data, which ought to have a multiplier effect on your site's
enjoyability or usefulness, so that's a third consideration. In my
experience, these facts are closely but not perfectly aligned.
Let's start with usability. First, I recommend hitting yourself on
the head or drinking whiskey until Michal's sentence no longer makes
any sense: "it feels intuitively simpler to me to tag something
"London" and "trip", and then search for the union of those tags."
Michal was not born with that intuition. He learned it. And now he's
so deep in the knowledge that it seems like intuition. (It seems
intuitive to me to spout Italian numerals when ordering coffee, but
really Starbucks has been training me for years.) For good or ill,
most web users have no such intuition. 99.9% would never speak of the
"union" of two tags, this being some sort of trickle-down from set-
theory talk. Far fewer would have that intuition in terms like "find
all pictures tagged both 'london' and 'trip.'" And of those, few
would have any idea how to do it. Yes, most search engines allow all
sorts of clever boolean logic (+london +tip -"pigeons shitting on me
in trafalgar square.") No, nobody uses that logic.
The same thing goes for any solution that requires users to write
words or phrases in special ways. Underscores? Periods? Hyphens?
Camel caps? Alpaca caps? Lower case? Exclusion of any character that
can be typed? Users see no earthly reason why a tag shouldn't allow
anything.
The union of "London" and "trip" may seem intuitive to you. But
London isn't Los Angeles, Santa Clara or Tiera del Fuego. Or take
"spring semester. "London" and "trip" make sense on their own;
"Spring semester" does not. The union of "spring" and "semester"? How
about the union of "spring," "training" "red" and "sox"? Or shall we
look for the union of "springTraining" and "red_sox"?
Congratulations, you need an "about" page to tell people how to tag,
and your users are all programmers.
As my proof, I offer that in seven months, 30k customers and 3
million tags, I've never received a complaint about LibraryThing's
multi-word tags. But when LibraryThing turned "children's literature"
into "childrens literature"? When the system showed tags in lower
case? People went BANANAS. "My stuff is now tagged 'childrens
literature' and 'london'? People will think I'm an idiot!"
From an input perspective then, you need to let people tag however
they want. True, you do need some way to mark breaks between tags.
I'd go with commas. Commas ARE intuitive, or at least taught from
grade school—milk, eggs, blueberry muffins, organic asian pears. And
it's easy to put a example tag list underneath the input box. As a
fall-back, I'd give people a gentle warning if they enter a 50-
character tag. Chances are their intuitions have been trained and
they're using spaces as separators.
Once you've got the input side, you need to make the search side as
easy as possible. If someone searches for "london," they should get
both "london trip" and "london." They should get "London" and
"LONDON" too. (On LibraryThing about 1% of users tag everything in
capital letters. Their intuitions appear to have been formed by an
Apple II.) On a global level, you should determine which is more
common—London or london—and go with that.
Whether you force one-word tags or not, you will get semantically
identical tags "spelled" differently—wwii, ww2, worldwartwo, world
war II, etc. etc. LibraryThing violates the rules by allowing users
to combine these tags on the global level (ie., no individual users'
tags are changed). If your users are passionate like LibraryThing's I
recommend this approach. In my experience, computing "similar tags,"
nets only the most popular "synonyms." At 3 million tags, the system
can't recognize that wwii and world war 2 are the same. I suspect it
wouldn't at 30 million either.
Now for the social and statistical side. It seems to me that the
effects you might want are:
1. "Swaming." Users seeing how others tag and tagging that way
themselves.
2. "Spreading out." Users tagging how every they like and creating a
sparser, more diverse web of meaning.
I can see arguments for both sides. "Swaming" looks good initially.
When your service is just starting out, swarming can make it look
used. In the long run, however, should should care more about the
quality of your data than its density. You can just DO more with
complex tags than you can with simple ones. (This is also why
LibraryThing doesn't suggest others' tags; although less "usable" it
also discourages herd thinking.) As a rule, users who use detailed
tags, like "trip to london" will often add "london" as a separate tag
anyway.
Lastly, I think that everything above could change if your
application needed it to. If something about it meant that tagged
items lived for a short time and had few taggers, encouraging
"swaming" might trump data complexity, etc.
Tim Spalding
LibraryThing.com
More information about the Tagdb
mailing list