[Tagdb] search keywords vs tags - automated tagging of docs
Gordon Mohr
gojomo at bitzi.com
Thu Dec 21 00:39:50 GMT 2006
Auto-extracting notable excerpts from a document as 'tags' recalls the
shortcuts taken back when full-text inverted-indexes were too expensive.
Of course, today's tech can handle such 'reduced' forms of documents (by
auto-extraction or manual tag/labelling) in full-text search engines
very easily.
Does this call into question whether any relational schema is needed for
tag systems at all?
Just treat every tagging-event (human or automated) as a fielded text
document and text-index. The degenerate schema, placing all tags in one
internally-delimited column, doesn't deserve the bad rap it sometimes
gets, if in fact full-text inverted-indexes are the usual way to query.
- Gordon @ Bitzi
Nitin Borwankar wrote:
> OK, Otis,
>
> Glad you brought that up because I wanted to set up the discussion for
> what I call
> "intrinsic tags" vs "extrinsic tags"
>
> Intrinsic tags are like Amazon's SIP's - they are intrinsic
> characteristics of the content
> Intrinsic tags are always a) derived from text in the document b) devoid
> of interpretation or implied meaning
>
> Extrinsic tags i.e. folksonomy tags, are a human description or
> interpretation - so they have many layers of meaning
>
> * tags I apply to a document could be "workflow tags" i.e "save this for
> later" or "send to Joe"
> * or they could be "descriptors that have global meaning" - "adult
> content"
> * or they could be "descriptors that have group meaning" - "project X
> needs this"
> * or they could be "descriptors that have private meaning" - "summer
> holiday"
>
>
> If you wanted to bootstrap a large corpus of text into a folksonomy
> context, automated tagging would get you in the game and at least allow
> rapid navigation of the whole document space albeit *in a very crude way*.
> But if you wait for the whole doc space to be manually tagged it could
> take a long time or never happen.
> So the question is would this be a viable way to bootstrap large text
> corpuses into a folksonomy context, i.e. make them usable enough that I
> can now find *roughly* what I am looking for and then apply my own tags
> to it.
>
> At all time it would be useful to distinguish between system-generated
> ( intrinsic ) tags and user-generated (extrinsic ) tags and allow
> independent navigation over the separate tagging spaces as well as allow
> navigation over the combined space.
>
> Thoughts ?
>
More information about the Tagdb
mailing list