[Tagdb] search keywords vs tags - automated tagging of docs

Gordon Mohr gojomo at bitzi.com
Thu Dec 21 00:39:50 GMT 2006


Auto-extracting notable excerpts from a document as 'tags' recalls the 
shortcuts taken back when full-text inverted-indexes were too expensive.

Of course, today's tech can handle such 'reduced' forms of documents (by 
auto-extraction or manual tag/labelling) in full-text search engines 
very easily.

Does this call into question whether any relational schema is needed for 
tag systems at all?

Just treat every tagging-event (human or automated) as a fielded text 
document and text-index. The degenerate schema, placing all tags in one 
internally-delimited column, doesn't deserve the bad rap it sometimes 
gets, if in fact full-text inverted-indexes are the usual way to query.

- Gordon @ Bitzi

Nitin Borwankar wrote:
> OK, Otis,
> 
> Glad you brought that up because I wanted to set up the discussion for 
> what I call
> "intrinsic tags" vs "extrinsic tags"
> 
> Intrinsic tags are like Amazon's SIP's - they are intrinsic 
> characteristics of the content
> Intrinsic tags are always a) derived from text in the document b) devoid 
> of interpretation or implied meaning
> 
> Extrinsic tags i.e. folksonomy tags, are a human description or 
> interpretation - so they have many layers of meaning
> 
> * tags I apply to a document could be "workflow tags" i.e "save this for 
> later" or "send to Joe"
> * or they could be "descriptors that have global meaning"  - "adult 
> content"
> * or they could be "descriptors that have group meaning" - "project X 
> needs this"
> * or they could be "descriptors that have private meaning" - "summer 
> holiday"
> 
> 
> If you wanted to bootstrap a large corpus of text into a folksonomy 
> context, automated tagging would get you in the game and at least allow 
> rapid navigation of the whole document space albeit *in a very crude way*.
> But if  you wait for the whole doc space to be manually tagged it could 
> take a long time or never happen.
> So the question is would this be a viable way to bootstrap large text 
> corpuses into a folksonomy context, i.e. make them usable enough that I 
> can now find *roughly* what I am looking for and then apply my own tags 
> to it.
> 
> At  all time it would be useful to distinguish between system-generated 
> ( intrinsic ) tags and user-generated (extrinsic ) tags and allow 
> independent navigation over the separate tagging spaces as well as allow 
> navigation over the combined space.
> 
> Thoughts ?
> 


More information about the Tagdb mailing list