[Tagdb] search keywords vs tags - automated tagging of docs

Nitin Borwankar nitin at borwankar.com
Wed Dec 20 18:10:02 GMT 2006


Increasingly I have been getting interested in the vertical search space 
and have been looking at nutch
www.nutch.org built on top of Lucene the java text indexing/searching 
library.

A question arises in my mind when I look at tokenization and inverted 
indexes etc... which are the bread and butter of IR and text search.....

What is the fundamental difference between a set of search keywords as 
typed into a search bar vs a set of tags by which I search for something 
on del.icio.us ?
It seems to me that if one wore to throw out the obvious stop words 
etc., then the set of keywords ( tokens ) that say Lucene generates for 
a document are a good first order set of (system generated) tags for the 
document.

Any comments arguments one way or another ?
This has major implications for automated tagging, so I am really 
curious as to why this won't work.

Nitin

 
-- 
Nitin Borwankar
Find, Learn, Act .... Greener
http://greener.com
nitin at borwankar.com
510-872-7066



More information about the Tagdb mailing list