[Tagdb] search keywords vs tags - automated tagging of docs
Nitin Borwankar
nitin at borwankar.com
Wed Dec 20 18:10:02 GMT 2006
Increasingly I have been getting interested in the vertical search space
and have been looking at nutch
www.nutch.org built on top of Lucene the java text indexing/searching
library.
A question arises in my mind when I look at tokenization and inverted
indexes etc... which are the bread and butter of IR and text search.....
What is the fundamental difference between a set of search keywords as
typed into a search bar vs a set of tags by which I search for something
on del.icio.us ?
It seems to me that if one wore to throw out the obvious stop words
etc., then the set of keywords ( tokens ) that say Lucene generates for
a document are a good first order set of (system generated) tags for the
document.
Any comments arguments one way or another ?
This has major implications for automated tagging, so I am really
curious as to why this won't work.
Nitin
--
Nitin Borwankar
Find, Learn, Act .... Greener
http://greener.com
nitin at borwankar.com
510-872-7066
More information about the Tagdb
mailing list