[Tagdb] search keywords vs tags - automated tagging of docs
Michal Migurski
mike at teczno.com
Thu Dec 21 00:58:42 GMT 2006
(sending this again, because I think I had the wrong "From:" address
the first time)
An example from a recent client:
"The first patent for a bicycle probably didn't have the word
'bicycle' in it."
Intentionality and hindsight are missing from the inverted index.
-mike.
On Dec 20, 2006, at 10:10 AM, Nitin Borwankar wrote:
> Increasingly I have been getting interested in the vertical search
> space
> and have been looking at nutch
> www.nutch.org built on top of Lucene the java text indexing/searching
> library.
>
> A question arises in my mind when I look at tokenization and inverted
> indexes etc... which are the bread and butter of IR and text
> search.....
>
> What is the fundamental difference between a set of search keywords as
> typed into a search bar vs a set of tags by which I search for
> something
> on del.icio.us ?
> It seems to me that if one wore to throw out the obvious stop words
> etc., then the set of keywords ( tokens ) that say Lucene generates
> for
> a document are a good first order set of (system generated) tags
> for the
> document.
>
> Any comments arguments one way or another ?
> This has major implications for automated tagging, so I am really
> curious as to why this won't work.
>
> Nitin
>
>
> --
> Nitin Borwankar
> Find, Learn, Act .... Greener
> http://greener.com
> nitin at borwankar.com
> 510-872-7066
>
> _______________________________________________
> Tagdb mailing list
> Tagdb at lists.tagschema.com
> http://lists.tagschema.com/mailman/listinfo/tagdb
>
----------------------------------------------------------------
michal migurski- contact info and pgp key:
sf/ca http://mike.teczno.com/contact.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tagschema.com/pipermail/tagdb/attachments/20061220/7bd4908a/attachment.htm
More information about the Tagdb
mailing list