[Tagdb] search keywords vs tags - automated tagging of docs

Michal Migurski mike at teczno.com
Thu Dec 21 00:58:42 GMT 2006


(sending this again, because I think I had the wrong "From:" address  
the first time)

An example from a recent client:
	"The first patent for a bicycle probably didn't have the word  
'bicycle' in it."

Intentionality and hindsight are missing from the inverted index.

-mike.

On Dec 20, 2006, at 10:10 AM, Nitin Borwankar wrote:

> Increasingly I have been getting interested in the vertical search  
> space
> and have been looking at nutch
> www.nutch.org built on top of Lucene the java text indexing/searching
> library.
>
> A question arises in my mind when I look at tokenization and inverted
> indexes etc... which are the bread and butter of IR and text  
> search.....
>
> What is the fundamental difference between a set of search keywords as
> typed into a search bar vs a set of tags by which I search for  
> something
> on del.icio.us ?
> It seems to me that if one wore to throw out the obvious stop words
> etc., then the set of keywords ( tokens ) that say Lucene generates  
> for
> a document are a good first order set of (system generated) tags  
> for the
> document.
>
> Any comments arguments one way or another ?
> This has major implications for automated tagging, so I am really
> curious as to why this won't work.
>
> Nitin
>
>
> -- 
> Nitin Borwankar
> Find, Learn, Act .... Greener
> http://greener.com
> nitin at borwankar.com
> 510-872-7066
>
> _______________________________________________
> Tagdb mailing list
> Tagdb at lists.tagschema.com
> http://lists.tagschema.com/mailman/listinfo/tagdb
>


----------------------------------------------------------------
michal migurski- contact info and pgp key:
sf/ca            http://mike.teczno.com/contact.html


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tagschema.com/pipermail/tagdb/attachments/20061220/7bd4908a/attachment.htm 


More information about the Tagdb mailing list