[Tagdb] tag-based semantic routing for vertical search

Nitin Borwankar nitin at borwankar.com
Mon Jan 22 00:40:56 GMT 2007


Late Dec 2006 Jimmy Wales of Wikipedia floated the idea of people 
powered search funded by Wikia.
http://search.wikia.com/wiki/Search_Wikia
He said:

/"Search is part of the fundamental infrastructure of the Internet. And, 
it is currently broken./

/Why is it broken? It is broken for the same reason that proprietary 
software is always broken: lack of freedom, lack of community, lack of 
accountability, lack of transparency. *Here, we will change all that.*/

Last week I was at Social Media Club seated at the same table as Jimmy 
Wales and the subject was what we would see next year in social media 
technology.    I  scribbled on the scratch paper some ideas that had 
been floating around in my head last year and then wrote them up later. 

I am floating them here for critique although the nutch and solr 
communities are also good places to seek comment.  I am starting here 
because tags are involved and this is a smaller group without any 
specific software platform as its focus so this is not off topic here.  
Eventually this may, and I hope it will, end up on the search-l list 
hosted by Wikia and formed around Jimmy's idea.

The basic idea proposed by Jimmy involves a feedback loop where users 
rate the search results a la' Digg.  This creates a framework for 
participation for search consumers.  But I submit that to truly create 
an architecture of participation in search there needs to be a framework 
for participation for search *providers*.  This means there needs to be 
a way for search engines to be plugged in to a global infrastructure 
that routes queries to the relevant participating search engines and 
collates results.  In addition user feedback (thumbs up/down) will be 
about quality of search results from a certain engine in addition to per 
link thumbs up/down.  The big issue is how to bootstrap this behemoth.

So here is a simple (simplistic?)  idea.  Let's assume that each search 
engine would describe itself via a collection of  tags (say the top 1000 
keywords by term frequency in its index).  Note that we are assuming ( 
for now ) these are vertical search engines so while 1000 keywords may 
not be very a large number it is a reasonable first guess to do a 
bootstrap for a *vertical* search engine.

So now assume that N vertical search engines have registered themselves 
with a central entity that we call, for
placeholder reasons, a "semantic router" which routes keyword queries to 
the right search engine.  How does it do the routing? 

For early alpha purposes assume that there is a simple keyword matching 
between query and each search engines 1000 kwd vector.  Where we have 
non zero intersection we route the query to that engine.

The  OpenSearch format is the glue that holds all this together.  kwds 
are submitted in the <tags> element of the OpenSearch Description 
Document,  query URL is also submitted in the OpenSearch Description 
document.
Search results in OpenSearch RSS format are collated at the "semantic 
router".

Note that for now there is no RDF or any of that stuff usually 
associated with the word "semantic" we are just doing crude tag matching. 

Aside from the usual worries (tag spam, gaming scores,....) what 
fundamental structural issues do you see with this approach.?

[Tag spam can be countered by having search providers be identified 
strongly via site certificates from a small set of known certificate 
providers.  The assumption is that there is a Wikipedia like trusted 
inner circle that collaboratively filters a lot of damage such as score 
gaming.]

What I am looking for is whether there is something fundamentally broken 
about the idea of having queries routed to a co-operating collection of 
search engines via tag matching ?  The other issues to be considered out 
of scope for now.

Any takers for blowing holes in this idea ?

-- 
Nitin Borwankar
Find, Learn, Act .... 
Greener, the search engine for the planet
http://greener.com
nitin at borwankar.com
510-872-7066



More information about the Tagdb mailing list