[Tagdb] tag-based semantic routing for vertical search
Nitin Borwankar
nitin at borwankar.com
Mon Jan 22 00:40:56 GMT 2007
Late Dec 2006 Jimmy Wales of Wikipedia floated the idea of people
powered search funded by Wikia.
http://search.wikia.com/wiki/Search_Wikia
He said:
/"Search is part of the fundamental infrastructure of the Internet. And,
it is currently broken./
/Why is it broken? It is broken for the same reason that proprietary
software is always broken: lack of freedom, lack of community, lack of
accountability, lack of transparency. *Here, we will change all that.*/
Last week I was at Social Media Club seated at the same table as Jimmy
Wales and the subject was what we would see next year in social media
technology. I scribbled on the scratch paper some ideas that had
been floating around in my head last year and then wrote them up later.
I am floating them here for critique although the nutch and solr
communities are also good places to seek comment. I am starting here
because tags are involved and this is a smaller group without any
specific software platform as its focus so this is not off topic here.
Eventually this may, and I hope it will, end up on the search-l list
hosted by Wikia and formed around Jimmy's idea.
The basic idea proposed by Jimmy involves a feedback loop where users
rate the search results a la' Digg. This creates a framework for
participation for search consumers. But I submit that to truly create
an architecture of participation in search there needs to be a framework
for participation for search *providers*. This means there needs to be
a way for search engines to be plugged in to a global infrastructure
that routes queries to the relevant participating search engines and
collates results. In addition user feedback (thumbs up/down) will be
about quality of search results from a certain engine in addition to per
link thumbs up/down. The big issue is how to bootstrap this behemoth.
So here is a simple (simplistic?) idea. Let's assume that each search
engine would describe itself via a collection of tags (say the top 1000
keywords by term frequency in its index). Note that we are assuming (
for now ) these are vertical search engines so while 1000 keywords may
not be very a large number it is a reasonable first guess to do a
bootstrap for a *vertical* search engine.
So now assume that N vertical search engines have registered themselves
with a central entity that we call, for
placeholder reasons, a "semantic router" which routes keyword queries to
the right search engine. How does it do the routing?
For early alpha purposes assume that there is a simple keyword matching
between query and each search engines 1000 kwd vector. Where we have
non zero intersection we route the query to that engine.
The OpenSearch format is the glue that holds all this together. kwds
are submitted in the <tags> element of the OpenSearch Description
Document, query URL is also submitted in the OpenSearch Description
document.
Search results in OpenSearch RSS format are collated at the "semantic
router".
Note that for now there is no RDF or any of that stuff usually
associated with the word "semantic" we are just doing crude tag matching.
Aside from the usual worries (tag spam, gaming scores,....) what
fundamental structural issues do you see with this approach.?
[Tag spam can be countered by having search providers be identified
strongly via site certificates from a small set of known certificate
providers. The assumption is that there is a Wikipedia like trusted
inner circle that collaboratively filters a lot of damage such as score
gaming.]
What I am looking for is whether there is something fundamentally broken
about the idea of having queries routed to a co-operating collection of
search engines via tag matching ? The other issues to be considered out
of scope for now.
Any takers for blowing holes in this idea ?
--
Nitin Borwankar
Find, Learn, Act ....
Greener, the search engine for the planet
http://greener.com
nitin at borwankar.com
510-872-7066
More information about the Tagdb
mailing list