Anshum wrote:
Hey Andrzej,
Could you tell me as to what research suggests this and why is it this way?
My calculation says the average load on each server would go down as I would
know what server to query for an index term as opposed to querying all
servers for terms.
I'm looking for a solution wherein I could break up the index based any
criteria and know what index to query for any input (and not query indexes
that would lead to zero results).
* Ricardo Baeza-Yates, Carlos Castillo, Flavio Junqueira, Vassilis
Plachouras, Fabrizio Silvestri, 2007: Challenges on Distributed Web
Retrieval: "The disadvantage of term partitioning is having to build
initially the entire global index. This does not scale well, and it is
not useful in actual large scale Web search engines. There are, however,
some advantages of this approach in the query processing phase. Webber
et al. show that term partitioning results in lower utilization of
resources [49]. More specifically, it significantly reduces the number of
disk accesses and the volume of data exchanged. Document partitioning
however is still better in terms of throughput, because of an uneven
distribution of work load in term partitioning."
* Claudine Badue, Ricardo Baeza-Yates, 2001: Distributed Query
Processing Using Partitioned Inverted Files (note that their conclusion
that global partitioning is more efficient than local partitioning is
based on a crucial assumption of being able to distribute the load
efficiently. Other papers indicate that this is a very complex issue).
* Claudine Badue, Ramurti Barbosa, Paulo Golgher: Distributed Processing
of Conjunctive Queries. This paper evaluates the bottlenecks in an
engine with local index partitioning.
* Justin Zobel, Alistair Moffat, 2006: Inverted Files for Text Search
Engines
* Claudio Lucchese, Salvatore Orlando, Raffaele Perego, Fabrizio
Silvestri, 2006: Mining Query Logs to Optimize Index Partitioning in
Parallel Web Search Engines
* Ronny Lempel, Shlomo Moran, 2002: Optimizing Result Prefetching in Web
Search Engines with Segmented Indices
... and quite a few other papers that I don't remember now ... please do
a search for "distributed IR" on ACM or Citeseer.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org