specify a minimum number of the optional BooleanClauses which must be
satisfied.
I haven't seen the implementation of setMinimumNumberShouldMatch but
it seems a bit different than what is intended with the WAND operator,
which can take any real number as threshold θ
As stated in the paper:
WAND(X1,w1, . . . Xk,wk, θ) is true iff X 1≤i≤k and SUM(xiwi)≥ θ
where xi is the indicator variable for Xi, that is xi = 1, if Xi is
true 0, otherwise.
Observe that WAND can be used to implement AND
and OR via
AND(X1,X2, . . .Xk) ≡ WAND(X1, 1,X2, 1, . . . Xk, 1, k),
and
OR(X1,X2, . ..Xk) ≡ WAND(X1, 1,X2, 1, . ..Xk, 1, 1).
What I find interesting is the idea of using a first pass using the
upper bound (maximal) contribution of a term on any document score and
the dynamic setting of the threshold θ to skip or to fully evaluate a
document..
As stated in the paper:
"Given this setup our preliminary scoring consists of evaluating
for each document d
WAND(X1,UB1,X2,UB2, . . .,Xk,UBk, θ),
where Xi is an indicator variable for the presence of query term i in
document d and the threshold θ is varied during
the algorithm as explained below. If WAND evaluates to true, then the
document d undergoes a full evaluation.
The threshold θ is set dynamically by the algorithm based on the
minimum score m among the top n results found so
far, where n is the number of requested documents. The larger the
threshold, the more documents will be skipped
and thus we will need to compute full scores for fewer documents."
I think its worth a try...
-- Joaquin
On Mon, Nov 16, 2009 at 2:54 AM, Andrzej Bialecki wrote:
J. Delgado wrote:
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
J. Delgado wrote:
Here is the link to the paper.
http://cis.poly.edu/westlab/papers/cntdstrb/p426-broder.pdf
A more recent application of the use and extension of the WAND operator for
indexing of Boolean expressions:
http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf
-- Joaquin
On Sun, Nov 15, 2009 at 11:12 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:
The functionality of WAND (weak AND) is already implemented in Lucene, if I understand it correctly - this is the BooleanQuery.setMinShouldMatch(int). Lucene implements this probably differently from the algorithm described in the paper, so there may be still some benefits from comparing the algorithms in Lucene's BooleanScorer[2] with this one ...http://cis.poly.edu/westlab/papers/cntdstrb/p426-broder.pdf
A more recent application of the use and extension of the WAND operator for
indexing of Boolean expressions:
http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf
-- Joaquin
On Sun, Nov 15, 2009 at 11:12 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:
Hey Joaquin,
The mailing list strips off attachments. Can you please upload it somewhere
and give us the link?
On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <joaquin.delgado@gmail.com
The mailing list strips off attachments. Can you please upload it somewhere
and give us the link?
On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <joaquin.delgado@gmail.com
wrote:
Please find attached the paper on "Efficient Query Evaluation using a
Two-Level Retrieval Process". I believe that such approach may improve the
way Lucene/Solr evaluates queries today.
Please find attached the paper on "Efficient Query Evaluation using a
Two-Level Retrieval Process". I believe that such approach may improve the
way Lucene/Solr evaluates queries today.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
