Thanks for the clarification...
While studying more in depth the doc about Similarity, I discover something
that is troubling be a little. The idf is calculated using the following
formula:
(Log (numDocInIndex/ (numDocWithTerm_t +1)) +1
While I agree this is fine for most application, it is not quite in mine.
numDocWithTerm_t is really, numDocWith_t.text_in_field_t.field. That's fine
with me, the problem is the other guy numDocInIndex... I would like to use
numDocInIndex_having_t.field. The reason is, again, that I want the
similarity score to be really meaningful. I have 'classes' of document in
the same index :
Document1: MeaningA="something here",ContentA="searchable text 1"
Document2: MeaningB="something else",ContentB="searchable text 2"
...
I have an unequal number of "A" and "B" documents. The same query text will
be sent in contentA and contentB at the same time. Since there is more
document in class B than in class A, the "idf" should use a different
numDocInIndex value. Is there a good way to achieve that ?
Thanks for all your help,
Jp
-----Original Message-----
From: Doug Cutting
Sent: Wednesday, May 04, 2005 5:10 PM
To:
[email protected]Subject: Re: PerFieldSimilarity
Robichaud, Jean-Philippe wrote:
How cool, I did not knew that... that may help me... If I understand you
correctly, I can create a boolean query where each "clause" use a different
similarity ?
Yes. That would look something like:
BooleanQuery booleanQuery = new BooleanQuery();
TermQuery clause1 = new TermQuery("foo", "bar") {
public Similarity getSimilarity(Searcher searcher) {
return new DefaultSimilarity() {
public float idf(Term term) { return 1.0f; }
};
}
};
booleanQuery.add(clause1, true, false);
...
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]For additional commands, e-mail:
[email protected]---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]For additional commands, e-mail:
[email protected]