thanks for your reply. I thought I've solved the issue according to Uwe, the
queries without coord function were reasonably comparable, but now you
actually reopened it.
So, I need to be sure I'm making them comparable and I would like to ask the
My BooleanQueries have similar structure. Important: they only contain
TermQueries. The fields are always 3 but the terms number can vary... this is
an example of BooleanQuery (sorry for the syntax):
If it is not clear how the BooleanQueries are, I can print some of them for
you. They have same number of fields but different number of terms.
1- Do you still think QueryNorm is not an issue ? Funny, because in the
documentation I can read:
QueryNorm(q) is a normalizing factor used to make scores between queries
comparable. This factor does not affect document ranking (since all ranked
documents are multiplied by the same factor), but rather just attempts to
make scores from different queries (or even different indexes) comparable.
It seems I can compare queries from the documentation.
But as you are always using the same type of query (TermQuery), the
QueryNorm should not change, so no issue at all. It differs if you have a
variable number of Boolean clauses, the Query norm could help you to make
the queries comparable. But if you only have always the same looking BQ with
exact same number of TQ in it (only different terms) its not an issue at
all. In all other cases, the query norm helps to compare e.g. a BQ with 5 TQ
clauses with another BQ that has 8 TQ clauses.
2- I don't think I'm using queryBoosts, are they enabled by default in the
Query boost are only active if you do TermQuery.setBoost(anything != 1.0f).
3- FieldNorm is not mentioned in Similarity class. How can I disable it ?
SHould I disable it ? Is it a issue ?
FieldNorm should not be a problem, as it's an indexed feature. So the same
document has always the same FieldNorm (which is a combination of length
norm, indexing document boost). If two queries hit the same document the
scores for this document should be comparable, as the FieldNorm is the same
for both cases.
See point 6) in the Similarity docs: norm(t,d)
4- If I'm not wrong Uwe told me I can compute comparable cosine
even with documents of different length. Tf and Idf are unbounded, and my
docs have different length. Can't I measure the similarity between query and
doc vectors anyway ?
The field norm normalizes that. So where is the problem?
5 - Again, I've been told I can compare queries and from documentation, I
can see that queryNorm factor normalizes all queries. But you are saying I
should manually normalize them somehow ? It is not clear
It only affects different querys (e.g. number of Boolean clauses differ,
type of queries differ).
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org