FAQ

weightage of each word according to precedence in document

A Z
Feb 4, 2012 at 10:12 am
hi lan,

sorry for late reply ,

it is simple search with default similarity only,
here it gives same score for doc which has both token that is abcd pqrst,
there is no more weight for doc which has predence of abcd in document .

here is output with score and searcher.explain


Query content:abcd^10.0 content:pqrst^5.0

*title ->pqrst uvwx abcd ::: content -> pqrst uvwx abcd::: Score ->0.6175326
*

Searcher.explain -> 0.6175326 = (MATCH) sum of:

0.46281427 = (MATCH) weight(content:abcd^10.0 in 0), product of:

0.92562854 = queryWeight(content:abcd^10.0), product of:

10.0 = boost

1.0 = idf(docFreq=4, maxDocs=5)

0.092562854 = queryNorm

0.5 = (MATCH) fieldWeight(content:abcd in 0), product of:

1.0 = tf(termFreq(content:abcd)=1)

1.0 = idf(docFreq=4, maxDocs=5)

0.5 = fieldNorm(field=content, doc=0)

0.15471835 = (MATCH) weight(content:pqrst^5.0 in 0), product of:

0.37843326 = queryWeight(content:pqrst^5.0), product of:

5.0 = boost

0.81767845 = idf(docFreq=5, maxDocs=5)

0.092562854 = queryNorm

0.40883923 = (MATCH) fieldWeight(content:pqrst in 0), product of:

1.0 = tf(termFreq(content:pqrst)=1)

0.81767845 = idf(docFreq=5, maxDocs=5)

0.5 = fieldNorm(field=content, doc=0)

*title ->abcd pqrst uvwx ::: content -> abcd pqrst uvwx::: Score ->0.6175326
*

Searcher.explain -> 0.6175326 = (MATCH) sum of:

0.46281427 = (MATCH) weight(content:abcd^10.0 in 1), product of:

0.92562854 = queryWeight(content:abcd^10.0), product of:

10.0 = boost

1.0 = idf(docFreq=4, maxDocs=5)

0.092562854 = queryNorm

0.5 = (MATCH) fieldWeight(content:abcd in 1), product of:

1.0 = tf(termFreq(content:abcd)=1)

1.0 = idf(docFreq=4, maxDocs=5)

0.5 = fieldNorm(field=content, doc=1)

0.15471835 = (MATCH) weight(content:pqrst^5.0 in 1), product of:

0.37843326 = queryWeight(content:pqrst^5.0), product of:

5.0 = boost

0.81767845 = idf(docFreq=5, maxDocs=5)

0.092562854 = queryNorm

0.40883923 = (MATCH) fieldWeight(content:pqrst in 1), product of:

1.0 = tf(termFreq(content:pqrst)=1)

0.81767845 = idf(docFreq=5, maxDocs=5)

0.5 = fieldNorm(field=content, doc=1)

*title ->pqrst uvwx lmn abcd ::: content -> pqrst uvwx lmn abcd::: Score
->0.6175326*

Searcher.explain -> 0.6175326 = (MATCH) sum of:

0.46281427 = (MATCH) weight(content:abcd^10.0 in 3), product of:

0.92562854 = queryWeight(content:abcd^10.0), product of:

10.0 = boost

1.0 = idf(docFreq=4, maxDocs=5)

0.092562854 = queryNorm

0.5 = (MATCH) fieldWeight(content:abcd in 3), product of:

1.0 = tf(termFreq(content:abcd)=1)

1.0 = idf(docFreq=4, maxDocs=5)

0.5 = fieldNorm(field=content, doc=3)

0.15471835 = (MATCH) weight(content:pqrst^5.0 in 3), product of:

0.37843326 = queryWeight(content:pqrst^5.0), product of:

5.0 = boost

0.81767845 = idf(docFreq=5, maxDocs=5)

0.092562854 = queryNorm

0.40883923 = (MATCH) fieldWeight(content:pqrst in 3), product of:

1.0 = tf(termFreq(content:pqrst)=1)

0.81767845 = idf(docFreq=5, maxDocs=5)

0.5 = fieldNorm(field=content, doc=3)

*title ->pqrst abcd uvwx lmn ::: content -> pqrst abcd uvwx lmn::: Score
->0.6175326*

Searcher.explain -> 0.6175326 = (MATCH) sum of:

0.46281427 = (MATCH) weight(content:abcd^10.0 in 4), product of:

0.92562854 = queryWeight(content:abcd^10.0), product of:

10.0 = boost

1.0 = idf(docFreq=4, maxDocs=5)

0.092562854 = queryNorm

0.5 = (MATCH) fieldWeight(content:abcd in 4), product of:

1.0 = tf(termFreq(content:abcd)=1)

1.0 = idf(docFreq=4, maxDocs=5)

0.5 = fieldNorm(field=content, doc=4)

0.15471835 = (MATCH) weight(content:pqrst^5.0 in 4), product of:

0.37843326 = queryWeight(content:pqrst^5.0), product of:

5.0 = boost

0.81767845 = idf(docFreq=5, maxDocs=5)

0.092562854 = queryNorm

0.40883923 = (MATCH) fieldWeight(content:pqrst in 4), product of:

1.0 = tf(termFreq(content:pqrst)=1)

0.81767845 = idf(docFreq=5, maxDocs=5)

0.5 = fieldNorm(field=content, doc=4)

*title ->pqrst uvwx lmn ::: content -> pqrst uvwx lmn::: Score ->0.07735918*

Searcher.explain -> 0.07735918 = (MATCH) product of:

0.15471835 = (MATCH) sum of:

0.15471835 = (MATCH) weight(content:pqrst^5.0 in 2), product of:

0.37843326 = queryWeight(content:pqrst^5.0), product of:

5.0 = boost

0.81767845 = idf(docFreq=5, maxDocs=5)

0.092562854 = queryNorm

0.40883923 = (MATCH) fieldWeight(content:pqrst in 2), product of:

1.0 = tf(termFreq(content:pqrst)=1)

0.81767845 = idf(docFreq=5, maxDocs=5)

0.5 = fieldNorm(field=content, doc=2)

0.5 = coord(1/2)

On Mon, Jan 30, 2012 at 2:59 PM, Ian Lea wrote:

They all give exactly the same score, even the 3rd doc which doesn't
contain abcd at all? Surprising. What does searcher.explain() say?
Is this a simple search with default Similarity or is there stuff
you're not telling us?

--
Ian.

On Sat, Jan 28, 2012 at 4:44 AM, A Z wrote:
Hi lan

thanks for your reply.

when i boosting each term while searching like abcd is boost with boost
factor of 10 and pqrst boost with boost factor of 5.
then also it gives same score for documents

*Query content:abcd^10.0 content:pqrst^5.0*


title ->pqrst uvwx abcd ::: content -> pqrst uvwx abcd::: Score
->0.40883923
title ->abcd pqrst uvwx ::: content -> abcd pqrst uvwx::: Score
->0.40883923
title ->pqrst uvwx lmn ::: content -> pqrst uvwx lmn::: Score
->0.40883923
title ->pqrst uvwx lmn abcd ::: content -> pqrst uvwx lmn abcd::: Score
->0.40883923

title ->pqrst abcd uvwx lmn ::: content -> pqrst abcd uvwx lmn::: Score
->0.40883923
Thanks
On Wed, Jan 25, 2012 at 8:38 PM, Ian Lea wrote:

If you want particular search terms to be more important than others
you can use boosting. See
http://lucene.apache.org/java/3_5_0/queryparsersyntax.html#Boosting a
Term

If you want the order of matched terms to matter, see PhraseQuery or
SpanQuery. The latter is more flexible. See
http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ for a
good writeup.

And you can of course use combinations of everything.


--
Ian.


On Tue, Jan 24, 2012 at 5:08 PM, A Z wrote:
Hi



how can we assign custom score for each token/word.



For Ex

I have document



1 pqrst uvwx abcd

2 abcd pqrst uvwx

3 pqrst uvwx lmn

4 pqrst uvwx lmn abcd

5 pqrst abcd uvwx lmn



*Now i m searching data ---> abcd pqrst*

So it should give more weightage score to 2nd document then 1st
document


So i want is

*document 1 :---* *pqrst *has more *weight * then *uvwx *word and *then
abcd *word

*document 2* *:---* *abcd *has more *weight * then *pqrst* word
and *then uvwx
*word
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
reply

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 6 of 7 | next ›

2 users in discussion

A Z: 4 posts Ian Lea: 3 posts