FAQ
I have seen different versions of Lucene's ranking function
from the similarity document and Lucene user list.

Since I need to get document-doucment similaries,
so what I do is to issue the document as query directly.
I found it is different if we issue "computer computer"
to Lucene vers we issue it to standard VSM. The latter one
will treat "computer computer" as "computer" but Lucene
doesn't.

In order to illustrate my question more clear, I write
a more formalized document

http://www.cs.virginia.edu/~xj3a/lucene_ranking.pdf

so that there is no ambiguity of those formulas.

I am not asure whether I understand correctly, but the
major reason comes from Lucene's query parser. It defaults
each term appear once. If we issue a query term multiple
times in the query string, it will result in some un-expected
results.

For detail information, pls refer to the attached link.

thanks

xiangyu jin

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedNov 30, '04 at 4:19p
activeNov 30, '04 at 4:19p
posts1
users1
websitelucene.apache.org

1 user in discussion

Xiangyu Jin: 1 post

People

Translate

site design / logo © 2022 Grokbase