FAQ
I would like to know if there is a simple way to force Lucene to adopt the
simple cosine similarity of the term frequency vectors of the documents and
the query for ranking the result. In practice the score sc_i of the document
i should be given by:

sc_i = (D_i*Q)/(|D_i|*|Q|)

where D_i = vector of the term frequencies of document i;
Q = vector of the term frequencies of the Query;
* = scalar product;
= norm of the vector (the square root of the sum of the squares
of the entries of the vector).

I wasn't able to find a way to evaluate |D_i|.

Thank you

Claudio


======================================
Ing. Claudio Gennaro, PhD
ISTI (Information Science and Technology Institute)
Consiglio Nazionale delle Ricerche Area della Ricerca di Pisa (room I14)
Via G. Moruzzi 1
56124 Pisa - ITALY
phone: +39 050 315 3077
mobile: +39 328 92 16 734
fax: +39 050 315 3464 or +39 050 315 2810
e-mail: claudio.gennaro@isti.cnr.it



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 1 | next ›
Discussion Overview
groupjava-user @
categorieslucene
postedAug 13, '09 at 4:30p
activeAug 13, '09 at 4:30p
posts1
users1
websitelucene.apache.org

1 user in discussion

Claudio Gennaro: 1 post

People

Translate

site design / logo © 2022 Grokbase