simple cosine similarity of the term frequency vectors of the documents and

the query for ranking the result. In practice the score sc_i of the document

i should be given by:

sc_i = (D_i*Q)/(|D_i|*|Q|)

where D_i = vector of the term frequencies of document i;

Q = vector of the term frequencies of the Query;

* = scalar product;

= norm of the vector (the square root of the sum of the squares

I wasn't able to find a way to evaluate |D_i|.

Thank you

Claudio

======================================

Ing. Claudio Gennaro, PhD

ISTI (Information Science and Technology Institute)

Consiglio Nazionale delle Ricerche Area della Ricerca di Pisa (room I14)

Via G. Moruzzi 1

56124 Pisa - ITALY

phone: +39 050 315 3077

mobile: +39 328 92 16 734

fax: +39 050 315 3464 or +39 050 315 2810

e-mail: claudio.gennaro@isti.cnr.it

---------------------------------------------------------------------

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org

For additional commands, e-mail: java-user-help@lucene.apache.org