I am pretty new to Lucene. Certainly, it is very exciting. I need to

implement a new Similarity class based on the Term Vector Space Model given

in http://www.miislita.com/term-vector/term-vector-3.html

Although that model is similar to Lucene’s model

(http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.html),

I am having hard time to extend the Similarity class to calculate that

model.

In that model, “tf” is multiplied with Idf for all terms in the index, but

in Lucene “tf” is calculated only for terms in the given Query. Because of

that effect, the norm calculation should also include “idf” for all terms.

Lucene calculates the norm, during indexing, by “just” counting the number

of terms per document. In the web formula (in miislita.com), a document norm

is calculated after multiplying “tf” and “idf”.

FYI: I could implement “idf” according to miisliat.com formula, but not the

“tf” and “norm”

Could you please comment me how I can implement a new Similarity class that

will fit in the Lucene’s architecture, but still implement the vector space

model given in miislita.com

Thanks a lot for your comments,

Dharma

--

View this message in context: http://www.nabble.com/Vector-Space-Model%3A-New-Similarity-Implementation-Issues-tp15696719p15696719.html

Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org

For additional commands, e-mail: java-user-help@lucene.apache.org