I have my 25 indexes of 1.8GB each read with MultiReader.
I try to get the document frequency of all the terms in specific documents
and it takes quite a long time - a document with 1000 terms takes around
4:30 minutes to calculate all the document frequencies of its terms - and
there are longer documents than that.
Since I have quite a lot of documents to process (around 12000) - it'll take
forever.
My function of getting the document frequency is listed below (it's for one
single term - but it's called for all the terms in the document term vector.
public int getdocumentfrequency (String termstr) throws Exception
{
Term term=new Term("contents", termstr);
TermEnum termenum=multireader.terms(term);
int freq=termenum.docFreq();
return freq;
}
Is there a better (i.e. faster) way to get all the document frequencies of a
specific document?
thanks a lot,
Nir.
--
View this message in context: http://www.nabble.com/docFreq-takes-long-time-to-execute-in-a-multiple-index-environment-tf4221604.html#a12009334
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org