Hi,
I'm developing a search application with two types of documents:
1. Documents that need to be indexed and queried against
2. Documents that will never show up in search results, but their content
needs to contribute to the global term frequency table
In other words, the application needs Type 2 documents only for the purpose
of constructing an
IDF<http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/Similarity.html#formula_idf>table.
I understand I can accomplish this by indexing both types of documents and
using a special indicator field to make sure that Type 2 documents never
show up in search results. However, this approach seems to be a waste of
resources with all the indexing overhead for Type 2 documents. Is there a
more efficient way for doing this?
Thanks,
Xiyang