Hi All,

This is my first question for this forum. I am fairly familiar with Lucene
and using 2.9.4 in my project (not using Solr). I have a following question
for the use of Synonym filter.

While indexing contents, I am using following analyzer setup

[Analyzer1] == StandardTokenizer --> StandardFilter --> LowerCaseFilter
--> StopFilter --> PorterStemFilter

And while searching using MoreLikeThis I am using analyzer similar to the
previous one but with addition of synonym filter

[Analyzer2] == StandardTokenizer --> StandardFilter --> LowerCaseFilter
--> StopFilter --> SynonymFilter --> PorterStemFilter

*Scenario 1: Analyzer 1 for indexing and searching*
Now I index document A, B and C using Analyzer1 and then use MoreLikeThis on
document D to find similar documents from the index using Analyzer1 (Not
Analyzer2), I get following output
A matched 40%
B matched 20%
C matched 5%

*Scenario 2: Analyzer 2 for indexing and searching*
My problem is, the moment I use Analyzer2 (with Synonym Filter) to index and
search similar documents to document D, all my results gets boost, my
results become:
A matched 60%
B matched 40%
C matched 25%

*Scenario 3: Analyzer 1 for indexing and Analyzer 2 for searching*
But if I use Analyzer1 for indexing and Analyzer2 for searching, then my
results go way down
A matched 15%
B matched 11%
C matched 2%

When I dig into the reason why the % matching went down, I understood that
this is happening because when searching using Synonym analyzer, I tend to
get much more interesting terms
[moreLikeThis.retrieveInterestingTerms(reader)] and then most of these
synonym words match with all the documents bringing down its tf and idf
resulting into less matching percentages for the documents.

*So my question is:*
1. Is it correct to use Analyzer without synonym filter for indexing and
with synonym filter for searching?
2. Is there any other setting that I am missing causing all the matching
percentages to go down?

My search setting while using MoreLikeThis are

MoreLikeThis mlt = new MoreLikeThis(index);
SynonymEngine engine = new WordNetSynonymEngine(new File("PATH"));

mlt.setAnalyzer(new PorterSynonymStandardAnalyzer(engine));

Thanks in advance


Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
postedMay 9, '11 at 4:15a
activeMay 9, '11 at 4:15a

1 user in discussion

Saurabh Gokhale: 1 post



site design / logo © 2022 Grokbase