I am using lucene 2.9.3 (via Solr 1.4.1) on windows and am trying to
understand ShingleFilter. I wrote the following code and find that if
I provide more words than the actual phrase indexed in the field, then
the search on that field fails (no score found with debugQuery=true).
Here is an example to reproduce, with field names:
Id: 1
title_1: Nina Simone
title_2: I put a spell on you
Query (dismax) with:
- “Nina Simone I put” <- Fails i.e. no score shown from title_1
search (using debugQuery)
- “Nina Simone” <- SUCCESS
But, when I used Solr’s Field Analysis with the ‘shingle’ field (given
below) and tried “Nina Simone I put”, it succeeds. It’s only during
the query that no score is provided. I also checked ‘parsedquery’ and
it shows disjunctionMaxQuery issuing the string “Nina_Simone Simone_I
I_put” to the title_1 field.
title_1 and title_2 fields are of type ‘shingle’, defined as:
<fieldType name="shingle" class="solr.TextField"
positionIncrementGap="100" indexed="true" stored="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory"
maxShingleSize="2" outputUnigrams="false"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory"
maxShingleSize="2" outputUnigrams="false"/>
</analyzer>
</fieldType>
Note that I also have a catchall field which is text. I have qf set
to: 'id^2 catchall' and pf set to: 'title_1^1.5 title_2^1.2'
If I am missing something or doing something wrong please let me know.
-Ethan
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]