Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.
I'm having some problems writing a query that matches a specific field on several words. I have implemented an AJAX search that basically takes whatever is in a form field and attempts to match documents. I'm not having much luck though. First word always matches correctly but as soon as I enter the second word I'm loosing matches, the third word doesn't give any matches at all.
The title field that I'm searching contains a product name that may or may not have several words.
The requirement is that the search should be progressive i.e. as the user inputs words I should always return results that contain all of the words entered. I also have to correct bad input like an erraneous space in the product name ex. "product name" instead of "productname".
I'm wondering if there isn't an easier way to query Solr? Ideally I'd want to say "give me all docs that have the following text in it's titles" Is that possible?
I'd really appreciate any help!
Regards,
Christopher Cato
[Solr-user] Need help with troublesome wildcard query
| Tweet |
|
Search Discussions
-
Briggs Thompson at Jul 7, 2011 at 9:17 pm ⇧
Hello Christopher,
Can you provide the exact query sent to Solr for the one word query and also
the two word query? The field type definition for your title field would be
useful too.From what I understand, Solr should be able to handle your use case. I amguessing it is a problem with how the field is defined assuming the query is
correct.
Briggs ThompsonOn Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato wrote:
Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.
I'm having some problems writing a query that matches a specific field on
several words. I have implemented an AJAX search that basically takes
whatever is in a form field and attempts to match documents. I'm not having
much luck though. First word always matches correctly but as soon as I enter
the second word I'm loosing matches, the third word doesn't give any matches
at all.
The title field that I'm searching contains a product name that may or may
not have several words.
The requirement is that the search should be progressive i.e. as the user
inputs words I should always return results that contain all of the words
entered. I also have to correct bad input like an erraneous space in the
product name ex. "product name" instead of "productname".
I'm wondering if there isn't an easier way to query Solr? Ideally I'd want
to say "give me all docs that have the following text in it's titles" Is
that possible?
I'd really appreciate any help!
Regards,
Christopher Cato -
Christopher Cato at Jul 8, 2011 at 1:04 pm ⇧
Hi Briggs. Thanks for taking the time. I have the query nearly working now, currently this is how it looks when it matches on the title "Super Technocrane 30" and others with similar names:
INFO: [] webapp=/solr path=/select/ params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)&qt=standard&fq=type:product+AND+language:sv} hits=3 status=0 QTime=1
Adding another letter stops it matching:
INFO: [] webapp=/solr path=/select/ params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)&qt=standard&fq=type:product+AND+language:sv} hits=0 status=0 QTime=0
The field type definitions are as follows:
<field name="title" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true"/>
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
There is also a type definition that is called text_ws, should I use that instead and change text to text_ws in the field definition for title?
<!-- A text field that only splits on whitespace for exact matching of words -->
<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
Mvh
Christopher Cato
Teknikchef
-----------------------------------
MiniMedia
Phone: +46761927603
www.minimedia.se
7 jul 2011 kl. 23.16 skrev Briggs Thompson:Hello Christopher,
Can you provide the exact query sent to Solr for the one word query and also
the two word query? The field type definition for your title field would be
useful too.
From what I understand, Solr should be able to handle your use case. I am
guessing it is a problem with how the field is defined assuming the query is
correct.
Briggs Thompson
On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato <
christopher.cato@minimedia.se> wrote:Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.
I'm having some problems writing a query that matches a specific field on
several words. I have implemented an AJAX search that basically takes
whatever is in a form field and attempts to match documents. I'm not having
much luck though. First word always matches correctly but as soon as I enter
the second word I'm loosing matches, the third word doesn't give any matches
at all.
The title field that I'm searching contains a product name that may or may
not have several words.
The requirement is that the search should be progressive i.e. as the user
inputs words I should always return results that contain all of the words
entered. I also have to correct bad input like an erraneous space in the
product name ex. "product name" instead of "productname".
I'm wondering if there isn't an easier way to query Solr? Ideally I'd want
to say "give me all docs that have the following text in it's titles" Is
that possible?
I'd really appreciate any help!
Regards,
Christopher Cato -
Briggs Thompson at Jul 8, 2011 at 2:58 pm ⇧
Hey Chris,
Removing the ORs in each query might help narrow down the problem, but I
suggest you run this through the query analyzer in order to see where it is
dropping out. It is a great tool for troubleshooting issues like these.
I see a few things here.
- for leading wildcard queries, you should include the
reverseWildcardFilterFactory. Check out the documentation here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory
- Your result might get dropped out because you are trying to do wildcard
searches on a stemmed field. Wildcard searches on a stemmed field is
counter-intuitive because if you index "computers", it may stem to "comput",
in which wildcard query of "computer*" would not match.
- If you want to support stemming and wildcard searches, I suggest
creating a copy field with an un-stemmed field type definition.
Don't forget if you modify your field type definition, you need to
re-index.
In response to your question about text_ws, this is just a different field
type definition that essentially splits on whiteSpaces. You should use that
if that is what the desired search logic is, but it probably isn't. Check
out the documentation on each of the tokenizers and filter factories in your
"text" field type and see what you need and what you don't to satisfy your
use cases.
Hope that helps,
Briggs ThompsonOn Fri, Jul 8, 2011 at 9:03 AM, Christopher Cato wrote:
Hi Briggs. Thanks for taking the time. I have the query nearly working now,
currently this is how it looks when it matches on the title "Super
Technocrane 30" and others with similar names:
INFO: [] webapp=/solr path=/select/
params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)&qt=standard&fq=type:product+AND+language:sv}
hits=3 status=0 QTime=1
Adding another letter stops it matching:
INFO: [] webapp=/solr path=/select/
params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)&qt=standard&fq=type:product+AND+language:sv}
hits=0 status=0 QTime=0
The field type definitions are as follows:
<field name="title" type="text" indexed="true" stored="true"
termVectors="true" omitNorms="true"/>
<fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
There is also a type definition that is called text_ws, should I use that
instead and change text to text_ws in the field definition for title?
<!-- A text field that only splits on whitespace for exact matching of
words -->
<fieldType name="text_ws" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
Mvh
Christopher Cato
Teknikchef
-----------------------------------
MiniMedia
Phone: +46761927603
www.minimedia.se
7 jul 2011 kl. 23.16 skrev Briggs Thompson:Hello Christopher,on
Can you provide the exact query sent to Solr for the one word query and also
the two word query? The field type definition for your title field would be
useful too.
From what I understand, Solr should be able to handle your use case. I am
guessing it is a problem with how the field is defined assuming the query is
correct.
Briggs Thompson
On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato <
christopher.cato@minimedia.se> wrote:Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.
I'm having some problems writing a query that matches a specific fieldhavingseveral words. I have implemented an AJAX search that basically takes
whatever is in a form field and attempts to match documents. I'm notentermuch luck though. First word always matches correctly but as soon as Imatchesthe second word I'm loosing matches, the third word doesn't give anymayat all.
The title field that I'm searching contains a product name that may orusernot have several words.
The requirement is that the search should be progressive i.e. as thewordsinputs words I should always return results that contain all of thewantentered. I also have to correct bad input like an erraneous space in the
product name ex. "product name" instead of "productname".
I'm wondering if there isn't an easier way to query Solr? Ideally I'dto say "give me all docs that have the following text in it's titles" Is
that possible?
I'd really appreciate any help!
Regards,
Christopher Cato -
Christopher Cato at Jul 8, 2011 at 4:45 pm ⇧
Hi Briggs, thanks for being patient with me!
Yeah, I saw I had a typo there in the OR clause. Fixed it but still no perfect results.
I'm looking at the analysis.jsp page and can't really figure it out. Feeling a bit overwhelmed by all the output. I also don't know how to check if stemming is used for the title field.
Theoretically, given the field type I'm using and also given that "super technocrane 30" is the title of one of the docs - how would one write the query so that it finds that doc if the user searches for "super techn" or "super technocrane"? Right now it stops matching in the middle of the word "technocrane" or rather after the "r".
Darnit, I just want to return all docs that contain the search terms either as whole words or parts of words.
Is it possible?
Regards,
Christopher
8 jul 2011 kl. 16.57 skrev Briggs Thompson:Hey Chris,
Removing the ORs in each query might help narrow down the problem, but I
suggest you run this through the query analyzer in order to see where it is
dropping out. It is a great tool for troubleshooting issues like these.
I see a few things here.
- for leading wildcard queries, you should include the
reverseWildcardFilterFactory. Check out the documentation here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory
- Your result might get dropped out because you are trying to do wildcard
searches on a stemmed field. Wildcard searches on a stemmed field is
counter-intuitive because if you index "computers", it may stem to "comput",
in which wildcard query of "computer*" would not match.
- If you want to support stemming and wildcard searches, I suggest
creating a copy field with an un-stemmed field type definition.
Don't forget if you modify your field type definition, you need to
re-index.
In response to your question about text_ws, this is just a different field
type definition that essentially splits on whiteSpaces. You should use that
if that is what the desired search logic is, but it probably isn't. Check
out the documentation on each of the tokenizers and filter factories in your
"text" field type and see what you need and what you don't to satisfy your
use cases.
Hope that helps,
Briggs Thompson
On Fri, Jul 8, 2011 at 9:03 AM, Christopher Cato <
christopher.cato@minimedia.se> wrote:Hi Briggs. Thanks for taking the time. I have the query nearly working now,
currently this is how it looks when it matches on the title "Super
Technocrane 30" and others with similar names:
INFO: [] webapp=/solr path=/select/
params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)&qt=standard&fq=type:product+AND+language:sv}
hits=3 status=0 QTime=1
Adding another letter stops it matching:
INFO: [] webapp=/solr path=/select/
params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)&qt=standard&fq=type:product+AND+language:sv}
hits=0 status=0 QTime=0
The field type definitions are as follows:
<field name="title" type="text" indexed="true" stored="true"
termVectors="true" omitNorms="true"/>
<fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
There is also a type definition that is called text_ws, should I use that
instead and change text to text_ws in the field definition for title?
<!-- A text field that only splits on whitespace for exact matching of
words -->
<fieldType name="text_ws" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
Mvh
Christopher Cato
Teknikchef
-----------------------------------
MiniMedia
Phone: +46761927603
www.minimedia.se
7 jul 2011 kl. 23.16 skrev Briggs Thompson:Hello Christopher,on
Can you provide the exact query sent to Solr for the one word query and also
the two word query? The field type definition for your title field would be
useful too.
From what I understand, Solr should be able to handle your use case. I am
guessing it is a problem with how the field is defined assuming the query is
correct.
Briggs Thompson
On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato <
christopher.cato@minimedia.se> wrote:Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.
I'm having some problems writing a query that matches a specific fieldhavingseveral words. I have implemented an AJAX search that basically takes
whatever is in a form field and attempts to match documents. I'm notentermuch luck though. First word always matches correctly but as soon as Imatchesthe second word I'm loosing matches, the third word doesn't give anymayat all.
The title field that I'm searching contains a product name that may orusernot have several words.
The requirement is that the search should be progressive i.e. as thewordsinputs words I should always return results that contain all of thewantentered. I also have to correct bad input like an erraneous space in the
product name ex. "product name" instead of "productname".
I'm wondering if there isn't an easier way to query Solr? Ideally I'dto say "give me all docs that have the following text in it's titles" Is
that possible?
I'd really appreciate any help!
Regards,
Christopher Cato -
Erick Erickson at Jul 8, 2011 at 5:00 pm ⇧
Yeah, the analysis page takes a bit of getting used to, but it's well
worth the time. Be sure to check the "verbose" box. Taking some time
to understand what it's telling you is one of the best investments
you'll make.
Your "parts of words" is the issue. One approach is to use ngrams or
edgengrams. Here's a writeup about edgengrams from Lucid:
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
it's written for autosuggest, but you get the idea. If "partial" words
could be not at the start then ngrams are a possibility....
Your problem is one of those
conceptually-simple-but-annoyingly-difficult-to-implement
ones that takes far longer to fully understand/implement than
it seems like it should.
Best
Erick
On Fri, Jul 8, 2011 at 12:44 PM, Christopher Cato
wrote:Hi Briggs, thanks for being patient with me!
Yeah, I saw I had a typo there in the OR clause. Fixed it but still no perfect results.
I'm looking at the analysis.jsp page and can't really figure it out. Feeling a bit overwhelmed by all the output. I also don't know how to check if stemming is used for the title field.
Theoretically, given the field type I'm using and also given that "super technocrane 30" is the title of one of the docs - how would one write the query so that it finds that doc if the user searches for "super techn" or "super technocrane"? Right now it stops matching in the middle of the word "technocrane" or rather after the "r".
Darnit, I just want to return all docs that contain the search terms either as whole words or parts of words.
Is it possible?
Regards,
Christopher
8 jul 2011 kl. 16.57 skrev Briggs Thompson:Hey Chris,
Removing the ORs in each query might help narrow down the problem, but I
suggest you run this through the query analyzer in order to see where it is
dropping out. It is a great tool for troubleshooting issues like these.
I see a few things here.
- for leading wildcard queries, you should include the
reverseWildcardFilterFactory. Check out the documentation here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory
- Your result might get dropped out because you are trying to do wildcard
searches on a stemmed field. Wildcard searches on a stemmed field is
counter-intuitive because if you index "computers", it may stem to "comput",
in which wildcard query of "computer*" would not match.
- If you want to support stemming and wildcard searches, I suggest
creating a copy field with an un-stemmed field type definition.
Don't forget if you modify your field type definition, you need to
re-index.
In response to your question about text_ws, this is just a different field
type definition that essentially splits on whiteSpaces. You should use that
if that is what the desired search logic is, but it probably isn't. Check
out the documentation on each of the tokenizers and filter factories in your
"text" field type and see what you need and what you don't to satisfy your
use cases.
Hope that helps,
Briggs Thompson
On Fri, Jul 8, 2011 at 9:03 AM, Christopher Cato <
christopher.cato@minimedia.se> wrote:Hi Briggs. Thanks for taking the time. I have the query nearly working now,
currently this is how it looks when it matches on the title "Super
Technocrane 30" and others with similar names:
INFO: [] webapp=/solr path=/select/
params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)&qt=standard&fq=type:product+AND+language:sv}
hits=3 status=0 QTime=1
Adding another letter stops it matching:
INFO: [] webapp=/solr path=/select/
params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)&qt=standard&fq=type:product+AND+language:sv}
hits=0 status=0 QTime=0
The field type definitions are as follows:
<field name="title" type="text" indexed="true" stored="true"
termVectors="true" omitNorms="true"/>
<fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
There is also a type definition that is called text_ws, should I use that
instead and change text to text_ws in the field definition for title?
<!-- A text field that only splits on whitespace for exact matching of
words -->
<fieldType name="text_ws" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
Mvh
Christopher Cato
Teknikchef
-----------------------------------
MiniMedia
Phone: +46761927603
www.minimedia.se
7 jul 2011 kl. 23.16 skrev Briggs Thompson:Hello Christopher,on
Can you provide the exact query sent to Solr for the one word query and also
the two word query? The field type definition for your title field would be
useful too.
From what I understand, Solr should be able to handle your use case. I am
guessing it is a problem with how the field is defined assuming the query is
correct.
Briggs Thompson
On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato <
christopher.cato@minimedia.se> wrote:Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.
I'm having some problems writing a query that matches a specific fieldhavingseveral words. I have implemented an AJAX search that basically takes
whatever is in a form field and attempts to match documents. I'm notentermuch luck though. First word always matches correctly but as soon as Imatchesthe second word I'm loosing matches, the third word doesn't give anymayat all.
The title field that I'm searching contains a product name that may orusernot have several words.
The requirement is that the search should be progressive i.e. as thewordsinputs words I should always return results that contain all of thewantentered. I also have to correct bad input like an erraneous space in the
product name ex. "product name" instead of "productname".
I'm wondering if there isn't an easier way to query Solr? Ideally I'dto say "give me all docs that have the following text in it's titles" Is
that possible?
I'd really appreciate any help!
Regards,
Christopher Cato -
Christopher Cato at Jul 8, 2011 at 5:37 pm ⇧
Thanks for that pointer, that's really more what I want to do. And actually, EdgeNGrams is stuck somewhere in the back of my head :) Yes, simple at first thought but not as easy to implement as I have discovered.
Well, so how do I implement something like this? I took the fieldtype declaration from that blog post, added it to my schema.xml within the fieldtypes part.
So, if I get it all correctly, all I have to do now is to add a new field with newly added fieldtype, a copyfield from the original title field, change the query to use the new field and restart / reindex. Or am I missing something?
//Christopher
8 jul 2011 kl. 18.59 skrev Erick Erickson:Yeah, the analysis page takes a bit of getting used to, but it's well
worth the time. Be sure to check the "verbose" box. Taking some time
to understand what it's telling you is one of the best investments
you'll make.
Your "parts of words" is the issue. One approach is to use ngrams or
edgengrams. Here's a writeup about edgengrams from Lucid:
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
it's written for autosuggest, but you get the idea. If "partial" words
could be not at the start then ngrams are a possibility....
Your problem is one of those
conceptually-simple-but-annoyingly-difficult-to-implement
ones that takes far longer to fully understand/implement than
it seems like it should.
Best
Erick
On Fri, Jul 8, 2011 at 12:44 PM, Christopher Cato
wrote:Hi Briggs, thanks for being patient with me!
Yeah, I saw I had a typo there in the OR clause. Fixed it but still no perfect results.
I'm looking at the analysis.jsp page and can't really figure it out. Feeling a bit overwhelmed by all the output. I also don't know how to check if stemming is used for the title field.
Theoretically, given the field type I'm using and also given that "super technocrane 30" is the title of one of the docs - how would one write the query so that it finds that doc if the user searches for "super techn" or "super technocrane"? Right now it stops matching in the middle of the word "technocrane" or rather after the "r".
Darnit, I just want to return all docs that contain the search terms either as whole words or parts of words.
Is it possible?
Regards,
Christopher
8 jul 2011 kl. 16.57 skrev Briggs Thompson:Hey Chris,
Removing the ORs in each query might help narrow down the problem, but I
suggest you run this through the query analyzer in order to see where it is
dropping out. It is a great tool for troubleshooting issues like these.
I see a few things here.
- for leading wildcard queries, you should include the
reverseWildcardFilterFactory. Check out the documentation here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory
- Your result might get dropped out because you are trying to do wildcard
searches on a stemmed field. Wildcard searches on a stemmed field is
counter-intuitive because if you index "computers", it may stem to "comput",
in which wildcard query of "computer*" would not match.
- If you want to support stemming and wildcard searches, I suggest
creating a copy field with an un-stemmed field type definition.
Don't forget if you modify your field type definition, you need to
re-index.
In response to your question about text_ws, this is just a different field
type definition that essentially splits on whiteSpaces. You should use that
if that is what the desired search logic is, but it probably isn't. Check
out the documentation on each of the tokenizers and filter factories in your
"text" field type and see what you need and what you don't to satisfy your
use cases.
Hope that helps,
Briggs Thompson
On Fri, Jul 8, 2011 at 9:03 AM, Christopher Cato <
christopher.cato@minimedia.se> wrote:Hi Briggs. Thanks for taking the time. I have the query nearly working now,
currently this is how it looks when it matches on the title "Super
Technocrane 30" and others with similar names:
INFO: [] webapp=/solr path=/select/
params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)&qt=standard&fq=type:product+AND+language:sv}
hits=3 status=0 QTime=1
Adding another letter stops it matching:
INFO: [] webapp=/solr path=/select/
params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)&qt=standard&fq=type:product+AND+language:sv}
hits=0 status=0 QTime=0
The field type definitions are as follows:
<field name="title" type="text" indexed="true" stored="true"
termVectors="true" omitNorms="true"/>
<fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
There is also a type definition that is called text_ws, should I use that
instead and change text to text_ws in the field definition for title?
<!-- A text field that only splits on whitespace for exact matching of
words -->
<fieldType name="text_ws" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
Mvh
Christopher Cato
Teknikchef
-----------------------------------
MiniMedia
Phone: +46761927603
www.minimedia.se
7 jul 2011 kl. 23.16 skrev Briggs Thompson:Hello Christopher,on
Can you provide the exact query sent to Solr for the one word query and also
the two word query? The field type definition for your title field would be
useful too.
From what I understand, Solr should be able to handle your use case. I am
guessing it is a problem with how the field is defined assuming the query is
correct.
Briggs Thompson
On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato <
christopher.cato@minimedia.se> wrote:Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.
I'm having some problems writing a query that matches a specific fieldhavingseveral words. I have implemented an AJAX search that basically takes
whatever is in a form field and attempts to match documents. I'm notentermuch luck though. First word always matches correctly but as soon as Imatchesthe second word I'm loosing matches, the third word doesn't give anymayat all.
The title field that I'm searching contains a product name that may orusernot have several words.
The requirement is that the search should be progressive i.e. as thewordsinputs words I should always return results that contain all of thewantentered. I also have to correct bad input like an erraneous space in the
product name ex. "product name" instead of "productname".
I'm wondering if there isn't an easier way to query Solr? Ideally I'dto say "give me all docs that have the following text in it's titles" Is
that possible?
I'd really appreciate any help!
Regards,
Christopher Cato -
Erick Erickson at Jul 8, 2011 at 7:16 pm ⇧
Nope, that should do it (although I haven't tried that
exact set of steps). But you do have to reindex
from scratch....
Best
Erick
On Fri, Jul 8, 2011 at 1:36 PM, Christopher Cato
wrote:Thanks for that pointer, that's really more what I want to do. And actually, EdgeNGrams is stuck somewhere in the back of my head :) Yes, simple at first thought but not as easy to implement as I have discovered.
Well, so how do I implement something like this? I took the fieldtype declaration from that blog post, added it to my schema.xml within the fieldtypes part.
So, if I get it all correctly, all I have to do now is to add a new field with newly added fieldtype, a copyfield from the original title field, change the query to use the new field and restart / reindex. Or am I missing something?
//Christopher
8 jul 2011 kl. 18.59 skrev Erick Erickson:Yeah, the analysis page takes a bit of getting used to, but it's well
worth the time. Be sure to check the "verbose" box. Taking some time
to understand what it's telling you is one of the best investments
you'll make.
Your "parts of words" is the issue. One approach is to use ngrams or
edgengrams. Here's a writeup about edgengrams from Lucid:
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
it's written for autosuggest, but you get the idea. If "partial" words
could be not at the start then ngrams are a possibility....
Your problem is one of those
conceptually-simple-but-annoyingly-difficult-to-implement
ones that takes far longer to fully understand/implement than
it seems like it should.
Best
Erick
On Fri, Jul 8, 2011 at 12:44 PM, Christopher Cato
wrote:Hi Briggs, thanks for being patient with me!
Yeah, I saw I had a typo there in the OR clause. Fixed it but still no perfect results.
I'm looking at the analysis.jsp page and can't really figure it out. Feeling a bit overwhelmed by all the output. I also don't know how to check if stemming is used for the title field.
Theoretically, given the field type I'm using and also given that "super technocrane 30" is the title of one of the docs - how would one write the query so that it finds that doc if the user searches for "super techn" or "super technocrane"? Right now it stops matching in the middle of the word "technocrane" or rather after the "r".
Darnit, I just want to return all docs that contain the search terms either as whole words or parts of words.
Is it possible?
Regards,
Christopher
8 jul 2011 kl. 16.57 skrev Briggs Thompson:Hey Chris,
Removing the ORs in each query might help narrow down the problem, but I
suggest you run this through the query analyzer in order to see where it is
dropping out. It is a great tool for troubleshooting issues like these.
I see a few things here.
- for leading wildcard queries, you should include the
reverseWildcardFilterFactory. Check out the documentation here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory
- Your result might get dropped out because you are trying to do wildcard
searches on a stemmed field. Wildcard searches on a stemmed field is
counter-intuitive because if you index "computers", it may stem to "comput",
in which wildcard query of "computer*" would not match.
- If you want to support stemming and wildcard searches, I suggest
creating a copy field with an un-stemmed field type definition.
Don't forget if you modify your field type definition, you need to
re-index.
In response to your question about text_ws, this is just a different field
type definition that essentially splits on whiteSpaces. You should use that
if that is what the desired search logic is, but it probably isn't. Check
out the documentation on each of the tokenizers and filter factories in your
"text" field type and see what you need and what you don't to satisfy your
use cases.
Hope that helps,
Briggs Thompson
On Fri, Jul 8, 2011 at 9:03 AM, Christopher Cato <
christopher.cato@minimedia.se> wrote:Hi Briggs. Thanks for taking the time. I have the query nearly working now,
currently this is how it looks when it matches on the title "Super
Technocrane 30" and others with similar names:
INFO: [] webapp=/solr path=/select/
params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)&qt=standard&fq=type:product+AND+language:sv}
hits=3 status=0 QTime=1
Adding another letter stops it matching:
INFO: [] webapp=/solr path=/select/
params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)&qt=standard&fq=type:product+AND+language:sv}
hits=0 status=0 QTime=0
The field type definitions are as follows:
<field name="title" type="text" indexed="true" stored="true"
termVectors="true" omitNorms="true"/>
<fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
There is also a type definition that is called text_ws, should I use that
instead and change text to text_ws in the field definition for title?
<!-- A text field that only splits on whitespace for exact matching of
words -->
<fieldType name="text_ws" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
Mvh
Christopher Cato
Teknikchef
-----------------------------------
MiniMedia
Phone: +46761927603
www.minimedia.se
7 jul 2011 kl. 23.16 skrev Briggs Thompson:Hello Christopher,on
Can you provide the exact query sent to Solr for the one word query and also
the two word query? The field type definition for your title field would be
useful too.
From what I understand, Solr should be able to handle your use case. I am
guessing it is a problem with how the field is defined assuming the query is
correct.
Briggs Thompson
On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato <
christopher.cato@minimedia.se> wrote:Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.
I'm having some problems writing a query that matches a specific fieldhavingseveral words. I have implemented an AJAX search that basically takes
whatever is in a form field and attempts to match documents. I'm notentermuch luck though. First word always matches correctly but as soon as Imatchesthe second word I'm loosing matches, the third word doesn't give anymayat all.
The title field that I'm searching contains a product name that may orusernot have several words.
The requirement is that the search should be progressive i.e. as thewordsinputs words I should always return results that contain all of thewantentered. I also have to correct bad input like an erraneous space in the
product name ex. "product name" instead of "productname".
I'm wondering if there isn't an easier way to query Solr? Ideally I'dto say "give me all docs that have the following text in it's titles" Is
that possible?
I'd really appreciate any help!
Regards,
Christopher Cato -
Christopher Cato at Jul 8, 2011 at 8:20 pm ⇧
And don't you know, that EdgeNGram analyzer did the trick. Added the fieldtype, added a new field based on it, copyfielded the old title to it, reindexed and hey - it works brilliantly :)
And you were right, the analysis output does make sence once it actually matches something :D
Thanks a million!
Mvh
Christopher Cato
Teknikchef
-----------------------------------
MiniMedia
Phone: +46761927603
www.minimedia.se
8 jul 2011 kl. 21.16 skrev Erick Erickson:Nope, that should do it (although I haven't tried that
exact set of steps). But you do have to reindex
from scratch....
Best
Erick
On Fri, Jul 8, 2011 at 1:36 PM, Christopher Cato
wrote:Thanks for that pointer, that's really more what I want to do. And actually, EdgeNGrams is stuck somewhere in the back of my head :) Yes, simple at first thought but not as easy to implement as I have discovered.
Well, so how do I implement something like this? I took the fieldtype declaration from that blog post, added it to my schema.xml within the fieldtypes part.
So, if I get it all correctly, all I have to do now is to add a new field with newly added fieldtype, a copyfield from the original title field, change the query to use the new field and restart / reindex. Or am I missing something?
//Christopher
8 jul 2011 kl. 18.59 skrev Erick Erickson:Yeah, the analysis page takes a bit of getting used to, but it's well
worth the time. Be sure to check the "verbose" box. Taking some time
to understand what it's telling you is one of the best investments
you'll make.
Your "parts of words" is the issue. One approach is to use ngrams or
edgengrams. Here's a writeup about edgengrams from Lucid:
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
it's written for autosuggest, but you get the idea. If "partial" words
could be not at the start then ngrams are a possibility....
Your problem is one of those
conceptually-simple-but-annoyingly-difficult-to-implement
ones that takes far longer to fully understand/implement than
it seems like it should.
Best
Erick
On Fri, Jul 8, 2011 at 12:44 PM, Christopher Cato
wrote:Hi Briggs, thanks for being patient with me!
Yeah, I saw I had a typo there in the OR clause. Fixed it but still no perfect results.
I'm looking at the analysis.jsp page and can't really figure it out. Feeling a bit overwhelmed by all the output. I also don't know how to check if stemming is used for the title field.
Theoretically, given the field type I'm using and also given that "super technocrane 30" is the title of one of the docs - how would one write the query so that it finds that doc if the user searches for "super techn" or "super technocrane"? Right now it stops matching in the middle of the word "technocrane" or rather after the "r".
Darnit, I just want to return all docs that contain the search terms either as whole words or parts of words.
Is it possible?
Regards,
Christopher
8 jul 2011 kl. 16.57 skrev Briggs Thompson:Hey Chris,
Removing the ORs in each query might help narrow down the problem, but I
suggest you run this through the query analyzer in order to see where it is
dropping out. It is a great tool for troubleshooting issues like these.
I see a few things here.
- for leading wildcard queries, you should include the
reverseWildcardFilterFactory. Check out the documentation here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory
- Your result might get dropped out because you are trying to do wildcard
searches on a stemmed field. Wildcard searches on a stemmed field is
counter-intuitive because if you index "computers", it may stem to "comput",
in which wildcard query of "computer*" would not match.
- If you want to support stemming and wildcard searches, I suggest
creating a copy field with an un-stemmed field type definition.
Don't forget if you modify your field type definition, you need to
re-index.
In response to your question about text_ws, this is just a different field
type definition that essentially splits on whiteSpaces. You should use that
if that is what the desired search logic is, but it probably isn't. Check
out the documentation on each of the tokenizers and filter factories in your
"text" field type and see what you need and what you don't to satisfy your
use cases.
Hope that helps,
Briggs Thompson
On Fri, Jul 8, 2011 at 9:03 AM, Christopher Cato <
christopher.cato@minimedia.se> wrote:Hi Briggs. Thanks for taking the time. I have the query nearly working now,
currently this is how it looks when it matches on the title "Super
Technocrane 30" and others with similar names:
INFO: [] webapp=/solr path=/select/
params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)&qt=standard&fq=type:product+AND+language:sv}
hits=3 status=0 QTime=1
Adding another letter stops it matching:
INFO: [] webapp=/solr path=/select/
params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)&qt=standard&fq=type:product+AND+language:sv}
hits=0 status=0 QTime=0
The field type definitions are as follows:
<field name="title" type="text" indexed="true" stored="true"
termVectors="true" omitNorms="true"/>
<fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
There is also a type definition that is called text_ws, should I use that
instead and change text to text_ws in the field definition for title?
<!-- A text field that only splits on whitespace for exact matching of
words -->
<fieldType name="text_ws" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
Mvh
Christopher Cato
Teknikchef
-----------------------------------
MiniMedia
Phone: +46761927603
www.minimedia.se
7 jul 2011 kl. 23.16 skrev Briggs Thompson:Hello Christopher,on
Can you provide the exact query sent to Solr for the one word query and also
the two word query? The field type definition for your title field would be
useful too.
From what I understand, Solr should be able to handle your use case. I am
guessing it is a problem with how the field is defined assuming the query is
correct.
Briggs Thompson
On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato <
christopher.cato@minimedia.se> wrote:Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.
I'm having some problems writing a query that matches a specific fieldhavingseveral words. I have implemented an AJAX search that basically takes
whatever is in a form field and attempts to match documents. I'm notentermuch luck though. First word always matches correctly but as soon as Imatchesthe second word I'm loosing matches, the third word doesn't give anymayat all.
The title field that I'm searching contains a product name that may orusernot have several words.
The requirement is that the search should be progressive i.e. as thewordsinputs words I should always return results that contain all of thewantentered. I also have to correct bad input like an erraneous space in the
product name ex. "product name" instead of "productname".
I'm wondering if there isn't an easier way to query Solr? Ideally I'dto say "give me all docs that have the following text in it's titles" Is
that possible?
I'd really appreciate any help!
Regards,
Christopher Cato
Related Discussions
Discussion Navigation
| view | thread | post |
Discussion Overview
| group | solr-user
|
| categories | lucene |
| posted | Jul 7, '11 at 4:22p |
| active | Jul 8, '11 at 8:20p |
| posts | 9 |
| users | 3 |
| website | lucene.apache.org... |
