FAQ

[Solr-user] Need help with troublesome wildcard query

Christopher Cato
Jul 7, 2011 at 4:22 pm
Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.

I'm having some problems writing a query that matches a specific field on several words. I have implemented an AJAX search that basically takes whatever is in a form field and attempts to match documents. I'm not having much luck though. First word always matches correctly but as soon as I enter the second word I'm loosing matches, the third word doesn't give any matches at all.

The title field that I'm searching contains a product name that may or may not have several words.

The requirement is that the search should be progressive i.e. as the user inputs words I should always return results that contain all of the words entered. I also have to correct bad input like an erraneous space in the product name ex. "product name" instead of "productname".

I'm wondering if there isn't an easier way to query Solr? Ideally I'd want to say "give me all docs that have the following text in it's titles" Is that possible?


I'd really appreciate any help!


Regards,
Christopher Cato
reply

Search Discussions

8 responses

  • Briggs Thompson at Jul 7, 2011 at 9:17 pm
    Hello Christopher,

    Can you provide the exact query sent to Solr for the one word query and also
    the two word query? The field type definition for your title field would be
    useful too.
    From what I understand, Solr should be able to handle your use case. I am
    guessing it is a problem with how the field is defined assuming the query is
    correct.

    Briggs Thompson
    On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato wrote:

    Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.

    I'm having some problems writing a query that matches a specific field on
    several words. I have implemented an AJAX search that basically takes
    whatever is in a form field and attempts to match documents. I'm not having
    much luck though. First word always matches correctly but as soon as I enter
    the second word I'm loosing matches, the third word doesn't give any matches
    at all.

    The title field that I'm searching contains a product name that may or may
    not have several words.

    The requirement is that the search should be progressive i.e. as the user
    inputs words I should always return results that contain all of the words
    entered. I also have to correct bad input like an erraneous space in the
    product name ex. "product name" instead of "productname".

    I'm wondering if there isn't an easier way to query Solr? Ideally I'd want
    to say "give me all docs that have the following text in it's titles" Is
    that possible?


    I'd really appreciate any help!


    Regards,
    Christopher Cato
  • Christopher Cato at Jul 8, 2011 at 1:04 pm
    Hi Briggs. Thanks for taking the time. I have the query nearly working now, currently this is how it looks when it matches on the title "Super Technocrane 30" and others with similar names:

    INFO: [] webapp=/solr path=/select/ params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)&qt=standard&fq=type:product+AND+language:sv} hits=3 status=0 QTime=1

    Adding another letter stops it matching:

    INFO: [] webapp=/solr path=/select/ params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)&qt=standard&fq=type:product+AND+language:sv} hits=0 status=0 QTime=0

    The field type definitions are as follows:

    <field name="title" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true"/>

    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal.
    add enablePositionIncrements=true in both the index and query
    analyzers to leave a 'gap' for more accurate phrase queries.
    -->
    <filter class="solr.StopFilterFactory"
    ignoreCase="true"
    words="stopwords.txt"
    enablePositionIncrements="true"
    />
    <filter class="solr.WordDelimiterFilterFactory"
    generateWordParts="1"
    generateNumberParts="1"
    catenateWords="1"
    catenateNumbers="1"
    catenateAll="0"
    splitOnCaseChange="1"
    preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
    ignoreCase="true"
    words="stopwords.txt"
    enablePositionIncrements="true"
    />
    <filter class="solr.WordDelimiterFilterFactory"
    generateWordParts="1"
    generateNumberParts="1"
    catenateWords="0"
    catenateNumbers="0"
    catenateAll="0"
    splitOnCaseChange="1"
    preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    </fieldType>


    There is also a type definition that is called text_ws, should I use that instead and change text to text_ws in the field definition for title?

    <!-- A text field that only splits on whitespace for exact matching of words -->
    <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    </analyzer>
    </fieldType>




    Mvh

    Christopher Cato
    Teknikchef
    -----------------------------------
    MiniMedia
    Phone: +46761927603
    www.minimedia.se

    7 jul 2011 kl. 23.16 skrev Briggs Thompson:
    Hello Christopher,

    Can you provide the exact query sent to Solr for the one word query and also
    the two word query? The field type definition for your title field would be
    useful too.

    From what I understand, Solr should be able to handle your use case. I am
    guessing it is a problem with how the field is defined assuming the query is
    correct.

    Briggs Thompson

    On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato <
    christopher.cato@minimedia.se> wrote:
    Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.

    I'm having some problems writing a query that matches a specific field on
    several words. I have implemented an AJAX search that basically takes
    whatever is in a form field and attempts to match documents. I'm not having
    much luck though. First word always matches correctly but as soon as I enter
    the second word I'm loosing matches, the third word doesn't give any matches
    at all.

    The title field that I'm searching contains a product name that may or may
    not have several words.

    The requirement is that the search should be progressive i.e. as the user
    inputs words I should always return results that contain all of the words
    entered. I also have to correct bad input like an erraneous space in the
    product name ex. "product name" instead of "productname".

    I'm wondering if there isn't an easier way to query Solr? Ideally I'd want
    to say "give me all docs that have the following text in it's titles" Is
    that possible?


    I'd really appreciate any help!


    Regards,
    Christopher Cato
  • Briggs Thompson at Jul 8, 2011 at 2:58 pm
    Hey Chris,
    Removing the ORs in each query might help narrow down the problem, but I
    suggest you run this through the query analyzer in order to see where it is
    dropping out. It is a great tool for troubleshooting issues like these.

    I see a few things here.

    - for leading wildcard queries, you should include the
    reverseWildcardFilterFactory. Check out the documentation here:
    http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory
    - Your result might get dropped out because you are trying to do wildcard
    searches on a stemmed field. Wildcard searches on a stemmed field is
    counter-intuitive because if you index "computers", it may stem to "comput",
    in which wildcard query of "computer*" would not match.
    - If you want to support stemming and wildcard searches, I suggest
    creating a copy field with an un-stemmed field type definition.

    Don't forget if you modify your field type definition, you need to
    re-index.

    In response to your question about text_ws, this is just a different field
    type definition that essentially splits on whiteSpaces. You should use that
    if that is what the desired search logic is, but it probably isn't. Check
    out the documentation on each of the tokenizers and filter factories in your
    "text" field type and see what you need and what you don't to satisfy your
    use cases.

    Hope that helps,
    Briggs Thompson

    On Fri, Jul 8, 2011 at 9:03 AM, Christopher Cato wrote:

    Hi Briggs. Thanks for taking the time. I have the query nearly working now,
    currently this is how it looks when it matches on the title "Super
    Technocrane 30" and others with similar names:

    INFO: [] webapp=/solr path=/select/
    params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)&qt=standard&fq=type:product+AND+language:sv}
    hits=3 status=0 QTime=1

    Adding another letter stops it matching:

    INFO: [] webapp=/solr path=/select/
    params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)&qt=standard&fq=type:product+AND+language:sv}
    hits=0 status=0 QTime=0

    The field type definitions are as follows:

    <field name="title" type="text" indexed="true" stored="true"
    termVectors="true" omitNorms="true"/>

    <fieldType name="text" class="solr.TextField"
    positionIncrementGap="100">
    <analyzer type="index">
    <charFilter class="solr.MappingCharFilterFactory"
    mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory"
    synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal.
    add enablePositionIncrements=true in both the index and query
    analyzers to leave a 'gap' for more accurate phrase queries.
    -->
    <filter class="solr.StopFilterFactory"
    ignoreCase="true"
    words="stopwords.txt"
    enablePositionIncrements="true"
    />
    <filter class="solr.WordDelimiterFilterFactory"
    generateWordParts="1"
    generateNumberParts="1"
    catenateWords="1"
    catenateNumbers="1"
    catenateAll="0"
    splitOnCaseChange="1"
    preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"
    protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
    <charFilter class="solr.MappingCharFilterFactory"
    mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
    ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
    ignoreCase="true"
    words="stopwords.txt"
    enablePositionIncrements="true"
    />
    <filter class="solr.WordDelimiterFilterFactory"
    generateWordParts="1"
    generateNumberParts="1"
    catenateWords="0"
    catenateNumbers="0"
    catenateAll="0"
    splitOnCaseChange="1"
    preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"
    protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    </fieldType>


    There is also a type definition that is called text_ws, should I use that
    instead and change text to text_ws in the field definition for title?

    <!-- A text field that only splits on whitespace for exact matching of
    words -->
    <fieldType name="text_ws" class="solr.TextField"
    positionIncrementGap="100">
    <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    </analyzer>
    </fieldType>




    Mvh

    Christopher Cato
    Teknikchef
    -----------------------------------
    MiniMedia
    Phone: +46761927603
    www.minimedia.se

    7 jul 2011 kl. 23.16 skrev Briggs Thompson:
    Hello Christopher,

    Can you provide the exact query sent to Solr for the one word query and also
    the two word query? The field type definition for your title field would be
    useful too.

    From what I understand, Solr should be able to handle your use case. I am
    guessing it is a problem with how the field is defined assuming the query is
    correct.

    Briggs Thompson

    On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato <
    christopher.cato@minimedia.se> wrote:
    Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.

    I'm having some problems writing a query that matches a specific field
    on
    several words. I have implemented an AJAX search that basically takes
    whatever is in a form field and attempts to match documents. I'm not
    having
    much luck though. First word always matches correctly but as soon as I
    enter
    the second word I'm loosing matches, the third word doesn't give any
    matches
    at all.

    The title field that I'm searching contains a product name that may or
    may
    not have several words.

    The requirement is that the search should be progressive i.e. as the
    user
    inputs words I should always return results that contain all of the
    words
    entered. I also have to correct bad input like an erraneous space in the
    product name ex. "product name" instead of "productname".

    I'm wondering if there isn't an easier way to query Solr? Ideally I'd
    want
    to say "give me all docs that have the following text in it's titles" Is
    that possible?


    I'd really appreciate any help!


    Regards,
    Christopher Cato
  • Christopher Cato at Jul 8, 2011 at 4:45 pm
    Hi Briggs, thanks for being patient with me!

    Yeah, I saw I had a typo there in the OR clause. Fixed it but still no perfect results.
    I'm looking at the analysis.jsp page and can't really figure it out. Feeling a bit overwhelmed by all the output. I also don't know how to check if stemming is used for the title field.

    Theoretically, given the field type I'm using and also given that "super technocrane 30" is the title of one of the docs - how would one write the query so that it finds that doc if the user searches for "super techn" or "super technocrane"? Right now it stops matching in the middle of the word "technocrane" or rather after the "r".

    Darnit, I just want to return all docs that contain the search terms either as whole words or parts of words.
    Is it possible?

    Regards,
    Christopher

    8 jul 2011 kl. 16.57 skrev Briggs Thompson:
    Hey Chris,
    Removing the ORs in each query might help narrow down the problem, but I
    suggest you run this through the query analyzer in order to see where it is
    dropping out. It is a great tool for troubleshooting issues like these.

    I see a few things here.

    - for leading wildcard queries, you should include the
    reverseWildcardFilterFactory. Check out the documentation here:
    http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory
    - Your result might get dropped out because you are trying to do wildcard
    searches on a stemmed field. Wildcard searches on a stemmed field is
    counter-intuitive because if you index "computers", it may stem to "comput",
    in which wildcard query of "computer*" would not match.
    - If you want to support stemming and wildcard searches, I suggest
    creating a copy field with an un-stemmed field type definition.

    Don't forget if you modify your field type definition, you need to
    re-index.

    In response to your question about text_ws, this is just a different field
    type definition that essentially splits on whiteSpaces. You should use that
    if that is what the desired search logic is, but it probably isn't. Check
    out the documentation on each of the tokenizers and filter factories in your
    "text" field type and see what you need and what you don't to satisfy your
    use cases.

    Hope that helps,
    Briggs Thompson


    On Fri, Jul 8, 2011 at 9:03 AM, Christopher Cato <
    christopher.cato@minimedia.se> wrote:
    Hi Briggs. Thanks for taking the time. I have the query nearly working now,
    currently this is how it looks when it matches on the title "Super
    Technocrane 30" and others with similar names:

    INFO: [] webapp=/solr path=/select/
    params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)&qt=standard&fq=type:product+AND+language:sv}
    hits=3 status=0 QTime=1

    Adding another letter stops it matching:

    INFO: [] webapp=/solr path=/select/
    params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)&qt=standard&fq=type:product+AND+language:sv}
    hits=0 status=0 QTime=0

    The field type definitions are as follows:

    <field name="title" type="text" indexed="true" stored="true"
    termVectors="true" omitNorms="true"/>

    <fieldType name="text" class="solr.TextField"
    positionIncrementGap="100">
    <analyzer type="index">
    <charFilter class="solr.MappingCharFilterFactory"
    mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory"
    synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal.
    add enablePositionIncrements=true in both the index and query
    analyzers to leave a 'gap' for more accurate phrase queries.
    -->
    <filter class="solr.StopFilterFactory"
    ignoreCase="true"
    words="stopwords.txt"
    enablePositionIncrements="true"
    />
    <filter class="solr.WordDelimiterFilterFactory"
    generateWordParts="1"
    generateNumberParts="1"
    catenateWords="1"
    catenateNumbers="1"
    catenateAll="0"
    splitOnCaseChange="1"
    preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"
    protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
    <charFilter class="solr.MappingCharFilterFactory"
    mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
    ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
    ignoreCase="true"
    words="stopwords.txt"
    enablePositionIncrements="true"
    />
    <filter class="solr.WordDelimiterFilterFactory"
    generateWordParts="1"
    generateNumberParts="1"
    catenateWords="0"
    catenateNumbers="0"
    catenateAll="0"
    splitOnCaseChange="1"
    preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"
    protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    </fieldType>


    There is also a type definition that is called text_ws, should I use that
    instead and change text to text_ws in the field definition for title?

    <!-- A text field that only splits on whitespace for exact matching of
    words -->
    <fieldType name="text_ws" class="solr.TextField"
    positionIncrementGap="100">
    <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    </analyzer>
    </fieldType>




    Mvh

    Christopher Cato
    Teknikchef
    -----------------------------------
    MiniMedia
    Phone: +46761927603
    www.minimedia.se

    7 jul 2011 kl. 23.16 skrev Briggs Thompson:
    Hello Christopher,

    Can you provide the exact query sent to Solr for the one word query and also
    the two word query? The field type definition for your title field would be
    useful too.

    From what I understand, Solr should be able to handle your use case. I am
    guessing it is a problem with how the field is defined assuming the query is
    correct.

    Briggs Thompson

    On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato <
    christopher.cato@minimedia.se> wrote:
    Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.

    I'm having some problems writing a query that matches a specific field
    on
    several words. I have implemented an AJAX search that basically takes
    whatever is in a form field and attempts to match documents. I'm not
    having
    much luck though. First word always matches correctly but as soon as I
    enter
    the second word I'm loosing matches, the third word doesn't give any
    matches
    at all.

    The title field that I'm searching contains a product name that may or
    may
    not have several words.

    The requirement is that the search should be progressive i.e. as the
    user
    inputs words I should always return results that contain all of the
    words
    entered. I also have to correct bad input like an erraneous space in the
    product name ex. "product name" instead of "productname".

    I'm wondering if there isn't an easier way to query Solr? Ideally I'd
    want
    to say "give me all docs that have the following text in it's titles" Is
    that possible?


    I'd really appreciate any help!


    Regards,
    Christopher Cato
  • Erick Erickson at Jul 8, 2011 at 5:00 pm
    Yeah, the analysis page takes a bit of getting used to, but it's well
    worth the time. Be sure to check the "verbose" box. Taking some time
    to understand what it's telling you is one of the best investments
    you'll make.

    Your "parts of words" is the issue. One approach is to use ngrams or
    edgengrams. Here's a writeup about edgengrams from Lucid:
    http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

    it's written for autosuggest, but you get the idea. If "partial" words
    could be not at the start then ngrams are a possibility....

    Your problem is one of those
    conceptually-simple-but-annoyingly-difficult-to-implement
    ones that takes far longer to fully understand/implement than
    it seems like it should.

    Best
    Erick

    On Fri, Jul 8, 2011 at 12:44 PM, Christopher Cato
    wrote:
    Hi Briggs, thanks for being patient with me!

    Yeah, I saw I had a typo there in the OR clause. Fixed it but still no perfect results.
    I'm looking at the analysis.jsp page and can't really figure it out. Feeling a bit overwhelmed by all the output. I also don't know how to check if stemming is used for the title field.

    Theoretically, given the field type I'm using and also given that "super technocrane 30" is the title of one of the docs - how would one write the query so that it finds that doc if the user searches for "super techn" or "super technocrane"? Right now it stops matching in the middle of the word "technocrane" or rather after the "r".

    Darnit, I just want to return all docs that contain the search terms either as whole words or parts of words.
    Is it possible?

    Regards,
    Christopher

    8 jul 2011 kl. 16.57 skrev Briggs Thompson:
    Hey Chris,
    Removing the ORs in each query might help narrow down the problem, but I
    suggest you run this through the query analyzer in order to see where it is
    dropping out. It is a great tool for troubleshooting issues like these.

    I see a few things here.

    - for leading wildcard queries, you should include the
    reverseWildcardFilterFactory. Check out the documentation here:
    http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory
    - Your result might get dropped out because you are trying to do wildcard
    searches on a stemmed field. Wildcard searches on a stemmed field is
    counter-intuitive because if you index "computers", it may stem to "comput",
    in which wildcard query of "computer*" would not match.
    - If you want to support stemming and wildcard searches, I suggest
    creating a copy field with an un-stemmed field type definition.

    Don't forget if you modify your field type definition, you need to
    re-index.

    In response to your question about text_ws, this is just a different field
    type definition that essentially splits on whiteSpaces. You should use that
    if that is what the desired search logic is, but it probably isn't. Check
    out the documentation on each of the tokenizers and filter factories in your
    "text" field type and see what you need and what you don't to satisfy your
    use cases.

    Hope that helps,
    Briggs Thompson


    On Fri, Jul 8, 2011 at 9:03 AM, Christopher Cato <
    christopher.cato@minimedia.se> wrote:
    Hi Briggs. Thanks for taking the time. I have the query nearly working now,
    currently this is how it looks when it matches on the title "Super
    Technocrane 30" and others with similar names:

    INFO: [] webapp=/solr path=/select/
    params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)&qt=standard&fq=type:product+AND+language:sv}
    hits=3 status=0 QTime=1

    Adding another letter stops it matching:

    INFO: [] webapp=/solr path=/select/
    params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)&qt=standard&fq=type:product+AND+language:sv}
    hits=0 status=0 QTime=0

    The field type definitions are as follows:

    <field name="title" type="text" indexed="true" stored="true"
    termVectors="true" omitNorms="true"/>

    <fieldType name="text" class="solr.TextField"
    positionIncrementGap="100">
    <analyzer type="index">
    <charFilter class="solr.MappingCharFilterFactory"
    mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory"
    synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal.
    add enablePositionIncrements=true in both the index and query
    analyzers to leave a 'gap' for more accurate phrase queries.
    -->
    <filter class="solr.StopFilterFactory"
    ignoreCase="true"
    words="stopwords.txt"
    enablePositionIncrements="true"
    />
    <filter class="solr.WordDelimiterFilterFactory"
    generateWordParts="1"
    generateNumberParts="1"
    catenateWords="1"
    catenateNumbers="1"
    catenateAll="0"
    splitOnCaseChange="1"
    preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"
    protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
    <charFilter class="solr.MappingCharFilterFactory"
    mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
    ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
    ignoreCase="true"
    words="stopwords.txt"
    enablePositionIncrements="true"
    />
    <filter class="solr.WordDelimiterFilterFactory"
    generateWordParts="1"
    generateNumberParts="1"
    catenateWords="0"
    catenateNumbers="0"
    catenateAll="0"
    splitOnCaseChange="1"
    preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"
    protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    </fieldType>


    There is also a type definition that is called text_ws, should I use that
    instead and change text to text_ws in the field definition for title?

    <!-- A text field that only splits on whitespace for exact matching of
    words -->
    <fieldType name="text_ws" class="solr.TextField"
    positionIncrementGap="100">
    <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    </analyzer>
    </fieldType>




    Mvh

    Christopher Cato
    Teknikchef
    -----------------------------------
    MiniMedia
    Phone: +46761927603
    www.minimedia.se

    7 jul 2011 kl. 23.16 skrev Briggs Thompson:
    Hello Christopher,

    Can you provide the exact query sent to Solr for the one word query and also
    the two word query? The field type definition for your title field would be
    useful too.

    From what I understand, Solr should be able to handle your use case. I am
    guessing it is a problem with how the field is defined assuming the query is
    correct.

    Briggs Thompson

    On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato <
    christopher.cato@minimedia.se> wrote:
    Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.

    I'm having some problems writing a query that matches a specific field
    on
    several words. I have implemented an AJAX search that basically takes
    whatever is in a form field and attempts to match documents. I'm not
    having
    much luck though. First word always matches correctly but as soon as I
    enter
    the second word I'm loosing matches, the third word doesn't give any
    matches
    at all.

    The title field that I'm searching contains a product name that may or
    may
    not have several words.

    The requirement is that the search should be progressive i.e. as the
    user
    inputs words I should always return results that contain all of the
    words
    entered. I also have to correct bad input like an erraneous space in the
    product name ex. "product name" instead of "productname".

    I'm wondering if there isn't an easier way to query Solr? Ideally I'd
    want
    to say "give me all docs that have the following text in it's titles" Is
    that possible?


    I'd really appreciate any help!


    Regards,
    Christopher Cato
  • Christopher Cato at Jul 8, 2011 at 5:37 pm
    Thanks for that pointer, that's really more what I want to do. And actually, EdgeNGrams is stuck somewhere in the back of my head :) Yes, simple at first thought but not as easy to implement as I have discovered.

    Well, so how do I implement something like this? I took the fieldtype declaration from that blog post, added it to my schema.xml within the fieldtypes part.

    So, if I get it all correctly, all I have to do now is to add a new field with newly added fieldtype, a copyfield from the original title field, change the query to use the new field and restart / reindex. Or am I missing something?

    //Christopher


    8 jul 2011 kl. 18.59 skrev Erick Erickson:
    Yeah, the analysis page takes a bit of getting used to, but it's well
    worth the time. Be sure to check the "verbose" box. Taking some time
    to understand what it's telling you is one of the best investments
    you'll make.

    Your "parts of words" is the issue. One approach is to use ngrams or
    edgengrams. Here's a writeup about edgengrams from Lucid:
    http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

    it's written for autosuggest, but you get the idea. If "partial" words
    could be not at the start then ngrams are a possibility....

    Your problem is one of those
    conceptually-simple-but-annoyingly-difficult-to-implement
    ones that takes far longer to fully understand/implement than
    it seems like it should.

    Best
    Erick

    On Fri, Jul 8, 2011 at 12:44 PM, Christopher Cato
    wrote:
    Hi Briggs, thanks for being patient with me!

    Yeah, I saw I had a typo there in the OR clause. Fixed it but still no perfect results.
    I'm looking at the analysis.jsp page and can't really figure it out. Feeling a bit overwhelmed by all the output. I also don't know how to check if stemming is used for the title field.

    Theoretically, given the field type I'm using and also given that "super technocrane 30" is the title of one of the docs - how would one write the query so that it finds that doc if the user searches for "super techn" or "super technocrane"? Right now it stops matching in the middle of the word "technocrane" or rather after the "r".

    Darnit, I just want to return all docs that contain the search terms either as whole words or parts of words.
    Is it possible?

    Regards,
    Christopher

    8 jul 2011 kl. 16.57 skrev Briggs Thompson:
    Hey Chris,
    Removing the ORs in each query might help narrow down the problem, but I
    suggest you run this through the query analyzer in order to see where it is
    dropping out. It is a great tool for troubleshooting issues like these.

    I see a few things here.

    - for leading wildcard queries, you should include the
    reverseWildcardFilterFactory. Check out the documentation here:
    http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory
    - Your result might get dropped out because you are trying to do wildcard
    searches on a stemmed field. Wildcard searches on a stemmed field is
    counter-intuitive because if you index "computers", it may stem to "comput",
    in which wildcard query of "computer*" would not match.
    - If you want to support stemming and wildcard searches, I suggest
    creating a copy field with an un-stemmed field type definition.

    Don't forget if you modify your field type definition, you need to
    re-index.

    In response to your question about text_ws, this is just a different field
    type definition that essentially splits on whiteSpaces. You should use that
    if that is what the desired search logic is, but it probably isn't. Check
    out the documentation on each of the tokenizers and filter factories in your
    "text" field type and see what you need and what you don't to satisfy your
    use cases.

    Hope that helps,
    Briggs Thompson


    On Fri, Jul 8, 2011 at 9:03 AM, Christopher Cato <
    christopher.cato@minimedia.se> wrote:
    Hi Briggs. Thanks for taking the time. I have the query nearly working now,
    currently this is how it looks when it matches on the title "Super
    Technocrane 30" and others with similar names:

    INFO: [] webapp=/solr path=/select/
    params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)&qt=standard&fq=type:product+AND+language:sv}
    hits=3 status=0 QTime=1

    Adding another letter stops it matching:

    INFO: [] webapp=/solr path=/select/
    params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)&qt=standard&fq=type:product+AND+language:sv}
    hits=0 status=0 QTime=0

    The field type definitions are as follows:

    <field name="title" type="text" indexed="true" stored="true"
    termVectors="true" omitNorms="true"/>

    <fieldType name="text" class="solr.TextField"
    positionIncrementGap="100">
    <analyzer type="index">
    <charFilter class="solr.MappingCharFilterFactory"
    mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory"
    synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal.
    add enablePositionIncrements=true in both the index and query
    analyzers to leave a 'gap' for more accurate phrase queries.
    -->
    <filter class="solr.StopFilterFactory"
    ignoreCase="true"
    words="stopwords.txt"
    enablePositionIncrements="true"
    />
    <filter class="solr.WordDelimiterFilterFactory"
    generateWordParts="1"
    generateNumberParts="1"
    catenateWords="1"
    catenateNumbers="1"
    catenateAll="0"
    splitOnCaseChange="1"
    preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"
    protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
    <charFilter class="solr.MappingCharFilterFactory"
    mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
    ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
    ignoreCase="true"
    words="stopwords.txt"
    enablePositionIncrements="true"
    />
    <filter class="solr.WordDelimiterFilterFactory"
    generateWordParts="1"
    generateNumberParts="1"
    catenateWords="0"
    catenateNumbers="0"
    catenateAll="0"
    splitOnCaseChange="1"
    preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"
    protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    </fieldType>


    There is also a type definition that is called text_ws, should I use that
    instead and change text to text_ws in the field definition for title?

    <!-- A text field that only splits on whitespace for exact matching of
    words -->
    <fieldType name="text_ws" class="solr.TextField"
    positionIncrementGap="100">
    <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    </analyzer>
    </fieldType>




    Mvh

    Christopher Cato
    Teknikchef
    -----------------------------------
    MiniMedia
    Phone: +46761927603
    www.minimedia.se

    7 jul 2011 kl. 23.16 skrev Briggs Thompson:
    Hello Christopher,

    Can you provide the exact query sent to Solr for the one word query and also
    the two word query? The field type definition for your title field would be
    useful too.

    From what I understand, Solr should be able to handle your use case. I am
    guessing it is a problem with how the field is defined assuming the query is
    correct.

    Briggs Thompson

    On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato <
    christopher.cato@minimedia.se> wrote:
    Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.

    I'm having some problems writing a query that matches a specific field
    on
    several words. I have implemented an AJAX search that basically takes
    whatever is in a form field and attempts to match documents. I'm not
    having
    much luck though. First word always matches correctly but as soon as I
    enter
    the second word I'm loosing matches, the third word doesn't give any
    matches
    at all.

    The title field that I'm searching contains a product name that may or
    may
    not have several words.

    The requirement is that the search should be progressive i.e. as the
    user
    inputs words I should always return results that contain all of the
    words
    entered. I also have to correct bad input like an erraneous space in the
    product name ex. "product name" instead of "productname".

    I'm wondering if there isn't an easier way to query Solr? Ideally I'd
    want
    to say "give me all docs that have the following text in it's titles" Is
    that possible?


    I'd really appreciate any help!


    Regards,
    Christopher Cato
  • Erick Erickson at Jul 8, 2011 at 7:16 pm
    Nope, that should do it (although I haven't tried that
    exact set of steps). But you do have to reindex
    from scratch....


    Best
    Erick

    On Fri, Jul 8, 2011 at 1:36 PM, Christopher Cato
    wrote:
    Thanks for that pointer, that's really more what I want to do. And actually, EdgeNGrams is stuck somewhere in the back of my head :) Yes, simple at first thought but not as easy to implement as I have discovered.

    Well, so how do I implement something like this? I took the fieldtype declaration from that blog post, added it to my schema.xml within the fieldtypes part.

    So, if I get it all correctly, all I have to do now is to add a new field with newly added fieldtype, a copyfield from the original title field, change the query to use the new field and restart / reindex. Or am I missing something?

    //Christopher


    8 jul 2011 kl. 18.59 skrev Erick Erickson:
    Yeah, the analysis page takes a bit of getting used to, but it's well
    worth the time. Be sure to check the "verbose" box. Taking some time
    to understand what it's telling you is one of the best investments
    you'll make.

    Your "parts of words" is the issue. One approach is to use ngrams or
    edgengrams. Here's a writeup about edgengrams from Lucid:
    http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

    it's written for autosuggest, but you get the idea. If "partial" words
    could be not at the start then ngrams are a possibility....

    Your problem is one of those
    conceptually-simple-but-annoyingly-difficult-to-implement
    ones that takes far longer to fully understand/implement than
    it seems like it should.

    Best
    Erick

    On Fri, Jul 8, 2011 at 12:44 PM, Christopher Cato
    wrote:
    Hi Briggs, thanks for being patient with me!

    Yeah, I saw I had a typo there in the OR clause. Fixed it but still no perfect results.
    I'm looking at the analysis.jsp page and can't really figure it out. Feeling a bit overwhelmed by all the output. I also don't know how to check if stemming is used for the title field.

    Theoretically, given the field type I'm using and also given that "super technocrane 30" is the title of one of the docs - how would one write the query so that it finds that doc if the user searches for "super techn" or "super technocrane"? Right now it stops matching in the middle of the word "technocrane" or rather after the "r".

    Darnit, I just want to return all docs that contain the search terms either as whole words or parts of words.
    Is it possible?

    Regards,
    Christopher

    8 jul 2011 kl. 16.57 skrev Briggs Thompson:
    Hey Chris,
    Removing the ORs in each query might help narrow down the problem, but I
    suggest you run this through the query analyzer in order to see where it is
    dropping out. It is a great tool for troubleshooting issues like these.

    I see a few things here.

    - for leading wildcard queries, you should include the
    reverseWildcardFilterFactory. Check out the documentation here:
    http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory
    - Your result might get dropped out because you are trying to do wildcard
    searches on a stemmed field. Wildcard searches on a stemmed field is
    counter-intuitive because if you index "computers", it may stem to "comput",
    in which wildcard query of "computer*" would not match.
    - If you want to support stemming and wildcard searches, I suggest
    creating a copy field with an un-stemmed field type definition.

    Don't forget if you modify your field type definition, you need to
    re-index.

    In response to your question about text_ws, this is just a different field
    type definition that essentially splits on whiteSpaces. You should use that
    if that is what the desired search logic is, but it probably isn't. Check
    out the documentation on each of the tokenizers and filter factories in your
    "text" field type and see what you need and what you don't to satisfy your
    use cases.

    Hope that helps,
    Briggs Thompson


    On Fri, Jul 8, 2011 at 9:03 AM, Christopher Cato <
    christopher.cato@minimedia.se> wrote:
    Hi Briggs. Thanks for taking the time. I have the query nearly working now,
    currently this is how it looks when it matches on the title "Super
    Technocrane 30" and others with similar names:

    INFO: [] webapp=/solr path=/select/
    params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)&qt=standard&fq=type:product+AND+language:sv}
    hits=3 status=0 QTime=1

    Adding another letter stops it matching:

    INFO: [] webapp=/solr path=/select/
    params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)&qt=standard&fq=type:product+AND+language:sv}
    hits=0 status=0 QTime=0

    The field type definitions are as follows:

    <field name="title" type="text" indexed="true" stored="true"
    termVectors="true" omitNorms="true"/>

    <fieldType name="text" class="solr.TextField"
    positionIncrementGap="100">
    <analyzer type="index">
    <charFilter class="solr.MappingCharFilterFactory"
    mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory"
    synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal.
    add enablePositionIncrements=true in both the index and query
    analyzers to leave a 'gap' for more accurate phrase queries.
    -->
    <filter class="solr.StopFilterFactory"
    ignoreCase="true"
    words="stopwords.txt"
    enablePositionIncrements="true"
    />
    <filter class="solr.WordDelimiterFilterFactory"
    generateWordParts="1"
    generateNumberParts="1"
    catenateWords="1"
    catenateNumbers="1"
    catenateAll="0"
    splitOnCaseChange="1"
    preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"
    protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
    <charFilter class="solr.MappingCharFilterFactory"
    mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
    ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
    ignoreCase="true"
    words="stopwords.txt"
    enablePositionIncrements="true"
    />
    <filter class="solr.WordDelimiterFilterFactory"
    generateWordParts="1"
    generateNumberParts="1"
    catenateWords="0"
    catenateNumbers="0"
    catenateAll="0"
    splitOnCaseChange="1"
    preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"
    protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    </fieldType>


    There is also a type definition that is called text_ws, should I use that
    instead and change text to text_ws in the field definition for title?

    <!-- A text field that only splits on whitespace for exact matching of
    words -->
    <fieldType name="text_ws" class="solr.TextField"
    positionIncrementGap="100">
    <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    </analyzer>
    </fieldType>




    Mvh

    Christopher Cato
    Teknikchef
    -----------------------------------
    MiniMedia
    Phone: +46761927603
    www.minimedia.se

    7 jul 2011 kl. 23.16 skrev Briggs Thompson:
    Hello Christopher,

    Can you provide the exact query sent to Solr for the one word query and also
    the two word query? The field type definition for your title field would be
    useful too.

    From what I understand, Solr should be able to handle your use case. I am
    guessing it is a problem with how the field is defined assuming the query is
    correct.

    Briggs Thompson

    On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato <
    christopher.cato@minimedia.se> wrote:
    Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.

    I'm having some problems writing a query that matches a specific field
    on
    several words. I have implemented an AJAX search that basically takes
    whatever is in a form field and attempts to match documents. I'm not
    having
    much luck though. First word always matches correctly but as soon as I
    enter
    the second word I'm loosing matches, the third word doesn't give any
    matches
    at all.

    The title field that I'm searching contains a product name that may or
    may
    not have several words.

    The requirement is that the search should be progressive i.e. as the
    user
    inputs words I should always return results that contain all of the
    words
    entered. I also have to correct bad input like an erraneous space in the
    product name ex. "product name" instead of "productname".

    I'm wondering if there isn't an easier way to query Solr? Ideally I'd
    want
    to say "give me all docs that have the following text in it's titles" Is
    that possible?


    I'd really appreciate any help!


    Regards,
    Christopher Cato
  • Christopher Cato at Jul 8, 2011 at 8:20 pm
    And don't you know, that EdgeNGram analyzer did the trick. Added the fieldtype, added a new field based on it, copyfielded the old title to it, reindexed and hey - it works brilliantly :)

    And you were right, the analysis output does make sence once it actually matches something :D

    Thanks a million!


    Mvh

    Christopher Cato
    Teknikchef
    -----------------------------------
    MiniMedia
    Phone: +46761927603
    www.minimedia.se

    8 jul 2011 kl. 21.16 skrev Erick Erickson:
    Nope, that should do it (although I haven't tried that
    exact set of steps). But you do have to reindex
    from scratch....


    Best
    Erick

    On Fri, Jul 8, 2011 at 1:36 PM, Christopher Cato
    wrote:
    Thanks for that pointer, that's really more what I want to do. And actually, EdgeNGrams is stuck somewhere in the back of my head :) Yes, simple at first thought but not as easy to implement as I have discovered.

    Well, so how do I implement something like this? I took the fieldtype declaration from that blog post, added it to my schema.xml within the fieldtypes part.

    So, if I get it all correctly, all I have to do now is to add a new field with newly added fieldtype, a copyfield from the original title field, change the query to use the new field and restart / reindex. Or am I missing something?

    //Christopher


    8 jul 2011 kl. 18.59 skrev Erick Erickson:
    Yeah, the analysis page takes a bit of getting used to, but it's well
    worth the time. Be sure to check the "verbose" box. Taking some time
    to understand what it's telling you is one of the best investments
    you'll make.

    Your "parts of words" is the issue. One approach is to use ngrams or
    edgengrams. Here's a writeup about edgengrams from Lucid:
    http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

    it's written for autosuggest, but you get the idea. If "partial" words
    could be not at the start then ngrams are a possibility....

    Your problem is one of those
    conceptually-simple-but-annoyingly-difficult-to-implement
    ones that takes far longer to fully understand/implement than
    it seems like it should.

    Best
    Erick

    On Fri, Jul 8, 2011 at 12:44 PM, Christopher Cato
    wrote:
    Hi Briggs, thanks for being patient with me!

    Yeah, I saw I had a typo there in the OR clause. Fixed it but still no perfect results.
    I'm looking at the analysis.jsp page and can't really figure it out. Feeling a bit overwhelmed by all the output. I also don't know how to check if stemming is used for the title field.

    Theoretically, given the field type I'm using and also given that "super technocrane 30" is the title of one of the docs - how would one write the query so that it finds that doc if the user searches for "super techn" or "super technocrane"? Right now it stops matching in the middle of the word "technocrane" or rather after the "r".

    Darnit, I just want to return all docs that contain the search terms either as whole words or parts of words.
    Is it possible?

    Regards,
    Christopher

    8 jul 2011 kl. 16.57 skrev Briggs Thompson:
    Hey Chris,
    Removing the ORs in each query might help narrow down the problem, but I
    suggest you run this through the query analyzer in order to see where it is
    dropping out. It is a great tool for troubleshooting issues like these.

    I see a few things here.

    - for leading wildcard queries, you should include the
    reverseWildcardFilterFactory. Check out the documentation here:
    http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory
    - Your result might get dropped out because you are trying to do wildcard
    searches on a stemmed field. Wildcard searches on a stemmed field is
    counter-intuitive because if you index "computers", it may stem to "comput",
    in which wildcard query of "computer*" would not match.
    - If you want to support stemming and wildcard searches, I suggest
    creating a copy field with an un-stemmed field type definition.

    Don't forget if you modify your field type definition, you need to
    re-index.

    In response to your question about text_ws, this is just a different field
    type definition that essentially splits on whiteSpaces. You should use that
    if that is what the desired search logic is, but it probably isn't. Check
    out the documentation on each of the tokenizers and filter factories in your
    "text" field type and see what you need and what you don't to satisfy your
    use cases.

    Hope that helps,
    Briggs Thompson


    On Fri, Jul 8, 2011 at 9:03 AM, Christopher Cato <
    christopher.cato@minimedia.se> wrote:
    Hi Briggs. Thanks for taking the time. I have the query nearly working now,
    currently this is how it looks when it matches on the title "Super
    Technocrane 30" and others with similar names:

    INFO: [] webapp=/solr path=/select/
    params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)&qt=standard&fq=type:product+AND+language:sv}
    hits=3 status=0 QTime=1

    Adding another letter stops it matching:

    INFO: [] webapp=/solr path=/select/
    params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)&qt=standard&fq=type:product+AND+language:sv}
    hits=0 status=0 QTime=0

    The field type definitions are as follows:

    <field name="title" type="text" indexed="true" stored="true"
    termVectors="true" omitNorms="true"/>

    <fieldType name="text" class="solr.TextField"
    positionIncrementGap="100">
    <analyzer type="index">
    <charFilter class="solr.MappingCharFilterFactory"
    mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory"
    synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal.
    add enablePositionIncrements=true in both the index and query
    analyzers to leave a 'gap' for more accurate phrase queries.
    -->
    <filter class="solr.StopFilterFactory"
    ignoreCase="true"
    words="stopwords.txt"
    enablePositionIncrements="true"
    />
    <filter class="solr.WordDelimiterFilterFactory"
    generateWordParts="1"
    generateNumberParts="1"
    catenateWords="1"
    catenateNumbers="1"
    catenateAll="0"
    splitOnCaseChange="1"
    preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"
    protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
    <charFilter class="solr.MappingCharFilterFactory"
    mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
    ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
    ignoreCase="true"
    words="stopwords.txt"
    enablePositionIncrements="true"
    />
    <filter class="solr.WordDelimiterFilterFactory"
    generateWordParts="1"
    generateNumberParts="1"
    catenateWords="0"
    catenateNumbers="0"
    catenateAll="0"
    splitOnCaseChange="1"
    preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"
    protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    </fieldType>


    There is also a type definition that is called text_ws, should I use that
    instead and change text to text_ws in the field definition for title?

    <!-- A text field that only splits on whitespace for exact matching of
    words -->
    <fieldType name="text_ws" class="solr.TextField"
    positionIncrementGap="100">
    <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    </analyzer>
    </fieldType>




    Mvh

    Christopher Cato
    Teknikchef
    -----------------------------------
    MiniMedia
    Phone: +46761927603
    www.minimedia.se

    7 jul 2011 kl. 23.16 skrev Briggs Thompson:
    Hello Christopher,

    Can you provide the exact query sent to Solr for the one word query and also
    the two word query? The field type definition for your title field would be
    useful too.

    From what I understand, Solr should be able to handle your use case. I am
    guessing it is a problem with how the field is defined assuming the query is
    correct.

    Briggs Thompson

    On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato <
    christopher.cato@minimedia.se> wrote:
    Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.

    I'm having some problems writing a query that matches a specific field
    on
    several words. I have implemented an AJAX search that basically takes
    whatever is in a form field and attempts to match documents. I'm not
    having
    much luck though. First word always matches correctly but as soon as I
    enter
    the second word I'm loosing matches, the third word doesn't give any
    matches
    at all.

    The title field that I'm searching contains a product name that may or
    may
    not have several words.

    The requirement is that the search should be progressive i.e. as the
    user
    inputs words I should always return results that contain all of the
    words
    entered. I also have to correct bad input like an erraneous space in the
    product name ex. "product name" instead of "productname".

    I'm wondering if there isn't an easier way to query Solr? Ideally I'd
    want
    to say "give me all docs that have the following text in it's titles" Is
    that possible?


    I'd really appreciate any help!


    Regards,
    Christopher Cato

Related Discussions

Discussion Navigation
viewthread | post