FAQ
Hi All

I had a requirement to implement queries that involves phrase proximity.
like user should be able to search "ab cd" w/5 "de fg", both phrases as
whole should be with in 5 words of each other. For this I implement a query
parser that make use of nested span queries, so above query would be parsed
as

spanNear([spanNear([Contents:ab, Contents:cd], 0, true),
spanNear([Contents:de, Contents:fg], 0, true)], 5, false)

Queries like this seems to work really good when phrases are small but when
phrases are large this doesn't work fine. Now my question, Is there any
limitation of SpanNearQuery. that we cannot handle large phrases in this
way?

please help

Regards
Ahsan

Search Discussions

  • Otis Gospodnetic at Feb 23, 2011 at 6:50 pm
    Hi,

    What do you mean by "this doesn't work fine"? Does it not work correctly or is
    it slow or ...

    I was going to suggest you look at Surround QP, but it looks like you already
    did that. Wouldn't it be better to get Surround QP to work?

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Ahsan |qbal <ahsan.iqbal023@gmail.com>
    To: solr-user@lucene.apache.org
    Sent: Tue, February 22, 2011 10:59:26 AM
    Subject: Question about Nested Span Near Query

    Hi All

    I had a requirement to implement queries that involves phrase proximity.
    like user should be able to search "ab cd" w/5 "de fg", both phrases as
    whole should be with in 5 words of each other. For this I implement a query
    parser that make use of nested span queries, so above query would be parsed
    as

    spanNear([spanNear([Contents:ab, Contents:cd], 0, true),
    spanNear([Contents:de, Contents:fg], 0, true)], 5, false)

    Queries like this seems to work really good when phrases are small but when
    phrases are large this doesn't work fine. Now my question, Is there any
    limitation of SpanNearQuery. that we cannot handle large phrases in this
    way?

    please help

    Regards
    Ahsan
  • Ahsan |qbal at Feb 24, 2011 at 5:26 am
    Hi

    It didn't search.. (means no results found even results exist) one
    observation is that it works well even in the long phrases but when the long
    phrases contain stop words and same stop word exist two or more time in the
    phrase then, solr can't search with query parsed in this way.

    On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic wrote:

    Hi,

    What do you mean by "this doesn't work fine"? Does it not work correctly
    or is
    it slow or ...

    I was going to suggest you look at Surround QP, but it looks like you
    already
    did that. Wouldn't it be better to get Surround QP to work?

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Ahsan |qbal <ahsan.iqbal023@gmail.com>
    To: solr-user@lucene.apache.org
    Sent: Tue, February 22, 2011 10:59:26 AM
    Subject: Question about Nested Span Near Query

    Hi All

    I had a requirement to implement queries that involves phrase proximity.
    like user should be able to search "ab cd" w/5 "de fg", both phrases as
    whole should be with in 5 words of each other. For this I implement a query
    parser that make use of nested span queries, so above query would be parsed
    as

    spanNear([spanNear([Contents:ab, Contents:cd], 0, true),
    spanNear([Contents:de, Contents:fg], 0, true)], 5, false)

    Queries like this seems to work really good when phrases are small but when
    phrases are large this doesn't work fine. Now my question, Is there any
    limitation of SpanNearQuery. that we cannot handle large phrases in this
    way?

    please help

    Regards
    Ahsan
  • Ahsan |qbal at Feb 24, 2011 at 2:27 pm
    Hi

    To narrow down the issue I indexed a single document with one of the sample
    queries (given below) which was giving issue.

    *"evaluation of loan and lease portfolios for purposes of assessing the
    adequacy of" *

    Now when i Perform a search query (*TextContents:"evaluation of loan and
    lease portfolios for purposes of assessing the adequacy of"*) the parsed
    query is

    *spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([Contents:evaluation,
    Contents:of], 0, true), Contents:loan], 0, true), Contents:and], 0, true),
    Contents:lease], 0, true), Contents:portfolios], 0, true), Contents:for], 0,
    true), Contents:purposes], 0, true), Contents:of], 0, true),
    Contents:assessing], 0, true), Contents:the], 0, true), Contents:adequacy],
    0, true), Contents:of], 0, true)*

    and search is not successful.

    If I remove '*evaluation*' from start OR *'assessing the adequacy of*' from
    end it works fine. Issue seems to come on relatively long phrases but I have
    not been able to find a pattern and its really mind boggling coz I thought
    this issue might be due to large position list but this is a single document
    with one phrase. So its definitely not related to size of index.

    Any ideas whats going on??
    On Thu, Feb 24, 2011 at 10:25 AM, Ahsan |qbal wrote:

    Hi

    It didn't search.. (means no results found even results exist) one
    observation is that it works well even in the long phrases but when the long
    phrases contain stop words and same stop word exist two or more time in the
    phrase then, solr can't search with query parsed in this way.


    On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic <
    otis_gospodnetic@yahoo.com> wrote:
    Hi,

    What do you mean by "this doesn't work fine"? Does it not work correctly
    or is
    it slow or ...

    I was going to suggest you look at Surround QP, but it looks like you
    already
    did that. Wouldn't it be better to get Surround QP to work?

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Ahsan |qbal <ahsan.iqbal023@gmail.com>
    To: solr-user@lucene.apache.org
    Sent: Tue, February 22, 2011 10:59:26 AM
    Subject: Question about Nested Span Near Query

    Hi All

    I had a requirement to implement queries that involves phrase
    proximity.
    like user should be able to search "ab cd" w/5 "de fg", both phrases as
    whole should be with in 5 words of each other. For this I implement a query
    parser that make use of nested span queries, so above query would be parsed
    as

    spanNear([spanNear([Contents:ab, Contents:cd], 0, true),
    spanNear([Contents:de, Contents:fg], 0, true)], 5, false)

    Queries like this seems to work really good when phrases are small but when
    phrases are large this doesn't work fine. Now my question, Is there any
    limitation of SpanNearQuery. that we cannot handle large phrases in this
    way?

    please help

    Regards
    Ahsan
  • Bill Bell at Feb 24, 2011 at 3:25 pm
    Send schema and document in XML format and I'll look at it

    Bill Bell
    Sent from mobile

    On Feb 24, 2011, at 7:26 AM, "Ahsan |qbal" wrote:

    Hi

    To narrow down the issue I indexed a single document with one of the sample
    queries (given below) which was giving issue.

    *"evaluation of loan and lease portfolios for purposes of assessing the
    adequacy of" *

    Now when i Perform a search query (*TextContents:"evaluation of loan and
    lease portfolios for purposes of assessing the adequacy of"*) the parsed
    query is

    *spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([Contents:evaluation,
    Contents:of], 0, true), Contents:loan], 0, true), Contents:and], 0, true),
    Contents:lease], 0, true), Contents:portfolios], 0, true), Contents:for], 0,
    true), Contents:purposes], 0, true), Contents:of], 0, true),
    Contents:assessing], 0, true), Contents:the], 0, true), Contents:adequacy],
    0, true), Contents:of], 0, true)*

    and search is not successful.

    If I remove '*evaluation*' from start OR *'assessing the adequacy of*' from
    end it works fine. Issue seems to come on relatively long phrases but I have
    not been able to find a pattern and its really mind boggling coz I thought
    this issue might be due to large position list but this is a single document
    with one phrase. So its definitely not related to size of index.

    Any ideas whats going on??
    On Thu, Feb 24, 2011 at 10:25 AM, Ahsan |qbal wrote:

    Hi

    It didn't search.. (means no results found even results exist) one
    observation is that it works well even in the long phrases but when the long
    phrases contain stop words and same stop word exist two or more time in the
    phrase then, solr can't search with query parsed in this way.


    On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic <
    otis_gospodnetic@yahoo.com> wrote:
    Hi,

    What do you mean by "this doesn't work fine"? Does it not work correctly
    or is
    it slow or ...

    I was going to suggest you look at Surround QP, but it looks like you
    already
    did that. Wouldn't it be better to get Surround QP to work?

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Ahsan |qbal <ahsan.iqbal023@gmail.com>
    To: solr-user@lucene.apache.org
    Sent: Tue, February 22, 2011 10:59:26 AM
    Subject: Question about Nested Span Near Query

    Hi All

    I had a requirement to implement queries that involves phrase
    proximity.
    like user should be able to search "ab cd" w/5 "de fg", both phrases as
    whole should be with in 5 words of each other. For this I implement a query
    parser that make use of nested span queries, so above query would be parsed
    as

    spanNear([spanNear([Contents:ab, Contents:cd], 0, true),
    spanNear([Contents:de, Contents:fg], 0, true)], 5, false)

    Queries like this seems to work really good when phrases are small but when
    phrases are large this doesn't work fine. Now my question, Is there any
    limitation of SpanNearQuery. that we cannot handle large phrases in this
    way?

    please help

    Regards
    Ahsan
  • Ahsan |qbal at Feb 24, 2011 at 3:58 pm
    Hi

    schema and document are attached.
    On Thu, Feb 24, 2011 at 8:24 PM, Bill Bell wrote:

    Send schema and document in XML format and I'll look at it

    Bill Bell
    Sent from mobile

    On Feb 24, 2011, at 7:26 AM, "Ahsan |qbal" wrote:

    Hi

    To narrow down the issue I indexed a single document with one of the sample
    queries (given below) which was giving issue.

    *"evaluation of loan and lease portfolios for purposes of assessing the
    adequacy of" *

    Now when i Perform a search query (*TextContents:"evaluation of loan and
    lease portfolios for purposes of assessing the adequacy of"*) the parsed
    query is

    *spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([Contents:evaluation,
    Contents:of], 0, true), Contents:loan], 0, true), Contents:and], 0, true),
    Contents:lease], 0, true), Contents:portfolios], 0, true), Contents:for], 0,
    true), Contents:purposes], 0, true), Contents:of], 0, true),
    Contents:assessing], 0, true), Contents:the], 0, true),
    Contents:adequacy],
    0, true), Contents:of], 0, true)*

    and search is not successful.

    If I remove '*evaluation*' from start OR *'assessing the adequacy of*' from
    end it works fine. Issue seems to come on relatively long phrases but I have
    not been able to find a pattern and its really mind boggling coz I thought
    this issue might be due to large position list but this is a single document
    with one phrase. So its definitely not related to size of index.

    Any ideas whats going on??

    On Thu, Feb 24, 2011 at 10:25 AM, Ahsan |qbal <ahsan.iqbal023@gmail.com
    wrote:
    Hi

    It didn't search.. (means no results found even results exist) one
    observation is that it works well even in the long phrases but when the
    long
    phrases contain stop words and same stop word exist two or more time in
    the
    phrase then, solr can't search with query parsed in this way.


    On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic <
    otis_gospodnetic@yahoo.com> wrote:
    Hi,

    What do you mean by "this doesn't work fine"? Does it not work
    correctly
    or is
    it slow or ...

    I was going to suggest you look at Surround QP, but it looks like you
    already
    did that. Wouldn't it be better to get Surround QP to work?

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Ahsan |qbal <ahsan.iqbal023@gmail.com>
    To: solr-user@lucene.apache.org
    Sent: Tue, February 22, 2011 10:59:26 AM
    Subject: Question about Nested Span Near Query

    Hi All

    I had a requirement to implement queries that involves phrase
    proximity.
    like user should be able to search "ab cd" w/5 "de fg", both phrases
    as
    whole should be with in 5 words of each other. For this I implement a query
    parser that make use of nested span queries, so above query would be parsed
    as

    spanNear([spanNear([Contents:ab, Contents:cd], 0, true),
    spanNear([Contents:de, Contents:fg], 0, true)], 5, false)

    Queries like this seems to work really good when phrases are small
    but
    when
    phrases are large this doesn't work fine. Now my question, Is there
    any
    limitation of SpanNearQuery. that we cannot handle large phrases in this
    way?

    please help

    Regards
    Ahsan
  • Ahsan |qbal at Feb 28, 2011 at 9:39 am
    Hi Bill

    Any update..
    On Thu, Feb 24, 2011 at 8:58 PM, Ahsan |qbal wrote:

    Hi

    schema and document are attached.

    On Thu, Feb 24, 2011 at 8:24 PM, Bill Bell wrote:

    Send schema and document in XML format and I'll look at it

    Bill Bell
    Sent from mobile


    On Feb 24, 2011, at 7:26 AM, "Ahsan |qbal" <ahsan.iqbal023@gmail.com>
    wrote:
    Hi

    To narrow down the issue I indexed a single document with one of the sample
    queries (given below) which was giving issue.

    *"evaluation of loan and lease portfolios for purposes of assessing the
    adequacy of" *

    Now when i Perform a search query (*TextContents:"evaluation of loan and
    lease portfolios for purposes of assessing the adequacy of"*) the parsed
    query is

    *spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([Contents:evaluation,
    Contents:of], 0, true), Contents:loan], 0, true), Contents:and], 0, true),
    Contents:lease], 0, true), Contents:portfolios], 0, true),
    Contents:for], 0,
    true), Contents:purposes], 0, true), Contents:of], 0, true),
    Contents:assessing], 0, true), Contents:the], 0, true),
    Contents:adequacy],
    0, true), Contents:of], 0, true)*

    and search is not successful.

    If I remove '*evaluation*' from start OR *'assessing the adequacy of*' from
    end it works fine. Issue seems to come on relatively long phrases but I have
    not been able to find a pattern and its really mind boggling coz I thought
    this issue might be due to large position list but this is a single document
    with one phrase. So its definitely not related to size of index.

    Any ideas whats going on??

    On Thu, Feb 24, 2011 at 10:25 AM, Ahsan |qbal <ahsan.iqbal023@gmail.com
    wrote:
    Hi

    It didn't search.. (means no results found even results exist) one
    observation is that it works well even in the long phrases but when the
    long
    phrases contain stop words and same stop word exist two or more time in
    the
    phrase then, solr can't search with query parsed in this way.


    On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic <
    otis_gospodnetic@yahoo.com> wrote:
    Hi,

    What do you mean by "this doesn't work fine"? Does it not work
    correctly
    or is
    it slow or ...

    I was going to suggest you look at Surround QP, but it looks like you
    already
    did that. Wouldn't it be better to get Surround QP to work?

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Ahsan |qbal <ahsan.iqbal023@gmail.com>
    To: solr-user@lucene.apache.org
    Sent: Tue, February 22, 2011 10:59:26 AM
    Subject: Question about Nested Span Near Query

    Hi All

    I had a requirement to implement queries that involves phrase
    proximity.
    like user should be able to search "ab cd" w/5 "de fg", both phrases
    as
    whole should be with in 5 words of each other. For this I implement
    a
    query
    parser that make use of nested span queries, so above query would be parsed
    as

    spanNear([spanNear([Contents:ab, Contents:cd], 0, true),
    spanNear([Contents:de, Contents:fg], 0, true)], 5, false)

    Queries like this seems to work really good when phrases are small
    but
    when
    phrases are large this doesn't work fine. Now my question, Is there
    any
    limitation of SpanNearQuery. that we cannot handle large phrases in this
    way?

    please help

    Regards
    Ahsan
  • William Bell at Mar 2, 2011 at 2:06 am
    I am not 100% sure. But I why did you not use the standard confix for "text" ?

    <fieldType name="text" class="solr.TextField"
    positionIncrementGap="100" autoGeneratePhraseQueries="true">
    <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory"
    synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal.
    add enablePositionIncrements=true in both the index and query
    analyzers to leave a 'gap' for more accurate phrase queries.
    -->
    <filter class="solr.StopFilterFactory"
    ignoreCase="true"
    words="stopwords.txt"
    enablePositionIncrements="true"
    />
    <filter class="solr.WordDelimiterFilterFactory"
    generateWordParts="1" generateNumberParts="1" catenateWords="1"
    catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory"
    protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory"
    synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
    ignoreCase="true"
    words="stopwords.txt"
    enablePositionIncrements="true"
    />
    <filter class="solr.WordDelimiterFilterFactory"
    generateWordParts="1" generateNumberParts="1" catenateWords="0"
    catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory"
    protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
    </fieldType>


    You are using:

    - <fieldtype name="text" class="solr.TextField">
    - <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"
    luceneMatchVersion="LUCENE_29" />
    <filter class="solr.StandardFilterFactory" />
    <filter class="solr.LowerCaseFilterFactory" />
    - <!--
    <filter class="solr.StopFilterFactory" luceneMatchVersion="LUCENE_29"/>
    <filter class="solr.EnglishPorterFilterFactory"/>

    -->
    </analyzer>
    </fieldtype>


    Can you try a more standard approach ?

    solr.WhitespaceTokenizerFactory
    solr.LowerCaseFilterFactory

    ??

    Thanks.

    On Mon, Feb 28, 2011 at 2:38 AM, Ahsan |qbal wrote:
    Hi Bill
    Any update..
    On Thu, Feb 24, 2011 at 8:58 PM, Ahsan |qbal wrote:

    Hi
    schema and document are attached.
    On Thu, Feb 24, 2011 at 8:24 PM, Bill Bell wrote:

    Send schema and document in XML format and I'll look at it

    Bill Bell
    Sent from mobile


    On Feb 24, 2011, at 7:26 AM, "Ahsan |qbal" <ahsan.iqbal023@gmail.com>
    wrote:
    Hi

    To narrow down the issue I indexed a single document with one of the
    sample
    queries (given below) which was giving issue.

    *"evaluation of loan and lease portfolios for purposes of assessing the
    adequacy of" *

    Now when i Perform a search query (*TextContents:"evaluation of loan
    and
    lease portfolios for purposes of assessing the adequacy of"*) the
    parsed
    query is


    *spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([Contents:evaluation,
    Contents:of], 0, true), Contents:loan], 0, true), Contents:and], 0,
    true),
    Contents:lease], 0, true), Contents:portfolios], 0, true),
    Contents:for], 0,
    true), Contents:purposes], 0, true), Contents:of], 0, true),
    Contents:assessing], 0, true), Contents:the], 0, true),
    Contents:adequacy],
    0, true), Contents:of], 0, true)*

    and search is not successful.

    If I remove '*evaluation*' from start OR *'assessing the adequacy of*'
    from
    end it works fine. Issue seems to come on relatively long phrases but I
    have
    not been able to find a pattern and its really mind boggling coz I
    thought
    this issue might be due to large position list but this is a single
    document
    with one phrase. So its definitely not related to size of index.

    Any ideas whats going on??

    On Thu, Feb 24, 2011 at 10:25 AM, Ahsan |qbal
    wrote:
    Hi

    It didn't search.. (means no results found even results exist) one
    observation is that it works well even in the long phrases but when
    the long
    phrases contain stop words and same stop word exist two or more time
    in the
    phrase then, solr can't search with query parsed in this way.


    On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic <
    otis_gospodnetic@yahoo.com> wrote:
    Hi,

    What do you mean by "this doesn't work fine"?  Does it not work
    correctly
    or is
    it slow or ...

    I was going to suggest you look at Surround QP, but it looks like you
    already
    did that.  Wouldn't it be better to get Surround QP to work?

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Ahsan |qbal <ahsan.iqbal023@gmail.com>
    To: solr-user@lucene.apache.org
    Sent: Tue, February 22, 2011 10:59:26 AM
    Subject: Question about Nested Span Near Query

    Hi All

    I had a requirement to implement queries that involves phrase
    proximity.
    like user should be able to search "ab cd" w/5 "de fg", both
    phrases as
    whole should be with in 5 words of each other. For this I  implement
    a query
    parser that make use of nested span queries, so above query  would
    be parsed
    as

    spanNear([spanNear([Contents:ab, Contents:cd], 0,  true),
    spanNear([Contents:de, Contents:fg], 0, true)], 5,  false)

    Queries like this seems to work really good when phrases are small
    but when
    phrases are large this doesn't work fine. Now my question, Is there
    any
    limitation of SpanNearQuery. that we cannot handle large phrases in this
    way?

    please help

    Regards
    Ahsan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupsolr-user @
categorieslucene
postedFeb 22, '11 at 4:00p
activeMar 2, '11 at 2:06a
posts8
users3
websitelucene.apache.org...

People

Translate

site design / logo © 2022 Grokbase