FAQ
Hi all,

I've been having some problems using RangeQuery. I have a simple Query which
is essentially "document.field < AB". Field values are:

"" // Empty string
"A SPACE"
"A123456"
"ABC"

Now I expected to find the first three of the four values (and I do with
another commercial search engine product I've worked with). With Lucene I
get nothing. Part of the problem I think is that there are some issues with
case here. Changing my query to "document.field < ab" returns:

"A123456"

Now I would have expected "A SPACE" to get returned, and I was really
surprised that "" wasn't returned. I'm guessing that "" wasn't returned
because no term in the field passed the query criteria, and empty string is
not considered a term.

How should I go about getting what I expect? What is going on here?

Thanks much,

James





--
To unsubscribe, e-mail:
For additional commands, e-mail:

Search Discussions

  • James Ricci at Jun 6, 2002 at 11:41 pm
    I'm replying to my own message because I think I now understand the problem,
    and part of it is, in my opinion, a bad implementation of RangedQuery.

    When you create a ranged query and omit the lower term, my expectation would
    be that I would find everything less than the upper term. Now if I pass
    false for the inclusive term, then I would expect that I would find all
    terms less than the upper term excluding the upper term itself.

    What is happening in the case of lower_term=null, upper_term=x,
    inclusive=false is that empty strings are being excluded because inclusive
    is set false, and the implementation of RangedQuery creates a default lower
    term of Term(fieldName, ""). Since it's not inclusive, it excludes "". This
    isn't what I intended, and I don't think it's what most people would imagine
    RangedQuery would do in the case I've mentioned.

    I equate lower=null, upper=x, inclusive=false to Field < x. lower=null,
    upper=x, inclusive=true would be Field <= x. In both cases, the only
    difference should be whether or not Field = x is true for the query.

    I'm still quite new to Lucene, so maybe I'm wrong about all this because I
    just don't understand it well enough. If so, could someone tell me where
    I've gone astray?

    Thanks much,

    James

    PS: The rest of the problems I had below I was able to fix by changing how
    the fields were tokenized and indexed.
    -----Original Message-----
    From: James Ricci
    Sent: Thursday, June 06, 2002 11:16 AM
    To: 'lucene-user@jakarta.apache.org'
    Subject: Question about RangeQuery and strings...

    Hi all,

    I've been having some problems using RangeQuery. I have a simple Query
    which is essentially "document.field < AB". Field values are:

    "" // Empty string
    "A SPACE"
    "A123456"
    "ABC"

    Now I expected to find the first three of the four values (and I do with
    another commercial search engine product I've worked with). With Lucene I
    get nothing. Part of the problem I think is that there are some issues
    with case here. Changing my query to "document.field < ab" returns:

    "A123456"

    Now I would have expected "A SPACE" to get returned, and I was really
    surprised that "" wasn't returned. I'm guessing that "" wasn't returned
    because no term in the field passed the query criteria, and empty string
    is not considered a term.

    How should I go about getting what I expect? What is going on here?

    Thanks much,

    James


    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:
  • Otis Gospodnetic at Jun 11, 2002 at 2:59 pm
    James,

    I haven't used RangeQueries, but what you describe does sound confusing
    to me. I'll enter it as a bug, just so this information doesn't get
    lost, because I am not certain that this is really a bug, even though
    it sounds like one to me.

    Thanks,
    Otis


    --- James Ricci wrote:
    I'm replying to my own message because I think I now understand the
    problem,
    and part of it is, in my opinion, a bad implementation of
    RangedQuery.

    When you create a ranged query and omit the lower term, my
    expectation would
    be that I would find everything less than the upper term. Now if I
    pass
    false for the inclusive term, then I would expect that I would find
    all
    terms less than the upper term excluding the upper term itself.

    What is happening in the case of lower_term=null, upper_term=x,
    inclusive=false is that empty strings are being excluded because
    inclusive
    is set false, and the implementation of RangedQuery creates a default
    lower
    term of Term(fieldName, ""). Since it's not inclusive, it excludes
    "". This
    isn't what I intended, and I don't think it's what most people would
    imagine
    RangedQuery would do in the case I've mentioned.

    I equate lower=null, upper=x, inclusive=false to Field < x.
    lower=null,
    upper=x, inclusive=true would be Field <= x. In both cases, the only
    difference should be whether or not Field = x is true for the query.

    I'm still quite new to Lucene, so maybe I'm wrong about all this
    because I
    just don't understand it well enough. If so, could someone tell me
    where
    I've gone astray?

    Thanks much,

    James

    PS: The rest of the problems I had below I was able to fix by
    changing how
    the fields were tokenized and indexed.
    -----Original Message-----
    From: James Ricci
    Sent: Thursday, June 06, 2002 11:16 AM
    To: 'lucene-user@jakarta.apache.org'
    Subject: Question about RangeQuery and strings...

    Hi all,

    I've been having some problems using RangeQuery. I have a simple Query
    which is essentially "document.field < AB". Field values are:

    "" // Empty string
    "A SPACE"
    "A123456"
    "ABC"

    Now I expected to find the first three of the four values (and I do with
    another commercial search engine product I've worked with). With Lucene I
    get nothing. Part of the problem I think is that there are some issues
    with case here. Changing my query to "document.field < ab" returns:

    "A123456"

    Now I would have expected "A SPACE" to get returned, and I was really
    surprised that "" wasn't returned. I'm guessing that "" wasn't returned
    because no term in the field passed the query criteria, and empty string
    is not considered a term.

    How should I go about getting what I expect? What is going on here?

    Thanks much,

    James


    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:

    __________________________________________________
    Do You Yahoo!?
    Yahoo! - Official partner of 2002 FIFA World Cup
    http://fifaworldcup.yahoo.com

    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 6, '02 at 3:17p
activeJun 11, '02 at 2:59p
posts3
users2
websitelucene.apache.org

2 users in discussion

James Ricci: 2 posts Otis Gospodnetic: 1 post

People

Translate

site design / logo © 2022 Grokbase