FAQ
Hi all,

I read postings about searching for empty field with but did not find any cases of successful search using query language syntax itself(-myField:[* TO *] for example). I saw that other techniques like using a filter were used to get around this syntax string limitation. Given that the latest postings on this topic were a few years old, I am wondering if there have been any changes in Lucene query syntax to support searching for empty fields. Has anyone been successfully searched for empty fields with recent Lucene releases?

Thanks

Jason

Search Discussions

  • Findbestopensource at Jul 15, 2011 at 4:53 am
    Hi Jason,

    The easiest way would be to set some default value for the field which is
    empty, Say EMPTY and search for this string to check out the records having
    empty field.

    Regards
    Aditya
    www.findbestopensource.com

    On Fri, Jul 15, 2011 at 5:32 AM, Trieu, Jason T wrote:

    Hi all,

    I read postings about searching for empty field with but did not find any
    cases of successful search using query language syntax itself(-myField:[* TO
    *] for example). I saw that other techniques like using a filter were used
    to get around this syntax string limitation. Given that the latest
    postings on this topic were a few years old, I am wondering if there have
    been any changes in Lucene query syntax to support searching for empty
    fields. Has anyone been successfully searched for empty fields with recent
    Lucene releases?

    Thanks

    Jason

  • Trejkaz at Jul 15, 2011 at 6:11 am

    On Fri, Jul 15, 2011 at 10:02 AM, Trieu, Jason T wrote:
    Hi all,

    I read postings about searching for empty field with but did not find any cases of successful search using query language syntax itself(-myField:[* TO *] for example).
    We have been using: -myField:*

    You would need to use setAllowLeadingWildcard on the QueryParser and
    it wouldn't exactly be fast, but it works for us.

    The suggestion to use a magic token is a good idea, though I would put
    it in a separate field called "has" or something... so you can do:

    has:title (same results as title:* but quicker to run)

    The crappy thing is that to actually detect if there are any tokens in
    the field you need to make a TokenStream which can be used to read the
    first token and then rewind again. I'm not sure if there is such a
    thing in Lucene at the moment. We had to write it ourselves but we
    were on a considerably older version at the time.

    TX

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Jul 15, 2011 at 6:46 am
    Hi,
    The crappy thing is that to actually detect if there are any tokens in the field
    you need to make a TokenStream which can be used to read the first token
    and then rewind again. I'm not sure if there is such a thing in Lucene at the
    moment. We had to write it ourselves but we were on a considerably older
    version at the time.
    CachingTokenFilter plugged over any other TokenStream.

    Uwe


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Trejkaz at Jul 15, 2011 at 7:00 am

    On Fri, Jul 15, 2011 at 4:45 PM, Uwe Schindler wrote:
    Hi,
    The crappy thing is that to actually detect if there are any tokens in the field
    you need to make a TokenStream which can be used to read the first token
    and then rewind again.  I'm not sure if there is such a thing in Lucene at the
    moment.  We had to write it ourselves but we were on a considerably older
    version at the time.
    CachingTokenFilter plugged over any other TokenStream.
    Ah, quite right. If you can afford the memory it will eat (or if your
    documents are all relatively small), CachingTokenFilter will work. I
    think in our case it caused OOME for larger character streams, which
    is why we ended up falling back to one which only cached the first
    token.

    TX

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJul 15, '11 at 12:03a
activeJul 15, '11 at 7:00a
posts5
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase