FAQ
According to the documentation for Lucene's Query Parser Syntax, the
tilde operator provides a proximity search.

For instance, "Harry Hallows"~6 should match the text 'Harry Potter
and the Deathly Hallows'.

And while this is fine for two token phrases, I was wondering if it
worked well for subexpressions:

( ( "alien" "ufo" "crash site" ) ( "Roswell" "NM" "New Mexico" ) ) ~10

Will this match "alien ship crashed in Roswell" in addition to "ufo
spotted hovering over New Mexico"?

I've been searching through the Lucene groups
(http://www.gossamer-threads.com/lists/lucene/java-user/) for span and
proximity, but have come up with nothing about such syntax or
capability.

Additionally, I tried entering expressions containing a tilde into
Luke and then doing an explain structure.

Surprisingly, I didn't see a difference between a simple case like "A
B" and "A B"~10 in the explanation.

Is this a problem with Luke, or did I possibly miss something trivial?

Anyhow, the more important question is: Is it possible to do proximity
searches on lists of terms?

Thanks,
-Walt Stoneburner, <[email protected]>
http://www.wwco.com/~wls/blog/

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Search Discussions

  • Mark Miller at May 9, 2007 at 4:04 pm
    The ~ syntax can only be applied to a single phrase, i.e. "the greate
    one"~3. This sets the slop allowed for the phrase to be 3. The defintion of
    slop can be found by searching the archive, but it will not easily allow for
    what you are looking for. The Lucene query language has not yet recived Span
    systax support. You might want to check out Surround in contrib which is an
    alternate syntax with support for Spans. I also have a parser that adds
    proximity support...I am currently sitting on version 1.0 waiting for some
    free time to release. My syntax consists of binary operators including a
    proximity operator. Precedence works as expected (being a binary op sytax)
    and a search for (alien ufo crash site ) ~10 (Roswell NM New Mexico ) would
    make sure that each term on the left is within 10 of each term on the right
    (if you default space operator was AND).

    -Mark

    On 5/9/07, Walt Stoneburner wrote:

    According to the documentation for Lucene's Query Parser Syntax, the
    tilde operator provides a proximity search.

    For instance, "Harry Hallows"~6 should match the text 'Harry Potter
    and the Deathly Hallows'.

    And while this is fine for two token phrases, I was wondering if it
    worked well for subexpressions:

    ( ( "alien" "ufo" "crash site" ) ( "Roswell" "NM" "New Mexico" ) ) ~10

    Will this match "alien ship crashed in Roswell" in addition to "ufo
    spotted hovering over New Mexico"?

    I've been searching through the Lucene groups
    (http://www.gossamer-threads.com/lists/lucene/java-user/) for span and
    proximity, but have come up with nothing about such syntax or
    capability.

    Additionally, I tried entering expressions containing a tilde into
    Luke and then doing an explain structure.

    Surprisingly, I didn't see a difference between a simple case like "A
    B" and "A B"~10 in the explanation.

    Is this a problem with Luke, or did I possibly miss something trivial?

    Anyhow, the more important question is: Is it possible to do proximity
    searches on lists of terms?

    Thanks,
    -Walt Stoneburner, <[email protected]>
    http://www.wwco.com/~wls/blog/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 9, '07 at 1:18p
activeMay 9, '07 at 4:04p
posts2
users2
websitelucene.apache.org

2 users in discussion

Mark Miller: 1 post Walt Stoneburner: 1 post

People

Translate

site design / logo © 2023 Grokbase