FAQ
Hi,

I'm seeing some strange behavior in the way the QueryParser handles
consecutive backslash characters. I know that backslash is the escape
character in Lucene, and so I would expect "\\\\" to match fields that
have two consecutive backslashes, but this does not seem to be the
case.

The fields I'm searching are UNC paths, e.g. "\\192.168.0.15\public".
The only way I can get my query to find the record containing that
value is to type "FieldName:\\\192.168.0.15\\public" (three slashes).
Why is the third backslash character not treated as an escape? Is it
just that any backslash that is preceded by a backslash is interpreted
as a literal backslash character, regardless of whether the "escape"
backslash was itself escaped?

I can code around this, but it seems inconsistent with the way that
escape characters usually work. Is this a bug, or is it intentional,
or am I missing something?

Thanks,
Jeff

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erik Hatcher at Jul 20, 2005 at 7:38 pm

    On Jul 19, 2005, at 11:19 AM, Jeff Davis wrote:

    Hi,

    I'm seeing some strange behavior in the way the QueryParser handles
    consecutive backslash characters. I know that backslash is the escape
    character in Lucene, and so I would expect "\\\\" to match fields that
    have two consecutive backslashes, but this does not seem to be the
    case.

    The fields I'm searching are UNC paths, e.g. "\\192.168.0.15\public".
    The only way I can get my query to find the record containing that
    value is to type "FieldName:\\\192.168.0.15\\public" (three slashes).
    Why is the third backslash character not treated as an escape? Is it
    just that any backslash that is preceded by a backslash is interpreted
    as a literal backslash character, regardless of whether the "escape"
    backslash was itself escaped?

    I can code around this, but it seems inconsistent with the way that
    escape characters usually work. Is this a bug, or is it intentional,
    or am I missing something?
    I've waited until I had a chance to experiment with this before
    replying. I say that this is a bug. There is a private method in
    QueryParser called discardEscapeChar (shown below). I copied it to a
    JUnit test case and gave it this assert:

    assertEquals("\\\\\\\\192.168.0.15\\\\public", discardEscapeChar
    ("\\\\192.168.0.15\\\\public"));

    This test fails with:

    Expected:\\\\192.168.0.15\\public
    Actual :\192.168.0.15\public

    Which is wrong in my opinion. (though my head hurts thinking about
    metaescaping backslashes in Java code to make this a proper test)

    The bug is isolated to the discardEscapeChar() method where it eats
    too many backslashes. Could you have a shot at tweaking that method
    to do the right thing and submit a patch?

    private String discardEscapeChar(String input) {
    char[] caSource = input.toCharArray();
    char[] caDest = new char[caSource.length];
    int j = 0;
    for (int i = 0; i < caSource.length; i++) {
    if ((caSource[i] != '\\') || (i > 0 && caSource[i-1] == '\\')) {
    caDest[j++]=caSource[i];
    }
    }
    return new String(caDest, 0, j);
    }

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Eyal at Jul 20, 2005 at 8:56 pm
    I think this should work:

    (Written in C# originally - so someone please check if it compiles - I don't
    have a java compiler here)

    private String discardEscapeChar(String input)
    {
    char[] caSource = input.toCharArray();
    char[] caDest = new char[caSource.length];
    int j = 0;

    for (int i = 0; i < caSource.length; i++)
    {
    if (caSource[i] == '\\')
    {
    if (caSource.length == ++i)
    break;
    }
    caDest[j++]=caSource[i];
    }
    return new String(caDest, 0, j);
    }


    Regarding your UnitTest - It think it's wrong:
    assertEquals("\\\\\\\\192.168.0.15\\\\public",
    discardEscapeChar ("\\\\192.168.0.15\\\\public"));
    It should be: assertEquals("\\\\192.168.0.15\\\\public", discardEscapeChar
    ("\\\\\\\\192.168.0.15\\\\public"));

    I would also suggest to add the following:
    String s="\\\\some.host.name\\dir+:+-!():^[]\{}~*?";
    assertEquals(s,discardEscapeChar(escape(s)));

    Eyal
    -----Original Message-----
    From: Erik Hatcher
    Sent: Wednesday, July 20, 2005 22:38 PM
    To: java-user@lucene.apache.org
    Subject: Re: QueryParser handling of backslash characters

    On Jul 19, 2005, at 11:19 AM, Jeff Davis wrote:

    Hi,

    I'm seeing some strange behavior in the way the QueryParser handles
    consecutive backslash characters. I know that backslash is
    the escape
    character in Lucene, and so I would expect "\\\\" to match
    fields that
    have two consecutive backslashes, but this does not seem to be the
    case.

    The fields I'm searching are UNC paths, e.g.
    "\\192.168.0.15\public".
    The only way I can get my query to find the record containing that
    value is to type "FieldName:\\\192.168.0.15\\public" (three slashes).
    Why is the third backslash character not treated as an
    escape? Is it
    just that any backslash that is preceded by a backslash is
    interpreted
    as a literal backslash character, regardless of whether the "escape"
    backslash was itself escaped?

    I can code around this, but it seems inconsistent with the way that
    escape characters usually work. Is this a bug, or is it
    intentional,
    or am I missing something?
    I've waited until I had a chance to experiment with this
    before replying. I say that this is a bug. There is a
    private method in QueryParser called discardEscapeChar (shown
    below). I copied it to a JUnit test case and gave it this assert:

    assertEquals("\\\\\\\\192.168.0.15\\\\public",
    discardEscapeChar ("\\\\192.168.0.15\\\\public"));

    This test fails with:

    Expected:\\\\192.168.0.15\\public
    Actual :\192.168.0.15\public

    Which is wrong in my opinion. (though my head hurts thinking
    about metaescaping backslashes in Java code to make this a
    proper test)

    The bug is isolated to the discardEscapeChar() method where
    it eats too many backslashes. Could you have a shot at
    tweaking that method to do the right thing and submit a patch?

    private String discardEscapeChar(String input) {
    char[] caSource = input.toCharArray();
    char[] caDest = new char[caSource.length];
    int j = 0;
    for (int i = 0; i < caSource.length; i++) {
    if ((caSource[i] != '\\') || (i > 0 && caSource[i-1]
    == '\\')) {
    caDest[j++]=caSource[i];
    }
    }
    return new String(caDest, 0, j);
    }

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jeff Davis at Jul 20, 2005 at 9:18 pm
    That fix works perfectly, as far as I can tell.

    As for the unit test, it should actually be:
    assertEquals("\\\\192.168.0.15\\public", discardEscapeChar
    ("\\\\\\\\192.168.0.15\\\\public"));

    Jeff

    On 7/20/05, Eyal wrote:
    I think this should work:

    (Written in C# originally - so someone please check if it compiles - I don't
    have a java compiler here)

    private String discardEscapeChar(String input)
    {
    char[] caSource = input.toCharArray();
    char[] caDest = new char[caSource.length];
    int j = 0;

    for (int i = 0; i < caSource.length; i++)
    {
    if (caSource[i] == '\\')
    {
    if (caSource.length == ++i)
    break;
    }
    caDest[j++]=caSource[i];
    }
    return new String(caDest, 0, j);
    }


    Regarding your UnitTest - It think it's wrong:
    assertEquals("\\\\\\\\192.168.0.15\\\\public",
    discardEscapeChar ("\\\\192.168.0.15\\\\public"));
    It should be: assertEquals("\\\\192.168.0.15\\\\public", discardEscapeChar
    ("\\\\\\\\192.168.0.15\\\\public"));

    I would also suggest to add the following:
    String s="\\\\some.host.name\\dir+:+-!():^[]\{}~*?";
    assertEquals(s,discardEscapeChar(escape(s)));

    Eyal
    -----Original Message-----
    From: Erik Hatcher
    Sent: Wednesday, July 20, 2005 22:38 PM
    To: java-user@lucene.apache.org
    Subject: Re: QueryParser handling of backslash characters

    On Jul 19, 2005, at 11:19 AM, Jeff Davis wrote:

    Hi,

    I'm seeing some strange behavior in the way the QueryParser handles
    consecutive backslash characters. I know that backslash is
    the escape
    character in Lucene, and so I would expect "\\\\" to match
    fields that
    have two consecutive backslashes, but this does not seem to be the
    case.

    The fields I'm searching are UNC paths, e.g.
    "\\192.168.0.15\public".
    The only way I can get my query to find the record containing that
    value is to type "FieldName:\\\192.168.0.15\\public" (three slashes).
    Why is the third backslash character not treated as an
    escape? Is it
    just that any backslash that is preceded by a backslash is
    interpreted
    as a literal backslash character, regardless of whether the "escape"
    backslash was itself escaped?

    I can code around this, but it seems inconsistent with the way that
    escape characters usually work. Is this a bug, or is it
    intentional,
    or am I missing something?
    I've waited until I had a chance to experiment with this
    before replying. I say that this is a bug. There is a
    private method in QueryParser called discardEscapeChar (shown
    below). I copied it to a JUnit test case and gave it this assert:

    assertEquals("\\\\\\\\192.168.0.15\\\\public",
    discardEscapeChar ("\\\\192.168.0.15\\\\public"));

    This test fails with:

    Expected:\\\\192.168.0.15\\public
    Actual :\192.168.0.15\public

    Which is wrong in my opinion. (though my head hurts thinking
    about metaescaping backslashes in Java code to make this a
    proper test)

    The bug is isolated to the discardEscapeChar() method where
    it eats too many backslashes. Could you have a shot at
    tweaking that method to do the right thing and submit a patch?

    private String discardEscapeChar(String input) {
    char[] caSource = input.toCharArray();
    char[] caDest = new char[caSource.length];
    int j = 0;
    for (int i = 0; i < caSource.length; i++) {
    if ((caSource[i] != '\\') || (i > 0 && caSource[i-1]
    == '\\')) {
    caDest[j++]=caSource[i];
    }
    }
    return new String(caDest, 0, j);
    }

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJul 19, '05 at 3:20p
activeJul 20, '05 at 9:18p
posts4
users3
websitelucene.apache.org

3 users in discussion

Jeff Davis: 2 posts Eyal: 1 post Erik Hatcher: 1 post

People

Translate

site design / logo © 2022 Grokbase