FAQ
Hi all.

I have been using the new query parser framework fairly heavily,
although our use case is largely for *generating* queries rather than
parsing them - the intermediate query nodes happened to be a very good
model for doing this without all the usual nightmares of thinking
about the escape syntax, and without having to think about how each
query is encoded, which is the usual drawback of using Query objects
directly.

But I have some questions.

1. Is it intentional that query nodes do not implement equals()? I
had rather a lot of overhead when writing unit tests due to being
unable to use it - it's either (a) define a Matcher for every single
QueryNode class, or (b) toString() it and perform some sanitisation
(which is what we're doing.)

2. Is there a plan to introduce a QuerySyntaxFormatter interface as
a counterpart to QuerySyntaxParser, for generating the same query
format using the nodes that would have been generated when parsing it
(obviously with a small change in format in some situations)?

3. I have been parsing a lot of boolean queries, and have noticed
that there is *always* a GroupQueryNode around any BooleanQueryNode.
Is this really required, given that BooleanQueryNode is already
implicitly a grouping type of query?

4. If GroupQueryNode is specifically a cue to whether the user
specified parentheses or not (i.e. if it is supposed to be cosmetic,
for the purposes of getting back to what the user typed in) then why
is it that "tag:a tag:b" and "tag:(a b)" both parse to the same node
structure (making it impossible to figure out which the user actually
used)?

Daniel



--
Daniel Noll Forensic and eDiscovery Software
Senior Developer The world's most advanced
Nuix email data analysis
http://nuix.com/ and eDiscovery software

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Adriano Crestani at May 3, 2010 at 5:12 am
    Hi Daniel,

    1. Is it intentional that query nodes do not implement equals()? I
    had rather a lot of overhead when writing unit tests due to being
    unable to use it - it's either (a) define a Matcher for every single
    QueryNode class, or (b) toString() it and perform some sanitisation
    (which is what we're doing.)

    Good point! QueryNode(s) are data objects, and it makes sense to override
    their equals method. But before, we need to define what is a QueryNode
    equality. Should two nodes be considered equal if they represent
    syntactically or semantically the same query? e.g. an ORQueryNode created
    from the query <a OR b OR c> will not have the same children ordering as the
    query <b OR c OR a>, so they are syntactically not equal, but they are
    semantically equal, because the order of the OR operands (usually) does not
    matter when the query is executed. I say it usually does not matter, because
    it's up to the Query object implementation built from that ORQueryNode
    object, for this reason, I vote for defining that two query nodes should be
    equals if they are syntactically equal.

    I also vote for excluding query node tags from the equality check, because
    they are not meant to represent the query structure, but to attach extra
    info to the node, which is usually used for communication between
    processors.


    2. Is there a plan to introduce a QuerySyntaxFormatter interface as
    a counterpart to QuerySyntaxParser, for generating the same query
    format using the nodes that would have been generated when parsing it
    (obviously with a small change in format in some situations)?

    I actually never liked how QueryNode -> query string is done today, using
    QueryNode.toQueryString(...) method. A QueryNode shouldn't be responsible
    for converting itself back to the string format, because different
    SyntaxParser(s) may create, e.g., an ORQueryNode from a <OR(a, b)> or <a OR
    b> syntax, so what should orQueryNode.toQueryString(...) return? So a
    QuerySyntaxFormatter makes sense, now we need to start working on how this
    interface should look like, so SyntaxParser implementors can start
    implementing equivalent QuerySyntaxFormatter(s).

    3. I have been parsing a lot of boolean queries, and have noticed
    that there is *always* a GroupQueryNode around any BooleanQueryNode.
    Is this really required, given that BooleanQueryNode is already
    implicitly a grouping type of query?

    4. If GroupQueryNode is specifically a cue to whether the user
    specified parentheses or not (i.e. if it is supposed to be cosmetic,
    for the purposes of getting back to what the user typed in) then why
    is it that "tag:a tag:b" and "tag:(a b)" both parse to the same node
    structure (making it impossible to figure out which the user actually
    used)?

    Yes, it's created when parentheses are defined. The standard query
    processors needs to know where parentheses were typed, so they can enforce
    Lucene operator precedence, which is not that trivial and rely on some
    conditions on whether the user typed or not the parentheses.

    StandardSyntaxParser generate <tag:a tag:b> and <tag:(a b)> different query
    node trees for these two queries, one with GroupQueryNode and the other
    without. However, after the query node tree is sent through the
    StandardQueryNodeProcessorPipeline, the query node tree is optimized and
    usually GroupQueryNode(s) are removed.

    Best Regards,
    Adriano Crestani
    On Sun, May 2, 2010 at 7:47 PM, Daniel Noll wrote:

    Hi all.

    I have been using the new query parser framework fairly heavily,
    although our use case is largely for *generating* queries rather than
    parsing them - the intermediate query nodes happened to be a very good
    model for doing this without all the usual nightmares of thinking
    about the escape syntax, and without having to think about how each
    query is encoded, which is the usual drawback of using Query objects
    directly.

    But I have some questions.

    1. Is it intentional that query nodes do not implement equals()? I
    had rather a lot of overhead when writing unit tests due to being
    unable to use it - it's either (a) define a Matcher for every single
    QueryNode class, or (b) toString() it and perform some sanitisation
    (which is what we're doing.)

    2. Is there a plan to introduce a QuerySyntaxFormatter interface as
    a counterpart to QuerySyntaxParser, for generating the same query
    format using the nodes that would have been generated when parsing it
    (obviously with a small change in format in some situations)?

    3. I have been parsing a lot of boolean queries, and have noticed
    that there is *always* a GroupQueryNode around any BooleanQueryNode.
    Is this really required, given that BooleanQueryNode is already
    implicitly a grouping type of query?

    4. If GroupQueryNode is specifically a cue to whether the user
    specified parentheses or not (i.e. if it is supposed to be cosmetic,
    for the purposes of getting back to what the user typed in) then why
    is it that "tag:a tag:b" and "tag:(a b)" both parse to the same node
    structure (making it impossible to figure out which the user actually
    used)?

    Daniel



    --
    Daniel Noll Forensic and eDiscovery Software
    Senior Developer The world's most advanced
    Nuix email data analysis
    http://nuix.com/ and eDiscovery software

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Daniel Noll at May 4, 2010 at 1:27 am

    On Mon, May 3, 2010 at 15:11, Adriano Crestani wrote:
    I actually never liked how QueryNode -> query string is done today, using
    QueryNode.toQueryString(...) method. A QueryNode shouldn't be responsible
    for converting itself back to the string format, because different
    SyntaxParser(s) may create, e.g., an ORQueryNode from a <OR(a, b)> or <a OR
    b> syntax, so what should orQueryNode.toQueryString(...) return? So a
    QuerySyntaxFormatter makes sense, now we need to start working on how this
    interface should look like, so SyntaxParser implementors can start
    implementing equivalent QuerySyntaxFormatter(s).
    Essentially I have started doing this for the few queries we are
    already building programmatically (full support isn't in there yet for
    anything a user might type in though.)

    The interface itself is dead simple:

    public interface SyntaxFormatter {
    CharSequence format(QueryNode node, CharSequence field);
    }

    Internal to our particular implementation I have a
    PartialQueryFormatter<N extends QueryNode> interface which I implement
    for each type of query and have been slowly building these up. Most
    of the tricky implementation has been making it spit out an
    aesthetically pleasing format, and what is aesthetically pleasing to
    people will wildly differ so I'm imagining that any future
    StandardSyntaxFormatter which appears in Lucene will have options for
    a bunch of things (e.g. do you prefer to group booleans under a single
    field or not, do you put spaces inside parentheses, do you use + style
    booleans or OR/AND style, ...)
    3. I have been parsing a lot of boolean queries, and have noticed
    that there is *always* a GroupQueryNode around any BooleanQueryNode.
    Is this really required, given that BooleanQueryNode is already
    implicitly a grouping type of query?

    4. If GroupQueryNode is specifically a cue to whether the user
    specified parentheses or not (i.e. if it is supposed to be cosmetic,
    for the purposes of getting back to what the user typed in) then why
    is it that "tag:a tag:b" and "tag:(a b)" both parse to the same node
    structure (making it impossible to figure out which the user actually
    used)?

    Yes, it's created when parentheses are defined. The standard query
    processors needs to know where parentheses were typed, so they can enforce
    Lucene operator precedence, which is not that trivial and rely on some
    conditions on whether the user typed or not the parentheses.
    I see, so from my perspective where I am manually creating an
    OrQueryNode - the node is already a group so I didn't insert any
    GroupQueryNode. And if I understand correctly, not inserting one
    isn't actually a problem either (correct formatting code has to
    generate the right parentheses whether it came from the user or not.)
    StandardSyntaxParser generate <tag:a tag:b> and <tag:(a b)> different query
    node trees for these two queries, one with GroupQueryNode and the other
    without. However, after the query node tree is sent through the
    StandardQueryNodeProcessorPipeline, the query node tree is optimized and
    usually GroupQueryNode(s) are removed.
    Aha. That explains why I had to write my own little piece of code to
    strip them out again, because my code doesn't go through the rest of
    the pipeline.

    It doesn't explain why these two queries generate the same node tree, however:

    tag:a AND (tag:b OR tag:c)

    tag:a AND tag:(b OR c)

    For me these both parse with a "group" around the "or" node. This is
    probably fine anyway, as I don't really want to encourage the former
    way of formatting it as the latter is more concise. Actually it could
    even be...

    tag:(a AND (b OR c))

    But I don't think my formatting logic is quite smart enough for that yet.

    Daniel


    --
    Daniel Noll Forensic and eDiscovery Software
    Senior Developer The world's most advanced
    Nuix email data analysis
    http://nuix.com/ and eDiscovery software

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 2, '10 at 11:47p
activeMay 4, '10 at 1:27a
posts3
users2
websitelucene.apache.org

2 users in discussion

Daniel Noll: 2 posts Adriano Crestani: 1 post

People

Translate

site design / logo © 2022 Grokbase