Grokbase Groups Lucene dev April 2010
FAQ
Hey everyone,

My organization uses our own homebrew QueryParser class, unrelated to
Lucene's JavaCC-based QueryParser, to parse our queries. We don't currently
use anything from Solr. Our QueryParser class has gotten quite cumbersome,
and I'm looking into alternatives. Grammar-based parsing seems like the way
to go, but I've got some questions:

- ANTLR seems to be very well-supported and well-liked, but I see that
Lucene's QueryParser and StandardTokenizer use JavaCC. Does anyone have
experience writing a Lucene or Solr parser using ANTLR? Any thoughts on
whether it would be helpful to stick with JavaCC, or problematic to use
ANTLR, in light of Lucene's default usage of JavaCC?
- Any experience using ANTLR for tokenization?
- I was told that Solr might be componentizing its query parsing in such a
way that we might be able to use that instead of a homebrew grammar-based
solution. However, I haven't found anything written about that. I don't know
much about Solr's query parsing, other than what I saw looking at
QParser.java and QParserPlugin.java: it seems that one can plug in any
parser needed. That doesn't really help us, as our goal is to simplify our
parsing logic. Is there any way to structure our query parsing logic without
needing to write a grammar from scratch, whether it's a Solr component or
something else?

In a nutshell, I'm trying to get a sense of the best practices in this
situation (namely, custom query parsing that's getting very complex) before
I dive into implementing a solution.

Thanks!
Tavi

Search Discussions

  • Earwin Burrfoot at Apr 27, 2010 at 9:30 am
    We use ANTLR for query parsing. Works good for the lazy guys :)
    On Tue, Apr 27, 2010 at 06:17, Tavi Nathanson wrote:
    Hey everyone,

    My organization uses our own homebrew QueryParser class, unrelated to
    Lucene's JavaCC-based QueryParser, to parse our queries. We don't currently
    use anything from Solr. Our QueryParser class has gotten quite cumbersome,
    and I'm looking into alternatives. Grammar-based parsing seems like the way
    to go, but I've got some questions:

    - ANTLR seems to be very well-supported and well-liked, but I see that
    Lucene's QueryParser and StandardTokenizer use JavaCC. Does anyone have
    experience writing a Lucene or Solr parser using ANTLR? Any thoughts on
    whether it would be helpful to stick with JavaCC, or problematic to use
    ANTLR, in light of Lucene's default usage of JavaCC?
    - Any experience using ANTLR for tokenization?
    - I was told that Solr might be componentizing its query parsing in such a
    way that we might be able to use that instead of a homebrew grammar-based
    solution. However, I haven't found anything written about that. I don't know
    much about Solr's query parsing, other than what I saw looking at
    QParser.java and QParserPlugin.java: it seems that one can plug in any
    parser needed. That doesn't really help us, as our goal is to simplify our
    parsing logic. Is there any way to structure our query parsing logic without
    needing to write a grammar from scratch, whether it's a Solr component or
    something else?

    In a nutshell, I'm trying to get a sense of the best practices in this
    situation (namely, custom query parsing that's getting very complex) before
    I dive into implementing a solution.

    Thanks!
    Tavi


    --
    Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
    Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
    ICQ: 104465785

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedApr 27, '10 at 2:17a
activeApr 27, '10 at 9:30a
posts2
users2
websitelucene.apache.org

2 users in discussion

Tavi Nathanson: 1 post Earwin Burrfoot: 1 post

People

Translate

site design / logo © 2022 Grokbase