FAQ
Hi,

I am coding a *local pdf search engine* in Java.(If someone did it before,
could you please give some tips?) So I need query parse.

Assume I want to search for "hello user" in the document.

*Q1*. I have 4 kinds of queries in my program. They are:

1. Match Exact words or phases. e.g. "hello user", "hello users", etc. can
be the search results. Note: the words must be in right order.

2. Match Exact words or phases, but whole words only. e.g. only "hello
user" can be the search result. Note: the words must be in right order.

3. Match Any words. e.g. "hello", "user", "users", etc. can be the search
results.

4. Match Any words or phases, but whole words only. e.g. only "hello",
"user", "hello user" can be the search results.

My question: what the query expressions are for these 4 kind that i can use
it to parse?

2nd expression is: "hello user".

4th expression is: "hello" OR "user".

Correct? And what about the other 2?

*Q2*. Then I use the following code to get context of the search result.
Note: the result must be ordered by relevance.

Query q = new QueryParser(Version.LUCENE_30,"field",
analyzer).parse(expression);
QueryScorer scorer = new QueryScorer(query);
Highlighter highlighter = new Highlighter(scorer);
highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer, 500));

And there are something wrong in line 2 for the 4 kinds of queries. How can
I modify the code to get the snippet of the content?

Thanks.

Search Discussions

  • Gong Li at Feb 8, 2011 at 5:22 pm
    Hi,

    I am coding a *local pdf search engine* in Java.(If someone did it before,
    could you please give some tips?) So I need query parse.

    Assume I want to search for "hello user" in the document.

    *Q1*. I have 4 kinds of queries in my program. They are:

    1. Match Exact words or phases. e.g. "hello user", "hello users", etc. can
    be the search results. Note: the words must be in right order.

    2. Match Exact words or phases, but whole words only. e.g. only "hello
    user" can be the search result. Note: the words must be in right order.

    3. Match Any words. e.g. "hello", "user", "users", etc. can be the search
    results.

    4. Match Any words or phases, but whole words only. e.g. only "hello",
    "user", "hello user" can be the search results.

    My question: what the query expressions are for these 4 kind that i can use
    it to parse?

    2nd expression is: "hello user".

    4th expression is: "hello" OR "user".

    Correct? And what about the other 2?

    *Q2*. Then I use the following code to get context of the search result.
    Note: the result must be ordered by relevance.

    Query q = new QueryParser(Version.LUCENE_30,"field",
    analyzer).parse(expression);
    QueryScorer scorer = new QueryScorer(query);
    Highlighter highlighter = new Highlighter(scorer);
    highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer, 500));

    And there are something wrong in line 2 for the 4 kinds of queries. How can
    I modify the code to get the snippet of the content?

    Thanks.
  • Ian Lea at Feb 9, 2011 at 10:22 am
    http://lucene.apache.org/java/3_0_3/queryparsersyntax.html should have
    answers to your query formulation questions. You might also like to
    consider building the queries yourself programatically and using Span
    queries. They are good for specifying slop and order and so on. There
    is useful info on the Lucid Imagination site - google "lucid
    imagination span query".

    Sorry, can't help you with highlighting except to say there was a
    recent thread on it. Or you could post again with something more
    specific than "there is something wrong".


    --
    Ian.

    On Tue, Feb 8, 2011 at 4:36 PM, Gong Li wrote:
    Hi,

    I am coding a *local pdf search engine* in Java.(If someone did it before,
    could you please give some tips?) So I need query parse.

    Assume I want to search for "hello user" in the document.

    *Q1*. I have 4 kinds of queries in my program. They are:

    1. Match Exact words or phases. e.g. "hello user", "hello users", etc. can
    be the search results. Note: the words must be in right order.

    2. Match Exact words or phases, but whole words only. e.g. only "hello
    user" can be the search result. Note: the words must be in right order.

    3. Match Any words. e.g. "hello", "user", "users", etc. can be the search
    results.

    4. Match Any words or phases, but whole words only. e.g. only "hello",
    "user", "hello user" can be the search results.

    My question: what the query expressions are for these 4 kind that i can use
    it to parse?

    2nd expression is: "hello user".

    4th expression is: "hello" OR "user".

    Correct? And what about the other 2?

    *Q2*. Then I use the following code to get context of the search result.
    Note: the result must be ordered by relevance.

    Query q = new QueryParser(Version.LUCENE_30,"field",
    analyzer).parse(expression);
    QueryScorer scorer = new QueryScorer(query);
    Highlighter highlighter = new Highlighter(scorer);
    highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer, 500));

    And there are something wrong in line 2 for the 4 kinds of queries. How can
    I modify the code to get the snippet of the content?

    Thanks.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedFeb 8, '11 at 4:37p
activeFeb 9, '11 at 10:22a
posts3
users2
websitelucene.apache.org

2 users in discussion

Gong Li: 2 posts Ian Lea: 1 post

People

Translate

site design / logo © 2022 Grokbase