FAQ
You need to call rewrite on the query to expand it then give that version to the highlighter - see the package javadocs.
http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description


Cheers
Mark




----- Original Message ----
From: "Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch>
To: java-user@lucene.apache.org
Sent: Thursday, 11 September, 2008 9:34:13
Subject: AW: Search with multiple wildcards

Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
3ildcard queries are expanded before they are processed, and I see that i can
set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory,
but one final thing is still open: does a wildcard query work together with
the Lucene Highlighter? I tried it, but I only got an empty result. Without
wildcards, the highlighter works pretty smooth!

Regards
Mirko

-----Ursprüngliche Nachricht-----
Von: Erick Erickson
Gesendet: Mittwoch, 10. September 2008 18:15
An: java-user@lucene.apache.org
Betreff: Re: Search with multiple wildcards

Of course you can construct your own BooleanQuery
programmatically.

It's relatively easy, just try it.
On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag wrote:

Jep, this is what i have read.

do I need to use the query parser, or can I create a query by the api?
Is there an example available?

Thanks a lot
Mirko

-----Ursprüngliche Nachricht-----
Von: Erick Erickson
Gesendet: Mittwoch, 10. September 2008 16:45
An: java-user@lucene.apache.org
Betreff: Re: Search with multiple wildcards

Is this what you're referring to?

Lucene supports single and multiple character wildcard searches within
single terms (not within phrase queries).
(from http://lucene.apache.org/java/docs/queryparsersyntax.html)

I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
friend here, download a copy and try it <G>. Be sure on the search tab to
specify StandardAnalyzer or some such, rather than keywordanalyzer.

The phrase is trying to point out that a phrase query does NOT respect
wildcards. That is, submitting
"ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
sure that

+field:ab* +field:bc* +field:cd*

will work just fine. The key here is "within single terms", which I think
of
as
"within a single term query". You can add as many TermQuerys as you want.
See the query documentation for how to submit phrase queries.

Best
Erick

On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
<Mirko.Sertic@bedag.ch
wrote:
Hi

Thank you for your quick response:-)

Of course I need to use the * character :-) But I have read somewhere in
the documentation that leading wildcards are not supported, and only one
wildcard term per query. Is this limitation resolved in the current version?
Regards
Mirko

-----Ursprüngliche Nachricht-----
Von: Erick Erickson
Gesendet: Mittwoch, 10. September 2008 15:47
An: java-user@lucene.apache.org
Betreff: Re: Search with multiple wildcards

Sure, but you'll have to set the leading wildcard option,
which I've forgotten the exact call, but it's in the docs.

And use * rather than % <G>.

But wildcards are tricky, especially the TooManyClauses
exception. You might want to peruse the archive for wildcard
posts...

Best
Erick

On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
wrote:
Hi@all



Is it possible to do a search with multiple wildcards in one query, for
instance "%MANAGE%" AND "CORE%"? Is there a code example available?



Thanks a lot

Mirko


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Sertic Mirko, Bedag at Sep 11, 2008 at 11:08 am
    Ok, one final question:

    If i query for "*ll*", the query is expanded to ("hallo" or "alle" or ...), so the
    Highligter will highlight the words "hallo" or "alle". But how can i highlight only
    the original query, so only the "ll"? Is this possible?

    Thanks a lot
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: mark harwood
    Gesendet: Donnerstag, 11. September 2008 11:20
    An: java-user@lucene.apache.org
    Betreff: Re: AW: Search with multiple wildcards

    You need to call rewrite on the query to expand it then give that version to the highlighter - see the package javadocs.
    http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description


    Cheers
    Mark




    ----- Original Message ----
    From: "Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch>
    To: java-user@lucene.apache.org
    Sent: Thursday, 11 September, 2008 9:34:13
    Subject: AW: Search with multiple wildcards

    Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
    3ildcard queries are expanded before they are processed, and I see that i can
    set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory,
    but one final thing is still open: does a wildcard query work together with
    the Lucene Highlighter? I tried it, but I only got an empty result. Without
    wildcards, the highlighter works pretty smooth!

    Regards
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 18:15
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Of course you can construct your own BooleanQuery
    programmatically.

    It's relatively easy, just try it.
    On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag wrote:

    Jep, this is what i have read.

    do I need to use the query parser, or can I create a query by the api?
    Is there an example available?

    Thanks a lot
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 16:45
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Is this what you're referring to?

    Lucene supports single and multiple character wildcard searches within
    single terms (not within phrase queries).
    (from http://lucene.apache.org/java/docs/queryparsersyntax.html)

    I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
    friend here, download a copy and try it <G>. Be sure on the search tab to
    specify StandardAnalyzer or some such, rather than keywordanalyzer.

    The phrase is trying to point out that a phrase query does NOT respect
    wildcards. That is, submitting
    "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
    sure that

    +field:ab* +field:bc* +field:cd*

    will work just fine. The key here is "within single terms", which I think
    of
    as
    "within a single term query". You can add as many TermQuerys as you want.
    See the query documentation for how to submit phrase queries.

    Best
    Erick

    On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
    <Mirko.Sertic@bedag.ch
    wrote:
    Hi

    Thank you for your quick response:-)

    Of course I need to use the * character :-) But I have read somewhere in
    the documentation that leading wildcards are not supported, and only one
    wildcard term per query. Is this limitation resolved in the current version?
    Regards
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 15:47
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Sure, but you'll have to set the leading wildcard option,
    which I've forgotten the exact call, but it's in the docs.

    And use * rather than % <G>.

    But wildcards are tricky, especially the TooManyClauses
    exception. You might want to peruse the archive for wildcard
    posts...

    Best
    Erick

    On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
    wrote:
    Hi@all



    Is it possible to do a search with multiple wildcards in one query, for
    instance "%MANAGE%" AND "CORE%"? Is there a code example available?



    Thanks a lot

    Mirko


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark harwood at Sep 11, 2008 at 1:32 pm
    Is this possible?
    Not currently, the highlighter works with a list of words (or words AND phrases using the new span support) and highlights those.
    To do anything else would require the higlighter to faithfully re-implement much of the logic in all of the different query types (fuzzy, wildcard, regex etc etc) which is much more challenging/difficult to maintain.



    ----- Original Message ----
    From: "Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch>
    To: java-user@lucene.apache.org
    Sent: Thursday, 11 September, 2008 12:07:36
    Subject: AW: AW: Search with multiple wildcards

    Ok, one final question:

    If i query for "*ll*", the query is expanded to ("hallo" or "alle" or ...), so the
    Highligter will highlight the words "hallo" or "alle". But how can i highlight only
    the original query, so only the "ll"? Is this possible?

    Thanks a lot
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: mark harwood
    Gesendet: Donnerstag, 11. September 2008 11:20
    An: java-user@lucene.apache.org
    Betreff: Re: AW: Search with multiple wildcards

    You need to call rewrite on the query to expand it then give that version to the highlighter - see the package javadocs.
    http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description


    Cheers
    Mark




    ----- Original Message ----
    From: "Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch>
    To: java-user@lucene.apache.org
    Sent: Thursday, 11 September, 2008 9:34:13
    Subject: AW: Search with multiple wildcards

    Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
    3ildcard queries are expanded before they are processed, and I see that i can
    set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory,
    but one final thing is still open: does a wildcard query work together with
    the Lucene Highlighter? I tried it, but I only got an empty result. Without
    wildcards, the highlighter works pretty smooth!

    Regards
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 18:15
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Of course you can construct your own BooleanQuery
    programmatically.

    It's relatively easy, just try it.
    On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag wrote:

    Jep, this is what i have read.

    do I need to use the query parser, or can I create a query by the api?
    Is there an example available?

    Thanks a lot
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 16:45
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Is this what you're referring to?

    Lucene supports single and multiple character wildcard searches within
    single terms (not within phrase queries).
    (from http://lucene.apache.org/java/docs/queryparsersyntax.html)

    I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
    friend here, download a copy and try it <G>. Be sure on the search tab to
    specify StandardAnalyzer or some such, rather than keywordanalyzer.

    The phrase is trying to point out that a phrase query does NOT respect
    wildcards. That is, submitting
    "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
    sure that

    +field:ab* +field:bc* +field:cd*

    will work just fine. The key here is "within single terms", which I think
    of
    as
    "within a single term query". You can add as many TermQuerys as you want.
    See the query documentation for how to submit phrase queries.

    Best
    Erick

    On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
    <Mirko.Sertic@bedag.ch
    wrote:
    Hi

    Thank you for your quick response:-)

    Of course I need to use the * character :-) But I have read somewhere in
    the documentation that leading wildcards are not supported, and only one
    wildcard term per query. Is this limitation resolved in the current version?
    Regards
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 15:47
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Sure, but you'll have to set the leading wildcard option,
    which I've forgotten the exact call, but it's in the docs.

    And use * rather than % <G>.

    But wildcards are tricky, especially the TooManyClauses
    exception. You might want to peruse the archive for wildcard
    posts...

    Best
    Erick

    On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
    wrote:
    Hi@all



    Is it possible to do a search with multiple wildcards in one query, for
    instance "%MANAGE%" AND "CORE%"? Is there a code example available?



    Thanks a lot

    Mirko


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Matthew Hall at Sep 11, 2008 at 1:41 pm
    Well, you could certainly manipulate your search string, removing the
    wildcard punctuations, and then use that for what you pass to the
    highlighter.

    That should give you the functionality you are looking for.


    -Matt
    mark harwood wrote:
    Is this possible?
    Not currently, the highlighter works with a list of words (or words AND phrases using the new span support) and highlights those.
    To do anything else would require the higlighter to faithfully re-implement much of the logic in all of the different query types (fuzzy, wildcard, regex etc etc) which is much more challenging/difficult to maintain.



    ----- Original Message ----
    From: "Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch>
    To: java-user@lucene.apache.org
    Sent: Thursday, 11 September, 2008 12:07:36
    Subject: AW: AW: Search with multiple wildcards

    Ok, one final question:

    If i query for "*ll*", the query is expanded to ("hallo" or "alle" or ...), so the
    Highligter will highlight the words "hallo" or "alle". But how can i highlight only
    the original query, so only the "ll"? Is this possible?

    Thanks a lot
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: mark harwood
    Gesendet: Donnerstag, 11. September 2008 11:20
    An: java-user@lucene.apache.org
    Betreff: Re: AW: Search with multiple wildcards

    You need to call rewrite on the query to expand it then give that version to the highlighter - see the package javadocs.
    http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description


    Cheers
    Mark




    ----- Original Message ----
    From: "Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch>
    To: java-user@lucene.apache.org
    Sent: Thursday, 11 September, 2008 9:34:13
    Subject: AW: Search with multiple wildcards

    Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
    3ildcard queries are expanded before they are processed, and I see that i can
    set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory,
    but one final thing is still open: does a wildcard query work together with
    the Lucene Highlighter? I tried it, but I only got an empty result. Without
    wildcards, the highlighter works pretty smooth!

    Regards
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 18:15
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Of course you can construct your own BooleanQuery
    programmatically.

    It's relatively easy, just try it.

    On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag <Mirko.Sertic@bedag.ch
    wrote:
    Jep, this is what i have read.

    do I need to use the query parser, or can I create a query by the api?
    Is there an example available?

    Thanks a lot
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 16:45
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Is this what you're referring to?

    Lucene supports single and multiple character wildcard searches within
    single terms (not within phrase queries).
    (from http://lucene.apache.org/java/docs/queryparsersyntax.html)

    I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
    friend here, download a copy and try it <G>. Be sure on the search tab to
    specify StandardAnalyzer or some such, rather than keywordanalyzer.

    The phrase is trying to point out that a phrase query does NOT respect
    wildcards. That is, submitting
    "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
    sure that

    +field:ab* +field:bc* +field:cd*

    will work just fine. The key here is "within single terms", which I think
    of
    as
    "within a single term query". You can add as many TermQuerys as you want.
    See the query documentation for how to submit phrase queries.

    Best
    Erick

    On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
    <Mirko.Sertic@bedag.ch
    wrote:

    Hi

    Thank you for your quick response:-)

    Of course I need to use the * character :-) But I have read somewhere in
    the documentation that leading wildcards are not supported, and only one
    wildcard term per query. Is this limitation resolved in the current version?
    Regards
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 15:47
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Sure, but you'll have to set the leading wildcard option,
    which I've forgotten the exact call, but it's in the docs.

    And use * rather than % <G>.

    But wildcards are tricky, especially the TooManyClauses
    exception. You might want to peruse the archive for wildcard
    posts...

    Best
    Erick

    On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
    wrote:

    Hi@all



    Is it possible to do a search with multiple wildcards in one query, for
    instance "%MANAGE%" AND "CORE%"? Is there a code example available?



    Thanks a lot

    Mirko



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark harwood at Sep 11, 2008 at 1:59 pm
    That should give you the functionality you are looking for.
    If I understand your suggestion correctly, It won't. The Highlighter uses a tokenized version of the document text.

    Simplistically it does the following psuedo code:

    for all tokens in documentTokenStream,
    if(queryTermsSet.contains(token))
    output "<b>"+token+"</b">
    else
    output token

    NOT

    for all tokens in query string
    fullDocumentString.replaceAll(queryStringToken, "<b>"+queryStringToken+"</b>"

    So in the given example while you suggest manipulating "ll" to be in the query string, you cannot make "ll" appear as a token in documentTokenStream.

    Actually the Highlighter logic is a fair bit more involved than this (especially when using SpanQueryScorer) but the basis of it is there in the above pseudo code.





    ----- Original Message ----
    From: Matthew Hall <mhall@informatics.jax.org>
    To: java-user@lucene.apache.org
    Sent: Thursday, 11 September, 2008 14:40:26
    Subject: Re: AW: AW: Search with multiple wildcards

    Well, you could certainly manipulate your search string, removing the
    wildcard punctuations, and then use that for what you pass to the
    highlighter.

    That should give you the functionality you are looking for.


    -Matt
    mark harwood wrote:
    Is this possible?
    Not currently, the highlighter works with a list of words (or words AND phrases using the new span support) and highlights those.
    To do anything else would require the higlighter to faithfully re-implement much of the logic in all of the different query types (fuzzy, wildcard, regex etc etc) which is much more challenging/difficult to maintain.



    ----- Original Message ----
    From: "Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch>
    To: java-user@lucene.apache.org
    Sent: Thursday, 11 September, 2008 12:07:36
    Subject: AW: AW: Search with multiple wildcards

    Ok, one final question:

    If i query for "*ll*", the query is expanded to ("hallo" or "alle" or ...), so the
    Highligter will highlight the words "hallo" or "alle". But how can i highlight only
    the original query, so only the "ll"? Is this possible?

    Thanks a lot
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: mark harwood
    Gesendet: Donnerstag, 11. September 2008 11:20
    An: java-user@lucene.apache.org
    Betreff: Re: AW: Search with multiple wildcards

    You need to call rewrite on the query to expand it then give that version to the highlighter - see the package javadocs.
    http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description


    Cheers
    Mark




    ----- Original Message ----
    From: "Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch>
    To: java-user@lucene.apache.org
    Sent: Thursday, 11 September, 2008 9:34:13
    Subject: AW: Search with multiple wildcards

    Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
    3ildcard queries are expanded before they are processed, and I see that i can
    set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory,
    but one final thing is still open: does a wildcard query work together with
    the Lucene Highlighter? I tried it, but I only got an empty result. Without
    wildcards, the highlighter works pretty smooth!

    Regards
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 18:15
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Of course you can construct your own BooleanQuery
    programmatically.

    It's relatively easy, just try it.

    On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag <Mirko.Sertic@bedag.ch
    wrote:
    Jep, this is what i have read.

    do I need to use the query parser, or can I create a query by the api?
    Is there an example available?

    Thanks a lot
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 16:45
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Is this what you're referring to?

    Lucene supports single and multiple character wildcard searches within
    single terms (not within phrase queries).
    (from http://lucene.apache.org/java/docs/queryparsersyntax.html)

    I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
    friend here, download a copy and try it <G>. Be sure on the search tab to
    specify StandardAnalyzer or some such, rather than keywordanalyzer.

    The phrase is trying to point out that a phrase query does NOT respect
    wildcards. That is, submitting
    "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
    sure that

    +field:ab* +field:bc* +field:cd*

    will work just fine. The key here is "within single terms", which I think
    of
    as
    "within a single term query". You can add as many TermQuerys as you want.
    See the query documentation for how to submit phrase queries.

    Best
    Erick

    On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
    <Mirko.Sertic@bedag.ch
    wrote:

    Hi

    Thank you for your quick response:-)

    Of course I need to use the * character :-) But I have read somewhere in
    the documentation that leading wildcards are not supported, and only one
    wildcard term per query. Is this limitation resolved in the current version?
    Regards
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 15:47
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Sure, but you'll have to set the leading wildcard option,
    which I've forgotten the exact call, but it's in the docs.

    And use * rather than % <G>.

    But wildcards are tricky, especially the TooManyClauses
    exception. You might want to peruse the archive for wildcard
    posts...

    Best
    Erick

    On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
    wrote:

    Hi@all



    Is it possible to do a search with multiple wildcards in one query, for
    instance "%MANAGE%" AND "CORE%"? Is there a code example available?



    Thanks a lot

    Mirko



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Matthew Hall at Sep 11, 2008 at 2:05 pm
    Ah.. that's a darn good point..

    Though, that second bit of code you have there could be used at display
    time for him to get the functionality that he wants. You could also
    modify it somewhat, and apply it against the displayable part of the hit
    he's getting back rather than the individual tokens.

    This if of course only assuming that this functionality would be used
    after a contains type search was detected.

    Considering that's the only real usecase for a technique like this, I'm
    thinking its probably more trouble than its worth for his case.



    mark harwood wrote:
    That should give you the functionality you are looking for.
    If I understand your suggestion correctly, It won't. The Highlighter uses a tokenized version of the document text.

    Simplistically it does the following psuedo code:

    for all tokens in documentTokenStream,
    if(queryTermsSet.contains(token))
    output "<b>"+token+"</b">
    else
    output token

    NOT

    for all tokens in query string
    fullDocumentString.replaceAll(queryStringToken, "<b>"+queryStringToken+"</b>"

    So in the given example while you suggest manipulating "ll" to be in the query string, you cannot make "ll" appear as a token in documentTokenStream.

    Actually the Highlighter logic is a fair bit more involved than this (especially when using SpanQueryScorer) but the basis of it is there in the above pseudo code.





    ----- Original Message ----
    From: Matthew Hall <mhall@informatics.jax.org>
    To: java-user@lucene.apache.org
    Sent: Thursday, 11 September, 2008 14:40:26
    Subject: Re: AW: AW: Search with multiple wildcards

    Well, you could certainly manipulate your search string, removing the
    wildcard punctuations, and then use that for what you pass to the
    highlighter.

    That should give you the functionality you are looking for.


    -Matt
    mark harwood wrote:
    Is this possible?
    Not currently, the highlighter works with a list of words (or words AND phrases using the new span support) and highlights those.
    To do anything else would require the higlighter to faithfully re-implement much of the logic in all of the different query types (fuzzy, wildcard, regex etc etc) which is much more challenging/difficult to maintain.



    ----- Original Message ----
    From: "Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch>
    To: java-user@lucene.apache.org
    Sent: Thursday, 11 September, 2008 12:07:36
    Subject: AW: AW: Search with multiple wildcards

    Ok, one final question:

    If i query for "*ll*", the query is expanded to ("hallo" or "alle" or ...), so the
    Highligter will highlight the words "hallo" or "alle". But how can i highlight only
    the original query, so only the "ll"? Is this possible?

    Thanks a lot
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: mark harwood
    Gesendet: Donnerstag, 11. September 2008 11:20
    An: java-user@lucene.apache.org
    Betreff: Re: AW: Search with multiple wildcards

    You need to call rewrite on the query to expand it then give that version to the highlighter - see the package javadocs.
    http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description


    Cheers
    Mark




    ----- Original Message ----
    From: "Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch>
    To: java-user@lucene.apache.org
    Sent: Thursday, 11 September, 2008 9:34:13
    Subject: AW: Search with multiple wildcards

    Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
    3ildcard queries are expanded before they are processed, and I see that i can
    set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory,
    but one final thing is still open: does a wildcard query work together with
    the Lucene Highlighter? I tried it, but I only got an empty result. Without
    wildcards, the highlighter works pretty smooth!

    Regards
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 18:15
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Of course you can construct your own BooleanQuery
    programmatically.

    It's relatively easy, just try it.

    On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag <Mirko.Sertic@bedag.ch

    wrote:

    Jep, this is what i have read.

    do I need to use the query parser, or can I create a query by the api?
    Is there an example available?

    Thanks a lot
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 16:45
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Is this what you're referring to?

    Lucene supports single and multiple character wildcard searches within
    single terms (not within phrase queries).
    (from http://lucene.apache.org/java/docs/queryparsersyntax.html)

    I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
    friend here, download a copy and try it <G>. Be sure on the search tab to
    specify StandardAnalyzer or some such, rather than keywordanalyzer.

    The phrase is trying to point out that a phrase query does NOT respect
    wildcards. That is, submitting
    "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
    sure that

    +field:ab* +field:bc* +field:cd*

    will work just fine. The key here is "within single terms", which I think
    of
    as
    "within a single term query". You can add as many TermQuerys as you want.
    See the query documentation for how to submit phrase queries.

    Best
    Erick

    On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
    <Mirko.Sertic@bedag.ch

    wrote:

    Hi

    Thank you for your quick response:-)

    Of course I need to use the * character :-) But I have read somewhere in
    the documentation that leading wildcards are not supported, and only one
    wildcard term per query. Is this limitation resolved in the current
    version?

    Regards
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 15:47
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Sure, but you'll have to set the leading wildcard option,
    which I've forgotten the exact call, but it's in the docs.

    And use * rather than % <G>.

    But wildcards are tricky, especially the TooManyClauses
    exception. You might want to peruse the archive for wildcard
    posts...

    Best
    Erick

    On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
    wrote:


    Hi@all



    Is it possible to do a search with multiple wildcards in one query, for
    instance "%MANAGE%" AND "CORE%"? Is there a code example available?



    Thanks a lot

    Mirko




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sertic Mirko, Bedag at Sep 11, 2008 at 3:14 pm
    Ok, i see the problems.

    I will talk to my customer about this requirement. Perhaps he doesn't need it anymore.

    Again, thanks a lot to all, you saved my day!

    Regards
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Matthew Hall
    Gesendet: Donnerstag, 11. September 2008 16:05
    An: java-user@lucene.apache.org
    Betreff: Re: AW: AW: Search with multiple wildcards

    Ah.. that's a darn good point..

    Though, that second bit of code you have there could be used at display
    time for him to get the functionality that he wants. You could also
    modify it somewhat, and apply it against the displayable part of the hit
    he's getting back rather than the individual tokens.

    This if of course only assuming that this functionality would be used
    after a contains type search was detected.

    Considering that's the only real usecase for a technique like this, I'm
    thinking its probably more trouble than its worth for his case.



    mark harwood wrote:
    That should give you the functionality you are looking for.
    If I understand your suggestion correctly, It won't. The Highlighter uses a tokenized version of the document text.

    Simplistically it does the following psuedo code:

    for all tokens in documentTokenStream,
    if(queryTermsSet.contains(token))
    output "<b>"+token+"</b">
    else
    output token

    NOT

    for all tokens in query string
    fullDocumentString.replaceAll(queryStringToken, "<b>"+queryStringToken+"</b>"

    So in the given example while you suggest manipulating "ll" to be in the query string, you cannot make "ll" appear as a token in documentTokenStream.

    Actually the Highlighter logic is a fair bit more involved than this (especially when using SpanQueryScorer) but the basis of it is there in the above pseudo code.





    ----- Original Message ----
    From: Matthew Hall <mhall@informatics.jax.org>
    To: java-user@lucene.apache.org
    Sent: Thursday, 11 September, 2008 14:40:26
    Subject: Re: AW: AW: Search with multiple wildcards

    Well, you could certainly manipulate your search string, removing the
    wildcard punctuations, and then use that for what you pass to the
    highlighter.

    That should give you the functionality you are looking for.


    -Matt
    mark harwood wrote:
    Is this possible?
    Not currently, the highlighter works with a list of words (or words AND phrases using the new span support) and highlights those.
    To do anything else would require the higlighter to faithfully re-implement much of the logic in all of the different query types (fuzzy, wildcard, regex etc etc) which is much more challenging/difficult to maintain.



    ----- Original Message ----
    From: "Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch>
    To: java-user@lucene.apache.org
    Sent: Thursday, 11 September, 2008 12:07:36
    Subject: AW: AW: Search with multiple wildcards

    Ok, one final question:

    If i query for "*ll*", the query is expanded to ("hallo" or "alle" or ...), so the
    Highligter will highlight the words "hallo" or "alle". But how can i highlight only
    the original query, so only the "ll"? Is this possible?

    Thanks a lot
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: mark harwood
    Gesendet: Donnerstag, 11. September 2008 11:20
    An: java-user@lucene.apache.org
    Betreff: Re: AW: Search with multiple wildcards

    You need to call rewrite on the query to expand it then give that version to the highlighter - see the package javadocs.
    http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description


    Cheers
    Mark




    ----- Original Message ----
    From: "Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch>
    To: java-user@lucene.apache.org
    Sent: Thursday, 11 September, 2008 9:34:13
    Subject: AW: Search with multiple wildcards

    Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
    3ildcard queries are expanded before they are processed, and I see that i can
    set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory,
    but one final thing is still open: does a wildcard query work together with
    the Lucene Highlighter? I tried it, but I only got an empty result. Without
    wildcards, the highlighter works pretty smooth!

    Regards
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 18:15
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Of course you can construct your own BooleanQuery
    programmatically.

    It's relatively easy, just try it.

    On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag <Mirko.Sertic@bedag.ch

    wrote:

    Jep, this is what i have read.

    do I need to use the query parser, or can I create a query by the api?
    Is there an example available?

    Thanks a lot
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 16:45
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Is this what you're referring to?

    Lucene supports single and multiple character wildcard searches within
    single terms (not within phrase queries).
    (from http://lucene.apache.org/java/docs/queryparsersyntax.html)

    I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
    friend here, download a copy and try it <G>. Be sure on the search tab to
    specify StandardAnalyzer or some such, rather than keywordanalyzer.

    The phrase is trying to point out that a phrase query does NOT respect
    wildcards. That is, submitting
    "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
    sure that

    +field:ab* +field:bc* +field:cd*

    will work just fine. The key here is "within single terms", which I think
    of
    as
    "within a single term query". You can add as many TermQuerys as you want.
    See the query documentation for how to submit phrase queries.

    Best
    Erick

    On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
    <Mirko.Sertic@bedag.ch

    wrote:

    Hi

    Thank you for your quick response:-)

    Of course I need to use the * character :-) But I have read somewhere in
    the documentation that leading wildcards are not supported, and only one
    wildcard term per query. Is this limitation resolved in the current
    version?

    Regards
    Mirko

    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Mittwoch, 10. September 2008 15:47
    An: java-user@lucene.apache.org
    Betreff: Re: Search with multiple wildcards

    Sure, but you'll have to set the leading wildcard option,
    which I've forgotten the exact call, but it's in the docs.

    And use * rather than % <G>.

    But wildcards are tricky, especially the TooManyClauses
    exception. You might want to peruse the archive for wildcard
    posts...

    Best
    Erick

    On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
    wrote:


    Hi@all



    Is it possible to do a search with multiple wildcards in one query, for
    instance "%MANAGE%" AND "CORE%"? Is there a code example available?



    Thanks a lot

    Mirko




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedSep 11, '08 at 9:21a
activeSep 11, '08 at 3:14p
posts7
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase