FAQ
Folks,
My users require wildcard searches. Sometimes their search phrases contain
spaces. I am having trouble trying to implement a wildcard search on strings
containing spaces, so if the term includes spaces I force a literal search
by adding double quotes to the search term.
So the search string for 'Dublin' becomes search term (Dublin*)
whereas search string 'Dublin City' becomes ("Dublin City")


If I use (Dublin City*) I get all instances of Dublin OR City in the results
which is not what I am looking for.

Is there any way I can combine the wildcard search and the literal?

Heres my existing code. Its in c# with Lucene.Net

//if input has spaces we do a literal search
if (sSearchQuery.IndexOf(" ") < 0)
{
sSearchQuery = "(" + sSearchQuery + "*)";
}
else
{
sSearchQuery = "(\"" + sSearchQuery + "\")";
}
IndexSearcher searcher = new IndexSearcher(sIndexLocation);
Hits oHitColl = searcher.Search(oParser.Parse(sSearchQuery));

Thanks folks
--
View this message in context: http://www.nabble.com/Wildcard-and-Literal-Searches-combined-tp18089950p18089950.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Jon Loken at Jun 24, 2008 at 12:53 pm
    Hi,

    I posed a similar question on 09 May 08.

    The response was as below. I did not go down this route however, as a
    wild carded 'exact' phrase is in a way contradictory.
    Regards
    Jon



    Hi,

    Here's a searchable mailing list archive:
    http://www.gossamer-threads.com/lists/lucene/java-user/

    As regards the wildcard phrase queries, here's one way I think you could
    do it, but it's a bit of extra work. If you're using QueryParser, you'd
    have to override the "getFieldQuery" method to use span queries instead
    of phrase queries.

    A phrase query can be implemented as a span query with a span or "slop"

    factor of 1. So, once you have the PhraseQuery object, you would:

    1. Extract the terms
    2. For each one, check if it contains a "*" or a "?"
    3. If it does, create a WildcardQuery using that term, and re-write it
    using IndexReader.rewrite method. This expands the wildcard query into
    all it's matches.
    4. Create an array of SpanTermQuery objects, (one SpanTermQuery for each
    term that matched you wildcard); then add that array to a SpanOrQuery.
    5. Repeat 2 to 4 for each wildcard term in the phrase.
    6. Finally (!), create a SpanNearQuery, adding all the original terms in
    order, but substituting your SpanOrQuerys for the wildcard terms. Use a
    slop of 1, and set the inOrder flag to true.

    So, essentially, you'd end up with: (you'll have to excuse me if I
    haven't rendered the span queries correctly as strings here - but this
    should give the general idea...)

    spanNear[boiler (spanOr[replacement replacing])]

    So it will accept *either* "replacement" or "replacing" adjacent to
    "boiler", which is what you want.

    As you can see, it's a bit of work - but if you add this functionality
    to the QueryParser, you'll can re-use it a lot!

    Hope that helps!

    -JB






    -----Original Message-----
    From: mick l
    Sent: 24 June 2008 13:29
    To: java-user@lucene.apache.org
    Subject: Wildcard and Literal Searches combined


    Folks,
    My users require wildcard searches. Sometimes their search phrases
    contain spaces. I am having trouble trying to implement a wildcard
    search on strings containing spaces, so if the term includes spaces I
    force a literal search by adding double quotes to the search term.
    So the search string for 'Dublin' becomes search term (Dublin*) whereas
    search string 'Dublin City' becomes ("Dublin City")


    If I use (Dublin City*) I get all instances of Dublin OR City in the
    results which is not what I am looking for.

    Is there any way I can combine the wildcard search and the literal?

    Heres my existing code. Its in c# with Lucene.Net

    //if input has spaces we do a literal search if (sSearchQuery.IndexOf("
    ") < 0) { sSearchQuery = "(" + sSearchQuery + "*)"; } else {
    sSearchQuery = "(\"" + sSearchQuery + "\")"; } IndexSearcher searcher =
    new IndexSearcher(sIndexLocation); Hits oHitColl =
    searcher.Search(oParser.Parse(sSearchQuery));

    Thanks folks
    --
    View this message in context:
    http://www.nabble.com/Wildcard-and-Literal-Searches-combined-tp18089950p
    18089950.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ______________________________________________
    This email has been scanned by Netintelligence
    http://www.netintelligence.com/email



    BiP Solutions Limited is a company registered in Scotland with Company Number SC086146 and VAT number 383030966 and having its registered office at Park House, 300 Glasgow Road, Shawfield, Glasgow, G73 1SQ ****************************************************************************
    This e-mail (and any attachment) is intended only for the attention of the addressee(s). Its unauthorised use, disclosure, storage or copying is not permitted. If you are not the intended recipient, please destroyall copies and inform the sender by return e-mail.
    This e-mail (whether you are the sender or the recipient) may be monitored, recorded and retained by BiP Solutions Ltd.
    E-mail monitoring/ blocking software may be used, and e-mail content may be read at any time. You have a responsibility to ensure laws are not broken when composing or forwarding e-mails and their contents.
    ****************************************************************************

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mick l at Jun 24, 2008 at 1:22 pm
    Cheers,
    Out of interest, how did you go about it in the end?


    jloken wrote:
    Hi,

    I posed a similar question on 09 May 08.

    The response was as below. I did not go down this route however, as a
    wild carded 'exact' phrase is in a way contradictory.
    Regards
    Jon



    Hi,

    Here's a searchable mailing list archive:
    http://www.gossamer-threads.com/lists/lucene/java-user/

    As regards the wildcard phrase queries, here's one way I think you could
    do it, but it's a bit of extra work. If you're using QueryParser, you'd
    have to override the "getFieldQuery" method to use span queries instead
    of phrase queries.

    A phrase query can be implemented as a span query with a span or "slop"

    factor of 1. So, once you have the PhraseQuery object, you would:

    1. Extract the terms
    2. For each one, check if it contains a "*" or a "?"
    3. If it does, create a WildcardQuery using that term, and re-write it
    using IndexReader.rewrite method. This expands the wildcard query into
    all it's matches.
    4. Create an array of SpanTermQuery objects, (one SpanTermQuery for each
    term that matched you wildcard); then add that array to a SpanOrQuery.
    5. Repeat 2 to 4 for each wildcard term in the phrase.
    6. Finally (!), create a SpanNearQuery, adding all the original terms in
    order, but substituting your SpanOrQuerys for the wildcard terms. Use a
    slop of 1, and set the inOrder flag to true.

    So, essentially, you'd end up with: (you'll have to excuse me if I
    haven't rendered the span queries correctly as strings here - but this
    should give the general idea...)

    spanNear[boiler (spanOr[replacement replacing])]

    So it will accept *either* "replacement" or "replacing" adjacent to
    "boiler", which is what you want.

    As you can see, it's a bit of work - but if you add this functionality
    to the QueryParser, you'll can re-use it a lot!

    Hope that helps!

    -JB






    -----Original Message-----
    From: mick l
    Sent: 24 June 2008 13:29
    To: java-user@lucene.apache.org
    Subject: Wildcard and Literal Searches combined


    Folks,
    My users require wildcard searches. Sometimes their search phrases
    contain spaces. I am having trouble trying to implement a wildcard
    search on strings containing spaces, so if the term includes spaces I
    force a literal search by adding double quotes to the search term.
    So the search string for 'Dublin' becomes search term (Dublin*) whereas
    search string 'Dublin City' becomes ("Dublin City")


    If I use (Dublin City*) I get all instances of Dublin OR City in the
    results which is not what I am looking for.

    Is there any way I can combine the wildcard search and the literal?

    Heres my existing code. Its in c# with Lucene.Net

    //if input has spaces we do a literal search if (sSearchQuery.IndexOf("
    ") < 0) { sSearchQuery = "(" + sSearchQuery + "*)"; } else {
    sSearchQuery = "(\"" + sSearchQuery + "\")"; } IndexSearcher searcher =
    new IndexSearcher(sIndexLocation); Hits oHitColl =
    searcher.Search(oParser.Parse(sSearchQuery));

    Thanks folks
    --
    View this message in context:
    http://www.nabble.com/Wildcard-and-Literal-Searches-combined-tp18089950p
    18089950.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ______________________________________________
    This email has been scanned by Netintelligence
    http://www.netintelligence.com/email



    BiP Solutions Limited is a company registered in Scotland with Company
    Number SC086146 and VAT number 383030966 and having its registered office
    at Park House, 300 Glasgow Road, Shawfield, Glasgow, G73 1SQ
    ****************************************************************************
    This e-mail (and any attachment) is intended only for the attention of the
    addressee(s). Its unauthorised use, disclosure, storage or copying is not
    permitted. If you are not the intended recipient, please destroyall copies
    and inform the sender by return e-mail.
    This e-mail (whether you are the sender or the recipient) may be
    monitored, recorded and retained by BiP Solutions Ltd.
    E-mail monitoring/ blocking software may be used, and e-mail content may
    be read at any time. You have a responsibility to ensure laws are not
    broken when composing or forwarding e-mails and their contents.
    ****************************************************************************

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/Wildcard-and-Literal-Searches-combined-tp18089950p18090884.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jon Loken at Jun 24, 2008 at 1:35 pm
    Our approach is:
    If the keyword is a single word, then append the star (e.g. replace ->
    replace*)
    If the keyword is a phrase containing one of more spaces, then it is
    treated as an exact phrase (e.g. replace this -> 'replace this')

    Regards
    Jon

    -----Original Message-----
    From: mick l
    Sent: 24 June 2008 14:22
    To: java-user@lucene.apache.org
    Subject: RE: Wildcard and Literal Searches combined


    Cheers,
    Out of interest, how did you go about it in the end?


    jloken wrote:
    Hi,

    I posed a similar question on 09 May 08.

    The response was as below. I did not go down this route however, as a
    wild carded 'exact' phrase is in a way contradictory.
    Regards
    Jon



    Hi,

    Here's a searchable mailing list archive:
    http://www.gossamer-threads.com/lists/lucene/java-user/

    As regards the wildcard phrase queries, here's one way I think you
    could do it, but it's a bit of extra work. If you're using
    QueryParser, you'd have to override the "getFieldQuery" method to use
    span queries instead of phrase queries.

    A phrase query can be implemented as a span query with a span or "slop"
    factor of 1. So, once you have the PhraseQuery object, you would:

    1. Extract the terms
    2. For each one, check if it contains a "*" or a "?"
    3. If it does, create a WildcardQuery using that term, and re-write it
    using IndexReader.rewrite method. This expands the wildcard query into
    all it's matches.
    4. Create an array of SpanTermQuery objects, (one SpanTermQuery for
    each term that matched you wildcard); then add that array to a
    SpanOrQuery.
    5. Repeat 2 to 4 for each wildcard term in the phrase.
    6. Finally (!), create a SpanNearQuery, adding all the original terms
    in order, but substituting your SpanOrQuerys for the wildcard terms.
    Use a slop of 1, and set the inOrder flag to true.

    So, essentially, you'd end up with: (you'll have to excuse me if I
    haven't rendered the span queries correctly as strings here - but this
    should give the general idea...)

    spanNear[boiler (spanOr[replacement replacing])]

    So it will accept *either* "replacement" or "replacing" adjacent to
    "boiler", which is what you want.

    As you can see, it's a bit of work - but if you add this functionality
    to the QueryParser, you'll can re-use it a lot!

    Hope that helps!

    -JB






    -----Original Message-----
    From: mick l
    Sent: 24 June 2008 13:29
    To: java-user@lucene.apache.org
    Subject: Wildcard and Literal Searches combined


    Folks,
    My users require wildcard searches. Sometimes their search phrases
    contain spaces. I am having trouble trying to implement a wildcard
    search on strings containing spaces, so if the term includes spaces I
    force a literal search by adding double quotes to the search term.
    So the search string for 'Dublin' becomes search term (Dublin*)
    whereas search string 'Dublin City' becomes ("Dublin City")


    If I use (Dublin City*) I get all instances of Dublin OR City in the
    results which is not what I am looking for.

    Is there any way I can combine the wildcard search and the literal?

    Heres my existing code. Its in c# with Lucene.Net

    //if input has spaces we do a literal search if
    (sSearchQuery.IndexOf("
    ") < 0) { sSearchQuery = "(" + sSearchQuery + "*)"; } else {
    sSearchQuery = "(\"" + sSearchQuery + "\")"; } IndexSearcher searcher
    = new IndexSearcher(sIndexLocation); Hits oHitColl =
    searcher.Search(oParser.Parse(sSearchQuery));

    Thanks folks
    --
    View this message in context:
    http://www.nabble.com/Wildcard-and-Literal-Searches-combined-tp1808995
    0p
    18089950.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    BiP Solutions Limited is a company registered in Scotland with Company Number SC086146 and VAT number 383030966 and having its registered office at Park House, 300 Glasgow Road, Shawfield, Glasgow, G73 1SQ ****************************************************************************
    This e-mail (and any attachment) is intended only for the attention of the addressee(s). Its unauthorised use, disclosure, storage or copying is not permitted. If you are not the intended recipient, please destroyall copies and inform the sender by return e-mail.
    This e-mail (whether you are the sender or the recipient) may be monitored, recorded and retained by BiP Solutions Ltd.
    E-mail monitoring/ blocking software may be used, and e-mail content may be read at any time. You have a responsibility to ensure laws are not broken when composing or forwarding e-mails and their contents.
    ****************************************************************************

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Jun 24, 2008 at 1:36 pm
    Do you require that the words be right next to each other? You can,
    of course, set your default to AND (it's OR unless you change it
    explicitly). That'll give you documents that have both Dublin and City.
    You don't need wildcards at all in this case.

    If you require exact matches, you can use PhraseQuery or SpanQuery
    to get words right next to each other. That is, "dublin city" would
    be found but not "dublin is a very large city". You can set slop
    (the distance between terms that counts as a hit) variously, and you
    can also specify whether order is important. Again, you don't need
    wildcards at all (and shouldn't use them unless you really need to).

    Beware that phrase queries do NOT go through wildcard expansion. That
    is, "Dubli* city" (as a PhraseQuery) doesn't do what you might think.

    I'd really look over the query syntax carefully for more insights. See
    http://lucene.apache.org/java/docs/queryparsersyntax.html

    Also note that SpanQueries are something you construct yourself, you
    don't send text through the query parser and magically get SpanQueries
    back.

    Best
    Erick
    On Tue, Jun 24, 2008 at 8:28 AM, mick l wrote:


    Folks,
    My users require wildcard searches. Sometimes their search phrases contain
    spaces. I am having trouble trying to implement a wildcard search on
    strings
    containing spaces, so if the term includes spaces I force a literal search
    by adding double quotes to the search term.
    So the search string for 'Dublin' becomes search term (Dublin*)
    whereas search string 'Dublin City' becomes ("Dublin City")


    If I use (Dublin City*) I get all instances of Dublin OR City in the
    results
    which is not what I am looking for.

    Is there any way I can combine the wildcard search and the literal?

    Heres my existing code. Its in c# with Lucene.Net

    //if input has spaces we do a literal search
    if (sSearchQuery.IndexOf(" ") < 0)
    {
    sSearchQuery = "(" + sSearchQuery + "*)";
    }
    else
    {
    sSearchQuery = "(\"" + sSearchQuery + "\")";
    }
    IndexSearcher searcher = new IndexSearcher(sIndexLocation);
    Hits oHitColl = searcher.Search(oParser.Parse(sSearchQuery));

    Thanks folks
    --
    View this message in context:
    http://www.nabble.com/Wildcard-and-Literal-Searches-combined-tp18089950p18089950.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Chris Hostetter at Jun 25, 2008 at 9:21 pm
    : My users require wildcard searches. Sometimes their search phrases contain
    : spaces. I am having trouble trying to implement a wildcard search on strings
    : containing spaces, so if the term includes spaces I force a literal search
    : by adding double quotes to the search term.
    : So the search string for 'Dublin' becomes search term (Dublin*)
    : whereas search string 'Dublin City' becomes ("Dublin City")

    : If I use (Dublin City*) I get all instances of Dublin OR City in the results
    : which is not what I am looking for.

    my naive understanding of your problem makes me think you should
    completely avoid using the QueryParser. Index this field as "UNTOKENIZED"
    (or using an Analyzer that does no tokenization, but perhaps lowercases if
    that's what you want) and at query time construct a WildCardQuery directly
    off of the users input (lowercases first if that what you did at index
    time)

    ...but i could be missunderstanding your goal.


    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 24, '08 at 12:29p
activeJun 25, '08 at 9:21p
posts6
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase