FAQ
Hi all,

I need to replace some db queries with lucene due to response time
issues for sure. In this special case I need to do a range query on a
field and a prefix query. I'm trying to prepare and try my query in luke
with no success before migrating it to java.

I need to find all names starting with for example "A Balladeer" to "A
Perfect Circle" in the name field. The sort field is sortName (same
content as name, but untokenized for sorting).

I tried the following in luke which should give me a few hundred docs:

name:["A Balladeer*" TO "A Perfect Circle*"] - 0 results, also there
should be some
name:["A Balladeer*" TO "B*" - >10k results, but also returns results
which have a string in the middle or end starting with A

I tried using sortName (untokenized) field instead:
sortName:["A Balladeer*" TO "B*" - 25 results, all starting with A*
(guess since it's untokenized), but far less than expected again

Tried a couple of more (stupid) things with little success. I googled
around, but I'm kinda stuck here. So I'm asking the list. How can I
search all name/sortName fields in a range between "A Balladeer*" TO "A
Perfect Circle*" and get only terms back which are starting with that
terms? Is there a way to accomplish that in Java and try it in luke?

And is there a way to sort resultsets in luke?

Cheers,
Thomas

--
Thomas Becker



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Ian Lea at Jul 25, 2008 at 9:14 am
    Hi


    Are you sure your range queries should have wild card asterisks on the
    end? Looks odd to me and I don't know what the effect would be.

    I'd also prefer everything in lower case but maybe you've got the
    right analyzers being used consistently in indexing and searching
    chains.


    --
    Ian.


    On Fri, Jul 25, 2008 at 9:53 AM, Thomas Becker wrote:
    Hi all,

    I need to replace some db queries with lucene due to response time issues
    for sure. In this special case I need to do a range query on a field and a
    prefix query. I'm trying to prepare and try my query in luke with no success
    before migrating it to java.

    I need to find all names starting with for example "A Balladeer" to "A
    Perfect Circle" in the name field. The sort field is sortName (same content
    as name, but untokenized for sorting).

    I tried the following in luke which should give me a few hundred docs:

    name:["A Balladeer*" TO "A Perfect Circle*"] - 0 results, also there should
    be some
    name:["A Balladeer*" TO "B*" - >10k results, but also returns results which
    have a string in the middle or end starting with A

    I tried using sortName (untokenized) field instead:
    sortName:["A Balladeer*" TO "B*" - 25 results, all starting with A* (guess
    since it's untokenized), but far less than expected again

    Tried a couple of more (stupid) things with little success. I googled
    around, but I'm kinda stuck here. So I'm asking the list. How can I search
    all name/sortName fields in a range between "A Balladeer*" TO "A Perfect
    Circle*" and get only terms back which are starting with that terms? Is
    there a way to accomplish that in Java and try it in luke?

    And is there a way to sort resultsets in luke?

    Cheers,
    Thomas

    --
    Thomas Becker
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thomas Becker at Jul 25, 2008 at 9:24 am
    Hi Ian,

    no the wild cards should not be necessary. That was just the last try
    out of some. I now the exact content of both fields in my range query.
    The case is as the java code found it, but the analyzer will lowercase
    it anyhow. I'm trying the SimpleAnalyzer since all other seem to ommit
    single char terms.

    name:("A Balladeer") (translated to name:("a balladeer") by the analyzer
    gives me the doc I expect with doc Id 13002.
    name:("A Perfect Circle") gives me as excpected doc Id 35833.

    However:

    name:["A Balladeer" TO "A Perfect Circle"] gives zero results. Tried it
    also with braces around the term and such stupid things, even if they
    shouldn't be needed in a range query.

    I'm kinda clueless.

    Cheers,
    Thomas

    Ian Lea wrote:
    Hi


    Are you sure your range queries should have wild card asterisks on the
    end? Looks odd to me and I don't know what the effect would be.

    I'd also prefer everything in lower case but maybe you've got the
    right analyzers being used consistently in indexing and searching
    chains.


    --
    Ian.


    On Fri, Jul 25, 2008 at 9:53 AM, Thomas Becker wrote:

    Hi all,

    I need to replace some db queries with lucene due to response time issues
    for sure. In this special case I need to do a range query on a field and a
    prefix query. I'm trying to prepare and try my query in luke with no success
    before migrating it to java.

    I need to find all names starting with for example "A Balladeer" to "A
    Perfect Circle" in the name field. The sort field is sortName (same content
    as name, but untokenized for sorting).

    I tried the following in luke which should give me a few hundred docs:

    name:["A Balladeer*" TO "A Perfect Circle*"] - 0 results, also there should
    be some
    name:["A Balladeer*" TO "B*" - >10k results, but also returns results which
    have a string in the middle or end starting with A

    I tried using sortName (untokenized) field instead:
    sortName:["A Balladeer*" TO "B*" - 25 results, all starting with A* (guess
    since it's untokenized), but far less than expected again

    Tried a couple of more (stupid) things with little success. I googled
    around, but I'm kinda stuck here. So I'm asking the list. How can I search
    all name/sortName fields in a range between "A Balladeer*" TO "A Perfect
    Circle*" and get only terms back which are starting with that terms? Is
    there a way to accomplish that in Java and try it in luke?

    And is there a way to sort resultsets in luke?

    Cheers,
    Thomas

    --
    Thomas Becker
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thomas Becker at Jul 25, 2008 at 9:28 am
    Btw. I tried the wildcard since I found something on google, which noted
    wildcards together with StartsWith queries.

    Thomas Becker wrote:
    Hi Ian,

    no the wild cards should not be necessary. That was just the last try
    out of some. I now the exact content of both fields in my range query.
    The case is as the java code found it, but the analyzer will lowercase
    it anyhow. I'm trying the SimpleAnalyzer since all other seem to ommit
    single char terms.

    name:("A Balladeer") (translated to name:("a balladeer") by the
    analyzer gives me the doc I expect with doc Id 13002.
    name:("A Perfect Circle") gives me as excpected doc Id 35833.

    However:

    name:["A Balladeer" TO "A Perfect Circle"] gives zero results. Tried
    it also with braces around the term and such stupid things, even if
    they shouldn't be needed in a range query.

    I'm kinda clueless.

    Cheers,
    Thomas

    Ian Lea wrote:
    Hi


    Are you sure your range queries should have wild card asterisks on the
    end? Looks odd to me and I don't know what the effect would be.

    I'd also prefer everything in lower case but maybe you've got the
    right analyzers being used consistently in indexing and searching
    chains.


    --
    Ian.



    On Fri, Jul 25, 2008 at 9:53 AM, Thomas Becker
    wrote:
    Hi all,

    I need to replace some db queries with lucene due to response time
    issues
    for sure. In this special case I need to do a range query on a field
    and a
    prefix query. I'm trying to prepare and try my query in luke with no
    success
    before migrating it to java.

    I need to find all names starting with for example "A Balladeer" to "A
    Perfect Circle" in the name field. The sort field is sortName (same
    content
    as name, but untokenized for sorting).

    I tried the following in luke which should give me a few hundred docs:

    name:["A Balladeer*" TO "A Perfect Circle*"] - 0 results, also there
    should
    be some
    name:["A Balladeer*" TO "B*" - >10k results, but also returns
    results which
    have a string in the middle or end starting with A

    I tried using sortName (untokenized) field instead:
    sortName:["A Balladeer*" TO "B*" - 25 results, all starting with A*
    (guess
    since it's untokenized), but far less than expected again

    Tried a couple of more (stupid) things with little success. I googled
    around, but I'm kinda stuck here. So I'm asking the list. How can I
    search
    all name/sortName fields in a range between "A Balladeer*" TO "A
    Perfect
    Circle*" and get only terms back which are starting with that terms? Is
    there a way to accomplish that in Java and try it in luke?

    And is there a way to sort resultsets in luke?

    Cheers,
    Thomas

    --
    Thomas Becker
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Daniel rosher at Jul 25, 2008 at 9:36 am
    Hi Thomas,

    I think one solution would be similar to the autocomplete function I've
    implemented in solr, you can use this as follows in solr:

    FieldType:
    <fieldType name="autocomplete" class="solr.TextField">
    <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])"
    replacement="" replace="all" />
    <filter class="solr.EdgeNGramFilterFactory" maxGramSize="20"
    minGramSize="1" />
    </analyzer>
    <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])"
    replacement="" replace="all" />
    </analyzer>
    </fieldType>

    This can then match on the whole string OR part of the string. To use
    the QueryParser, you'd not be using the query part of the analyzer above
    but I've included for completeness. The core of it in regards to
    wildcard search is the EdgeNGramFilterFactory.

    Field:
    <field name="name" type="autocomplete" indexed="true" stored="false"
    required="false"/>

    Then you queries can then become e.g.:

    name:["aballadeer" TO "aperfectcirclf"] -- i.e. without wildcards.

    Note that you'd need to do the work of the query analyzer up front, i.e.
    lowercase the input and remove any non a-z chars. Additionally the '*'
    on the start term would need to be removed AND the '*' on the end term
    also removed and the last char increased by one char if the '*' is
    present. In this case 'e' becomes 'f'.

    I think you'd find this a much more efficient solution than using
    wildcards which can be a performance bottleneck.

    Regards,
    Dan
    On Fri, 2008-07-25 at 10:53 +0200, Thomas Becker wrote:
    Hi all,

    I need to replace some db queries with lucene due to response time
    issues for sure. In this special case I need to do a range query on a
    field and a prefix query. I'm trying to prepare and try my query in luke
    with no success before migrating it to java.

    I need to find all names starting with for example "A Balladeer" to "A
    Perfect Circle" in the name field. The sort field is sortName (same
    content as name, but untokenized for sorting).

    I tried the following in luke which should give me a few hundred docs:

    name:["A Balladeer*" TO "A Perfect Circle*"] - 0 results, also there
    should be some
    name:["A Balladeer*" TO "B*" - >10k results, but also returns results
    which have a string in the middle or end starting with A

    I tried using sortName (untokenized) field instead:
    sortName:["A Balladeer*" TO "B*" - 25 results, all starting with A*
    (guess since it's untokenized), but far less than expected again

    Tried a couple of more (stupid) things with little success. I googled
    around, but I'm kinda stuck here. So I'm asking the list. How can I
    search all name/sortName fields in a range between "A Balladeer*" TO "A
    Perfect Circle*" and get only terms back which are starting with that
    terms? Is there a way to accomplish that in Java and try it in luke?

    And is there a way to sort resultsets in luke?

    Cheers,
    Thomas
    Daniel Rosher
    Developer
    www.thehotonlinenetwork.com
    d: 0207 3489 912

    t: 0845 4680 568

    f: 0845 4680 868

    m:

    Beaumont House, Kensington Village, Avonmore Road, London, W14 8TS



    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    This message is sent in confidence for the addressee only. It may contain privileged

    information. The contents are not to be disclosed to anyone other than the addressee.

    Unauthorised recipients are requested to preserve this confidentiality and to advise

    us of any errors in transmission. Thank you.

    hotonline ltd is registered in England & Wales. Registered office: One Canada Square,

    Canary Wharf, London E14 5AP. Registered No: 1904765.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJul 25, '08 at 8:54a
activeJul 25, '08 at 9:36a
posts5
users3
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase