Hi Thomas,

I think one solution would be similar to the autocomplete function I've
implemented in solr, you can use this as follows in solr:

<fieldType name="autocomplete" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])"
replacement="" replace="all" />
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="20"
minGramSize="1" />
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])"
replacement="" replace="all" />

This can then match on the whole string OR part of the string. To use
the QueryParser, you'd not be using the query part of the analyzer above
but I've included for completeness. The core of it in regards to
wildcard search is the EdgeNGramFilterFactory.

<field name="name" type="autocomplete" indexed="true" stored="false"

Then you queries can then become e.g.:

name:["aballadeer" TO "aperfectcirclf"] -- i.e. without wildcards.

Note that you'd need to do the work of the query analyzer up front, i.e.
lowercase the input and remove any non a-z chars. Additionally the '*'
on the start term would need to be removed AND the '*' on the end term
also removed and the last char increased by one char if the '*' is
present. In this case 'e' becomes 'f'.

I think you'd find this a much more efficient solution than using
wildcards which can be a performance bottleneck.

On Fri, 2008-07-25 at 10:53 +0200, Thomas Becker wrote:
Hi all,

I need to replace some db queries with lucene due to response time
issues for sure. In this special case I need to do a range query on a
field and a prefix query. I'm trying to prepare and try my query in luke
with no success before migrating it to java.

I need to find all names starting with for example "A Balladeer" to "A
Perfect Circle" in the name field. The sort field is sortName (same
content as name, but untokenized for sorting).

I tried the following in luke which should give me a few hundred docs:

name:["A Balladeer*" TO "A Perfect Circle*"] - 0 results, also there
should be some
name:["A Balladeer*" TO "B*" - >10k results, but also returns results
which have a string in the middle or end starting with A

I tried using sortName (untokenized) field instead:
sortName:["A Balladeer*" TO "B*" - 25 results, all starting with A*
(guess since it's untokenized), but far less than expected again

Tried a couple of more (stupid) things with little success. I googled
around, but I'm kinda stuck here. So I'm asking the list. How can I
search all name/sortName fields in a range between "A Balladeer*" TO "A
Perfect Circle*" and get only terms back which are starting with that
terms? Is there a way to accomplish that in Java and try it in luke?

And is there a way to sort resultsets in luke?

Daniel Rosher
d: 0207 3489 912

t: 0845 4680 568

f: 0845 4680 868


Beaumont House, Kensington Village, Avonmore Road, London, W14 8TS

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

This message is sent in confidence for the addressee only. It may contain privileged

information. The contents are not to be disclosed to anyone other than the addressee.

Unauthorised recipients are requested to preserve this confidentiality and to advise

us of any errors in transmission. Thank you.

hotonline ltd is registered in England & Wales. Registered office: One Canada Square,

Canary Wharf, London E14 5AP. Registered No: 1904765.

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 5 | next ›
Discussion Overview
groupjava-user @
postedJul 25, '08 at 8:54a
activeJul 25, '08 at 9:36a



site design / logo © 2021 Grokbase