this is really no problem at all... use RBBI to identify runs of numbers in
your query string, and then replace them with the normalized version. you
will need icu jar for this.
String userQuery = "Potter 19,99";
Locale locale = new Locale("nl");
RuleBasedBreakIterator bi = (RuleBasedBreakIterator)
RuleBasedBreakIterator.getWordInstance(locale);
NumberFormat nf = NumberFormat.getNumberInstance(locale);
bi.setText(userQuery);
int start = bi.first(); int end = bi.next();
StringBuilder normalizedQuery = new StringBuilder();
while (end != BreakIterator.DONE) { // if its a number parse it and
append it formatted to my locale
if (bi.getRuleStatus() == RuleBasedBreakIterator.WORD_NUMBER) {
normalizedQuery.append(nf.parse(userQuery.substring(start,
end)));
} else {
normalizedQuery.append(userQuery.substring(start, end));
}
start = end;
end = bi.next();
}
after this code:
System.out.println(userQuery);
Potter 19,99
System.out.println(normalizedQuery);
Potter 19.99
On Fri, Mar 27, 2009 at 2:54 AM, Marcel Overdijk
wrote:
That would make sense yes.
But the problem is I'm having a general query filed. I don't know user
entered String or a number, or what he meant... Is 2008 a year (number) or
part of an address String e.g. keeping the address.
Or maybe he's combining stuff like "Potter 19,99"
Robert Muir wrote:
marcel,
I'd suggest parsing/display numbers in a locale-sensitive way with
NumberFormat (be sure to supply correct locale)... and keeping them in the
index one consistent way (i.e. 19.99)
On Thu, Mar 26, 2009 at 6:03 PM, Marcel Overdijk
wrote:
Thanks for your reply.
It's indeed a webapp with a html front-end.
I agree letting end-user enter a Lucene query might not what you want.
Probably I will be using an "all" index which indexes all fields of my
entity. So in the book example including book title, isbn, price,
author.firstname, author.lastname.
The end-user will have an Quick Search option in which he/she can enter
a
query string.
E.g. "Potter" when searching for Harry Potter books or "19,99" / "19.99"
for
books with a price of 19.99.
So I actually don't know for what field the user is searching.
This is also my use case to introduce Lucene/Hibernate Search.
I don't want multiple like's in a SQL query.
Cheers,
Marcel
Erick Erickson wrote:
What does the front end look like? Is it a web page or a custom app? And
do you expect your users to actually enter the field name? I'd be
reluctant
to allow any but the geekiest of users to enter the Lucene syntax
(i.e.
the
field
names). Users shouldn't know anything about the underlying structure. Not
to mention the headaches if you ever want to change it.
So, let's assume an HTML page. *You* know what the underlying field
is no matter what the label on the entry field, so you should be able
to construct the query with the proper field names.
Or I don't understand your problem at all, which is not unusual <G>..
Best
Erick
On Thu, Mar 26, 2009 at 5:32 PM, Marcel Overdijk
wrote:
First of all I'm new into Lucene. I'm experimenting right now with it
in
combination with Hibernate Search.
What I'm wondering is of I can index numbers related to i18n.
E.g. I have a Book entity with a price attribute.
A book with a price of 19.99 can be found while searching for
price:19.99.
The thing is Dutch users will search for 19,99 (different decimal
symbol).
How can this be handled.
Furthermore, Dutch users will search for something like prijs:19,99.
Can this be done with aliases or something. The problem is maybe one
day
I
want to support German language as well.
The front-end app can be translated by simply adding i18n resource
bundles.
Is something like this also possible for searching within Lucene?
Cheers,
Marcel
--
View this message in context:
http://www.nabble.com/i18n-numbers-tp22731528p22731528.htmlSent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]For additional commands, e-mail:
[email protected]--
View this message in context:
http://www.nabble.com/i18n-numbers-tp22731528p22732038.htmlSent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]For additional commands, e-mail:
[email protected]--
Robert Muir
[email protected]--
View this message in context:
http://www.nabble.com/i18n-numbers-tp22731528p22736807.htmlSent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]For additional commands, e-mail:
[email protected]--
Robert Muir
[email protected]