FAQ

It won't do what I need. I may have something like:

"All-In-One is located in 92226-4446 and has an E-A-R"

I want it to be tokenized as follows:

all
one
located
92226
4446
E-A-R

Right now... it is tokenizing it as this:

all
one
located
92226-4446
E-A-R
Thats the type of information you give when you ask the question the
first time (not to be a pompous ass or anything <g> ). The problem is
that your zip code is match by NUM
<NUM: (<ALPHANUM> <P> <HAS_DIGIT>
<HAS_DIGIT> <P> <ALPHANUM>
<ALPHANUM> (<P> <HAS_DIGIT> <P> <ALPHANUM>)+
<HAS_DIGIT> (<P> <ALPHANUM> <P> <HAS_DIGIT>)+
<ALPHANUM> <P> <HAS_DIGIT> (<P> <ALPHANUM> <P> <HAS_DIGIT>)+
<HAS_DIGIT> <P> <ALPHANUM> (<P> <HAS_DIGIT> <P> <ALPHANUM>)+
)
>

You could try and remove the first two OR options. Other than that, it
gets tricky. And if you remove them than other things they might
normally match (other than zip-codes) will not be matched.

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 6 | next ›
Discussion Overview
groupjava-user @
categorieslucene
postedJan 12, '07 at 1:13a
activeJan 13, '07 at 1:01a
posts6
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase