FAQ
Figures...I don't even think removing those pieces from OR will
work...that will just skip both pieces because they will appear as pure
numbers. What you want is a bit tricky. I will think about it if someone
else doesn't chime in...it is difficult to recognize one token and then
return it as two, though not impossible of course...

- Mark

Van Nguyen wrote:
It won't do what I need. I may have something like:

"All-In-One is located in 92226-4446 and has an E-A-R"

I want it to be tokenized as follows:

all
one
located
92226
4446
E-A-R

Right now... it is tokenizing it as this:

all
one
located
92226-4446
E-A-R



-----Original Message-----
From: Erick Erickson
Sent: Thursday, January 11, 2007 6:11 PM
To: java-user@lucene.apache.org
Subject: Re: Modifying StandardAnalyzer

Would it be simpler just to modify the input with a regex rather than
risk
messing with StandardANalyzer? Or wouldn't that do what you need?
On 1/11/07, Van Nguyen wrote:

Hi,



I need to modify the StandardAnalyzer so that it will tokenize zip codes
that look like this:



92626-2646



I think the part I need to modify is in here - specifically:



<HAS_DIGIT> <P> <ALPHANUM>



// floating point, serial, model numbers, ip addresses, etc.

// every other segment must have at least one digit
<NUM: (<ALPHANUM> <P> <HAS_DIGIT>
<HAS_DIGIT> <P> <ALPHANUM>
<HAS_DIGIT> <M>
<HAS_DIGIT> (<P> <HAS_DIGIT>)+ <M>
<LETTER> (<P> <LETTER>)+
<ALPHANUM> (<P> <HAS_DIGIT> <P> <ALPHANUM>)+
<HAS_DIGIT> (<P> <ALPHANUM> <P> <HAS_DIGIT>)+
<ALPHANUM> <P> <HAS_DIGIT> (<P> <ALPHANUM> <P> <HAS_DIGIT>)+
<HAS_DIGIT> <P> <ALPHANUM> (<P> <HAS_DIGIT> <P> <ALPHANUM>)+ )


Is there a way to keep that line so that the StandardAnalyzer works as
is - but tokenize anything that looks like



(HAS_DIGITS) <P>) | (<HAS_DIGITS> <P> <HAS_DIGITS>) or even better:



(<DIGIT><DIGIT><DIGIT><DIGIT><DIGIT><P>) |
<DIGIT><DIGIT><DIGIT><DIGIT><DIGIT><P><DIGIT><DIGIT><DIGIT><DIGIT>) - I
have zip codes that look like 92626, 92626-, and 92626-2646



I've tried adding that both lines to the "SKIP" section - but to no
avail.




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 6 of 6 | next ›
Discussion Overview
groupjava-user @
categorieslucene
postedJan 12, '07 at 1:13a
activeJan 13, '07 at 1:01a
posts6
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase