Actually, the start position of each token is stored in the "normal"
Lucene index (in the *.prx files), not using payloads.
Payloads are entirely for per-token extensibility (ie, core Lucene
doesn't use them by default): you'd have to create your own analyzer
to attach payloads to tokens, and then do something with them at
search time.
So I suggested you could store the end position of each token into the
Payload, but then you'd need to implement a Query class to use this
during searching.
Mike
Sumukh wrote:
Thanks for your suggestion Michael and thanks to Uwe for clarifying.
Payload is currently used to store only the start positions.
What I gathered from your suggestion is that we could possibly
store the end position, or span, or some other complex
encoding in order to store the extra information.
Am I right?
--Sumukh
Michael McCandless-2 wrote:
Since Lucene doesn't represent/store end position for a token, I
don't
think the index can properly represent SYN spanning two positions?
I suppose you could encode this into payloads, and create a custom
query that would look at the payload to enforce the constraint.
Or, if you switch to doing SYN expansion only at runtime (not adding
it to the index), that might work.
Mike
Uwe Schindler wrote:
I think his problem is, that "SYN" is a synonym for the phrase
"WORD1
WORD2". Using these positions, a phrase like "SYN WORD2" would also
match
(or other problems in queries that depend on order of words).
Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.deeMail:
[email protected]-----Original Message-----
From: Michael McCandless
Sent: Monday, March 02, 2009 4:07 PM
To:
[email protected]Subject: Re: Indexing synonyms for multiple words
Shouldn't WORD2's position be 1 more than your SYN?
Ie, don't you want these positions?:
WORD1 2
WORD2 3
SYN 2
The position is the starting position of the token; Lucene doesn't
store an ending position
Mike
Sumukh wrote:
Hi,
I'm fairly new to Lucene. I'd like to know how we can index
synonyms
for
multiple words.
This is the scenario:
Consider a sentence: AAA BBB WORD1 WORD2 EEE FFF GGG.
Now assume the two words combined WORD1 WORD2 can be replaced by
another
word SYN.
If I place SYN after WORD1 with positionIncrement set to 0, WORD2
will
follow SYN,
which is incorrect; and the other way round if I place it after
WORD2.
If any of you have solved a similar problem, I'd be thankful if
you
could
share some light on
the solution.
Regards,
Sumukh
---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]For additional commands, e-mail:
[email protected]---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]For additional commands, e-mail:
[email protected]---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]For additional commands, e-mail:
[email protected]--
View this message in context:
http://www.nabble.com/Indexing-synonyms-for-multiple-words-tp22289069p22300656.htmlSent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]For additional commands, e-mail:
[email protected]---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]For additional commands, e-mail:
[email protected]