FAQ
Hi,

I'm fairly new to Lucene. I'd like to know how we can index synonyms for
multiple words.

This is the scenario:

Consider a sentence: AAA BBB WORD1 WORD2 EEE FFF GGG.

Now assume the two words combined WORD1 WORD2 can be replaced by another
word SYN.

If I place SYN after WORD1 with positionIncrement set to 0, WORD2 will
which is incorrect; and the other way round if I place it after WORD2.

If any of you have solved a similar problem, I'd be thankful if you could
share some light on
the solution.

Regards,
Sumukh

## Search Discussions

•  at Mar 2, 2009 at 2:51 pm ⇧
This has been discussed in the user list, so searching there

See: http://wiki.apache.org/lucene-java/MailingListArchives

I don't remember the results, but...

Best
Erick
On Mon, Mar 2, 2009 at 9:13 AM, Sumukh wrote:

Hi,

I'm fairly new to Lucene. I'd like to know how we can index synonyms for
multiple words.

This is the scenario:

Consider a sentence: AAA BBB WORD1 WORD2 EEE FFF GGG.

Now assume the two words combined WORD1 WORD2 can be replaced by another
word SYN.

If I place SYN after WORD1 with positionIncrement set to 0, WORD2 will
which is incorrect; and the other way round if I place it after WORD2.

If any of you have solved a similar problem, I'd be thankful if you could
share some light on
the solution.

Regards,
Sumukh
•  at Mar 2, 2009 at 3:07 pm ⇧
Shouldn't WORD2's position be 1 more than your SYN?

Ie, don't you want these positions?:

WORD1 2
WORD2 3
SYN 2

The position is the starting position of the token; Lucene doesn't
store an ending position

Mike

Sumukh wrote:
Hi,

I'm fairly new to Lucene. I'd like to know how we can index synonyms
for
multiple words.

This is the scenario:

Consider a sentence: AAA BBB WORD1 WORD2 EEE FFF GGG.

Now assume the two words combined WORD1 WORD2 can be replaced by
another
word SYN.

If I place SYN after WORD1 with positionIncrement set to 0, WORD2 will
which is incorrect; and the other way round if I place it after WORD2.

If any of you have solved a similar problem, I'd be thankful if you
could
share some light on
the solution.

Regards,
Sumukh

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
•  at Mar 2, 2009 at 3:39 pm ⇧
I think his problem is, that "SYN" is a synonym for the phrase "WORD1
WORD2". Using these positions, a phrase like "SYN WORD2" would also match
(or other problems in queries that depend on order of words).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]
-----Original Message-----
From: Michael McCandless
Sent: Monday, March 02, 2009 4:07 PM
To: [email protected]
Subject: Re: Indexing synonyms for multiple words

Shouldn't WORD2's position be 1 more than your SYN?

Ie, don't you want these positions?:

WORD1 2
WORD2 3
SYN 2

The position is the starting position of the token; Lucene doesn't
store an ending position

Mike

Sumukh wrote:
Hi,

I'm fairly new to Lucene. I'd like to know how we can index synonyms
for
multiple words.

This is the scenario:

Consider a sentence: AAA BBB WORD1 WORD2 EEE FFF GGG.

Now assume the two words combined WORD1 WORD2 can be replaced by
another
word SYN.

If I place SYN after WORD1 with positionIncrement set to 0, WORD2 will
which is incorrect; and the other way round if I place it after WORD2.

If any of you have solved a similar problem, I'd be thankful if you
could
share some light on
the solution.

Regards,
Sumukh

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
•  at Mar 2, 2009 at 4:42 pm ⇧
Since Lucene doesn't represent/store end position for a token, I don't
think the index can properly represent SYN spanning two positions?

I suppose you could encode this into payloads, and create a custom
query that would look at the payload to enforce the constraint.

Or, if you switch to doing SYN expansion only at runtime (not adding
it to the index), that might work.

Mike

Uwe Schindler wrote:
I think his problem is, that "SYN" is a synonym for the phrase "WORD1
WORD2". Using these positions, a phrase like "SYN WORD2" would also
match
(or other problems in queries that depend on order of words).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]
-----Original Message-----
From: Michael McCandless
Sent: Monday, March 02, 2009 4:07 PM
To: [email protected]
Subject: Re: Indexing synonyms for multiple words

Shouldn't WORD2's position be 1 more than your SYN?

Ie, don't you want these positions?:

WORD1 2
WORD2 3
SYN 2

The position is the starting position of the token; Lucene doesn't
store an ending position

Mike

Sumukh wrote:
Hi,

I'm fairly new to Lucene. I'd like to know how we can index synonyms
for
multiple words.

This is the scenario:

Consider a sentence: AAA BBB WORD1 WORD2 EEE FFF GGG.

Now assume the two words combined WORD1 WORD2 can be replaced by
another
word SYN.

If I place SYN after WORD1 with positionIncrement set to 0, WORD2
will
which is incorrect; and the other way round if I place it after
WORD2.

If any of you have solved a similar problem, I'd be thankful if you
could
share some light on
the solution.

Regards,
Sumukh

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
•  at Mar 3, 2009 at 1:14 am ⇧
Thanks for your suggestion Michael and thanks to Uwe for clarifying.

Payload is currently used to store only the start positions.
What I gathered from your suggestion is that we could possibly
store the end position, or span, or some other complex
encoding in order to store the extra information.
Am I right?

--Sumukh

Michael McCandless-2 wrote:

Since Lucene doesn't represent/store end position for a token, I don't
think the index can properly represent SYN spanning two positions?

I suppose you could encode this into payloads, and create a custom
query that would look at the payload to enforce the constraint.

Or, if you switch to doing SYN expansion only at runtime (not adding
it to the index), that might work.

Mike

Uwe Schindler wrote:
I think his problem is, that "SYN" is a synonym for the phrase "WORD1
WORD2". Using these positions, a phrase like "SYN WORD2" would also
match
(or other problems in queries that depend on order of words).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]
-----Original Message-----
From: Michael McCandless
Sent: Monday, March 02, 2009 4:07 PM
To: [email protected]
Subject: Re: Indexing synonyms for multiple words

Shouldn't WORD2's position be 1 more than your SYN?

Ie, don't you want these positions?:

WORD1 2
WORD2 3
SYN 2

The position is the starting position of the token; Lucene doesn't
store an ending position

Mike

Sumukh wrote:
Hi,

I'm fairly new to Lucene. I'd like to know how we can index synonyms
for
multiple words.

This is the scenario:

Consider a sentence: AAA BBB WORD1 WORD2 EEE FFF GGG.

Now assume the two words combined WORD1 WORD2 can be replaced by
another
word SYN.

If I place SYN after WORD1 with positionIncrement set to 0, WORD2
will
which is incorrect; and the other way round if I place it after
WORD2.

If any of you have solved a similar problem, I'd be thankful if you
could
share some light on
the solution.

Regards,
Sumukh

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

--
View this message in context: http://www.nabble.com/Indexing-synonyms-for-multiple-words-tp22289069p22300656.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
•  at Mar 3, 2009 at 3:41 pm ⇧
Actually, the start position of each token is stored in the "normal"
Lucene index (in the *.prx files), not using payloads.

Payloads are entirely for per-token extensibility (ie, core Lucene
doesn't use them by default): you'd have to create your own analyzer
to attach payloads to tokens, and then do something with them at
search time.

So I suggested you could store the end position of each token into the
Payload, but then you'd need to implement a Query class to use this
during searching.

Mike

Sumukh wrote:
Thanks for your suggestion Michael and thanks to Uwe for clarifying.

Payload is currently used to store only the start positions.
What I gathered from your suggestion is that we could possibly
store the end position, or span, or some other complex
encoding in order to store the extra information.
Am I right?

--Sumukh

Michael McCandless-2 wrote:

Since Lucene doesn't represent/store end position for a token, I
don't
think the index can properly represent SYN spanning two positions?

I suppose you could encode this into payloads, and create a custom
query that would look at the payload to enforce the constraint.

Or, if you switch to doing SYN expansion only at runtime (not adding
it to the index), that might work.

Mike

Uwe Schindler wrote:
I think his problem is, that "SYN" is a synonym for the phrase
"WORD1
WORD2". Using these positions, a phrase like "SYN WORD2" would also
match
(or other problems in queries that depend on order of words).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]
-----Original Message-----
From: Michael McCandless
Sent: Monday, March 02, 2009 4:07 PM
To: [email protected]
Subject: Re: Indexing synonyms for multiple words

Shouldn't WORD2's position be 1 more than your SYN?

Ie, don't you want these positions?:

WORD1 2
WORD2 3
SYN 2

The position is the starting position of the token; Lucene doesn't
store an ending position

Mike

Sumukh wrote:
Hi,

I'm fairly new to Lucene. I'd like to know how we can index
synonyms
for
multiple words.

This is the scenario:

Consider a sentence: AAA BBB WORD1 WORD2 EEE FFF GGG.

Now assume the two words combined WORD1 WORD2 can be replaced by
another
word SYN.

If I place SYN after WORD1 with positionIncrement set to 0, WORD2
will
which is incorrect; and the other way round if I place it after
WORD2.

If any of you have solved a similar problem, I'd be thankful if
you
could
share some light on
the solution.

Regards,
Sumukh

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

--
View this message in context: http://www.nabble.com/Indexing-synonyms-for-multiple-words-tp22289069p22300656.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
•  at Mar 2, 2009 at 3:27 pm ⇧
Hi,

I'm fairly new to Lucene. I'd like to know how we can index synonyms for
multiple words.

This is the scenario:

Consider a sentence: AAA BBB WORD1 WORD2 EEE FFF GGG.

Now assume the two words combined WORD1 WORD2 can be replaced by another
word SYN.

If I place SYN after WORD1 with positionIncrement set to 0, WORD2 will
which is incorrect; and the other way round if I place it after WORD2.

If any of you have solved a similar problem, I'd be thankful if you could
share some light on
the solution.

Regards,
Sumukh

## Related Discussions

Discussion Overview
 group java-user categories lucene posted Mar 2, '09 at 2:25p active Mar 3, '09 at 3:41p posts 8 users 4 website lucene.apache.org

### 4 users in discussion

Content

People

Support

Translate

site design / logo © 2023 Grokbase