Grokbase Groups Lucene dev June 2016
FAQ
[ https://issues.apache.org/jira/browse/LUCENE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337696#comment-15337696 ]

Michael McCandless commented on LUCENE-6336:
--------------------------------------------

We could explore field collapsing / grouping, but that's maybe somewhat tricky to do with early termination (see LUCENE-7341) and it's somewhat wasteful ... it seems better to dedup once at indexing time? And if it's a simple wrapper around the dictionary, other suggesters could just use that too
AnalyzingInfixSuggester needs duplicate handling
------------------------------------------------

Key: LUCENE-6336
URL: https://issues.apache.org/jira/browse/LUCENE-6336
Project: Lucene - Core
Issue Type: Bug
Affects Versions: 4.10.3, 5.0
Reporter: Jan Høydahl
Labels: lookup, suggester
Attachments: LUCENE-6336.patch


Spinoff from LUCENE-5833 but else unrelated.
Using {{AnalyzingInfixSuggester}} which is backed by a Lucene index and stores payload and score together with the suggest text.
I did some testing with Solr, producing the DocumentDictionary from an index with multiple documents containing the same text, but with random weights between 0-100. Then I got duplicate identical suggestions sorted by weight:
{code}
{
"suggest":{"languages":{
"engl":{
"numFound":101,
"suggestions":[{
"term":"<b>Engl</b>ish",
"weight":100,
"payload":"0"},
{
"term":"<b>Engl</b>ish",
"weight":99,
"payload":"0"},
{
"term":"<b>Engl</b>ish",
"weight":98,
"payload":"0"},
---etc all the way down to 0---
{code}
I also reproduced the same behavior in AnalyzingInfixSuggester directly. So there is a need for some duplicate removal here, either while building the local suggest index or during lookup. Only the highest weight suggestion for a given term should be returned.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedJun 18, '16 at 9:49a
activeJun 18, '16 at 9:49a
posts1
users1
websitelucene.apache.org

1 user in discussion

Michael McCandless (JIRA): 1 post

People

Translate

site design / logo © 2019 Grokbase