Grokbase Groups Lucene dev July 2010
[ ]

Jingkei Ly commented on LUCENE-2557:

I dont understand why we need to average any idfs? this seems really costly and i think in general the idea of fuzzy is to find misspellings.
I agree that fuzzy is to find misspellings, but I don't think it should favour misspellings above an exact match. I think the reasoning behind the average IDFs (I based that on comments in LUCENE-329), is that in the absence of an IDF from the exact match it's better than nothing to have an average of the terms you do know. Perhaps, there is a better heuristic for that case, though.

furthermore i dont understand why its important if the idf if the query term exists in the index or not, because the query itself could be misspelled.
I think it's a fair assumption that users are searching for specific terms (+fore:john +sur:smith), so are unlikely that they would have a misspelling in the original query. If they did misspell it and got erroneous results, it seems it's immediately clear that the cause is a misspelt query.

FuzzyQuery - fuzzy terms and misspellings are ranked higher than exact matches

Key: LUCENE-2557
Project: Lucene - Java
Issue Type: Bug
Components: Query/Scoring
Affects Versions: 3.0.2
Reporter: Jingkei Ly
Attachments: idf-scoring-test-case.patch, LUCENE-2557.patch

The FuzzyQuery often causes misspellings to be ranked higher than the exact match, which seems to be an undesirable property generally.
For example, in an index of surnames, if I search using a FuzzyQuery for "smith", the misspellings such as "smiith", or "smiht" would appear near the top of the search results ahead of documents that match "smith".
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 7 of 15 | next ›
Discussion Overview
groupdev @
postedJul 23, '10 at 4:23p
activeJul 26, '10 at 4:26p

1 user in discussion

Mark Harwood (JIRA): 15 posts



site design / logo © 2021 Grokbase