Grokbase Groups Lucene dev July 2010
[ ]

Robert Muir commented on LUCENE-2557:

so here is an option for this issue. we could reword the whole issue as 'improve FuzzyQuery defaults'.

If we were to do this, i would suggest the following at the minimum:
* instead of a default distance of 0.5 (from queryparser), if distance isnt provided (~0.6 etc), calculate one that will perform well and never brute-force compare all the terms.
* instead of a default max expansions of booleanquery max clause count (1024), use a more reasonable # of expansions by default (such as 50)
* instead of the current rewrite, use a rewrite similar to FuzzyLikeThis. maybe we dont need to average docfreq across all 50 terms even, maybe the top-5 or so is sufficient.

If we were to do something like this, maybe we could improve performance and behavior instead of making tradeoffs.

FuzzyQuery - fuzzy terms and misspellings are ranked higher than exact matches

Key: LUCENE-2557
Project: Lucene - Java
Issue Type: Bug
Components: Query/Scoring
Affects Versions: 3.0.2
Reporter: Jingkei Ly
Attachments: idf-scoring-test-case.patch, LUCENE-2557.patch

The FuzzyQuery often causes misspellings to be ranked higher than the exact match, which seems to be an undesirable property generally.
For example, in an index of surnames, if I search using a FuzzyQuery for "smith", the misspellings such as "smiith", or "smiht" would appear near the top of the search results ahead of documents that match "smith".
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 11 of 15 | next ›
Discussion Overview
groupdev @
postedJul 23, '10 at 4:23p
activeJul 26, '10 at 4:26p

1 user in discussion

Mark Harwood (JIRA): 15 posts



site design / logo © 2021 Grokbase