[ https://issues.apache.org/jira/browse/LUCENE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661314#action_12661314 ]

Robert Muir commented on LUCENE-1513:

otis, discussion was on java-user.

again, I apologize for the messy code. as mentioned there, my setup is very specific to exactly what I am doing and in no way is this code ready. But since i'm currently pretty busy with other things at work I just wanted to put something up here for anyone else interested.

theres the issues you mentioned, and also some i mentioned on java-user. for example how to handle updates to indexes that introduce new terms (they must be added to auxiliary index), or even if auxiliary index is the best approach.

the general idea is that instead of enumerating terms to find terms, the deletion neighborhood as described in the paper is used instead. this way search time is not linear based on number of terms. yes you are correct that it only can guarantee edit distances of K which is determined at index time. perhaps this should be configurable, but i hardcoded k=1 for simplicity. i think its something like 80% of typos...

as i mentioned on the list another idea is you could implement FastSS (not the wC variant) with deletion positions maybe by using payloads. This would require more space but eliminate the candidate verification step. maybe it would be nice to have some of their other algorithms such as block-based,etc available also.

fastss fuzzyquery

Key: LUCENE-1513
URL: https://issues.apache.org/jira/browse/LUCENE-1513
Project: Lucene - Java
Issue Type: New Feature
Components: contrib/*
Reporter: Robert Muir
Priority: Minor
Attachments: fastSSfuzzy.zip

code for doing fuzzyqueries with fastssWC algorithm.
FuzzyIndexer: given a lucene field, it enumerates all terms and creates an auxiliary offline index for fuzzy queries.
FastFuzzyQuery: similar to fuzzy query except it queries the auxiliary index to retrieve a candidate list. this list is then verified with levenstein algorithm.
sorry but the code is a bit messy... what I'm actually using is very different from this so its pretty much untested. but at least you can see whats going on or fix it up.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 17 | next ›
Discussion Overview
groupjava-dev @
postedJan 6, '09 at 6:03p
activeJan 7, '09 at 1:30a



site design / logo © 2021 Grokbase