[ https://issues.apache.org/jira/browse/LUCENE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661302#action_12661302 ]

Otis Gospodnetic commented on LUCENE-1513:

I feel like I missed some FastSS discussion on the list.... was there one?

I took a quick look at the paper and the code. Is the following the general idea:
# index "fuzzy"/"misspelled" terms in addition to the normal terms (=> larger index, slower indexing). How much fuzziness one wants to allow or handle is decided at index time.
# rewrite the query to include variations/misspellings of each terms and use that to search (=> more clauses, slower than normal search, but faster than the "normal" fuzzy query whose speed depends on the number of indexed terms)

Quick code comments:
* Need to add ASL
* Need to replace tabs with 2 spaces and formatting in FuzzyHitCollector
* No @author
* Unit test if possible
* Should FastSSwC not be able to take a variable K?
* Should variables named after types (e.g. "set" in public static String getNeighborhoodString(Set<String> set) { ) be renamed, so they describe what's in them instead? (easier to understand API?)

fastss fuzzyquery

Key: LUCENE-1513
URL: https://issues.apache.org/jira/browse/LUCENE-1513
Project: Lucene - Java
Issue Type: New Feature
Components: contrib/*
Reporter: Robert Muir
Priority: Minor
Attachments: fastSSfuzzy.zip

code for doing fuzzyqueries with fastssWC algorithm.
FuzzyIndexer: given a lucene field, it enumerates all terms and creates an auxiliary offline index for fuzzy queries.
FastFuzzyQuery: similar to fuzzy query except it queries the auxiliary index to retrieve a candidate list. this list is then verified with levenstein algorithm.
sorry but the code is a bit messy... what I'm actually using is very different from this so its pretty much untested. but at least you can see whats going on or fix it up.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 17 | next ›
Discussion Overview
groupjava-dev @
postedJan 6, '09 at 6:03p
activeJan 7, '09 at 1:30a



site design / logo © 2021 Grokbase