FAQ
I don't think that is the case. You will have single deletion
neighborhood. The number of unique terms in the field is going to be
the union of the deletion dictionaries of each source term.

For example, given the following documents A which have field 'X'
with value best, and document B with value jest (and k == 1).

A will generate est bst, bet, bes, B will generate est, jest, jst, jes

so field FieldXFuzzy contains
(est:AB,bst:A,bet:A,bes:A,jest:B,jst:B,jes)

I don't think the storage requirement is any greater doing it this way.


3.2.1 Indexing
For all words in a dictionary, and a given number of edit operations
k, FastSS
generates all variant spellings recursively and save them as tuples
of type
v′ ∈ Ud (v, k) → (v, x) where v is a dictionary word and x a
list of deletion
positions.

Theorem 5. Index uses O(nmk+1) space, as it stores al l the variants
for n
dictionary words of length m with k mismatches.


3.2.2 Retrieval
For a query p and edit distance k, first generate the neighborhood Ud
(p, k).
Then compare the words in the neighborhood with the index, and find
matching candidates. Compare deletion positions for each candidate with
the deletion positions in U(p, k), using Theorem 4.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 8 of 18 | next ›
Discussion Overview
groupdev @
categorieslucene
postedJan 6, '09 at 6:03p
activeFeb 19, '10 at 9:28p
posts18
users3
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase