FAQ
I'd like to search fuzzily but not on a full term.
E.g.
I have a text "Merlot del Ticino"
I'd like
"mer", "merr", "melo", ... to match.

If I use FuzzyQuery only "merlot, "merlott" hit. What Query-combination should I use?

Thx
Clemens


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Uwe Schindler at May 2, 2011 at 11:50 am
    Hi,

    You can pass an integer to FuzzyQuery which defines the number of characters
    that are seen as prefix. So all terms must match this prefix and the rest of
    each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM
    To: java-user@lucene.apache.org
    Subject: "fuzzy prefix" search

    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott" hit. What Query-combination
    should I use?

    Thx
    Clemens


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Clemens Wyss at May 2, 2011 at 12:13 pm
    I tried this too, but unfortunately I only get hits when the search term is a least as long as the word to be looked up.

    E.g.:
    ...
    Directory directory = new RAMDirectory();
    IndexWriter indexWriter = new IndexWriter( directory, IndexManager.getIndexingAnalyzer( LOCALE_DE ),
    IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document();
    document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument( document );

    IndexReader indexReader = indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher( indexReader );

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f, 1 );
    TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals( 1, result.totalHits );
    ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines the number of
    characters that are seen as prefix. So all terms must match this prefix and the
    rest of each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM
    To: java-user@lucene.apache.org
    Subject: "fuzzy prefix" search

    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Clemens Wyss at May 2, 2011 at 9:01 pm
    Is it the combination of FuzzyQuery and Term which makes the search to go for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get hits when the search term is a
    least as long as the word to be looked up.

    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter indexWriter = new
    IndexWriter( directory, IndexManager.getIndexingAnalyzer( LOCALE_DE ),
    IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document();
    document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument( document );

    IndexReader indexReader = indexWriter.getReader(); IndexSearcher
    searcher = new IndexSearcher( indexReader );

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f, 1 ); TopDocs
    result = searcher.search( q, 10 ); Assert.assertEquals( 1, result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines the number of
    characters that are seen as prefix. So all terms must match this
    prefix and the rest of each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM
    To: java-user@lucene.apache.org
    Subject: "fuzzy prefix" search

    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens


    --------------------------------------------------------------------
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Clemens Wyss at May 3, 2011 at 8:57 am
    Sorry for coming back to my issue. Can anybody explain why my "simple" unit test below fails? Any hint/help appreciated.

    Directory directory = new RAMDirectory();
    IndexWriter indexWriter = new IndexWriter( directory, new StandardAnalyzer( Version.LUCENE_31 ), IndexWriter.MaxFieldLength.UNLIMITED );
    Document document = new Document();
    document.add( new Field( "test", "Merlot", Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument( document );
    IndexReader indexReader = indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher( indexReader );
    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, 0, 10 );
    // or Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f);
    TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals( 1, result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which makes the search to go
    for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get hits when the search
    term is a least as long as the word to be looked up.

    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter indexWriter =
    new IndexWriter( directory, IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
    IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document();
    document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader(); IndexSearcher
    searcher = new IndexSearcher( indexReader );

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f, 1 );
    TopDocs result = searcher.search( q, 10 ); Assert.assertEquals( 1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines the number of
    characters that are seen as prefix. So all terms must match this
    prefix and the rest of each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM
    To: java-user@lucene.apache.org
    Subject: "fuzzy prefix" search

    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens


    ------------------------------------------------------------------
    --
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --------------------------------------------------------------------
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ian Lea at May 3, 2011 at 9:06 am
    Mer != mer. The latter will be what is indexed because
    StandardAnalyzer calls LowerCaseFilter.

    --
    Ian.

    On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss wrote:
    Sorry for coming back to my issue. Can anybody explain why my "simple" unit test below fails? Any hint/help appreciated.

    Directory directory = new RAMDirectory();
    IndexWriter indexWriter = new IndexWriter( directory, new StandardAnalyzer( Version.LUCENE_31 ), IndexWriter.MaxFieldLength.UNLIMITED );
    Document document = new Document();
    document.add( new Field( "test", "Merlot", Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument( document );
    IndexReader indexReader = indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher( indexReader );
    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, 0, 10 );
    // or Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f);
    TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals( 1, result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which makes the search to go
    for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get hits when the search
    term is a least as long as the word to be looked up.

    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter indexWriter =
    new IndexWriter( directory, IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
                IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document();
    document.add( new Field( "test", "Merlot",
                Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader(); IndexSearcher
    searcher = new IndexSearcher( indexReader );

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f, 1 );
    TopDocs result = searcher.search( q, 10 ); Assert.assertEquals( 1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines the number of
    characters that are seen as prefix. So all terms must match this
    prefix and the rest of each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM
    To: java-user@lucene.apache.org
    Subject: "fuzzy prefix" search

    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot,  "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens


    ------------------------------------------------------------------
    --
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --------------------------------------------------------------------
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Clemens Wyss at May 3, 2011 at 9:10 am
    Unfortunately lowercasing doesn't help.
    Also, doesn't the FuzzyQuery ignore casing?
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:06
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    Mer != mer. The latter will be what is indexed because StandardAnalyzer
    calls LowerCaseFilter.

    --
    Ian.

    On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss wrote:
    Sorry for coming back to my issue. Can anybody explain why my "simple"
    unit test below fails? Any hint/help appreciated.
    Directory directory = new RAMDirectory(); IndexWriter indexWriter =
    new IndexWriter( directory, new StandardAnalyzer( Version.LUCENE_31 ),
    IndexWriter.MaxFieldLength.UNLIMITED ); Document document = new
    Document(); document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) ); indexWriter.addDocument(
    document ); IndexReader indexReader = indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher( indexReader ); Query q =
    new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, 0, 10 ); // or Query
    q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f); TopDocs result =
    searcher.search( q, 10 ); Assert.assertEquals( 1, result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which makes the search
    to go for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get hits when the search
    term is a least as long as the word to be looked up.

    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter indexWriter =
    new IndexWriter( directory, IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
                IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document(); document.add( new Field(
    "test", "Merlot",
                Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader(); IndexSearcher
    searcher = new IndexSearcher( indexReader );

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f, 1 );
    TopDocs result = searcher.search( q, 10 ); Assert.assertEquals( 1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines the number of
    characters that are seen as prefix. So all terms must match this
    prefix and the rest of each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM
    To: java-user@lucene.apache.org
    Subject: "fuzzy prefix" search

    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot,  "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens


    ---------------------------------------------------------------
    ---
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org


    -----------------------------------------------------------------
    ---
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    -------------------------------------------------------------------
    -- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ian Lea at May 3, 2011 at 9:22 am
    I'd assumed that FuzzyQuery wouldn't ignore case but I could be wrong.
    What would be the edit distance between "mer" and "merlot"? Would it
    be less that 1.5 which I reckon would be the value of length(term)*0.5
    as detailed in the javadocs? Seems unlikely, but I don't really know
    anything about the Levenshtein (edit distance) algorithm as used by
    FuzzyQuery. Wouldn't a PrefixQuery be more appropriate here?


    --
    Ian.
    On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss wrote:
    Unfortunately lowercasing doesn't help.
    Also, doesn't the FuzzyQuery ignore casing?
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:06
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    Mer != mer.  The latter will be what is indexed because StandardAnalyzer
    calls LowerCaseFilter.

    --
    Ian.


    On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss <clemensdev@mysign.ch>
    wrote:
    Sorry for coming back to my issue. Can anybody explain why my "simple"
    unit test below fails? Any hint/help appreciated.
    Directory directory = new RAMDirectory(); IndexWriter indexWriter =
    new IndexWriter( directory, new StandardAnalyzer( Version.LUCENE_31 ),
    IndexWriter.MaxFieldLength.UNLIMITED ); Document document = new
    Document(); document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) ); indexWriter.addDocument(
    document ); IndexReader indexReader = indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher( indexReader ); Query q =
    new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, 0, 10 ); // or Query
    q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f); TopDocs result =
    searcher.search( q, 10 ); Assert.assertEquals( 1, result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which makes the search
    to go for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get hits when the search
    term is a least as long as the word to be looked up.

    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter indexWriter =
    new IndexWriter( directory, IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
                IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document(); document.add( new Field(
    "test", "Merlot",
                Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader(); IndexSearcher
    searcher = new IndexSearcher( indexReader );

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f, 1 );
    TopDocs result = searcher.search( q, 10 ); Assert.assertEquals( 1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines the number of
    characters that are seen as prefix. So all terms must match this
    prefix and the rest of each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM
    To: java-user@lucene.apache.org
    Subject: "fuzzy prefix" search

    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot,  "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens


    ---------------------------------------------------------------
    ---
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org


    -----------------------------------------------------------------
    ---
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    -------------------------------------------------------------------
    -- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Clemens Wyss at May 3, 2011 at 9:26 am

    PrefixQuery
    I'd like the combination of prefix and fuzzy ;-) because people could also type "menlo" or "märl" and in any of these cases I'd like to get a hit on Merlot (for suggesting Merlot)
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:22
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    I'd assumed that FuzzyQuery wouldn't ignore case but I could be wrong.
    What would be the edit distance between "mer" and "merlot"? Would it be
    less that 1.5 which I reckon would be the value of length(term)*0.5 as
    detailed in the javadocs? Seems unlikely, but I don't really know anything
    about the Levenshtein (edit distance) algorithm as used by FuzzyQuery.
    Wouldn't a PrefixQuery be more appropriate here?


    --
    Ian.
    On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss wrote:
    Unfortunately lowercasing doesn't help.
    Also, doesn't the FuzzyQuery ignore casing?
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:06
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    Mer != mer.  The latter will be what is indexed because
    StandardAnalyzer calls LowerCaseFilter.

    --
    Ian.


    On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Sorry for coming back to my issue. Can anybody explain why my
    "simple"
    unit test below fails? Any hint/help appreciated.
    Directory directory = new RAMDirectory(); IndexWriter indexWriter =
    new IndexWriter( directory, new StandardAnalyzer(
    Version.LUCENE_31
    ), IndexWriter.MaxFieldLength.UNLIMITED ); Document document =
    new
    Document(); document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) ); indexWriter.addDocument(
    document ); IndexReader indexReader = indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher( indexReader ); Query q
    = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, 0, 10 ); // or
    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f); TopDocs
    result = searcher.search( q, 10 ); Assert.assertEquals( 1,
    result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which makes the
    search to go for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get hits when the
    search term is a least as long as the word to be looked up.

    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter
    indexWriter = new IndexWriter( directory,
    IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
                IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document(); document.add( new Field(
    "test", "Merlot",
                Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader(); IndexSearcher
    searcher = new IndexSearcher( indexReader );

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f, 1 );
    TopDocs result = searcher.search( q, 10 ); Assert.assertEquals(
    1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines the number
    of characters that are seen as prefix. So all terms must match
    this prefix and the rest of each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM
    To: java-user@lucene.apache.org
    Subject: "fuzzy prefix" search

    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot,  "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens


    ------------------------------------------------------------
    ---
    ---
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org


    --------------------------------------------------------------
    ---
    ---
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org

    ----------------------------------------------------------------
    ---
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org

    ------------------------------------------------------------------
    --- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    -------------------------------------------------------------------
    -- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ian Lea at May 3, 2011 at 9:45 am
    Then why not do that? Add a PrefixQuery and a FuzzyQuery to a
    BooleanQuery and use that.


    --
    Ian.

    On Tue, May 3, 2011 at 10:25 AM, Clemens Wyss wrote:
    PrefixQuery
    I'd like the combination of prefix and fuzzy ;-) because people could also type "menlo" or "märl" and in any of these cases I'd like to get a hit on Merlot (for suggesting Merlot)
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:22
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    I'd assumed that FuzzyQuery wouldn't ignore case but I could be wrong.
     What would be the edit distance between "mer" and "merlot"? Would it be
    less that 1.5 which I reckon would be the value of length(term)*0.5 as
    detailed in the javadocs?  Seems unlikely, but I don't really know anything
    about the Levenshtein (edit distance) algorithm as used by FuzzyQuery.
    Wouldn't a PrefixQuery be more appropriate here?


    --
    Ian.

    On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss <clemensdev@mysign.ch>
    wrote:
    Unfortunately lowercasing doesn't help.
    Also, doesn't the FuzzyQuery ignore casing?
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:06
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    Mer != mer.  The latter will be what is indexed because
    StandardAnalyzer calls LowerCaseFilter.

    --
    Ian.


    On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Sorry for coming back to my issue. Can anybody explain why my
    "simple"
    unit test below fails? Any hint/help appreciated.
    Directory directory = new RAMDirectory(); IndexWriter indexWriter =
    new IndexWriter( directory, new StandardAnalyzer(
    Version.LUCENE_31
    ), IndexWriter.MaxFieldLength.UNLIMITED ); Document document =
    new
    Document(); document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) ); indexWriter.addDocument(
    document ); IndexReader indexReader = indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher( indexReader ); Query q
    = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, 0, 10 ); // or
    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f); TopDocs
    result = searcher.search( q, 10 ); Assert.assertEquals( 1,
    result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which makes the
    search to go for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get hits when the
    search term is a least as long as the word to be looked up.

    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter
    indexWriter = new IndexWriter( directory,
    IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
                IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document(); document.add( new Field(
    "test", "Merlot",
                Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader(); IndexSearcher
    searcher = new IndexSearcher( indexReader );

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f, 1 );
    TopDocs result = searcher.search( q, 10 ); Assert.assertEquals(
    1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines the number
    of characters that are seen as prefix. So all terms must match
    this prefix and the rest of each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM
    To: java-user@lucene.apache.org
    Subject: "fuzzy prefix" search

    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot,  "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens


    ------------------------------------------------------------
    ---
    ---
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org


    --------------------------------------------------------------
    ---
    ---
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org

    ----------------------------------------------------------------
    ---
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org

    ------------------------------------------------------------------
    --- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    -------------------------------------------------------------------
    -- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Otis Gospodnetic at May 3, 2011 at 11:36 am
    Hi,

    I didn't read this thread closely, but just in case:
    * Is this something you can handle with synonyms?
    * If this is for English and you are trying to handle typos, there is a list of
    common English misspellings out there that you could use for this perhaps.
    * Have you considered n-gramming your tokens? Not sure if this would help,
    didn't read messages/examples closely enough, but you may want to look at this
    if you haven't done so yet.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Tue, May 3, 2011 5:25:30 AM
    Subject: AW: "fuzzy prefix" search
    PrefixQuery
    I'd like the combination of prefix and fuzzy ;-) because people could also
    type "menlo" or "märl" and in any of these cases I'd like to get a hit on
    Merlot (for suggesting Merlot)
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:22
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    I'd assumed that FuzzyQuery wouldn't ignore case but I could be wrong.
    What would be the edit distance between "mer" and "merlot"? Would it be
    less that 1.5 which I reckon would be the value of length(term)*0.5 as
    detailed in the javadocs? Seems unlikely, but I don't really know anything
    about the Levenshtein (edit distance) algorithm as used by FuzzyQuery.
    Wouldn't a PrefixQuery be more appropriate here?


    --
    Ian.

    On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss <clemensdev@mysign.ch>
    wrote:
    Unfortunately lowercasing doesn't help.
    Also, doesn't the FuzzyQuery ignore casing?
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:06
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    Mer != mer. The latter will be what is indexed because
    StandardAnalyzer calls LowerCaseFilter.

    --
    Ian.


    On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Sorry for coming back to my issue. Can anybody explain why my
    "simple"
    unit test below fails? Any hint/help appreciated.
    Directory directory = new RAMDirectory(); IndexWriter indexWriter =
    new IndexWriter( directory, new StandardAnalyzer(
    Version.LUCENE_31
    ), IndexWriter.MaxFieldLength.UNLIMITED ); Document document =
    new
    Document(); document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) ); indexWriter.addDocument(
    document ); IndexReader indexReader = indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher( indexReader ); Query q
    = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, 0, 10 ); // or
    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f); TopDocs
    result = searcher.search( q, 10 ); Assert.assertEquals( 1,
    result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which makes the
    search to go for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get hits when the
    search term is a least as long as the word to be looked up.

    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter
    indexWriter = new IndexWriter( directory,
    IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
    IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document(); document.add( new Field(
    "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader(); IndexSearcher
    searcher = new IndexSearcher( indexReader );

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f, 1 );
    TopDocs result = searcher.search( q, 10 ); Assert.assertEquals(
    1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines the number
    of characters that are seen as prefix. So all terms must match
    this prefix and the rest of each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM
    To: java-user@lucene.apache.org
    Subject: "fuzzy prefix" search

    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens


    ------------------------------------------------------------
    ---
    ---
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org


    --------------------------------------------------------------
    ---
    ---
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org

    ----------------------------------------------------------------
    ---
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org

    ------------------------------------------------------------------
    --- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    -------------------------------------------------------------------
    -- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Clemens Wyss at May 3, 2011 at 4:58 pm
    How does an simple Analyzer look that just "n-grams" the docs/fields.

    class SimpleNGramAnalyzer extends Analyzer
    {
    @Override
    public TokenStream tokenStream ( String fieldName, Reader reader )
    {
    EdgeNGramTokenFilter... ???
    }
    }
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 13:36
    An: java-user@lucene.apache.org
    Betreff: Re: AW: "fuzzy prefix" search

    Hi,

    I didn't read this thread closely, but just in case:
    * Is this something you can handle with synonyms?
    * If this is for English and you are trying to handle typos, there is a list of
    common English misspellings out there that you could use for this perhaps.
    * Have you considered n-gramming your tokens? Not sure if this would help,
    didn't read messages/examples closely enough, but you may want to look at
    this if you haven't done so yet.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem
    search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Tue, May 3, 2011 5:25:30 AM
    Subject: AW: "fuzzy prefix" search
    PrefixQuery
    I'd like the combination of prefix and fuzzy ;-) because people could
    also type "menlo" or "märl" and in any of these cases I'd like to get
    a hit on Merlot (for suggesting Merlot)
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:22
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    I'd assumed that FuzzyQuery wouldn't ignore case but I could be wrong.
    What would be the edit distance between "mer" and "merlot"? Would
    it be less that 1.5 which I reckon would be the value of
    length(term)*0.5 as detailed in the javadocs? Seems unlikely, but
    I don't really know anything about the Levenshtein (edit distance)
    algorithm as used by FuzzyQuery.
    Wouldn't a PrefixQuery be more appropriate here?


    --
    Ian.

    On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Unfortunately lowercasing doesn't help.
    Also, doesn't the FuzzyQuery ignore casing?
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:06
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    Mer != mer. The latter will be what is indexed because
    StandardAnalyzer calls LowerCaseFilter.

    --
    Ian.


    On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Sorry for coming back to my issue. Can anybody explain why my
    "simple"
    unit test below fails? Any hint/help appreciated.
    Directory directory = new RAMDirectory(); IndexWriter
    indexWriter = new IndexWriter( directory, new
    StandardAnalyzer(
    Version.LUCENE_31
    ), IndexWriter.MaxFieldLength.UNLIMITED ); Document document
    =
    new
    Document(); document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document ); IndexReader indexReader =
    indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher( indexReader );
    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, 0,
    10 ); // or Query q = new FuzzyQuery( new Term( "test", "Mer"
    ), 0.5f); TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals( 1, result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which makes the
    search to go for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get hits when
    the search term is a least as long as the word to be looked up.

    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter
    indexWriter = new IndexWriter( directory, >> >> >
    IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
    IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document(); document.add( new
    Field(
    "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader();
    IndexSearcher
    searcher = new IndexSearcher( indexReader ); >> >> >
    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f,
    1 ); TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals(
    1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines the
    number of characters that are seen as prefix. So all
    terms must match
    this prefix and the rest of each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM >> > > > To:
    java-user@lucene.apache.org
    Subject: "fuzzy prefix" search >> >> > > >
    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens



    --------------------------------------------------------
    ----
    ---
    ---
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >> > >


    ----------------------------------------------------------
    ----
    ---
    ---
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org
    --------------------------------------------------------------
    --
    ---
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org


    --------------------------------------------------------------
    ----
    --- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >

    ---------------------------------------------------------------
    ----
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >

    -----------------------------------------------------------------
    ----
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >

    ------------------------------------------------------------------
    ---
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --------------------------------------------------------------------
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Otis Gospodnetic at May 3, 2011 at 7:31 pm
    Clemens,

    Something a la:

    public TokenStream tokenStream (String fieldName, Reader r) {
    return nw EdgeNGramTokenFilter(new KeywordTokenizer(r),
    EdgeNGramTokenFilter.Side.FRONT, 1, 4);
    }


    Check out page 265 of Lucene in Action 2.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Tue, May 3, 2011 12:57:39 PM
    Subject: AW: AW: "fuzzy prefix" search

    How does an simple Analyzer look that just "n-grams" the docs/fields.

    class SimpleNGramAnalyzer extends Analyzer
    {
    @Override
    public TokenStream tokenStream ( String fieldName, Reader reader )
    {
    EdgeNGramTokenFilter... ???
    }
    }
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 13:36
    An: java-user@lucene.apache.org
    Betreff: Re: AW: "fuzzy prefix" search

    Hi,

    I didn't read this thread closely, but just in case:
    * Is this something you can handle with synonyms?
    * If this is for English and you are trying to handle typos, there is a list of
    common English misspellings out there that you could use for this perhaps.
    * Have you considered n-gramming your tokens? Not sure if this would help,
    didn't read messages/examples closely enough, but you may want to look at
    this if you haven't done so yet.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem
    search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Tue, May 3, 2011 5:25:30 AM
    Subject: AW: "fuzzy prefix" search
    PrefixQuery
    I'd like the combination of prefix and fuzzy ;-) because people could
    also type "menlo" or "märl" and in any of these cases I'd like to get
    a hit on Merlot (for suggesting Merlot)
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:22
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    I'd assumed that FuzzyQuery wouldn't ignore case but I could be
    wrong.
    What would be the edit distance between "mer" and "merlot"? Would
    it be less that 1.5 which I reckon would be the value of
    length(term)*0.5 as detailed in the javadocs? Seems unlikely, but
    I don't really know anything about the Levenshtein (edit distance)
    algorithm as used by FuzzyQuery.
    Wouldn't a PrefixQuery be more appropriate here?


    --
    Ian.

    On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Unfortunately lowercasing doesn't help.
    Also, doesn't the FuzzyQuery ignore casing?
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:06
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    Mer != mer. The latter will be what is indexed because
    StandardAnalyzer calls LowerCaseFilter.

    --
    Ian.


    On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Sorry for coming back to my issue. Can anybody explain why my
    "simple"
    unit test below fails? Any hint/help appreciated.
    Directory directory = new RAMDirectory(); IndexWriter
    indexWriter = new IndexWriter( directory, new
    StandardAnalyzer(
    Version.LUCENE_31
    ), IndexWriter.MaxFieldLength.UNLIMITED ); Document document
    =
    new
    Document(); document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document ); IndexReader indexReader =
    indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher( indexReader );
    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, 0,
    10 ); // or Query q = new FuzzyQuery( new Term( "test", "Mer"
    ), 0.5f); TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals( 1, result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which makes the
    search to go for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get hits when
    the search term is a least as long as the word to be looked
    up.
    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter
    indexWriter = new IndexWriter( directory, >> >> >
    IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
    IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document(); document.add( new
    Field(
    "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader();
    IndexSearcher
    searcher = new IndexSearcher( indexReader ); >> >> >
    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f,
    1 ); TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals(
    1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines the
    number of characters that are seen as prefix. So all
    terms must match
    this prefix and the rest of each term is matched using
    fuzzy.
    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM >> > > > To:
    java-user@lucene.apache.org
    Subject: "fuzzy prefix" search >> >> > > >
    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens



    --------------------------------------------------------
    ----
    ---
    ---
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >> > >


    ----------------------------------------------------------
    ----
    ---
    ---
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org
    --------------------------------------------------------------
    --
    ---
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org


    --------------------------------------------------------------
    ----
    --- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >

    ---------------------------------------------------------------
    ----
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >

    -----------------------------------------------------------------
    ----
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >

    ------------------------------------------------------------------
    ---
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --------------------------------------------------------------------
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Clemens Wyss at May 3, 2011 at 8:31 pm
    But doesn't the KeyWordTokenizer extract single words out oft he stream? I would like to create n-grams on the stream (field content) as it is...
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 21:31
    An: java-user@lucene.apache.org
    Betreff: Re: AW: AW: "fuzzy prefix" search

    Clemens,

    Something a la:

    public TokenStream tokenStream (String fieldName, Reader r) {
    return nw EdgeNGramTokenFilter(new KeywordTokenizer(r),
    EdgeNGramTokenFilter.Side.FRONT, 1, 4); }


    Check out page 265 of Lucene in Action 2.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Tue, May 3, 2011 12:57:39 PM
    Subject: AW: AW: "fuzzy prefix" search

    How does an simple Analyzer look that just "n-grams" the docs/fields.

    class SimpleNGramAnalyzer extends Analyzer
    {
    @Override
    public TokenStream tokenStream ( String fieldName, Reader reader )
    {
    EdgeNGramTokenFilter... ???
    }
    }
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 13:36
    An: java-user@lucene.apache.org
    Betreff: Re: AW: "fuzzy prefix" search

    Hi,

    I didn't read this thread closely, but just in case:
    * Is this something you can handle with synonyms?
    * If this is for English and you are trying to handle typos, there is a list of
    common English misspellings out there that you could use for this
    perhaps.
    * Have you considered n-gramming your tokens? Not sure if this would
    help,
    didn't read messages/examples closely enough, but you may want to
    look at
    this if you haven't done so yet.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene
    ecosystem
    search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Tue, May 3, 2011 5:25:30 AM
    Subject: AW: "fuzzy prefix" search
    PrefixQuery
    I'd like the combination of prefix and fuzzy ;-) because people could
    also type "menlo" or "märl" and in any of these cases I'd like to get
    a hit on Merlot (for suggesting Merlot)
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:22
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    I'd assumed that FuzzyQuery wouldn't ignore case but I could be
    wrong.
    What would be the edit distance between "mer" and "merlot"?
    Would
    it be less that 1.5 which I reckon would be the value of
    length(term)*0.5 as detailed in the javadocs? Seems unlikely, but
    I don't really know anything about the Levenshtein (edit distance)
    algorithm as used by FuzzyQuery.
    Wouldn't a PrefixQuery be more appropriate here?


    --
    Ian.

    On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Unfortunately lowercasing doesn't help.
    Also, doesn't the FuzzyQuery ignore casing?
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:06
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    Mer != mer. The latter will be what is indexed because
    StandardAnalyzer calls LowerCaseFilter.

    --
    Ian.


    On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Sorry for coming back to my issue. Can anybody explain why my
    "simple"
    unit test below fails? Any hint/help appreciated.
    Directory directory = new RAMDirectory(); IndexWriter
    indexWriter = new IndexWriter( directory, new
    StandardAnalyzer(
    Version.LUCENE_31
    ), IndexWriter.MaxFieldLength.UNLIMITED ); Document
    document
    =
    new
    Document(); document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document ); IndexReader indexReader =
    indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher( indexReader );
    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, 0,
    10 ); // or Query q = new FuzzyQuery( new Term( "test", "Mer"
    ), 0.5f); TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals( 1, result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which makes the
    search to go for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get hits when
    the search term is a least as long as the word to be looked
    up.
    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter
    indexWriter = new IndexWriter( directory, >> >> >
    IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
    IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document(); document.add(
    new
    Field(
    "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader();
    IndexSearcher
    searcher = new IndexSearcher( indexReader ); >> >> >
    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f,
    1 ); TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals(
    1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines the
    number of characters that are seen as prefix. So all
    terms must match
    this prefix and the rest of each term is matched using
    fuzzy.
    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM >> > > > To:
    java-user@lucene.apache.org
    Subject: "fuzzy prefix" search >> >> > > >
    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens



    --------------------------------------------------------
    ----
    ---
    ---
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >> > >


    ----------------------------------------------------------
    ----
    ---
    ---
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org
    --------------------------------------------------------------
    --
    ---
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org


    --------------------------------------------------------------
    ----
    --- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >

    ---------------------------------------------------------------
    ----
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >

    -----------------------------------------------------------------
    ----
    To unsubscribe, e-mail: java-user-
    unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >

    ------------------------------------------------------------------
    ---
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-
    help@lucene.apache.org

    --------------------------------------------------------------------
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Otis Gospodnetic at May 3, 2011 at 9:12 pm
    Clemens - that's just an example. Stick another tokenizer in there, like
    WhitespaceTokenizer in there, for example.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Tue, May 3, 2011 4:31:14 PM
    Subject: AW: AW: AW: "fuzzy prefix" search

    But doesn't the KeyWordTokenizer extract single words out oft he stream? I
    would like to create n-grams on the stream (field content) as it is...
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 21:31
    An: java-user@lucene.apache.org
    Betreff: Re: AW: AW: "fuzzy prefix" search

    Clemens,

    Something a la:

    public TokenStream tokenStream (String fieldName, Reader r) {
    return nw EdgeNGramTokenFilter(new KeywordTokenizer(r),
    EdgeNGramTokenFilter.Side.FRONT, 1, 4); }


    Check out page 265 of Lucene in Action 2.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Tue, May 3, 2011 12:57:39 PM
    Subject: AW: AW: "fuzzy prefix" search

    How does an simple Analyzer look that just "n-grams" the docs/fields.

    class SimpleNGramAnalyzer extends Analyzer
    {
    @Override
    public TokenStream tokenStream ( String fieldName, Reader reader )
    {
    EdgeNGramTokenFilter... ???
    }
    }
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 13:36
    An: java-user@lucene.apache.org
    Betreff: Re: AW: "fuzzy prefix" search

    Hi,

    I didn't read this thread closely, but just in case:
    * Is this something you can handle with synonyms?
    * If this is for English and you are trying to handle typos, there is a
    list
    of
    common English misspellings out there that you could use for this
    perhaps.
    * Have you considered n-gramming your tokens? Not sure if this would
    help,
    didn't read messages/examples closely enough, but you may want to
    look at
    this if you haven't done so yet.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene
    ecosystem
    search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Tue, May 3, 2011 5:25:30 AM
    Subject: AW: "fuzzy prefix" search
    PrefixQuery
    I'd like the combination of prefix and fuzzy ;-) because people
    could
    also type "menlo" or "märl" and in any of these cases I'd like to
    get
    a hit on Merlot (for suggesting Merlot)
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:22
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    I'd assumed that FuzzyQuery wouldn't ignore case but I could be
    wrong.
    What would be the edit distance between "mer" and "merlot"?
    Would
    it be less that 1.5 which I reckon would be the value of
    length(term)*0.5 as detailed in the javadocs? Seems unlikely,
    but
    I don't really know anything about the Levenshtein (edit
    distance)
    algorithm as used by FuzzyQuery.
    Wouldn't a PrefixQuery be more appropriate here?


    --
    Ian.

    On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Unfortunately lowercasing doesn't help.
    Also, doesn't the FuzzyQuery ignore casing?
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:06
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    Mer != mer. The latter will be what is indexed because
    StandardAnalyzer calls LowerCaseFilter.

    --
    Ian.


    On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Sorry for coming back to my issue. Can anybody explain why
    my
    "simple"
    unit test below fails? Any hint/help appreciated.
    Directory directory = new RAMDirectory(); IndexWriter
    indexWriter = new IndexWriter( directory, new
    StandardAnalyzer(
    Version.LUCENE_31
    ), IndexWriter.MaxFieldLength.UNLIMITED ); Document
    document
    =
    new
    Document(); document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document ); IndexReader indexReader =
    indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher( indexReader );
    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f,
    0,
    10 ); // or Query q = new FuzzyQuery( new Term( "test",
    "Mer"
    ), 0.5f); TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals( 1, result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which makes
    the
    search to go for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get hits when
    the search term is a least as long as the word to be
    looked
    up.
    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter
    indexWriter = new IndexWriter( directory, >> >> >
    IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
    IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document(); document.add(
    new
    Field(
    "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader();
    IndexSearcher
    searcher = new IndexSearcher( indexReader ); >> >> >
    Query q = new FuzzyQuery( new Term( "test", "Mer" ),
    0.6f,
    1 ); TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals(
    1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines
    the
    number of characters that are seen as prefix. So all
    terms must match
    this prefix and the rest of each term is matched using
    fuzzy.
    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM >> > > > To:
    java-user@lucene.apache.org
    Subject: "fuzzy prefix" search >> >> > > >
    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott" hit.
    What
    Query-combination should I use?

    Thx
    Clemens


    --------------------------------------------------------
    ----
    ---
    ---
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >> > >

    ----------------------------------------------------------
    ----
    ---
    ---
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org
    --------------------------------------------------------------
    --
    ---
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org

    --------------------------------------------------------------
    ----
    --- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >
    ---------------------------------------------------------------
    ----
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >
    -----------------------------------------------------------------
    ----
    To unsubscribe, e-mail: java-user-
    unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >
    ------------------------------------------------------------------
    ---
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-
    help@lucene.apache.org
    --------------------------------------------------------------------
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Clemens Wyss at May 4, 2011 at 6:08 am
    I know this is just an example.
    But even the WhitespaceAnalyzer takes the words apart, which I don't want. I would like the phrases as they are (maximum 3 words, e.g. "Merlot del Ticino", ...) to be n-gram-ed. I hence want to have the n-grams.
    Mer
    Merl
    Merlo
    Merlot
    Merlot
    Merlot d
    ...

    Regards
    Clemens
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 23:12
    An: java-user@lucene.apache.org
    Betreff: Re: AW: AW: AW: "fuzzy prefix" search

    Clemens - that's just an example. Stick another tokenizer in there, like
    WhitespaceTokenizer in there, for example.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem
    search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Tue, May 3, 2011 4:31:14 PM
    Subject: AW: AW: AW: "fuzzy prefix" search

    But doesn't the KeyWordTokenizer extract single words out oft he
    stream? I would like to create n-grams on the stream (field content) as it
    is...
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 21:31
    An: java-user@lucene.apache.org
    Betreff: Re: AW: AW: "fuzzy prefix" search

    Clemens,

    Something a la:

    public TokenStream tokenStream (String fieldName, Reader r) {
    return nw EdgeNGramTokenFilter(new KeywordTokenizer(r),
    EdgeNGramTokenFilter.Side.FRONT, 1, 4); }


    Check out page 265 of Lucene in Action 2.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene
    ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Tue, May 3, 2011 12:57:39 PM
    Subject: AW: AW: "fuzzy prefix" search

    How does an simple Analyzer look that just "n-grams" the docs/fields.

    class SimpleNGramAnalyzer extends Analyzer { @Override
    public TokenStream tokenStream ( String fieldName, Reader reader )
    {
    EdgeNGramTokenFilter... ???
    }
    }
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 13:36
    An: java-user@lucene.apache.org
    Betreff: Re: AW: "fuzzy prefix" search

    Hi,

    I didn't read this thread closely, but just in case:
    * Is this something you can handle with synonyms?
    * If this is for English and you are trying to handle typos,
    there is a
    list
    of
    common English misspellings out there that you could use for
    this
    perhaps.
    * Have you considered n-gramming your tokens? Not sure if
    this would
    help,
    didn't read messages/examples closely enough, but you may want
    to
    look at
    this if you haven't done so yet.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem
    search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-
    user@lucene.apache.org>
    Sent: Tue, May 3, 2011 5:25:30 AM
    Subject: AW: "fuzzy prefix" search
    PrefixQuery
    I'd like the combination of prefix and fuzzy ;-) because
    people
    could
    also type "menlo" or "märl" and in any of these cases I'd
    like to
    get
    a hit on Merlot (for suggesting Merlot)
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:22 > An:
    java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    I'd assumed that FuzzyQuery wouldn't ignore case but I
    could be
    wrong.
    What would be the edit distance between "mer" and "merlot"?
    Would
    it be less that 1.5 which I reckon would be the value of
    length(term)*0.5 as detailed in the javadocs? Seems
    unlikely,
    but
    I don't really know anything about the Levenshtein (edit
    distance)
    algorithm as used by FuzzyQuery.
    Wouldn't a PrefixQuery be more appropriate here?


    --
    Ian.

    On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Unfortunately lowercasing doesn't help.
    Also, doesn't the FuzzyQuery ignore casing?
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:06
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    Mer != mer. The latter will be what is indexed
    because
    StandardAnalyzer calls LowerCaseFilter.

    --
    Ian.


    On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Sorry for coming back to my issue. Can anybody
    explain why
    my
    "simple"
    unit test below fails? Any hint/help appreciated.
    Directory directory = new RAMDirectory();
    IndexWriter
    indexWriter = new IndexWriter( directory, new
    StandardAnalyzer(
    Version.LUCENE_31
    ), IndexWriter.MaxFieldLength.UNLIMITED ); Document
    document
    =
    new
    Document(); document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document ); IndexReader indexReader =
    indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher(
    indexReader );
    Query q = new FuzzyQuery( new Term( "test", "Mer" ),
    0.5f,
    0,
    10 ); // or Query q = new FuzzyQuery( new Term(
    "test",
    "Mer"
    ), 0.5f); TopDocs result = searcher.search( q, 10
    );
    Assert.assertEquals( 1, result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which
    makes
    the
    search to go for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get
    hits when
    the search term is a least as long as the word to be
    looked
    up.
    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter
    indexWriter = new IndexWriter( directory, >> >>
    IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
    IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document();
    document.add(
    new
    Field(
    "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader();
    IndexSearcher
    searcher = new IndexSearcher( indexReader ); >>
    Query q = new FuzzyQuery( new Term( "test", "Mer" ),
    0.6f,
    1 ); TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals( >> >> > 1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler >>
    Gesendet: Montag, 2. Mai 2011 13:50 >> >> > > An:
    java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines
    the
    number of characters that are seen as prefix. So all
    terms must match
    this prefix and the rest of each term is matched using
    fuzzy.
    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM >> > > > To:
    java-user@lucene.apache.org
    Subject: "fuzzy prefix" search >> >> > > >
    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott" hit.
    What
    Query-combination should I use?

    Thx
    Clemens


    --------------------------------------------------------
    ----
    ---
    ---
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >> > >

    ----------------------------------------------------------
    ----
    ---
    ---
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org
    --------------------------------------------------------------
    --
    ---
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org

    --------------------------------------------------------------
    ----
    --- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >
    ---------------------------------------------------------------
    ----
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >
    -----------------------------------------------------------------
    ----
    To unsubscribe, e-mail: java-user-
    unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >
    ------------------------------------------------------------------
    ---
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-
    help@lucene.apache.org
    --------------------------------------------------------------------
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org > > >
    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org > > >
    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >

    ------------------------------------------------------------------
    ---
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >

    --------------------------------------------------------------------
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at May 4, 2011 at 11:48 am
    Shingles won't to that either, so I suspect you'll have to write a custom
    tokenizer.

    Best
    Erick
    On Wed, May 4, 2011 at 2:07 AM, Clemens Wyss wrote:
    I know this is just an example.
    But even the WhitespaceAnalyzer takes the words apart, which I don't want. I would like the phrases as they are (maximum 3 words, e.g. "Merlot del Ticino", ...) to be n-gram-ed. I hence want to have the n-grams.
    Mer
    Merl
    Merlo
    Merlot
    Merlot
    Merlot d
    ...

    Regards
    Clemens
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 23:12
    An: java-user@lucene.apache.org
    Betreff: Re: AW: AW: AW: "fuzzy prefix" search

    Clemens - that's just an example.  Stick another tokenizer in there, like
    WhitespaceTokenizer in there, for example.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem
    search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Tue, May 3, 2011 4:31:14 PM
    Subject: AW: AW: AW: "fuzzy prefix" search

    But doesn't the KeyWordTokenizer extract single words out oft he
    stream? I would  like to create n-grams on the stream (field content) as it
    is...
     -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
     Gesendet: Dienstag, 3. Mai 2011 21:31
    An: java-user@lucene.apache.org
     Betreff: Re: AW: AW: "fuzzy prefix" search

    Clemens,

    Something a la:

    public TokenStream tokenStream (String  fieldName, Reader r) {
      return nw EdgeNGramTokenFilter(new  KeywordTokenizer(r),
    EdgeNGramTokenFilter.Side.FRONT, 1, 4); }


    Check out page 265 of Lucene in Action 2.

     Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene
    ecosystem search :: http://search-lucene.com/



    ----- Original  Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To:  "java-user@lucene.apache.org"  <java-user@lucene.apache.org>
     > Sent: Tue, May 3, 2011 12:57:39 PM
    Subject: AW: AW: "fuzzy  prefix" search

    How does an simple Analyzer look that  just "n-grams" the  docs/fields.

    class  SimpleNGramAnalyzer extends  Analyzer {  @Override
    public TokenStream tokenStream ( String fieldName,   Reader reader )
    {
        EdgeNGramTokenFilter...  ???
    }
    }
     >
    -----Ursprüngliche Nachricht-----
    Von:  Otis  Gospodnetic
     > >  Gesendet: Dienstag, 3. Mai 2011 13:36
    An: java-user@lucene.apache.org
     > >  Betreff: Re: AW: "fuzzy prefix" search
     >
    Hi,

    I  didn't  read this thread closely, but just in case:
    * Is this  something  you can handle with synonyms?
    * If this is for  English and you are  trying to handle typos,
    there is a
    list
     >of
    common English misspellings  out there that you  could use for
    this
    perhaps.
    * Have you  considered  n-gramming your tokens?  Not sure if
    this would
     help,
    didn't read  messages/examples closely enough, but  you may want
    to
    look at
    this if  you haven't done  so yet.

    Otis
    ----
     > > Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
    Lucene  ecosystem
    search :: http://search-lucene.com/  >
    ----- Original  Message  ----
    From: Clemens Wyss <clemensdev@mysign.ch>
     > > To:  "java-user@lucene.apache.org"   <java-
    user@lucene.apache.org>
     > >  > Sent: Tue, May 3, 2011 5:25:30 AM
     Subject: AW: "fuzzy prefix"  search
     > > >PrefixQuery
    I'd like the  combination  of prefix and fuzzy ;-) because
    people
    could
     >also  type "menlo" or "märl" and in any of these cases I'd
    like  to
    get
     >a hit on Merlot (for suggesting  Merlot)
      -----Ursprüngliche  Nachricht-----
    Von: Ian  Lea ÂÂ
     >  Gesendet:  Dienstag, 3. Mai 2011 11:22  > An:
    java-user@lucene.apache.org
     > >  > >  Betreff: Re: "fuzzy prefix" search
     > > >
     > > I'd assumed that  FuzzyQuery  wouldn't ignore case but I
    could be
    wrong.
     > > >  What would be the edit  distance between  "mer"  and "merlot"?
    Would
    it be less that 1.5  which I   reckon would be the value of
     length(term)*0.5 as detailed in  the  javadocs?  Seems
    unlikely,
    but
    I don't really  know anything about   the Levenshtein (edit
    distance)
    algorithm as  used by  FuzzyQuery.
     Wouldn't a PrefixQuery be  more  appropriate here?  >
      --
     Ian.

    On Tue, May  3,  2011 at 10:10 AM, Clemens Wyss
    <clemensdev@mysign.ch>
     > >  >  wrote:
    Unfortunately  lowercasing doesn't  help.
    Also,   doesn't the FuzzyQuery ignore  casing?
     >
      -----Ursprüngliche  Nachricht-----
    Von: Ian Lea ÂÂ
     >  >>  Gesendet: Dienstag, 3. Mai 2011 11:06
     > > > >>  An: java-user@lucene.apache.org
     > >  > >  >> Betreff: Re: "fuzzy prefix"  search
     >>
     >>  Mer != mer.  The latter will be  what is indexed
    because
    StandardAnalyzer calls   LowerCaseFilter.
     > >>   --
    Ian.
     > > > >>
     >>
     > > > >> On  Tue, May 3, 2011 at 9:56 AM,  Clemens  Wyss
    <clemensdev@mysign.ch>
     > >  > >>  wrote:
     > Sorry for coming back  to my issue. Can anybody
    explain why
    my
    "simple"
     > >  >> unit test below fails? Any  hint/help  appreciated.
     > > > > >> >
     >  Directory  directory = new RAMDirectory();
    IndexWriter
     > > > >  >> > indexWriter =  new IndexWriter(  directory, new
     > >> >  StandardAnalyzer(
      Version.LUCENE_31
     > > > > >> > ),   IndexWriter.MaxFieldLength.UNLIMITED  ); Document
    document
     > >  =
    new
     >> > Document();   document.add( new Field( "test",  "Merlot",
     Field.Store.YES,  Field.Index.ANALYZED ) );
      indexWriter.addDocument(
     >> >  document );  IndexReader indexReader =
    indexWriter.getReader();
     > > >> >  IndexSearcher searcher = new  IndexSearcher(
    indexReader );
     > > > > >> > Query q = new FuzzyQuery(   new Term(  "test", "Mer" ),
    0.5f,
    0,
    10 ); //  or  Query q =  new FuzzyQuery( new Term(
    "test",
    "Mer"
     > > >  >> > ), 0.5f); TopDocs  result =  searcher.search( q, 10
    );
     > > >> >  Assert.assertEquals( 1,  result.totalHits  );
     > >> >
    -    Clemens
     >> >>  -----Ursprüngliche  Nachricht-----
     > > > >> >> Von:  Clemens Wyss
     > >  >>  >> Gesendet: Montag, 2. Mai 2011  23:01
     >> >> An: java-user@lucene.apache.org
     > >  > >  >> >> Betreff: AW: "fuzzy prefix"  search
     > > >>  >>
     > > > >> >> Is it the  combination of FuzzyQuery and  Term  which
    makes
    the
     >> >>  search to go for "word  boundaries"?
      >> >>
       -----Ursprüngliche Nachricht-----
     > Von:  Clemens  Wyss
     > >  >>  >> > Gesendet: Montag, 2. Mai 2011  14:13
     > >> >> >  An: java-user@lucene.apache.org
     > >  > >  >> >> > Betreff: AW: "fuzzy  prefix"  search
     >>  >
     >> > I tried this too,  but unfortunately  I only get
    hits  when
     >> >> > the search term is a least   as long as the word to  be
    looked
    up.
      >
     >> >> > E.g.:
     > > >  >> >> > ...
     >>  >> >  Directory directory = new RAMDirectory();  IndexWriter
      >> >> >  indexWriter = new IndexWriter( directory,  >>  >>
    IndexManager.getIndexingAnalyzer(
     > >  > >>  >> LOCALE_DE ),
     > >> >>  >                IndexWriter.MaxFieldLength.UNLIMITED );
     >> >>  >
     >> >>  > Document document = new  Document();
    document.add(
    new
     > > > > Field(
     >> >>  > "test", "Merlot",
      >>  >             Field.Store.YES,   Field.Index.ANALYZED ) );
     >>  >>  indexWriter.addDocument(
     >> >  document  );
     >> >
     >   IndexReader indexReader = indexWriter.getReader();
     >  >> >> > IndexSearcher
     >  >> >>  > searcher = new IndexSearcher(  indexReader );  >>
     >
     >> >> > Query q = new FuzzyQuery(   new Term( "test", "Mer"  ),
    0.6f,
    1 );   TopDocs  result = searcher.search( q, 10 );
     >>  >> > Assert.assertEquals(  >>  >>  > 1,
     >> result.totalHits ); ...
     >   >> >> >
     >  -----Ursprüngliche  Nachricht-----
     >> >> >  > Von: Uwe Schindler  >>
     > > Gesendet: Montag, 2. Mai 2011  13:50  >> >> >  > An:
    java-user@lucene.apache.org
     > >  > >  >> >> > > Betreff: RE: "fuzzy  prefix"  search
     >> >  >
     >> >> > > Hi,
     > > > > >>  >> >  >
     > > >> >> > > You can pass an integer   to  FuzzyQuery which defines
    the
     >  number of  characters that are seen as prefix.
    So all
     > > > >  >> >> > > terms must match
     > > > > >>   >> > > this prefix and the rest  of each term is matched using
    fuzzy.
     >> >> > >
     >>  > >  Uwe
      >
     -----
     > > > >  >> >> > > Uwe Schindler
     > > > >>   >> > > H.-H.-Meier-Allee 63, D-28213  Bremen
     > > > >  >> >> > >  eMail: uwe@thetaphi.de
     >>  >> >  >
     >> > > >  -----Original Message-----
     >  >> >> > >  > From: Clemens Wyss
     > >  >>  >> > > > Sent: Monday, May 02,  2011 1:47 PM   >> > > > To:
     >> java-user@lucene.apache.org
     > >  > >  >> >> > > > Subject:  "fuzzy prefix"  search  >> >> > > >
     > > > >>  >> > > > I'd  like to search  fuzzily but not on a full  term.
     >> >  > > E.g.
     > >>  >> > > > I have a text "Merlot  del  Ticino"
     > > > > >> >> > > > I'd like
     >  > >>  >> > > > "mer", "merr", "melo",  ... to  match.
     >> >  > >
     > >> >> > > > If  I use  FuzzyQuery only  "merlot,  "merlott" hit.
    What
     > > > > >> >>  >  > >  Query-combination should I use?
      >> > >  >
     > >  Thx
      > > Clemens
     > > >> >> > >  >
      >> > > >
     > > > > >> >> > >  >
     > > >> >> > > >
    --------------------------------------------------------
     >  >> >> > > > ----
     >>  >>  > > > ---
     >> >> > > >  ---
      >> >> > > > --
     > >>  >> > > > -  To unsubscribe, e-mail:
      > > >> >> > > > java-user-unsubscribe@lucene.apache.org
     > >  > >  >> >> > > > For additional  commands,  e-mail:
     >> >> >  > > java-user-help@lucene.apache.org    >> >> > >
     >  >
     >> >> >  >
     >> >> > >
     > > > >> >> > >
    ----------------------------------------------------------
     > >  >> >> > > ----
     >>  >> >  > ---
     >> > > ---
     >>  >>  > > - To unsubscribe, e-mail:
     >>  >> > > java-user-unsubscribe@lucene.apache.org
     > >  > >  >> >> > > For additional  commands,  e-mail:
     >> >  > java-user-help@lucene.apache.org
     > >  > >  >> >> >
     > >> >>  >
     >>  >
     >>
    --------------------------------------------------------------
     >  > > >> >> --
      >> >> >  ---
     > -- To unsubscribe,   e-mail:
     >> > java-user-unsubscribe@lucene.apache.org
     > >  > >  >> >> > For additional commands,  e-mail:
     > > >>  >> > java-user-help@lucene.apache.org
     > >  > >  >> >>
     >> >>
     > > >> >>
     > > > > >> >>
    --------------------------------------------------------------
     > >  >> >> ----
      >> >> --- To  unsubscribe, e-mail:
     >> >> java-user-unsubscribe@lucene.apache.org
     > >  > >  >> >> For additional commands,  e-mail:
     > java-user-help@lucene.apache.org    >> >
     > >>  >
    ---------------------------------------------------------------
     > >  > >> > ----
      >> > -- To unsubscribe,  e-mail:
    java-user-unsubscribe@lucene.apache.org
     > >  > >  >> > For additional commands,  e-mail:
     > java-user-help@lucene.apache.org    >> >
     >  >>
     > >>
    -----------------------------------------------------------------
     > >  > >> ----
     >> To  unsubscribe, e-mail: java-user-
    unsubscribe@lucene.apache.org
     > >  > >  >> For additional commands,  e-mail:
    java-user-help@lucene.apache.org    >
     > > > > >
    ------------------------------------------------------------------
     > >  > > ---
     > To  unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
     > >  > >  > For additional commands, e-mail:  java-user-
    help@lucene.apache.org
     >  > > >
     > >
     >
    --------------------------------------------------------------------
     > >  > > - To  unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
     > >  > >  For additional commands, e-mail:
    java-user-help@lucene.apache.org  > >  >
    ---------------------------------------------------------------------
     > >  > To  unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
     > >  > For  additional commands, e-mail:
    java-user-help@lucene.apache.org  > >  >
     >   ---------------------------------------------------------------------
     > > To  unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
     > >  For additional commands, e-mail:
    java-user-help@lucene.apache.org  >

    ------------------------------------------------------------------
    ---
     > To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
     > For  additional commands, e-mail:
    java-user-help@lucene.apache.org  >

    --------------------------------------------------------------------
    - To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
     For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For  additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Otis Gospodnetic at May 4, 2011 at 8:00 pm
    We do have EdgeNGramTokenizer if that is what you are after.
    See how Solr uses it here:
    http://search-lucene.com/c/Solr:/src/java/org/apache/solr/analysis/EdgeNGramTokenizerFactory.java||EdgeNGramTokenizer


    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Wed, May 4, 2011 2:07:40 AM
    Subject: AW: AW: AW: AW: "fuzzy prefix" search

    I know this is just an example.
    But even the WhitespaceAnalyzer takes the words apart, which I don't want. I
    would like the phrases as they are (maximum 3 words, e.g. "Merlot del Ticino",
    ...) to be n-gram-ed. I hence want to have the n-grams.
    Mer
    Merl
    Merlo
    Merlot
    Merlot
    Merlot d
    ...

    Regards
    Clemens
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 23:12
    An: java-user@lucene.apache.org
    Betreff: Re: AW: AW: AW: "fuzzy prefix" search

    Clemens - that's just an example. Stick another tokenizer in there, like
    WhitespaceTokenizer in there, for example.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem
    search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Tue, May 3, 2011 4:31:14 PM
    Subject: AW: AW: AW: "fuzzy prefix" search

    But doesn't the KeyWordTokenizer extract single words out oft he
    stream? I would like to create n-grams on the stream (field content) as
    it
    is...
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 21:31
    An: java-user@lucene.apache.org
    Betreff: Re: AW: AW: "fuzzy prefix" search

    Clemens,

    Something a la:

    public TokenStream tokenStream (String fieldName, Reader r) {
    return nw EdgeNGramTokenFilter(new KeywordTokenizer(r),
    EdgeNGramTokenFilter.Side.FRONT, 1, 4); }


    Check out page 265 of Lucene in Action 2.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene
    ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Tue, May 3, 2011 12:57:39 PM
    Subject: AW: AW: "fuzzy prefix" search

    How does an simple Analyzer look that just "n-grams" the
    docs/fields.
    class SimpleNGramAnalyzer extends Analyzer { @Override
    public TokenStream tokenStream ( String fieldName, Reader reader )
    {
    EdgeNGramTokenFilter... ???
    }
    }
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 13:36
    An: java-user@lucene.apache.org
    Betreff: Re: AW: "fuzzy prefix" search

    Hi,

    I didn't read this thread closely, but just in case:
    * Is this something you can handle with synonyms?
    * If this is for English and you are trying to handle typos,
    there is a
    list
    of
    common English misspellings out there that you could use for
    this
    perhaps.
    * Have you considered n-gramming your tokens? Not sure if
    this would
    help,
    didn't read messages/examples closely enough, but you may want
    to
    look at
    this if you haven't done so yet.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem
    search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-
    user@lucene.apache.org>
    Sent: Tue, May 3, 2011 5:25:30 AM
    Subject: AW: "fuzzy prefix" search
    PrefixQuery
    I'd like the combination of prefix and fuzzy ;-) because
    people
    could
    also type "menlo" or "märl" and in any of these cases I'd
    like to
    get
    a hit on Merlot (for suggesting Merlot)
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:22 > An:
    java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    I'd assumed that FuzzyQuery wouldn't ignore case but I
    could be
    wrong.
    What would be the edit distance between "mer" and
    "merlot"?
    Would
    it be less that 1.5 which I reckon would be the value of
    length(term)*0.5 as detailed in the javadocs? Seems
    unlikely,
    but
    I don't really know anything about the Levenshtein (edit
    distance)
    algorithm as used by FuzzyQuery.
    Wouldn't a PrefixQuery be more appropriate here?


    --
    Ian.

    On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Unfortunately lowercasing doesn't help.
    Also, doesn't the FuzzyQuery ignore casing?
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:06
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    Mer != mer. The latter will be what is indexed
    because
    StandardAnalyzer calls LowerCaseFilter.

    --
    Ian.


    On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Sorry for coming back to my issue. Can anybody
    explain why
    my
    "simple"
    unit test below fails? Any hint/help appreciated.
    Directory directory = new RAMDirectory();
    IndexWriter
    indexWriter = new IndexWriter( directory, new
    StandardAnalyzer(
    Version.LUCENE_31
    ), IndexWriter.MaxFieldLength.UNLIMITED ); Document
    document
    =
    new
    Document(); document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document ); IndexReader indexReader =
    indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher(
    indexReader );
    Query q = new FuzzyQuery( new Term( "test", "Mer" ),
    0.5f,
    0,
    10 ); // or Query q = new FuzzyQuery( new Term(
    "test",
    "Mer"
    ), 0.5f); TopDocs result = searcher.search( q, 10
    );
    Assert.assertEquals( 1, result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which
    makes
    the
    search to go for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get
    hits when
    the search term is a least as long as the word to
    be
    looked
    up.
    E.g.:
    ...
    Directory directory = new RAMDirectory();
    IndexWriter
    indexWriter = new IndexWriter( directory, >> >>
    IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
    IndexWriter.MaxFieldLength.UNLIMITED
    );
    Document document = new Document();
    document.add(
    new
    Field(
    "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED
    ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader();
    IndexSearcher
    searcher = new IndexSearcher( indexReader ); >>
    Query q = new FuzzyQuery( new Term( "test", "Mer"
    ),
    0.6f,
    1 ); TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals( >> >> > 1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler >>
    Gesendet: Montag, 2. Mai 2011 13:50 >> >> > > An:
    java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which
    defines
    the
    number of characters that are seen as prefix. So all
    terms must match
    this prefix and the rest of each term is matched
    using
    fuzzy.
    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM >> > > >
    To:
    java-user@lucene.apache.org
    Subject: "fuzzy prefix" search >> >> > > >
    I'd like to search fuzzily but not on a full
    term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott"
    hit.
    What
    Query-combination should I use?

    Thx
    Clemens


    --------------------------------------------------------
    ----
    ---
    ---
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >> > >

    ----------------------------------------------------------
    ----
    ---
    ---
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org
    --------------------------------------------------------------
    --
    ---
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org

    --------------------------------------------------------------
    ----
    --- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >
    ---------------------------------------------------------------
    ----
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >
    -----------------------------------------------------------------
    ----
    To unsubscribe, e-mail: java-user-
    unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >
    ------------------------------------------------------------------
    ---
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-
    help@lucene.apache.org
    --------------------------------------------------------------------
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org > > >
    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org > > >
    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >

    ------------------------------------------------------------------
    ---
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >

    --------------------------------------------------------------------
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Clemens Wyss at May 5, 2011 at 6:27 am
    What I am looking for is the autosuggestion implemented here (@solr)

    http://search-lucene.com/m/0QBv41ssGlh/suggestion&subj=Auto+Suggest

    How "easily" can I switch from plain Lucene to Solr?
    Or (even better), can I just make use of "solr-suggestion"?

    Clemens
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Mittwoch, 4. Mai 2011 22:00
    An: java-user@lucene.apache.org
    Betreff: Re: AW: AW: AW: AW: "fuzzy prefix" search

    We do have EdgeNGramTokenizer if that is what you are after.
    See how Solr uses it here:
    http://search-
    lucene.com/c/Solr:/src/java/org/apache/solr/analysis/EdgeNGramTokenizer
    Factory.java||EdgeNGramTokenizer


    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem
    search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Wed, May 4, 2011 2:07:40 AM
    Subject: AW: AW: AW: AW: "fuzzy prefix" search

    I know this is just an example.
    But even the WhitespaceAnalyzer takes the words apart, which I don't
    want. I would like the phrases as they are (maximum 3 words, e.g.
    "Merlot del Ticino",
    ...) to be n-gram-ed. I hence want to have the n-grams.
    Mer
    Merl
    Merlo
    Merlot
    Merlot
    Merlot d
    ...

    Regards
    Clemens
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 23:12
    An: java-user@lucene.apache.org
    Betreff: Re: AW: AW: AW: "fuzzy prefix" search

    Clemens - that's just an example. Stick another tokenizer in
    there, like WhitespaceTokenizer in there, for example.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene
    ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Tue, May 3, 2011 4:31:14 PM
    Subject: AW: AW: AW: "fuzzy prefix" search

    But doesn't the KeyWordTokenizer extract single words out oft he
    stream? I would like to create n-grams on the stream (field
    content) as
    it
    is...
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 21:31
    An: java-user@lucene.apache.org
    Betreff: Re: AW: AW: "fuzzy prefix" search

    Clemens,

    Something a la:

    public TokenStream tokenStream (String fieldName, Reader r) {
    return nw EdgeNGramTokenFilter(new KeywordTokenizer(r),
    EdgeNGramTokenFilter.Side.FRONT, 1, 4); }


    Check out page 265 of Lucene in Action 2.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-
    user@lucene.apache.org>
    Sent: Tue, May 3, 2011 12:57:39 PM
    Subject: AW: AW: "fuzzy prefix" search

    How does an simple Analyzer look that just "n-grams" the
    docs/fields.
    class SimpleNGramAnalyzer extends Analyzer { @Override
    public TokenStream tokenStream ( String fieldName, Reader
    reader )
    {
    EdgeNGramTokenFilter... ???
    }
    }
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 13:36 > > > > An:
    java-user@lucene.apache.org > > > > Betreff: Re: AW: "fuzzy
    prefix" search > > > >
    Hi,

    I didn't read this thread closely, but just in case:
    * Is this something you can handle with synonyms?
    * If this is for English and you are trying to handle typos,
    there is a
    list
    of
    common English misspellings out there that you could use
    for
    this
    perhaps.
    * Have you considered n-gramming your tokens? Not sure if
    this would
    help,
    didn't read messages/examples closely enough, but you may
    want to
    look at
    this if you haven't done so yet.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem
    search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-
    user@lucene.apache.org>
    Sent: Tue, May 3, 2011 5:25:30 AM > > > > Subject:
    AW: "fuzzy prefix" search > > > >
    PrefixQuery
    I'd like the combination of prefix and fuzzy ;-) because
    people
    could
    also type "menlo" or "märl" and in any of these cases I'd
    like to
    get
    a hit on Merlot (for suggesting Merlot)
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:22 > An:
    java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    I'd assumed that FuzzyQuery wouldn't ignore case
    but I
    could be
    wrong.
    What would be the edit distance between "mer" and
    "merlot"?
    Would
    it be less that 1.5 which I reckon would be the value of
    length(term)*0.5 as detailed in the javadocs? Seems
    unlikely,
    but
    I don't really know anything about the Levenshtein (edit
    distance)
    algorithm as used by FuzzyQuery.
    Wouldn't a PrefixQuery be more appropriate here?


    --
    Ian.

    On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Unfortunately lowercasing doesn't help.
    Also, doesn't the FuzzyQuery ignore casing?
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:06 > > > >>
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    Mer != mer. The latter will be what is indexed
    because
    StandardAnalyzer calls LowerCaseFilter.

    --
    Ian.


    On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Sorry for coming back to my issue. Can anybody
    explain why
    my
    "simple"
    unit test below fails? Any hint/help appreciated.
    Directory directory = new RAMDirectory();
    IndexWriter
    indexWriter = new IndexWriter( directory, new
    StandardAnalyzer(
    Version.LUCENE_31
    ), IndexWriter.MaxFieldLength.UNLIMITED );
    Document
    document
    =
    new
    Document(); document.add( new Field( "test",
    "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document ); IndexReader indexReader =
    indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher(
    indexReader );
    Query q = new FuzzyQuery( new Term( "test", "Mer"
    ),
    0.5f,
    0,
    10 ); // or Query q = new FuzzyQuery( new
    Term(
    "test",
    "Mer"
    ), 0.5f); TopDocs result = searcher.search( q,
    10
    );
    Assert.assertEquals( 1, result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which
    makes
    the
    search to go for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only
    get
    hits when
    the search term is a least as long as the word to
    be
    looked
    up.
    E.g.:
    ...
    Directory directory = new RAMDirectory();
    IndexWriter
    indexWriter = new IndexWriter( directory, >> >>
    IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
    IndexWriter.MaxFieldLength.UNLIMITED
    );
    Document document = new Document();
    document.add(
    new
    Field(
    "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED
    ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader =
    indexWriter.getReader();
    IndexSearcher
    searcher = new IndexSearcher( indexReader ); >>
    Query q = new FuzzyQuery( new Term( "test",
    "Mer"
    ),
    0.6f,
    1 ); TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals( >> >> > 1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50 >> >> > > An:
    java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which
    defines
    the
    number of characters that are seen as prefix. So all
    terms must match
    this prefix and the rest of each term is matched
    using
    fuzzy.
    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM >> > > >
    To:
    java-user@lucene.apache.org > > > > > > >> >> >
    Subject: "fuzzy prefix" search >> >> > > >
    I'd like to search fuzzily but not on a full
    term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott"
    hit.
    What
    Query-combination should I use?

    Thx
    Clemens


    --------------------------------------------------------
    ----
    ---
    ---
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >> > >

    ----------------------------------------------------------
    ----
    ---
    ---
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org
    --------------------------------------------------------------
    --
    ---
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org

    --------------------------------------------------------------
    ----
    --- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >
    ---------------------------------------------------------------
    ----
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >
    -----------------------------------------------------------------
    ----
    To unsubscribe, e-mail: java-user-
    unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >
    ------------------------------------------------------------------
    ---
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-
    help@lucene.apache.org
    -------------------------------------------------------------------
    - > > > > > > - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org > > >
    --------------------------------------------------------------------
    - > > > > > To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org > > >
    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >


    --------------------------------------------------------------
    ----
    ---
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >
    ------------------------------------------------------------------
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >

    ------------------------------------------------------------------
    ---
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >

    --------------------------------------------------------------------
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at May 6, 2011 at 12:37 pm
    Well, Solr officially uses Lucene, but you'll do disappointingly
    little Java coding. Which some people think is a plus :).

    The biggest issue will be making really, really sure that your
    schema.xml file in Solr reflects your use in the Lucene code
    Actually, I'd swallow the blue pill and just make the leap to
    Solr. I'd hazard the guess that it really depends upon the amount
    of customized Lucene code you've written in the analyzers/tokenizers
    area. But even those are pretty transferable to Solr as plugins.

    Your Lucene background will help a lot when using Solr, BTW...

    As to your second question, you could just grab the Solr code and
    see if you can use/adapt the suggester code.... I confess I have no
    idea what that involves, haven't been in that code..

    I made the Lucene->Solr transition, and found after a while that
    I didn't really want to go back FWIW.

    Best
    Erick
    On Thu, May 5, 2011 at 2:26 AM, Clemens Wyss wrote:
    What I am looking for is the autosuggestion implemented here (@solr)

    http://search-lucene.com/m/0QBv41ssGlh/suggestion&subj=Auto+Suggest

    How "easily" can I switch from plain Lucene to Solr?
    Or (even better), can I just make use of "solr-suggestion"?

    Clemens
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Mittwoch, 4. Mai 2011 22:00
    An: java-user@lucene.apache.org
    Betreff: Re: AW: AW: AW: AW: "fuzzy prefix" search

    We do have EdgeNGramTokenizer if that is what you are after.
    See how Solr uses it here:
    http://search-
    lucene.com/c/Solr:/src/java/org/apache/solr/analysis/EdgeNGramTokenizer
    Factory.java||EdgeNGramTokenizer


    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem
    search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Wed, May 4, 2011 2:07:40 AM
    Subject: AW: AW: AW: AW: "fuzzy prefix" search

    I know this is just an example.
    But even the WhitespaceAnalyzer takes the  words apart, which I don't
    want. I would like the phrases as they are (maximum 3  words, e.g.
    "Merlot del Ticino",
    ...) to be n-gram-ed. I hence want to have the  n-grams.
    Mer
    Merl
    Merlo
    Merlot
    Merlot
    Merlot  d
    ...

    Regards
    Clemens
    -----Ursprüngliche  Nachricht-----
    Von: Otis Gospodnetic
     Gesendet: Dienstag, 3. Mai 2011 23:12
    An: java-user@lucene.apache.org
     Betreff: Re: AW: AW: AW: "fuzzy prefix" search

    Clemens - that's  just an example.  Stick another tokenizer in
    there, like  WhitespaceTokenizer in there, for example.

    Otis
     ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene
    ecosystem  search :: http://search-lucene.com/



    ----- Original  Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To:  "java-user@lucene.apache.org"  <java-user@lucene.apache.org>
     > Sent: Tue, May 3, 2011 4:31:14 PM
    Subject: AW: AW: AW: "fuzzy  prefix" search

    But doesn't the KeyWordTokenizer  extract single words out oft he
    stream? I would  like to create  n-grams on the stream (field
    content) as
    it
    is...
     > >  -----Ursprüngliche Nachricht-----
    Von: Otis  Gospodnetic
     > >  Gesendet: Dienstag, 3. Mai 2011 21:31
    An: java-user@lucene.apache.org
     > >  Betreff: Re: AW: AW: "fuzzy prefix" search
     >
    Clemens,

    Something a  la:

    public TokenStream tokenStream  (String  fieldName, Reader r) {
      return nw  EdgeNGramTokenFilter(new  KeywordTokenizer(r),
    EdgeNGramTokenFilter.Side.FRONT, 1, 4); }  >
    Check out page 265 of Lucene in Action 2.  >
     Otis
    ----
     Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene  ecosystem search :: http://search-lucene.com/
     >
    ----- Original  Message ----
     From: Clemens Wyss <clemensdev@mysign.ch>
     > > To:  "java-user@lucene.apache.org"   <java-
    user@lucene.apache.org>
     > >  > Sent: Tue, May 3, 2011 12:57:39 PM
     Subject: AW: AW: "fuzzy  prefix" search
     > > How does an simple Analyzer look that  just "n-grams" the
    docs/fields.
    class   SimpleNGramAnalyzer extends  Analyzer {  @Override
     > public TokenStream tokenStream ( String fieldName,   Reader
    reader  )
    {
         EdgeNGramTokenFilter...  ???
    }
     > }
     >
     -----Ursprüngliche Nachricht-----
    Von:   Otis  Gospodnetic
     > >  > >  Gesendet: Dienstag, 3. Mai 2011 13:36  > > > > An:
    java-user@lucene.apache.org  > >  > >  Betreff: Re: AW: "fuzzy
    prefix" search  > > >  >
    Hi,  >
    I  didn't  read this thread closely,  but just in case:
    * Is this  something   you can handle with synonyms?
    * If this is for   English and you are  trying to handle typos,
     there is a
    list
     >of
     > > common English misspellings  out there that you  could use
    for
    this
    perhaps.
     > > > * Have you  considered  n-gramming your tokens?   Not sure if
    this would
      help,
    didn't read  messages/examples closely  enough, but  you may
    want to
     look at
    this if  you haven't done  so  yet.

    Otis
     > > > ----
     > > Sematext :: http://sematext.com/ :: Solr  -  Lucene - Nutch
    Lucene  ecosystem
     > > > search :: http://search-lucene.com/
     >  >
    -----  Original  Message  ----
    From: Clemens  Wyss <clemensdev@mysign.ch>
     > >  > > To:  "java-user@lucene.apache.org"    <java-
    user@lucene.apache.org>
     >  > >  > Sent: Tue, May 3, 2011 5:25:30 AM  > > > >  Subject:
    AW: "fuzzy prefix"  search  > > > >
     > >  >PrefixQuery
    I'd like the   combination  of prefix and fuzzy ;-) because
     > people
    could
     >also   type "menlo" or "märl" and in any of these cases I'd
     like  to
    get
     >a hit on  Merlot (for suggesting  Merlot)
     > > > > > >   -----Ursprüngliche   Nachricht-----
    Von: Ian  Lea ÂÂ
     > >  >  Gesendet:  Dienstag, 3. Mai 2011 11:22   > An:
    java-user@lucene.apache.org
     > >  > >  > >  Betreff: Re: "fuzzy prefix"  search
     > > >
     >  > > I'd assumed that  FuzzyQuery  wouldn't ignore  case
    but I
    could be
    wrong.
     > > >  > > >  What would be the edit  distance  between  "mer"  and
    "merlot"?
    Would
     > > > > > it be less that 1.5  which I   reckon would  be the value of
     length(term)*0.5 as  detailed in  the  javadocs?  Seems
     > > unlikely,
    but
    I  don't really  know anything about   the Levenshtein (edit
     distance)
    algorithm as  used by   FuzzyQuery.
     Wouldn't a PrefixQuery  be  more  appropriate here?
     >
     >
     > >   --
     Ian.
     > > > > > >
    On Tue,  May  3,  2011 at 10:10 AM, Clemens Wyss
     > > <clemensdev@mysign.ch>
     > >  > >  >  wrote:
     > > > Unfortunately  lowercasing doesn't  help.
     > > > > > > Also,   doesn't the FuzzyQuery ignore   casing?
     >
     > > > >>   -----Ursprüngliche  Nachricht-----
     > > > > > > >> Von: Ian Lea ÂÂ
     > >  >  >>  Gesendet: Dienstag, 3. Mai 2011  11:06  > > > >>
    An: java-user@lucene.apache.org
     > >  > >  > >  >> Betreff: Re: "fuzzy  prefix"  search
      >>
     >>  Mer !=  mer.  The latter will be  what is indexed
     > > because
     StandardAnalyzer calls   LowerCaseFilter.
     > >>
     > >>    --
    Ian.
     >  > > > >>
      >>
     > > > >> On  Tue, May  3, 2011 at 9:56 AM,  Clemens  Wyss
     > <clemensdev@mysign.ch>
     > >  > >  > >>  wrote:
     > > > > >>  > Sorry for coming back  to my issue.  Can anybody
    explain why
     >my
    "simple"
     >  > >  >> unit test below fails? Any   hint/help  appreciated.
     > > > >  >> >
     >   Directory  directory = new RAMDirectory();
     > >> IndexWriter
     > > > >   >> > indexWriter =  new IndexWriter(  directory, new
     > > > > >  > >> >   StandardAnalyzer(
       Version.LUCENE_31
     > > > > >> >  ),   IndexWriter.MaxFieldLength.UNLIMITED  );
    Document
     > document
     > >  =
     > > > new
     >> >  Document();   document.add( new Field( "test",
    "Merlot",
     > > > > > >> >  Field.Store.YES,   Field.Index.ANALYZED ) );
     >   indexWriter.addDocument(
      >> >  document );  IndexReader indexReader =
     > > > > > indexWriter.getReader();
     >  > > >> >  IndexSearcher searcher = new   IndexSearcher(
    indexReader );
     >  > > > > >> > Query q = new FuzzyQuery(    new Term(  "test", "Mer"
    ),
    0.5f,
    0,
     > > > > >> > 10 ); //  or  Query q =  new  FuzzyQuery( new
    Term(
     "test",
    "Mer"
     > > >  >>  > ), 0.5f); TopDocs  result =  searcher.search( q,
    10
     > > );
     > > >> >   Assert.assertEquals( 1,  result.totalHits  );
     > >  > >> >
     >> > -    Clemens
     >> >
     >>  >>  -----Ursprüngliche  Nachricht-----
     >  > > > >> >> Von:  Clemens Wyss
     > >  >>  >> Gesendet: Montag, 2. Mai  2011  23:01
     >> >>  An: java-user@lucene.apache.org
     > >  > >  > >  >> >> Betreff: AW:  "fuzzy prefix"  search
     > >  >>  >>
     > > > >>  >> Is it the  combination of FuzzyQuery and  Term   which
    makes
    the
     > >  >> >>  search to go for "word   boundaries"?
      >>  >>
        -----Ursprüngliche Nachricht-----
     >> >>  > Von:  Clemens  Wyss
     > >  > >  >>  >> > Gesendet: Montag,  2. Mai 2011  14:13
     > >>  >> >  An: java-user@lucene.apache.org
     > >  > >  > >  >> >> > Betreff:  AW: "fuzzy  prefix"  search
     >>  >>  >
     >>  >> > I tried this too,  but unfortunately  I  only
    get
    hits  when
     > > > > > >  >> >> > the search term is a  least   as long as the word to
    be
    looked
     > up.
       >
     >> >> >  E.g.:
     > > >  >> >> >  ...
     >>  >>  >  Directory directory = new RAMDirectory();
    IndexWriter
     > > > > > >   >> >> >  indexWriter =  new IndexWriter( directory,  >>  >>
     > > > > IndexManager.getIndexingAnalyzer(
     >  > >  > >>  >> LOCALE_DE ),
     > > > > >  > >> >>  >                 IndexWriter.MaxFieldLength.UNLIMITED
    );
     >> >>   >
     >> >>  >  Document document = new  Document();
    document.add(
     > new
     > > > > Field(
     > > > >  >> >>  > "test", "Merlot",
     > > > > > > >>   >>  >              Field.Store.YES,   Field.Index.ANALYZED
    )  );
     >>  >>   indexWriter.addDocument(
      >> >  document  );
     >>  >> >
     >>  >   IndexReader indexReader =
    indexWriter.getReader();
     >   >> >> > IndexSearcher
      >  >> >>  > searcher = new IndexSearcher(   indexReader );  >>
      >
     >> >> > Query q  = new FuzzyQuery(   new Term( "test",
    "Mer"
    ),
     >0.6f,
    1  );   TopDocs  result = searcher.search( q, 10 );
     > > > >  >>  >> >  Assert.assertEquals(  >>  >>  > 1,
     > > > > > >>  >> result.totalHits ); ...
     > > > > >  >   >> >> >
     > > > > > >> >> >  >   -----Ursprüngliche  Nachricht-----
     >  >> >> >  > Von: Uwe Schindler
     > > > > > >>  > > Gesendet: Montag, 2. Mai  2011  13:50  >> >> >  > An:
     > > > > java-user@lucene.apache.org
     > >  > >  > >  >> >> > >  Betreff: RE: "fuzzy  prefix"  search
     > >>  >> >  >
     >  >> >> > > Hi,
     > >  > > >>  >> >  >
     >  > > >> >> > > You can pass an integer    to  FuzzyQuery which
    defines
    the
     > >> >> >  >  number of  characters that are  seen as prefix.
    So  all
     > > > >  >> >> >  > terms must match
     > > > > >>    >> > > this prefix and the rest  of each term is matched
    using
    fuzzy.
      >> >> > >
      >>  > >  Uwe
     >> >> >   >
     >> >> > >  -----
     > > >  >  >> >> > > Uwe Schindler
     >  > > > >>   >> > > H.-H.-Meier-Allee  63, D-28213  Bremen
     > >  > > > >  >> >> > >   eMail: uwe@thetaphi.de
     > >  >>  >> >  >
     > > > >>  >> > > >  -----Original  Message-----
     >  >> >>  > >  > From: Clemens Wyss
     > >  > >  >>  >> > > > Sent:  Monday, May 02,  2011 1:47 PM   >> > > >
    To:
     > > > > > >  >> java-user@lucene.apache.org  > >  > >  > >  >> >> >
     > Subject:  "fuzzy prefix"  search  >> >> >  > >
     > > > >>  >>  > > > I'd  like to search  fuzzily but not on a full
    term.
     >> >   > > E.g.
     > >>   >> > > > I have a text "Merlot  del  Ticino"
     > >  > > > > >> >> > > > I'd  like
     >  > >>  >>  > > > "mer", "merr", "melo",  ... to  match.
     > > > > > >>  >> >  > >
     > > > > >  > >> >> > > > If  I  use  FuzzyQuery only  "merlot,  "merlott" hit.
     >What
     > > > > >> >>   >  > >  Query-combination should I use?
     > > > > >>   >> > >  >
     > > > > > >> >> >  > >   Thx
      >  > Clemens
     > > >> >> >  >  >
      >>  > > >
     > > > > >> >>  > >  >
     > > >>  >> > > >
     >--------------------------------------------------------
     > > >  >  >> >> > > > ----
     > > > > > >  >>  >>  > > >  ---
     >> >> > >  >  ---
      >> >>  > > > --
     > >>   >> > > > -  To unsubscribe, e-mail:
     >   > > >> >> > > > java-user-unsubscribe@lucene.apache.org
     > >  > >  > >  >> >> > >  > For additional  commands,  e-mail:
     > >  >> >> >  > > java-user-help@lucene.apache.org     >> >> > >
     >> >>  >  >
     >  >> >> >  >
     >  >> >> > >
     > >  > >> >> > >
     >----------------------------------------------------------
     > >  > >  >> >> > > ----
     > > > > >  >>  >> >  >  ---
     >> > >  ---
     >>  >>   > > - To unsubscribe, e-mail:
      >>  >> > > java-user-unsubscribe@lucene.apache.org
     > >  > >  > >  >> >> > > For  additional  commands,  e-mail:
     >>  >> >  > java-user-help@lucene.apache.org
     > >  > >  > >  >> >> >
     > > > > >  > >> >>  >
     > > > > > >>  >>  >
     > > > > >>  >>
     >--------------------------------------------------------------
     > >  >  > > >> >> --
     > > >   >> >> >  ---
     > > > >> >>  > -- To unsubscribe,    e-mail:
     >> > java-user-unsubscribe@lucene.apache.org
     > >  > >  > >  >> >> > For  additional commands,  e-mail:
     > >  >>  >> > java-user-help@lucene.apache.org
     > >  > >  > >  >> >>
     > > > > >  >> >>
     >  > > >> >>
     > > >  > >> >>
     >--------------------------------------------------------------
     > > >  > >  >> >> ----
     > > > >   >> >> --- To  unsubscribe,  e-mail:
     >> >> java-user-unsubscribe@lucene.apache.org
     > >  > >  > >  >> >> For  additional commands,  e-mail:
     > java-user-help@lucene.apache.org     >> >
     >
     > >>  >
     > > > > > > >> >
     >---------------------------------------------------------------
     > >  > >  > >> > ----
     > > >   >> > -- To unsubscribe,  e-mail:
     > > > > > > java-user-unsubscribe@lucene.apache.org
     > >  > >  > >  >> > For additional  commands,  e-mail:
     > java-user-help@lucene.apache.org     >> >
     >
     >  >>
     > > > > > >>
     >  >>
     >-----------------------------------------------------------------
     > > >  > >  > >> ----
     > > >  >> To  unsubscribe, e-mail: java-user-
     > > unsubscribe@lucene.apache.org
     > >  > >  > >  >> For additional  commands,  e-mail:
    java-user-help@lucene.apache.org     >
     > > > >
     > > > > >
     >------------------------------------------------------------------
     > > >  > >  > > ---
     > >  > To  unsubscribe, e-mail:
     > > java-user-unsubscribe@lucene.apache.org
     > >  > >  > >  > For additional commands,  e-mail:  java-user-
    help@lucene.apache.org
     > >  >  > > >
     >
     > >
     >  >
    -------------------------------------------------------------------
    -  > >  > >  > > - To  unsubscribe,  e-mail:
    java-user-unsubscribe@lucene.apache.org
     > >  > >  > >  For additional commands,  e-mail:
    java-user-help@lucene.apache.org   > >  >
     > >
    --------------------------------------------------------------------
    -  > >  > >  > To  unsubscribe, e-mail:
     > java-user-unsubscribe@lucene.apache.org
     > >  > >  > For  additional commands,  e-mail:
    java-user-help@lucene.apache.org   > >  >
     >
     >
    ---------------------------------------------------------------------
     > >  > > To  unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
     > >  > >  For additional commands, e-mail:
     > java-user-help@lucene.apache.org   >


    --------------------------------------------------------------
    ----
     > > ---
     > To  unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
     > >  > For  additional commands, e-mail:
    java-user-help@lucene.apache.org   >
     >
    ------------------------------------------------------------------
    --
     > > - To  unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
     > >  For additional commands, e-mail:
    java-user-help@lucene.apache.org  >

    ------------------------------------------------------------------
    ---
     > To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
     > For  additional commands, e-mail:
    java-user-help@lucene.apache.org  >

    --------------------------------------------------------------------
    - To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
     For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For  additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Clemens Wyss at May 6, 2011 at 1:28 pm
    Our Lucene/indexing architecture allows any number of index data providers to register and put their data into a central index. As every provider can/does add his own specific fields to its documents there is no single schema,... And yes, this all works, except the problem I have with the "term index" for term-autosuggestions
    -----Ursprüngliche Nachricht-----
    Von: Erick Erickson
    Gesendet: Freitag, 6. Mai 2011 14:37
    An: java-user@lucene.apache.org
    Betreff: Re: AW: AW: AW: AW: "fuzzy prefix" search

    Well, Solr officially uses Lucene, but you'll do disappointingly little Java
    coding. Which some people think is a plus :).

    The biggest issue will be making really, really sure that your schema.xml file in
    Solr reflects your use in the Lucene code Actually, I'd swallow the blue pill
    and just make the leap to Solr. I'd hazard the guess that it really depends
    upon the amount of customized Lucene code you've written in the
    analyzers/tokenizers area. But even those are pretty transferable to Solr as
    plugins.

    Your Lucene background will help a lot when using Solr, BTW...

    As to your second question, you could just grab the Solr code and see if you
    can use/adapt the suggester code.... I confess I have no idea what that
    involves, haven't been in that code..

    I made the Lucene->Solr transition, and found after a while that I didn't really
    want to go back FWIW.

    Best
    Erick
    On Thu, May 5, 2011 at 2:26 AM, Clemens Wyss wrote:
    What I am looking for is the autosuggestion implemented here (@solr)

    http://search-
    lucene.com/m/0QBv41ssGlh/suggestion&subj=Auto+Suggest
    How "easily" can I switch from plain Lucene to Solr?
    Or (even better), can I just make use of "solr-suggestion"?

    Clemens
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Mittwoch, 4. Mai 2011 22:00
    An: java-user@lucene.apache.org
    Betreff: Re: AW: AW: AW: AW: "fuzzy prefix" search

    We do have EdgeNGramTokenizer if that is what you are after.
    See how Solr uses it here:
    http://search-
    lucene.com/c/Solr:/src/java/org/apache/solr/analysis/EdgeNGramTokeniz
    er
    Factory.java||EdgeNGramTokenizer


    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene
    ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Sent: Wed, May 4, 2011 2:07:40 AM
    Subject: AW: AW: AW: AW: "fuzzy prefix" search

    I know this is just an example.
    But even the WhitespaceAnalyzer takes the words apart, which I
    don't want. I would like the phrases as they are (maximum 3 words, e.g.
    "Merlot del Ticino",
    ...) to be n-gram-ed. I hence want to have the n-grams.
    Mer
    Merl
    Merlo
    Merlot
    Merlot
    Merlot d
    ...

    Regards
    Clemens
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 23:12
    An: java-user@lucene.apache.org
    Betreff: Re: AW: AW: AW: "fuzzy prefix" search

    Clemens - that's just an example. Stick another tokenizer in
    there, like WhitespaceTokenizer in there, for example.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene
    ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org"
    <java-user@lucene.apache.org>
    Sent: Tue, May 3, 2011 4:31:14 PM
    Subject: AW: AW: AW: "fuzzy prefix" search

    But doesn't the KeyWordTokenizer extract single words out oft
    he stream? I would like to create n-grams on the stream (field
    content) as
    it
    is...
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic
    Gesendet: Dienstag, 3. Mai 2011 21:31
    An: java-user@lucene.apache.org
    Betreff: Re: AW: AW: "fuzzy prefix" search

    Clemens,

    Something a la:

    public TokenStream tokenStream (String fieldName, Reader r)
    {
    return nw EdgeNGramTokenFilter(new KeywordTokenizer(r),
    EdgeNGramTokenFilter.Side.FRONT, 1, 4); }


    Check out page 265 of Lucene in Action 2.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-
    user@lucene.apache.org>
    Sent: Tue, May 3, 2011 12:57:39 PM
    Subject: AW: AW: "fuzzy prefix" search

    How does an simple Analyzer look that just "n-grams" the
    docs/fields.
    class SimpleNGramAnalyzer extends Analyzer { @Override
    public TokenStream tokenStream ( String fieldName,
    Reader
    reader )
    {
    EdgeNGramTokenFilter... ???
    }
    }
    -----Ursprüngliche Nachricht-----
    Von: Otis Gospodnetic

    Gesendet: Dienstag, 3. Mai 2011 13:36 > > > > An:
    java-user@lucene.apache.org > > > > Betreff: Re: AW: "fuzzy
    prefix" search > > > >
    Hi,

    I didn't read this thread closely, but just in case:
    * Is this something you can handle with synonyms?
    * If this is for English and you are trying to handle
    typos,
    there is a
    list
    of
    common English misspellings out there that you could
    use for
    this
    perhaps.
    * Have you considered n-gramming your tokens? Not
    sure if
    this would
    help,
    didn't read messages/examples closely enough, but you
    may want to
    look at
    this if you haven't done so yet.

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene -
    Nutch Lucene ecosystem
    search :: http://search-lucene.com/


    ----- Original Message ----
    From: Clemens Wyss <clemensdev@mysign.ch>
    To: "java-user@lucene.apache.org" <java-
    user@lucene.apache.org>
    Sent: Tue, May 3, 2011 5:25:30 AM > > > > Subject:
    AW: "fuzzy prefix" search > > > >
    PrefixQuery
    I'd like the combination of prefix and fuzzy ;-)
    because
    people
    could
    also type "menlo" or "märl" and in any of these cases
    I'd
    like to
    get
    a hit on Merlot (for suggesting Merlot)
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:22 > An:
    java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    I'd assumed that FuzzyQuery wouldn't ignore case
    but I
    could be
    wrong.
    What would be the edit distance between "mer"
    and
    "merlot"?
    Would
    it be less that 1.5 which I reckon would be the
    value of
    length(term)*0.5 as detailed in the javadocs?
    Seems
    unlikely,
    but
    I don't really know anything about the
    Levenshtein (edit
    distance)
    algorithm as used by FuzzyQuery.
    Wouldn't a PrefixQuery be more appropriate here?


    --
    Ian.

    On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Unfortunately lowercasing doesn't help.
    Also, doesn't the FuzzyQuery ignore casing?
    -----Ursprüngliche Nachricht-----
    Von: Ian Lea
    Gesendet: Dienstag, 3. Mai 2011 11:06 > > >
    An: java-user@lucene.apache.org
    Betreff: Re: "fuzzy prefix" search

    Mer != mer. The latter will be what is
    indexed
    because
    StandardAnalyzer calls LowerCaseFilter.

    --
    Ian.


    On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss
    <clemensdev@mysign.ch>
    wrote:
    Sorry for coming back to my issue. Can
    anybody
    explain why
    my
    "simple"
    unit test below fails? Any hint/help appreciated.
    Directory directory = new RAMDirectory();
    IndexWriter
    indexWriter = new IndexWriter( directory,
    new
    StandardAnalyzer(
    Version.LUCENE_31
    ), IndexWriter.MaxFieldLength.UNLIMITED );
    Document
    document
    =
    new
    Document(); document.add( new Field( "test",
    "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document ); IndexReader indexReader =
    indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher(
    indexReader );
    Query q = new FuzzyQuery( new Term( "test",
    "Mer"
    ),
    0.5f,
    0,
    10 ); // or Query q = new FuzzyQuery( new
    Term(
    "test",
    "Mer"
    ), 0.5f); TopDocs result = searcher.search(
    q,
    10
    );
    Assert.assertEquals( 1, result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and
    Term which makes
    the
    search to go for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I
    only get
    hits when
    the search term is a least as long as
    the word to
    be
    looked
    up.
    E.g.:
    ...
    Directory directory = new RAMDirectory();
    IndexWriter
    indexWriter = new IndexWriter(
    directory, >> >>
    IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
    IndexWriter.MaxFieldLength.UNLIMITED
    );
    Document document = new Document();
    document.add(
    new
    Field(
    "test", "Merlot",
    Field.Store.YES,
    Field.Index.ANALYZED
    ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader =
    indexWriter.getReader();
    IndexSearcher
    searcher = new IndexSearcher(
    indexReader ); >>
    Query q = new FuzzyQuery( new Term(
    "test",
    "Mer"
    ),
    0.6f,
    1 ); TopDocs result = searcher.search( q,
    10 );
    Assert.assertEquals( >> >> > 1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50 >> >> > >
    An:
    java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery
    which
    defines
    the
    number of characters that are seen as prefix. So all
    terms must match
    this prefix and the rest of each term
    is matched
    using
    fuzzy.
    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM
    To:
    java-user@lucene.apache.org > > > > > > >>
    Subject: "fuzzy prefix" search >> >> > > >
    I'd like to search fuzzily but not
    on a full
    term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott"
    hit.
    What
    Query-combination should I use?

    Thx
    Clemens


    --------------------------------------------------------
    ----
    ---
    ---
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org

    ----------------------------------------------------------
    ----
    ---
    ---
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org
    --------------------------------------------------------------
    --
    ---
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org

    --------------------------------------------------------------
    ----
    --- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >
    ---------------------------------------------------------------
    ----
    -- To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >> >
    ----------------------------------------------------------------
    -
    ----
    To unsubscribe, e-mail: java-user-
    unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >
    ----------------------------------------------------------------
    --
    ---
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-
    help@lucene.apache.org
    ----------------------------------------------------------------
    ---
    - > > > > > > - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org > > >
    -----------------------------------------------------------------
    ---
    - > > > > > To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org > > >
    --------------------------------------------------------------------
    -
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >


    -----------------------------------------------------------
    ---
    ----
    ---
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >
    ---------------------------------------------------------------
    ---
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >

    ---------------------------------------------------------------
    ---
    ---
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org >

    -----------------------------------------------------------------
    ---
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org

    -------------------------------------------------------------------
    -- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Biedermann,S.,Fa. Post Direkt at May 3, 2011 at 9:43 am
    Have you tried

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.499f);


    Sven


    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Dienstag, 3. Mai 2011 10:57
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Sorry for coming back to my issue. Can anybody explain why my "simple" unit test below fails? Any hint/help appreciated.

    Directory directory = new RAMDirectory();
    IndexWriter indexWriter = new IndexWriter( directory, new StandardAnalyzer( Version.LUCENE_31 ), IndexWriter.MaxFieldLength.UNLIMITED );
    Document document = new Document();
    document.add( new Field( "test", "Merlot", Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument( document );
    IndexReader indexReader = indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher( indexReader );
    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, 0, 10 );
    // or Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f);
    TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals( 1, result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which makes the search to go
    for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get hits when the search
    term is a least as long as the word to be looked up.

    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter indexWriter =
    new IndexWriter( directory, IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
    IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document();
    document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader(); IndexSearcher
    searcher = new IndexSearcher( indexReader );

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f, 1 );
    TopDocs result = searcher.search( q, 10 ); Assert.assertEquals( 1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines the number of
    characters that are seen as prefix. So all terms must match this
    prefix and the rest of each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM
    To: java-user@lucene.apache.org
    Subject: "fuzzy prefix" search

    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens


    ------------------------------------------------------------------
    --
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --------------------------------------------------------------------
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Biedermann,S.,Fa. Post Direkt at May 3, 2011 at 10:00 am
    I had a look into the 3.0 implementation

    The calculation of the similarity is

    1 - (edit distance / min (string 1 length, string 2 length)

    As opposed to the levenstein in spellchecker

    1 - (edit distance / max (string 1 length, string 2 length)


    So, the similarity is 1 - ( 3 / min(3,6)) = 0.

    --
    Sven


    -----Ursprüngliche Nachricht-----
    Von: Biedermann,S.,Fa. Post Direkt
    Gesendet: Dienstag, 3. Mai 2011 11:43
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Have you tried

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.499f);


    Sven


    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Dienstag, 3. Mai 2011 10:57
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Sorry for coming back to my issue. Can anybody explain why my "simple" unit test below fails? Any hint/help appreciated.

    Directory directory = new RAMDirectory();
    IndexWriter indexWriter = new IndexWriter( directory, new StandardAnalyzer( Version.LUCENE_31 ), IndexWriter.MaxFieldLength.UNLIMITED );
    Document document = new Document();
    document.add( new Field( "test", "Merlot", Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument( document );
    IndexReader indexReader = indexWriter.getReader();
    IndexSearcher searcher = new IndexSearcher( indexReader );
    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, 0, 10 );
    // or Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f);
    TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals( 1, result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which makes the search to go
    for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get hits when the search
    term is a least as long as the word to be looked up.

    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter indexWriter =
    new IndexWriter( directory, IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
    IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document();
    document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader(); IndexSearcher
    searcher = new IndexSearcher( indexReader );

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f, 1 );
    TopDocs result = searcher.search( q, 10 ); Assert.assertEquals( 1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines the number of
    characters that are seen as prefix. So all terms must match this
    prefix and the rest of each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM
    To: java-user@lucene.apache.org
    Subject: "fuzzy prefix" search

    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens


    ------------------------------------------------------------------
    --
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --------------------------------------------------------------------
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Clemens Wyss at May 3, 2011 at 10:12 am
    Is this calculation intended or a bug?
    -----Ursprüngliche Nachricht-----
    Von: Biedermann,S.,Fa. Post Direkt
    Gesendet: Dienstag, 3. Mai 2011 12:00
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I had a look into the 3.0 implementation

    The calculation of the similarity is

    1 - (edit distance / min (string 1 length, string 2 length)

    As opposed to the levenstein in spellchecker

    1 - (edit distance / max (string 1 length, string 2 length)


    So, the similarity is 1 - ( 3 / min(3,6)) = 0.

    --
    Sven


    -----Ursprüngliche Nachricht-----
    Von: Biedermann,S.,Fa. Post Direkt
    Gesendet: Dienstag, 3. Mai 2011 11:43
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Have you tried

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.499f);


    Sven


    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Dienstag, 3. Mai 2011 10:57
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Sorry for coming back to my issue. Can anybody explain why my "simple" unit
    test below fails? Any hint/help appreciated.

    Directory directory = new RAMDirectory(); IndexWriter indexWriter = new
    IndexWriter( directory, new StandardAnalyzer( Version.LUCENE_31 ),
    IndexWriter.MaxFieldLength.UNLIMITED );
    Document document = new Document();
    document.add( new Field( "test", "Merlot", Field.Store.YES,
    Field.Index.ANALYZED ) ); indexWriter.addDocument( document );
    IndexReader indexReader = indexWriter.getReader(); IndexSearcher
    searcher = new IndexSearcher( indexReader ); Query q = new FuzzyQuery(
    new Term( "test", "Mer" ), 0.5f, 0, 10 ); // or Query q = new FuzzyQuery( new
    Term( "test", "Mer" ), 0.5f); TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals( 1, result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which makes the search to
    go for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get hits when the search
    term is a least as long as the word to be looked up.

    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter indexWriter =
    new IndexWriter( directory, IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
    IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document();
    document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader(); IndexSearcher
    searcher = new IndexSearcher( indexReader );

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f, 1 );
    TopDocs result = searcher.search( q, 10 ); Assert.assertEquals( 1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines the number of
    characters that are seen as prefix. So all terms must match this
    prefix and the rest of each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM
    To: java-user@lucene.apache.org
    Subject: "fuzzy prefix" search

    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens


    ----------------------------------------------------------------
    --
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org


    ------------------------------------------------------------------
    --
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --------------------------------------------------------------------
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Biedermann,S.,Fa. Post Direkt at May 3, 2011 at 11:22 am
    I don't know.

    But changing it now would cause trouble in many applications...

    For our applications we reimplemented fuzzy query so that we can pass along a org.apache.lucene.search.spell.StringDistance instance that holds the similarity algorithm of choice.


    --

    Sven


    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Dienstag, 3. Mai 2011 12:12
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is this calculation intended or a bug?
    -----Ursprüngliche Nachricht-----
    Von: Biedermann,S.,Fa. Post Direkt
    Gesendet: Dienstag, 3. Mai 2011 12:00
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I had a look into the 3.0 implementation

    The calculation of the similarity is

    1 - (edit distance / min (string 1 length, string 2 length)

    As opposed to the levenstein in spellchecker

    1 - (edit distance / max (string 1 length, string 2 length)


    So, the similarity is 1 - ( 3 / min(3,6)) = 0.

    --
    Sven


    -----Ursprüngliche Nachricht-----
    Von: Biedermann,S.,Fa. Post Direkt
    Gesendet: Dienstag, 3. Mai 2011 11:43
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Have you tried

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.499f);


    Sven


    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Dienstag, 3. Mai 2011 10:57
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Sorry for coming back to my issue. Can anybody explain why my "simple" unit
    test below fails? Any hint/help appreciated.

    Directory directory = new RAMDirectory(); IndexWriter indexWriter = new
    IndexWriter( directory, new StandardAnalyzer( Version.LUCENE_31 ),
    IndexWriter.MaxFieldLength.UNLIMITED );
    Document document = new Document();
    document.add( new Field( "test", "Merlot", Field.Store.YES,
    Field.Index.ANALYZED ) ); indexWriter.addDocument( document );
    IndexReader indexReader = indexWriter.getReader(); IndexSearcher
    searcher = new IndexSearcher( indexReader ); Query q = new FuzzyQuery(
    new Term( "test", "Mer" ), 0.5f, 0, 10 ); // or Query q = new FuzzyQuery( new
    Term( "test", "Mer" ), 0.5f); TopDocs result = searcher.search( q, 10 );
    Assert.assertEquals( 1, result.totalHits );

    - Clemens
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 23:01
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    Is it the combination of FuzzyQuery and Term which makes the search to
    go for "word boundaries"?
    -----Ursprüngliche Nachricht-----
    Von: Clemens Wyss
    Gesendet: Montag, 2. Mai 2011 14:13
    An: java-user@lucene.apache.org
    Betreff: AW: "fuzzy prefix" search

    I tried this too, but unfortunately I only get hits when the search
    term is a least as long as the word to be looked up.

    E.g.:
    ...
    Directory directory = new RAMDirectory(); IndexWriter indexWriter =
    new IndexWriter( directory, IndexManager.getIndexingAnalyzer(
    LOCALE_DE ),
    IndexWriter.MaxFieldLength.UNLIMITED );

    Document document = new Document();
    document.add( new Field( "test", "Merlot",
    Field.Store.YES, Field.Index.ANALYZED ) );
    indexWriter.addDocument(
    document );

    IndexReader indexReader = indexWriter.getReader(); IndexSearcher
    searcher = new IndexSearcher( indexReader );

    Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f, 1 );
    TopDocs result = searcher.search( q, 10 ); Assert.assertEquals( 1,
    result.totalHits ); ...
    -----Ursprüngliche Nachricht-----
    Von: Uwe Schindler
    Gesendet: Montag, 2. Mai 2011 13:50
    An: java-user@lucene.apache.org
    Betreff: RE: "fuzzy prefix" search

    Hi,

    You can pass an integer to FuzzyQuery which defines the number of
    characters that are seen as prefix. So all terms must match this
    prefix and the rest of each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Clemens Wyss
    Sent: Monday, May 02, 2011 1:47 PM
    To: java-user@lucene.apache.org
    Subject: "fuzzy prefix" search

    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens


    ----------------------------------------------------------------
    --
    --
    - To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org


    ------------------------------------------------------------------
    --
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --------------------------------------------------------------------
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Meghana at Nov 28, 2011 at 6:53 pm
    Hi Uwe ,
    I need to do something similar... can u plz tell me how can i pass integer
    in my fuzzy search query?
    say for ex. i am searching like q=major~0.6
    i want to match terms after prefix "maj". how can i pass integer to do that
    way ?

    Thanks.



    Uwe Schindler wrote
    Hi,

    You can pass an integer to FuzzyQuery which defines the number of
    characters
    that are seen as prefix. So all terms must match this prefix and the rest
    of
    each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@
    -----Original Message-----
    From: Clemens Wyss [mailto:clemensdev@]
    Sent: Monday, May 02, 2011 1:47 PM
    To: java-user@.apache
    Subject: "fuzzy prefix" search

    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott" hit. What Query-combination
    should I use?

    Thx
    Clemens


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@.apache
    For additional commands, e-mail: java-user-help@.apache


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@.apache
    For additional commands, e-mail: java-user-help@.apache

    --
    View this message in context: http://lucene.472066.n3.nabble.com/fuzzy-prefix-search-tp2889563p3535742.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Nov 28, 2011 at 7:18 pm
    Hi Meghana,

    You can only do that by directly instantiating the FuzzyQuery, not via
    parsed queries.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: meghana
    Sent: Friday, November 25, 2011 11:20 AM
    To: java-user@lucene.apache.org
    Subject: RE: "fuzzy prefix" search

    Hi Uwe ,
    I need to do something similar... can u plz tell me how can i pass integer in my
    fuzzy search query?
    say for ex. i am searching like q=major~0.6 i want to match terms after prefix
    "maj". how can i pass integer to do that way ?

    Thanks.



    Uwe Schindler wrote
    Hi,

    You can pass an integer to FuzzyQuery which defines the number of
    characters that are seen as prefix. So all terms must match this
    prefix and the rest of each term is matched using fuzzy.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@
    -----Original Message-----
    From: Clemens Wyss [mailto:clemensdev@]
    Sent: Monday, May 02, 2011 1:47 PM
    To: java-user@.apache
    Subject: "fuzzy prefix" search

    I'd like to search fuzzily but not on a full term.
    E.g.
    I have a text "Merlot del Ticino"
    I'd like
    "mer", "merr", "melo", ... to match.

    If I use FuzzyQuery only "merlot, "merlott" hit. What
    Query-combination should I use?

    Thx
    Clemens


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@.apache For additional
    commands, e-mail: java-user-help@.apache


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@.apache For additional
    commands, e-mail: java-user-help@.apache

    --
    View this message in context:
    http://lucene.472066.n3.nabble.com/fuzzy-prefix-
    search-tp2889563p3535742.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 2, '11 at 11:47a
activeNov 28, '11 at 7:18p
posts27
users7
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase