FAQ
Hi,

I have two question about this GREAT tool.. (framework, library...
"whatever")
Well I decide put spell checker on my applications and I start to read some
papers and "found out" the Lucene project...

Anyway, I make it works, but I just want to know...

1º Why need I pass a Directory objecto (obligatory) on constructor of
SpellChecker?
2º Suposse that in my dictonary I had these words:

"The Lord of the Rings: The Two Towers"
"The Lord of the Rings: The Fellowship of the Ring"
"The Lord of the Rings: The Return of the King"

I just want to know how can I code something to "suggest" when user query
"The Lord of the Rings: The Two Towers" the application suggest:

"The Lord of the Rings: The Fellowship of the Ring"
"The Lord of the Rings: The Return of the King"

It is possible just using the Lucene?

################ My Test Class ######################
SpellChecker spell;
spell= new SpellChecker(FSDirectory.getDirectory(".")); //why this... ?!!
spell.indexDictionary(new Dicionario());

String[] l = spell.suggestSimilar(args[0],5);

for (String vl : l ){
System.out.println("Suggested : " + vl);
}
###############################################



############### My Dictionary######################
public class Dicionario implements
org.apache.lucene.search.spell.Dictionary{

public Iterator getWordsIterator(){
List<String> lista = new ArrayList<String>();
lista.add("peter");
lista.add("spider man 3");
lista.add("johnny depp");
lista.add("the edge");
lista.add("monk");
lista.add("arnold schwarzenegger");
return lista.iterator();
}
}
###############################################

Thanks in advance... :D
--
View this message in context: http://www.nabble.com/Questions-about-use-of-SpellChecker%3A-Constructor-and-Simillarity...-tp16559731p16559731.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Mathieu Lecarme at Apr 8, 2008 at 3:22 pm
    Use shingleFilter.

    I'm working on a wider SpellChecker, I'll post a third patch soon.
    https://admin.garambrogne.net/projets/revuedepresse/browser/trunk/src/java

    M.

    dreampeppers99 a écrit :
    Hi,

    I have two question about this GREAT tool.. (framework, library...
    "whatever")
    Well I decide put spell checker on my applications and I start to read some
    papers and "found out" the Lucene project...

    Anyway, I make it works, but I just want to know...

    1º Why need I pass a Directory objecto (obligatory) on constructor of
    SpellChecker?
    2º Suposse that in my dictonary I had these words:

    "The Lord of the Rings: The Two Towers"
    "The Lord of the Rings: The Fellowship of the Ring"
    "The Lord of the Rings: The Return of the King"

    I just want to know how can I code something to "suggest" when user query
    "The Lord of the Rings: The Two Towers" the application suggest:

    "The Lord of the Rings: The Fellowship of the Ring"
    "The Lord of the Rings: The Return of the King"

    It is possible just using the Lucene?

    ################ My Test Class ######################
    SpellChecker spell;
    spell= new SpellChecker(FSDirectory.getDirectory(".")); //why this... ?!!
    spell.indexDictionary(new Dicionario());

    String[] l = spell.suggestSimilar(args[0],5);

    for (String vl : l ){
    System.out.println("Suggested : " + vl);
    }
    ###############################################



    ############### My Dictionary######################
    public class Dicionario implements
    org.apache.lucene.search.spell.Dictionary{

    public Iterator getWordsIterator(){
    List<String> lista = new ArrayList<String>();
    lista.add("peter");
    lista.add("spider man 3");
    lista.add("johnny depp");
    lista.add("the edge");
    lista.add("monk");
    lista.add("arnold schwarzenegger");
    return lista.iterator();
    }
    }
    ###############################################

    Thanks in advance... :D

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Leandro at Apr 8, 2008 at 3:58 pm
    Hi,

    I have two question about this GREAT tool.. (framework, library...
    "whatever")
    Well I decide put spell checker on my applications and I start to read some
    papers and "found out" the Lucene project...

    Anyway, I make it works, but I just want to know...

    1º Why need I pass a Directory objecto (obligatory) on constructor of
    SpellChecker?
    2º Suposse that in my dictonary I had these words:

    "The Lord of the Rings: The Two Towers"
    "The Lord of the Rings: The Fellowship of the Ring"
    "The Lord of the Rings: The Return of the King"

    I just want to know how can I code something to "suggest" when user query
    "The Lord of the Rings: The Two Towers" the application suggest:

    "The Lord of the Rings: The Fellowship of the Ring"
    "The Lord of the Rings: The Return of the King"

    It is possible just using the Lucene?

    ################ My Test Class ######################
    SpellChecker spell;
    spell= new SpellChecker(FSDirectory.getDirectory(".")); //why this... ?!!
    spell.indexDictionary(new Dicionario());

    String[] l = spell.suggestSimilar(args[0],5);

    for (String vl : l ){
    System.out.println("Suggested : " + vl);
    }
    ###############################################



    ############### My Dictionary######################
    public class Dicionario implements
    org.apache.lucene.search.spell.Dictionary{

    public Iterator getWordsIterator(){
    List<String> lista = new ArrayList<String>();
    lista.add("peter");
    lista.add("spider man 3");
    lista.add("johnny depp");
    lista.add("the edge");
    lista.add("monk");
    lista.add("arnold schwarzenegger");
    return lista.iterator();
    }
    }
    ###############################################

    Thanks in advance... :D
  • Karl Wettin at Apr 8, 2008 at 4:35 pm

    dreampeppers99 skrev:
    1º Why need I pass a Directory objecto (obligatory) on constructor of
    SpellChecker?
    Mainly because it is a nasty peice of code. But it does a good job.
    2º Suposse that in my dictonary I had these words:

    "The Lord of the Rings: The Two Towers"
    "The Lord of the Rings: The Fellowship of the Ring"
    "The Lord of the Rings: The Return of the King"

    I just want to know how can I code something to "suggest" when user query
    "The Lord of the Rings: The Two Towers" the application suggest:

    "The Lord of the Rings: The Fellowship of the Ring"
    "The Lord of the Rings: The Return of the King"

    It is possible just using the Lucene?
    There are no typos in your example so you really don't even need a spell
    checker for that. Using OR clauses in your query would be enough.
    Perhaps you want to combine one variant with MUST clauses that has a bit
    more boost than the OR clauses.



    karl

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Leandro at Apr 8, 2008 at 4:50 pm


    1º Why need I pass a Directory objecto (obligatory) on constructor of
    SpellChecker?
    Mainly because it is a nasty peice of code. But it does a good job.
    Thanks.
    How can we suggest it (create an normal constructor without param) to the
    team?

    2º Suposse that in my dictonary I had these words:
    "The Lord of the Rings: The Two Towers"
    "The Lord of the Rings: The Fellowship of the Ring"
    "The Lord of the Rings: The Return of the King"

    I just want to know how can I code something to "suggest" when user
    query
    "The Lord of the Rings: The Two Towers" the application suggest:
    "The Lord of the Rings: The Fellowship of the Ring"
    "The Lord of the Rings: The Return of the King"

    It is possible just using the Lucene?
    There are no typos in your example so you really don't even need a spell
    checker for that. Using OR clauses in your query would be enough.

    I guess no, because user will enter : "The Lord of the Rings: The Return of
    the King" ... and the system should response with:


    Similar:
    The Lord of the Rings: The Two Towers
    The Lord of the Rings: The Fellowship of the Ring

    I can't see how can I do that? (just using the OR statement)
    For example:

    name like '%the%'
    or
    name like '%Lord%'
    or
    name like '%of%'
    or
    name like '%the%'
    or
    name like '%Rings%'

    will produce so much results besides to be non-performatic...

    Perhaps you want to combine one variant with MUST clauses that has a bit
    more boost than the OR clauses.

    karl

    Thanks so much Karl!!!
  • Karl Wettin at Apr 8, 2008 at 5:24 pm

    Leandro skrev:

    1º Why need I pass a Directory objecto (obligatory) on constructor of
    SpellChecker?
    Mainly because it is a nasty peice of code. But it does a good job.
    How can we suggest it (create an normal constructor without param) to the
    team?
    Sorry, I missunderstood your question. See other reply.
    There are no typos in your example so you really don't even need a spell
    checker for that. Using OR clauses in your query would be enough.
    I guess no, because user will enter : "The Lord of the Rings: The Return of
    the King" ... and the system should response with:


    Similar:
    The Lord of the Rings: The Two Towers
    The Lord of the Rings: The Fellowship of the Ring

    I can't see how can I do that? (just using the OR statement)
    For example:

    name like '%the%'
    or
    name like '%Lord%'
    or
    name like '%of%'
    or
    name like '%the%'
    or
    name like '%Rings%'

    will produce so much results besides to be non-performatic...
    Are you sure about that? Did you benchmark? Can we see the results?




    karl

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Leandro at Apr 8, 2008 at 5:29 pm

    Sorry, I missunderstood your question. See other reply.
    Yes I got it. thanks
    Are you sure about that? Did you benchmark? Can we see the results?

    Hey man take it easy, I just imagine. But I guess use the ShingleFilter will
    help.
  • Karl Wettin at Apr 8, 2008 at 5:33 pm

    Leandro skrev:
    Sorry, I missunderstood your question. See other reply.
    Yes I got it. thanks
    Are you sure about that? Did you benchmark? Can we see the results?

    Hey man take it easy, I just imagine. But I guess use the ShingleFilter will
    help.

    I'm cool :) I just think you are overcomplicating things.


    karl


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Leandro at Apr 8, 2008 at 5:47 pm



    I'm cool :) I just think you are overcomplicating things.
    Yes... I can use two words and OR
    Suposse I query on this

    The Lord of Rings: Return of King
    The Lord of Rings: Fellowship
    The Lord of Rings: The Two towers
    The Lord of Weapons
    The Lord of War

    Suposse an user search: "The Lord of Rings Return of King"
    WHERE
    name like '%the lord%' or
    name like '%lord of%' or
    name like '%of rings%' or
    name like '%rings return%' or
    name like '%return of%' or
    name like '%of king%'


    So will show all lines... the question now is which is best 'ranking' ...
    However you all help me so much , THANKS SO MUCH!!!
    (now I won't say bad about the constructor of SpellChecker)
  • Mathieu Lecarme at Apr 8, 2008 at 6:04 pm

    I'm cool :) I just think you are overcomplicating things.
    Yes... I can use two words and OR
    Suposse I query on this

    The Lord of Rings: Return of King
    The Lord of Rings: Fellowship
    The Lord of Rings: The Two towers
    The Lord of Weapons
    The Lord of War

    Suposse an user search: "The Lord of Rings Return of King"
    WHERE
    name like '%the lord%' or
    name like '%lord of%' or
    name like '%of rings%' or
    name like '%rings return%' or
    name like '%return of%' or
    name like '%of king%'
    Lucen syntax is more pretty.
    With movie title indexed as "title", with LowerCaseFilter.


    BooleanQuery bq = new BooleanQuery();
    bd.add(new TermQuery(new Term("title", "the lord")), Occur.SHOULD);
    bd.add(new TermQuery(new Term("title", "lord of")), Occur.SHOULD);
    bd.add(new TermQuery(new Term("title", "of rings")), Occur.SHOULD);
    bd.add(new TermQuery(new Term("title", "rings return")), Occur.SHOULD);
    bd.add(new TermQuery(new Term("title", "return of")), Occur.SHOULD);
    bd.add(new TermQuery(new Term("title", "of king")), Occur.SHOULD);
    So will show all lines... the question now is which is best
    'ranking' ...
    However you all help me so much , THANKS SO MUCH!!!
    (now I won't say bad about the constructor of SpellChecker)
    most word matched, the better score you have.
    You should use a thresold (number of common words/word size) or
    something like that to exclude to far title.

    M.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mathieu Lecarme at Apr 8, 2008 at 5:02 pm

    Le 8 avr. 08 à 18:34, Karl Wettin a écrit :
    dreampeppers99 skrev:
    1º Why need I pass a Directory objecto (obligatory) on constructor of
    SpellChecker?
    Mainly because it is a nasty peice of code. But it does a good job.
    Because spellChecker use a directory to store data. It can be
    FSDirectory, RAMDirectory ....
    2º Suposse that in my dictonary I had these words:
    "The Lord of the Rings: The Two Towers"
    "The Lord of the Rings: The Fellowship of the Ring"
    "The Lord of the Rings: The Return of the King"
    I just want to know how can I code something to "suggest" when user
    query
    "The Lord of the Rings: The Two Towers" the application suggest:
    "The Lord of the Rings: The Fellowship of the Ring"
    "The Lord of the Rings: The Return of the King"
    It is possible just using the Lucene?
    There are no typos in your example so you really don't even need a
    spell checker for that. Using OR clauses in your query would be
    enough. Perhaps you want to combine one variant with MUST clauses
    that has a bit more boost than the OR clauses.
    A classical OR query will match shuffled data : "The king of lord got
    a ring" should match.
    With shingle, you will match title in the right order.

    M.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Leandro at Apr 8, 2008 at 5:18 pm

    Mainly because it is a nasty peice of code. But it does a good job.
    Because spellChecker use a directory to store data. It can be FSDirectory,
    RAMDirectory ....

    Perfect explanation... !!!
    So use the RAMDirectory is better (perfomatically)

    spell= new SpellChecker(FSDirectory.getDirectory("."));
    spell= new SpellChecker(RAMDirectory.getDirectory("."));

    The second is better (fast) to little amount of data...
    Thanks so much, now I can understand ... It may be on real documentation...


    A classical OR query will match shuffled data : "The king of lord got a
    ring" should match.
    With shingle, you will match title in the right order.

    Shingle will divide it on "couple" of words... so I can use it with OR ...
    (The good one.... I'll try this)


    Thanks so much!!!
  • Karl Wettin at Apr 8, 2008 at 5:21 pm

    Mathieu Lecarme skrev:
    Le 8 avr. 08 à 18:34, Karl Wettin a écrit :
    dreampeppers99 skrev:
    2º Suposse that in my dictonary I had these words:
    "The Lord of the Rings: The Two Towers"
    "The Lord of the Rings: The Fellowship of the Ring"
    "The Lord of the Rings: The Return of the King"
    I just want to know how can I code something to "suggest" when user
    query
    "The Lord of the Rings: The Two Towers" the application suggest: "The
    Lord of the Rings: The Fellowship of the Ring"
    "The Lord of the Rings: The Return of the King"
    It is possible just using the Lucene?
    There are no typos in your example so you really don't even need a
    spell checker for that. Using OR clauses in your query would be
    enough. Perhaps you want to combine one variant with MUST clauses that
    has a bit more boost than the OR clauses.
    A classical OR query will match shuffled data : "The king of lord got a
    ring" should match.
    With shingle, you will match title in the right order.
    Appending a SHOULD clause containing a phrase or span query with a bit
    of boost also works.



    karl

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedApr 8, '08 at 3:11p
activeApr 8, '08 at 6:04p
posts13
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase