FAQ
Hi friends

I have just started using lucene and the way i want to use it is the
following:

i have documents consisting of names of users as one field.
i have a sentence that may contain the name of some user.
i perform a search for the sentence in the index using the searcher.
if it contains the name of the user, then that user's document is listed on
top by lucene.

now i want to determine the position in the sentence where the string has
been found.

i am using fuzzy query matching by adding the character '~' to the sentence
i am searching.
so this means i cannot use the find function of the String class as is to
get the position of the match.

Thanks in advance

--
Rohit Banga

Search Discussions

  • Uwe Schindler at Feb 6, 2010 at 2:43 pm
    There are two contrib packages for highlighting in the lucene distribution: highlighter and fast-vector-highlighter

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Rohit Banga
    Sent: Saturday, February 06, 2010 2:27 PM
    To: java-user@lucene.apache.org
    Subject: hit highlighting in lucene

    Hi friends

    I have just started using lucene and the way i want to use it is the
    following:

    i have documents consisting of names of users as one field.
    i have a sentence that may contain the name of some user.
    i perform a search for the sentence in the index using the searcher.
    if it contains the name of the user, then that user's document is
    listed on
    top by lucene.

    now i want to determine the position in the sentence where the string
    has
    been found.

    i am using fuzzy query matching by adding the character '~' to the
    sentence
    i am searching.
    so this means i cannot use the find function of the String class as is
    to
    get the position of the match.

    Thanks in advance

    --
    Rohit Banga

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Rohit Banga at Feb 7, 2010 at 10:33 am
    but what about the case in which i am using fuzzy query matching. then the
    highlighter package does not work.
    On Sat, Feb 6, 2010 at 8:12 PM, Uwe Schindler wrote:

    There are two contrib packages for highlighting in the lucene distribution:
    highlighter and fast-vector-highlighter

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Rohit Banga
    Sent: Saturday, February 06, 2010 2:27 PM
    To: java-user@lucene.apache.org
    Subject: hit highlighting in lucene

    Hi friends

    I have just started using lucene and the way i want to use it is the
    following:

    i have documents consisting of names of users as one field.
    i have a sentence that may contain the name of some user.
    i perform a search for the sentence in the index using the searcher.
    if it contains the name of the user, then that user's document is
    listed on
    top by lucene.

    now i want to determine the position in the sentence where the string
    has
    been found.

    i am using fuzzy query matching by adding the character '~' to the
    sentence
    i am searching.
    so this means i cannot use the find function of the String class as is
    to
    get the position of the match.

    Thanks in advance

    --
    Rohit Banga

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Rohit Banga
  • Uwe Schindler at Feb 7, 2010 at 10:41 am
    It works.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Rohit Banga
    Sent: Sunday, February 07, 2010 11:33 AM
    To: java-user@lucene.apache.org
    Subject: Re: hit highlighting in lucene

    but what about the case in which i am using fuzzy query matching. then
    the
    highlighter package does not work.
    On Sat, Feb 6, 2010 at 8:12 PM, Uwe Schindler wrote:

    There are two contrib packages for highlighting in the lucene
    distribution:
    highlighter and fast-vector-highlighter

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Rohit Banga
    Sent: Saturday, February 06, 2010 2:27 PM
    To: java-user@lucene.apache.org
    Subject: hit highlighting in lucene

    Hi friends

    I have just started using lucene and the way i want to use it is
    the
    following:

    i have documents consisting of names of users as one field.
    i have a sentence that may contain the name of some user.
    i perform a search for the sentence in the index using the
    searcher.
    if it contains the name of the user, then that user's document is
    listed on
    top by lucene.

    now i want to determine the position in the sentence where the
    string
    has
    been found.

    i am using fuzzy query matching by adding the character '~' to the
    sentence
    i am searching.
    so this means i cannot use the find function of the String class as
    is
    to
    get the position of the match.

    Thanks in advance

    --
    Rohit Banga

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Rohit Banga

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Simon Willnauer at Feb 7, 2010 at 10:53 am
    Rohit,
    what kind of problems are you facing with using fuzzy query and highlighting.
    could you give us more details and maybe a small code snipped which
    isolates you problem?

    simon
    On Sun, Feb 7, 2010 at 11:32 AM, Rohit Banga wrote:
    but what about the case in which i am using fuzzy query matching. then the
    highlighter package does not work.
    On Sat, Feb 6, 2010 at 8:12 PM, Uwe Schindler wrote:

    There are two contrib packages for highlighting in the lucene distribution:
    highlighter and fast-vector-highlighter

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Rohit Banga
    Sent: Saturday, February 06, 2010 2:27 PM
    To: java-user@lucene.apache.org
    Subject: hit highlighting in lucene

    Hi friends

    I have just started using lucene and the way i want to use it is the
    following:

    i have documents consisting of names of users as one field.
    i have a sentence that may contain the name of some user.
    i perform a search for the sentence in the index using the searcher.
    if it contains the name of the user, then that user's document is
    listed on
    top by lucene.

    now i want to determine the position in the sentence where the string
    has
    been found.

    i am using fuzzy query matching by adding the character '~' to the
    sentence
    i am searching.
    so this means i cannot use the find function of the String class as is
    to
    get the position of the match.

    Thanks in advance

    --
    Rohit Banga

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Rohit Banga
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Rohit Banga at Feb 7, 2010 at 10:59 am
    // list of cities that has been indexed
    // each city name is a document
    public static final String[] names = {"New Delhi", "Bangalore",
    "Hyderabad",
    "Mumbai", "Chennai", "Kolkata",
    "Ahmedabad",
    "Kanpur", "Guwahati", "Roorkee",
    "Dehradun",
    "Lucknow", "Bhopal", "Jaipur",
    "Jodhpur",
    "Thiruvanthapuram", "Jammu",
    "Srinagar",
    "Raipur", "Pathankot", "Meerut",
    "Muzaffarnagar",
    "Agra", "Jhansi", "Gandhinagar",
    "Nasik", "Nagpur",
    "Calicut", "Trichi", "Bharatpur",
    "Nainital"
    };

    // i am using the standard analyzer
    void highLightWords(String qStr) throws Exception {

    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
    TokenStream stream = analyzer.tokenStream("name", new
    StringReader(qStr));

    TermQuery tq = new TermQuery(new Term("name","mumbai"));
    QueryScorer scorer = new QueryScorer(tq);
    Highlighter highlighter = new Highlighter(scorer);

    String fragment = highlighter.getBestFragment(stream, qStr);
    System.out.println("\nfragment found: " + fragment);
    }


    // invoking the above function
    luceneTest.highLightWords("some unimportant text here Mumbai some
    unimportant text there~");
    fragment found: some unimportant text here <B>Mumbai</B> some unimportant
    text there~

    but when i change mumbai to mumbhai
    then while searching lucene does return hits for the correct document the
    fragment is not found by the above function.

    luceneTest.highLightWords("some unimportant text here Mumbhai some
    unimportant text there~");
    fragment is null.
    On Sun, Feb 7, 2010 at 4:22 PM, Simon Willnauer wrote:

    Rohit,
    what kind of problems are you facing with using fuzzy query and
    highlighting.
    could you give us more details and maybe a small code snipped which
    isolates you problem?

    simon
    On Sun, Feb 7, 2010 at 11:32 AM, Rohit Banga wrote:
    but what about the case in which i am using fuzzy query matching. then the
    highlighter package does not work.
    On Sat, Feb 6, 2010 at 8:12 PM, Uwe Schindler wrote:

    There are two contrib packages for highlighting in the lucene
    distribution:
    highlighter and fast-vector-highlighter

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Rohit Banga
    Sent: Saturday, February 06, 2010 2:27 PM
    To: java-user@lucene.apache.org
    Subject: hit highlighting in lucene

    Hi friends

    I have just started using lucene and the way i want to use it is the
    following:

    i have documents consisting of names of users as one field.
    i have a sentence that may contain the name of some user.
    i perform a search for the sentence in the index using the searcher.
    if it contains the name of the user, then that user's document is
    listed on
    top by lucene.

    now i want to determine the position in the sentence where the string
    has
    been found.

    i am using fuzzy query matching by adding the character '~' to the
    sentence
    i am searching.
    so this means i cannot use the find function of the String class as is
    to
    get the position of the match.

    Thanks in advance

    --
    Rohit Banga

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Rohit Banga
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Rohit Banga
  • Simon Willnauer at Feb 7, 2010 at 11:15 am
    try
    Query tq = new FuzzyQuery(new Term("name","mumbai"));
    instead of
    TermQuery tq = new TermQuery(new Term("name","mumbai"));

    simon
    On Sun, Feb 7, 2010 at 11:58 AM, Rohit Banga wrote:

    // list of cities that has been indexed
    // each city name is a document
    public static final String[] names = {"New Delhi", "Bangalore",
    "Hyderabad",
    "Mumbai", "Chennai", "Kolkata",
    "Ahmedabad",
    "Kanpur", "Guwahati", "Roorkee",
    "Dehradun",
    "Lucknow", "Bhopal", "Jaipur",
    "Jodhpur",
    "Thiruvanthapuram", "Jammu",
    "Srinagar",
    "Raipur", "Pathankot", "Meerut",
    "Muzaffarnagar",
    "Agra", "Jhansi", "Gandhinagar",
    "Nasik", "Nagpur",
    "Calicut", "Trichi", "Bharatpur",
    "Nainital"
    };

    // i am using the standard analyzer
    void highLightWords(String qStr) throws Exception {

    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
    TokenStream stream = analyzer.tokenStream("name", new
    StringReader(qStr));

    TermQuery tq = new TermQuery(new Term("name","mumbai"));
    QueryScorer scorer = new QueryScorer(tq);
    Highlighter highlighter = new Highlighter(scorer);

    String fragment = highlighter.getBestFragment(stream, qStr);
    System.out.println("\nfragment found: " + fragment);
    }


    // invoking the above function
    luceneTest.highLightWords("some unimportant text here Mumbai some
    unimportant text there~");
    fragment found: some unimportant text here <B>Mumbai</B> some unimportant
    text there~

    but when i change mumbai to mumbhai
    then while searching lucene does return hits for the correct document the
    fragment is not found by the above function.

    luceneTest.highLightWords("some unimportant text here Mumbhai some
    unimportant text there~");
    fragment is null.

    On Sun, Feb 7, 2010 at 4:22 PM, Simon Willnauer
    wrote:
    Rohit,
    what kind of problems are you facing with using fuzzy query and
    highlighting.
    could you give us more details and maybe a small code snipped which
    isolates you problem?

    simon

    On Sun, Feb 7, 2010 at 11:32 AM, Rohit Banga <iamrohitbanga@gmail.com>
    wrote:
    but what about the case in which i am using fuzzy query matching. then
    the
    highlighter package does not work.
    On Sat, Feb 6, 2010 at 8:12 PM, Uwe Schindler wrote:

    There are two contrib packages for highlighting in the lucene
    distribution:
    highlighter and fast-vector-highlighter

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Rohit Banga
    Sent: Saturday, February 06, 2010 2:27 PM
    To: java-user@lucene.apache.org
    Subject: hit highlighting in lucene

    Hi friends

    I have just started using lucene and the way i want to use it is the
    following:

    i have documents consisting of names of users as one field.
    i have a sentence that may contain the name of some user.
    i perform a search for the sentence in the index using the searcher.
    if it contains the name of the user, then that user's document is
    listed on
    top by lucene.

    now i want to determine the position in the sentence where the string
    has
    been found.

    i am using fuzzy query matching by adding the character '~' to the
    sentence
    i am searching.
    so this means i cannot use the find function of the String class as
    is
    to
    get the position of the match.

    Thanks in advance

    --
    Rohit Banga

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Rohit Banga
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    Rohit Banga
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Rohit Banga at Feb 7, 2010 at 12:13 pm
    it works!!! :)

    could you also offer a suggestion for the following?

    please have a look at the code above. it contains a list of cities that have
    been added to the index.

    // this is the code for indexing
    void indexCities() throws Exception {

    IndexWriter writer = new
    IndexWriter(FSDirectory.open(index_directory),
    new
    StandardAnalyzer(Version.LUCENE_CURRENT), true,
    IndexWriter.MaxFieldLength.LIMITED);

    for (int i = 0; i < names.length; ++i) {
    Document doc = new Document();
    doc.add(new Field("name", names[i], Field.Store.YES,
    Field.Index.ANALYZED));
    writer.addDocument(doc);
    }

    writer.optimize();
    writer.close();
    }

    if i try
    TermQuery tq = new FuzzyQuery(new Term("name","new delhi"));

    i get a null because new and delhi are considered separately.

    how should i change the analyzer to consider new delhi as a single term.
    basically i am using lucene to find the names of all cities in the string.
    because their may be spelling mistakes fuzzy matching works. that is i get
    that the document with the closest matching city name as a top hit.
    but since i also want to identify where in the query the match occurred, i
    am using a hit highlighter. do i need to modify my analyzer to group new and
    delhi into a phrase.

    sorry for the noob question :(


    On Sun, Feb 7, 2010 at 4:44 PM, Simon Willnauer wrote:

    try
    Query tq = new FuzzyQuery(new Term("name","mumbai"));
    instead of
    TermQuery tq = new TermQuery(new Term("name","mumbai"));

    simon
    On Sun, Feb 7, 2010 at 11:58 AM, Rohit Banga wrote:

    // list of cities that has been indexed
    // each city name is a document
    public static final String[] names = {"New Delhi", "Bangalore",
    "Hyderabad",
    "Mumbai", "Chennai", "Kolkata",
    "Ahmedabad",
    "Kanpur", "Guwahati",
    "Roorkee",
    "Dehradun",
    "Lucknow", "Bhopal", "Jaipur",
    "Jodhpur",
    "Thiruvanthapuram", "Jammu",
    "Srinagar",
    "Raipur", "Pathankot", "Meerut",
    "Muzaffarnagar",
    "Agra", "Jhansi",
    "Gandhinagar",
    "Nasik", "Nagpur",
    "Calicut", "Trichi",
    "Bharatpur",
    "Nainital"
    };

    // i am using the standard analyzer
    void highLightWords(String qStr) throws Exception {

    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
    TokenStream stream = analyzer.tokenStream("name", new
    StringReader(qStr));

    TermQuery tq = new TermQuery(new Term("name","mumbai"));
    QueryScorer scorer = new QueryScorer(tq);
    Highlighter highlighter = new Highlighter(scorer);

    String fragment = highlighter.getBestFragment(stream, qStr);
    System.out.println("\nfragment found: " + fragment);
    }


    // invoking the above function
    luceneTest.highLightWords("some unimportant text here Mumbai some
    unimportant text there~");
    fragment found: some unimportant text here <B>Mumbai</B> some unimportant
    text there~

    but when i change mumbai to mumbhai
    then while searching lucene does return hits for the correct document the
    fragment is not found by the above function.

    luceneTest.highLightWords("some unimportant text here Mumbhai some
    unimportant text there~");
    fragment is null.

    On Sun, Feb 7, 2010 at 4:22 PM, Simon Willnauer
    wrote:
    Rohit,
    what kind of problems are you facing with using fuzzy query and
    highlighting.
    could you give us more details and maybe a small code snipped which
    isolates you problem?

    simon

    On Sun, Feb 7, 2010 at 11:32 AM, Rohit Banga <iamrohitbanga@gmail.com>
    wrote:
    but what about the case in which i am using fuzzy query matching. then
    the
    highlighter package does not work.
    On Sat, Feb 6, 2010 at 8:12 PM, Uwe Schindler wrote:

    There are two contrib packages for highlighting in the lucene
    distribution:
    highlighter and fast-vector-highlighter

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Rohit Banga
    Sent: Saturday, February 06, 2010 2:27 PM
    To: java-user@lucene.apache.org
    Subject: hit highlighting in lucene

    Hi friends

    I have just started using lucene and the way i want to use it is
    the
    following:

    i have documents consisting of names of users as one field.
    i have a sentence that may contain the name of some user.
    i perform a search for the sentence in the index using the
    searcher.
    if it contains the name of the user, then that user's document is
    listed on
    top by lucene.

    now i want to determine the position in the sentence where the
    string
    has
    been found.

    i am using fuzzy query matching by adding the character '~' to the
    sentence
    i am searching.
    so this means i cannot use the find function of the String class as
    is
    to
    get the position of the match.

    Thanks in advance

    --
    Rohit Banga

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Rohit Banga
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    Rohit Banga


    --
    Rohit Banga

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedFeb 6, '10 at 1:27p
activeFeb 7, '10 at 12:13p
posts8
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase