FAQ
Hi,
I'm using standard tokenizer for both indexing and searching
process.Myindexed value is like "emp-id Aq234 kaith creating document
for search".
I can get search results for the query CONTENT:"emp-id" by using hits =
indexSearcher.search(*query*).
But if I try to get termfrequency of that term (CONTENT:"emp-id") by
using indexreader.termdocs(new Term("CONTENT","emp-id")).freq() , 0 results
returned.
I think because of the analyzer I can get result in 1st case but absence
of analyzer I can't get result in 2nd case (term freq). Is it right?
How do i get correct term frequency for that term?


Thanks & Regards
RSK

Search Discussions

  • Liu_andy2 at Jun 20, 2007 at 10:04 am
    You are right!
    "emp-id" will be separated to two terms CONTENT:"emp" CONTENT:"id" by
    standard tokenizer for indexing and searching. But direct writing term
    (CONTENT:"emp-id") will not.

    Andy

    -----Original Message-----
    From: SK R
    Sent: Wednesday, June 20, 2007 5:24 PM
    To: java-user@lucene.apache.org
    Subject: zero termfreq for some search strings with special characters

    Hi,
    I'm using standard tokenizer for both indexing and searching
    process.Myindexed value is like "emp-id Aq234 kaith creating document
    for search".
    I can get search results for the query CONTENT:"emp-id" by using
    hits =
    indexSearcher.search(*query*).
    But if I try to get termfrequency of that term (CONTENT:"emp-id") by
    using indexreader.termdocs(new Term("CONTENT","emp-id")).freq() , 0
    results
    returned.
    I think because of the analyzer I can get result in 1st case but
    absence
    of analyzer I can't get result in 2nd case (term freq). Is it right?
    How do i get correct term frequency for that term?


    Thanks & Regards
    RSK

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • SK R at Jun 20, 2007 at 11:53 am
    Hi,
    Thanks for your reply.
    But how do I get termfreq of that term("emp-id")? Does Lucene have any
    other way to handle this?
    I appreciate any solution regarding this problem.

    Regards
    SenthilKumaran

    On 6/20/07, Liu_Andy2@emc.com wrote:

    You are right!
    "emp-id" will be separated to two terms CONTENT:"emp" CONTENT:"id" by
    standard tokenizer for indexing and searching. But direct writing term
    (CONTENT:"emp-id") will not.

    Andy

    -----Original Message-----
    From: SK R
    Sent: Wednesday, June 20, 2007 5:24 PM
    To: java-user@lucene.apache.org
    Subject: zero termfreq for some search strings with special characters

    Hi,
    I'm using standard tokenizer for both indexing and searching
    process.Myindexed value is like "emp-id Aq234 kaith creating document
    for search".
    I can get search results for the query CONTENT:"emp-id" by using
    hits =
    indexSearcher.search(*query*).
    But if I try to get termfrequency of that term (CONTENT:"emp-id") by
    using indexreader.termdocs(new Term("CONTENT","emp-id")).freq() , 0
    results
    returned.
    I think because of the analyzer I can get result in 1st case but
    absence
    of analyzer I can't get result in 2nd case (term freq). Is it right?
    How do i get correct term frequency for that term?


    Thanks & Regards
    RSK

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Jun 20, 2007 at 1:24 pm
    You don't. You don't have an actual term "emp-id" in your index. You
    have "emp" and "id". So "emp-id" isn't a term.

    If you really want to control this sort of thing, and none of the
    stock analyzers work exactly as you require, you need to write
    your own Analyzer that breaks the stream however you want, and
    use *that* analyzer at index and search time. Then looking at
    termfreq will work as you expect.

    PerFieldAnalyzerWrapper will allow you to treat different fields
    differently, which may help if you want one sort of behavior for
    one field in your documents and different behavior for others.

    Best
    Erick
    On 6/20/07, SK R wrote:

    Hi,
    Thanks for your reply.
    But how do I get termfreq of that term("emp-id")? Does Lucene have
    any
    other way to handle this?
    I appreciate any solution regarding this problem.

    Regards
    SenthilKumaran

    On 6/20/07, Liu_Andy2@emc.com wrote:

    You are right!
    "emp-id" will be separated to two terms CONTENT:"emp" CONTENT:"id" by
    standard tokenizer for indexing and searching. But direct writing term
    (CONTENT:"emp-id") will not.

    Andy

    -----Original Message-----
    From: SK R
    Sent: Wednesday, June 20, 2007 5:24 PM
    To: java-user@lucene.apache.org
    Subject: zero termfreq for some search strings with special characters

    Hi,
    I'm using standard tokenizer for both indexing and searching
    process.Myindexed value is like "emp-id Aq234 kaith creating document
    for search".
    I can get search results for the query CONTENT:"emp-id" by using
    hits =
    indexSearcher.search(*query*).
    But if I try to get termfrequency of that term (CONTENT:"emp-id") by
    using indexreader.termdocs(new Term("CONTENT","emp-id")).freq() , 0
    results
    returned.
    I think because of the analyzer I can get result in 1st case but
    absence
    of analyzer I can't get result in 2nd case (term freq). Is it right?
    How do i get correct term frequency for that term?


    Thanks & Regards
    RSK

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 20, '07 at 9:24a
activeJun 20, '07 at 1:24p
posts4
users3
websitelucene.apache.org

3 users in discussion

SK R: 2 posts Liu_andy2: 1 post Erick Erickson: 1 post

People

Translate

site design / logo © 2022 Grokbase