FAQ
Hi Everyone,

I've been searching the archive without success to answer this one: is it
possible to specify one similarity class per field, just like we can do with
an analyzer ? I know I can change the similarity of the searcher, but that
restrict me to break some complex queries into different chunk and sum the
score "by hand" rather than having the fast internal implementation do the
job. What I would really like is to have something like
PerFieldAnalyzerWrapper but for similarity... Is this possible ?

Jp

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Search Discussions

  • Erik Hatcher at May 3, 2005 at 11:42 pm

    On May 3, 2005, at 5:57 PM, Robichaud, Jean-Philippe wrote:

    Hi Everyone,

    I've been searching the archive without success to answer this one:
    is it
    possible to specify one similarity class per field, just like we
    can do with
    an analyzer ? I know I can change the similarity of the searcher,
    but that
    restrict me to break some complex queries into different chunk and
    sum the
    score "by hand" rather than having the fast internal implementation
    do the
    job. What I would really like is to have something like
    PerFieldAnalyzerWrapper but for similarity... Is this possible ?
    I'm interested in what your use case is in desiring this. What
    factors would you vary per field? The only factor that seems to make
    sense is lengthNorm which is computed at indexing time and does allow
    per-field tweaking. A custom Similarity subclass could be used to
    affect the lengthNorm based on the field name parameter.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Robichaud, Jean-Philippe at May 4, 2005 at 8:12 pm
    I have an application where I use Lucene to retrieve "made up" documents,
    just like many people do. I my case, I need the score to be meaningful,
    really meaning full. For certain fields, the similarity should be a
    frequency count, without idf factor, for others the idf should be the real
    idf, for others again idf should be equal to sqrt(idf). Again, I can change
    the similarity of the reader at run-time and issue specific queries, summing
    the score myself, but that is pretty inefficient. A ScoreObject
    (http://mail-archives.apache.org/mod_mbox/lucene-java-user/200504.mbox/%3c42
    [email protected]%3e) would save me a little bit, but that's
    another topic.

    I understand that Lucene objective is more to be a generic search engine
    rather than a semantic/special IR system, but it is so close of being so
    that is it too tempting to use it as is.

    Jp


    -----Original Message-----
    From: Erik Hatcher
    Sent: Tuesday, May 03, 2005 7:40 PM
    To: [email protected]
    Subject: Re: PerFieldSimilarity

    On May 3, 2005, at 5:57 PM, Robichaud, Jean-Philippe wrote:

    Hi Everyone,

    I've been searching the archive without success to answer this one:
    is it
    possible to specify one similarity class per field, just like we
    can do with
    an analyzer ? I know I can change the similarity of the searcher,
    but that
    restrict me to break some complex queries into different chunk and
    sum the
    score "by hand" rather than having the fast internal implementation
    do the
    job. What I would really like is to have something like
    PerFieldAnalyzerWrapper but for similarity... Is this possible ?
    I'm interested in what your use case is in desiring this. What
    factors would you vary per field? The only factor that seems to make
    sense is lengthNorm which is computed at indexing time and does allow
    per-field tweaking. A custom Similarity subclass could be used to
    affect the lengthNorm based on the field name parameter.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Doug Cutting at May 4, 2005 at 8:46 pm

    Robichaud, Jean-Philippe wrote:
    Again, I can change
    the similarity of the reader at run-time and issue specific queries, summing
    the score myself, but that is pretty inefficient.
    You can also specify a Similarity implementation per Query node in a
    complex query, e.g.:

    BooleanQuery query = new BooleanQuery() {
    public Similarity getSimilarity(Searcher searcher) {
    return new DefaultSimilarity {
    ... override Similarity methods here ...
    };
    }
    }

    Doug

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Robichaud, Jean-Philippe at May 4, 2005 at 8:52 pm
    How cool, I did not knew that... that may help me... If I understand you
    correctly, I can create a boolean query where each "clause" use a different
    similarity ?

    Thanks,

    Jp

    ___________________________________________________________________________
    SpeechWorks solutions from ScanSoft. Inspired Applications, Exceptional
    Results

    <Jean-Philippe Robichaud > :: Solution Speech Scientist
    ScanSoft :: Professional Services
    5100-75 Queen Street, Montreal, QC
    P +1 514 843 4884


    -----Original Message-----
    From: Doug Cutting
    Sent: Wednesday, May 04, 2005 4:45 PM
    To: [email protected]
    Subject: Re: PerFieldSimilarity

    Robichaud, Jean-Philippe wrote:
    Again, I can change
    the similarity of the reader at run-time and issue specific queries, summing
    the score myself, but that is pretty inefficient.
    You can also specify a Similarity implementation per Query node in a
    complex query, e.g.:

    BooleanQuery query = new BooleanQuery() {
    public Similarity getSimilarity(Searcher searcher) {
    return new DefaultSimilarity {
    ... override Similarity methods here ...
    };
    }
    }

    Doug

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Doug Cutting at May 4, 2005 at 9:11 pm

    Robichaud, Jean-Philippe wrote:
    How cool, I did not knew that... that may help me... If I understand you
    correctly, I can create a boolean query where each "clause" use a different
    similarity ?
    Yes. That would look something like:

    BooleanQuery booleanQuery = new BooleanQuery();
    TermQuery clause1 = new TermQuery("foo", "bar") {
    public Similarity getSimilarity(Searcher searcher) {
    return new DefaultSimilarity() {
    public float idf(Term term) { return 1.0f; }
    };
    }
    };
    booleanQuery.add(clause1, true, false);
    ...

    Doug


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Robichaud, Jean-Philippe at May 5, 2005 at 5:09 pm
    Thanks for the clarification...

    While studying more in depth the doc about Similarity, I discover something
    that is troubling be a little. The idf is calculated using the following
    formula:

    (Log (numDocInIndex/ (numDocWithTerm_t +1)) +1

    While I agree this is fine for most application, it is not quite in mine.
    numDocWithTerm_t is really, numDocWith_t.text_in_field_t.field. That's fine
    with me, the problem is the other guy numDocInIndex... I would like to use
    numDocInIndex_having_t.field. The reason is, again, that I want the
    similarity score to be really meaningful. I have 'classes' of document in
    the same index :
    Document1: MeaningA="something here",ContentA="searchable text 1"
    Document2: MeaningB="something else",ContentB="searchable text 2"
    ...

    I have an unequal number of "A" and "B" documents. The same query text will
    be sent in contentA and contentB at the same time. Since there is more
    document in class B than in class A, the "idf" should use a different
    numDocInIndex value. Is there a good way to achieve that ?

    Thanks for all your help,

    Jp


    -----Original Message-----
    From: Doug Cutting
    Sent: Wednesday, May 04, 2005 5:10 PM
    To: [email protected]
    Subject: Re: PerFieldSimilarity

    Robichaud, Jean-Philippe wrote:
    How cool, I did not knew that... that may help me... If I understand you
    correctly, I can create a boolean query where each "clause" use a different
    similarity ?
    Yes. That would look something like:

    BooleanQuery booleanQuery = new BooleanQuery();
    TermQuery clause1 = new TermQuery("foo", "bar") {
    public Similarity getSimilarity(Searcher searcher) {
    return new DefaultSimilarity() {
    public float idf(Term term) { return 1.0f; }
    };
    }
    };
    booleanQuery.add(clause1, true, false);
    ...

    Doug


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 3, '05 at 9:58p
activeMay 5, '05 at 5:09p
posts7
users3
websitelucene.apache.org

People

Translate

site design / logo © 2023 Grokbase