FAQ
Hello,

I ran into some very strange behavior by Lucene 1.9. Boost factor under 1.3
does not effect the result score! I wrote a simple test to isolate the
issue:

Writing test index
Creating 3 documents with same KEY and boosts of default, 1.1, 1.2, and 1.3

public static void writeTestIndex() throws IOException {

// opening index writer
IndexWriter writer = null;
writer = new IndexWriter("C:\\a_temp", new StandardAnalyzer(), true);

Document currentDocument = null;

// creating and adding document with DEFAULT boost
currentDocument = new Document();
currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
Field.Index.UN_TOKENIZED));
currentDocument.add(new Field("BOOST_FACTOR", "1", Field.Store.YES,
Field.Index.UN_TOKENIZED));
writer.addDocument(currentDocument);

// creating and adding document with 1.1 boost
currentDocument = new Document();
currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
Field.Index.UN_TOKENIZED));
currentDocument.add( new Field("BOOST_FACTOR", "1.1", Field.Store.YES,
Field.Index.UN_TOKENIZED));
currentDocument.setBoost((float)1.1);
writer.addDocument(currentDocument);

// creating and adding document with 1.2 boost
currentDocument = new Document();
currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
Field.Index.UN_TOKENIZED));
currentDocument.add( new Field("BOOST_FACTOR", "1.2", Field.Store.YES,
Field.Index.UN_TOKENIZED));
currentDocument.setBoost((float)1.2);
writer.addDocument(currentDocument);

// creating and adding document with 1.3 boost
currentDocument = new Document();
currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
Field.Index.UN_TOKENIZED));
currentDocument.add(new Field("BOOST_FACTOR", "1.3", Field.Store.YES,
Field.Index.UN_TOKENIZED));
currentDocument.setBoost((float)1.3);
writer.addDocument(currentDocument);

// optimizing and closing IndexWriter
writer.optimize();
writer.close();
}


Test Search
Searching for the KEY value, which is the same in all 4 documents

public static void testIndex() throws IOException {

// opening IndexSearcher
IndexSearcher searcher = null;
searcher = new IndexSearcher("C:\\a_temp");

// searching for KEY
Hits hits = searcher.search(new TermQuery(new Term("KEY", "AA")));

// listing documents and their BOOST_FACTOR field
Document doc = null;
if (null != hits) {
logger.debug("Listing results: ");
for (int i = 0; i < hits.length(); i++) {
doc = hits.doc(i);
logger.debug("BOOST_FACTOR field: " + doc.get("BOOST_FACTOR") + " Score:
" + hits.score(i));
}
}

// closing IndexSearcher
searcher.close();
}

Output

BOOST_FACTOR field: 1.3 Score: 0.9710705
BOOST_FACTOR field: 1 Score: 0.7768564
BOOST_FACTOR field: 1.1 Score: 0.7768564
BOOST_FACTOR field: 1.2 Score: 0.7768564

Boost of 1.1 and 1.2 did not effect score for the last 2 documents!
Document with boost of 1.3 jumped to the top, but the rest were returned in
the order they were added to the index.

What am I missing here? I thought document score would reflect all levels
of boost, not just 1.3 and above? Please help.
--
View this message in context: http://www.nabble.com/Boosting-Documents-and-score-calculation-tf2159899.html#a5968287
Sent from the Lucene - Java Users forum at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Search Discussions

  • Chris Hostetter at Aug 24, 2006 at 11:04 pm
    First off, when trying to make sense of socres you should allways use
    either HitCollector or one of the TopDocs methods of the Searcher
    interface -- otherwise the "normalize if greater then 1" logic of the Hits
    class might confuse you.

    Second: Searcher.explain(Query,int) is your friend ... it will help you
    understand exactly where your scores are coming from

    Third: index time document boosts are folded into the "norm" value for
    that field (along with any index time field boosts and the length norm)
    ... these norms are "encoded" as a single byte, which can result in a loss
    of precision, so it wouldn't be too suprising if boosts of 1.0, 1.1,
    and 1.2 all encoded as the same value. (you can use
    Similarity.decodeNorm(Similarity.encodeNorm(some_float)) to see exactly
    how much precision is lost for any given float value.



    : Date: Thu, 24 Aug 2006 10:06:35 -0700 (PDT)
    : From: AlexeyG <[email protected]>
    : Reply-To: [email protected]
    : To: [email protected]
    : Subject: Boosting Documents and score calculation
    :
    :
    : Hello,
    :
    : I ran into some very strange behavior by Lucene 1.9. Boost factor under 1.3
    : does not effect the result score! I wrote a simple test to isolate the
    : issue:
    :
    : Writing test index
    : Creating 3 documents with same KEY and boosts of default, 1.1, 1.2, and 1.3
    :
    : public static void writeTestIndex() throws IOException {
    :
    : // opening index writer
    : IndexWriter writer = null;
    : writer = new IndexWriter("C:\\a_temp", new StandardAnalyzer(), true);
    :
    : Document currentDocument = null;
    :
    : // creating and adding document with DEFAULT boost
    : currentDocument = new Document();
    : currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
    : Field.Index.UN_TOKENIZED));
    : currentDocument.add(new Field("BOOST_FACTOR", "1", Field.Store.YES,
    : Field.Index.UN_TOKENIZED));
    : writer.addDocument(currentDocument);
    :
    : // creating and adding document with 1.1 boost
    : currentDocument = new Document();
    : currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
    : Field.Index.UN_TOKENIZED));
    : currentDocument.add( new Field("BOOST_FACTOR", "1.1", Field.Store.YES,
    : Field.Index.UN_TOKENIZED));
    : currentDocument.setBoost((float)1.1);
    : writer.addDocument(currentDocument);
    :
    : // creating and adding document with 1.2 boost
    : currentDocument = new Document();
    : currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
    : Field.Index.UN_TOKENIZED));
    : currentDocument.add( new Field("BOOST_FACTOR", "1.2", Field.Store.YES,
    : Field.Index.UN_TOKENIZED));
    : currentDocument.setBoost((float)1.2);
    : writer.addDocument(currentDocument);
    :
    : // creating and adding document with 1.3 boost
    : currentDocument = new Document();
    : currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
    : Field.Index.UN_TOKENIZED));
    : currentDocument.add(new Field("BOOST_FACTOR", "1.3", Field.Store.YES,
    : Field.Index.UN_TOKENIZED));
    : currentDocument.setBoost((float)1.3);
    : writer.addDocument(currentDocument);
    :
    : // optimizing and closing IndexWriter
    : writer.optimize();
    : writer.close();
    : }
    :
    :
    : Test Search
    : Searching for the KEY value, which is the same in all 4 documents
    :
    : public static void testIndex() throws IOException {
    :
    : // opening IndexSearcher
    : IndexSearcher searcher = null;
    : searcher = new IndexSearcher("C:\\a_temp");
    :
    : // searching for KEY
    : Hits hits = searcher.search(new TermQuery(new Term("KEY", "AA")));
    :
    : // listing documents and their BOOST_FACTOR field
    : Document doc = null;
    : if (null != hits) {
    : logger.debug("Listing results: ");
    : for (int i = 0; i < hits.length(); i++) {
    : doc = hits.doc(i);
    : logger.debug("BOOST_FACTOR field: " + doc.get("BOOST_FACTOR") + " Score:
    : " + hits.score(i));
    : }
    : }
    :
    : // closing IndexSearcher
    : searcher.close();
    : }
    :
    : Output
    :
    : BOOST_FACTOR field: 1.3 Score: 0.9710705
    : BOOST_FACTOR field: 1 Score: 0.7768564
    : BOOST_FACTOR field: 1.1 Score: 0.7768564
    : BOOST_FACTOR field: 1.2 Score: 0.7768564
    :
    : Boost of 1.1 and 1.2 did not effect score for the last 2 documents!
    : Document with boost of 1.3 jumped to the top, but the rest were returned in
    : the order they were added to the index.
    :
    : What am I missing here? I thought document score would reflect all levels
    : of boost, not just 1.3 and above? Please help.
    : --
    : View this message in context: http://www.nabble.com/Boosting-Documents-and-score-calculation-tf2159899.html#a5968287
    : Sent from the Lucene - Java Users forum at Nabble.com.
    :
    :
    : ---------------------------------------------------------------------
    : To unsubscribe, e-mail: [email protected]
    : For additional commands, e-mail: [email protected]
    :



    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedAug 24, '06 at 5:07p
activeAug 24, '06 at 11:04p
posts2
users2
websitelucene.apache.org

2 users in discussion

AlexeyG: 1 post Chris Hostetter: 1 post

People

Translate

site design / logo © 2023 Grokbase