Hi there



Long time user (and reader), first time poster J



Just wondering if anyone knows of a way to bubble up (ie, increase the
score on) items which are newer - either via putting a date field in the
document, or some kind of timestamp / tick count.



I have found some references to doing it in the Java version, but I
can't find many of the the classes (FieldScoreQuery, CustomScoreQuery)
in the .NET version - I assume they are new in the Java version. I
looked into replacing the Query, Weight, Scorer set, modelling it off
the Lucene.Net.Search.Spans stuff.... But nothing so far.



Am I even looking in the right place? Is it even possible?



I'm after something with a long, flat tail (so I guess I'm going to have
to write something custom regardless) - eg stuff which is 1 day old gets
a boost of 10, 1 week old is 5, 1 month is 2, over three months is 0 etc
(with a floating scale on those - think "the long tail" diagram
(http://en.wikipedia.org/wiki/The_Long_Tail), but have it hit 0 when it
flattens out.)



Anyone done this?



Thanks heaps.



Nic Wise

Lead .NET Developer, TopGear.com redevelopment.

BBC Worldwide.
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.

Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this

This e-mail has been sent by one of the following wholly-owned subsidiaries of the BBC:

BBC Worldwide, Registration Number: 1420028 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
BBC World, Registration Number: 04514407 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
BBC World Distribution Limited, Registration Number: 04514408, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT

Search Discussions

  • DIGY at Jan 15, 2008 at 8:23 pm
    Hi Nic,

    CustomScoreQuery or FieldScoreQuery is introduced with Lucene 2.2.
    If you don't want to wait for Lucene.Net 2.2, writing a custom sort function
    and sorting the returned results -for ex.,using the "timestamp" field and
    the Score() method of Hits class- may be a quick solution.

    DIGY

    -----Original Message-----
    From: Nic Wise
    Sent: Tuesday, January 15, 2008 7:46 PM
    To: lucene-net-user@incubator.apache.org
    Subject: Bubbling up "newer" records

    Hi there



    Long time user (and reader), first time poster J



    Just wondering if anyone knows of a way to bubble up (ie, increase the
    score on) items which are newer - either via putting a date field in the
    document, or some kind of timestamp / tick count.



    I have found some references to doing it in the Java version, but I
    can't find many of the the classes (FieldScoreQuery, CustomScoreQuery)
    in the .NET version - I assume they are new in the Java version. I
    looked into replacing the Query, Weight, Scorer set, modelling it off
    the Lucene.Net.Search.Spans stuff.... But nothing so far.



    Am I even looking in the right place? Is it even possible?



    I'm after something with a long, flat tail (so I guess I'm going to have
    to write something custom regardless) - eg stuff which is 1 day old gets
    a boost of 10, 1 week old is 5, 1 month is 2, over three months is 0 etc
    (with a floating scale on those - think "the long tail" diagram
    (http://en.wikipedia.org/wiki/The_Long_Tail), but have it hit 0 when it
    flattens out.)



    Anyone done this?



    Thanks heaps.



    Nic Wise

    Lead .NET Developer, TopGear.com redevelopment.

    BBC Worldwide.
    This e-mail (and any attachments) is confidential and may contain personal
    views which are not the views of the BBC unless specifically stated. If you
    have received it in error, please delete it from your system. Do not use,
    copy or disclose the information in any way nor act in reliance on it and
    notify the sender immediately.

    Please note that the BBC monitors e-mails sent or received. Further
    communication will signify your consent to this

    This e-mail has been sent by one of the following wholly-owned subsidiaries
    of the BBC:

    BBC Worldwide, Registration Number: 1420028 England, Registered Address:
    Woodlands, 80 Wood Lane, London W12 0TT
    BBC World, Registration Number: 04514407 England, Registered Address:
    Woodlands, 80 Wood Lane, London W12 0TT
    BBC World Distribution Limited, Registration Number: 04514408, Registered
    Address: Woodlands, 80 Wood Lane, London W12 0TT
  • Nic Wise at Jan 16, 2008 at 11:43 am
    Thanks! Do you have a URL or some sample code for how to write a custom
    sort function? I am wanting it to influence the results (as a boost
    does), not really sort by this one field.

    Eg, if I start out with a relevance of 0.1 and 0.9, if the 0.1 one was
    put in today, it may end up being 0.7, and if the 0.9 one was 3 months
    old, it may be down-graded to 0.72 - so it's not just a pure sort....

    I tried a quick google (and will continue once I have this new laptop
    built up), but couldn't find much. Is there a snippit somewhere?

    Thanks heaps!

    -----Original Message-----
    From: DIGY
    Sent: 15 January 2008 20:18
    To: lucene-net-user@incubator.apache.org
    Subject: RE: Bubbling up "newer" records

    Hi Nic,

    CustomScoreQuery or FieldScoreQuery is introduced with Lucene 2.2.
    If you don't want to wait for Lucene.Net 2.2, writing a custom sort
    function
    and sorting the returned results -for ex.,using the "timestamp" field
    and
    the Score() method of Hits class- may be a quick solution.

    DIGY

    -----Original Message-----
    From: Nic Wise
    Sent: Tuesday, January 15, 2008 7:46 PM
    To: lucene-net-user@incubator.apache.org
    Subject: Bubbling up "newer" records

    Hi there



    Long time user (and reader), first time poster J



    Just wondering if anyone knows of a way to bubble up (ie, increase the
    score on) items which are newer - either via putting a date field in the
    document, or some kind of timestamp / tick count.



    I have found some references to doing it in the Java version, but I
    can't find many of the the classes (FieldScoreQuery, CustomScoreQuery)
    in the .NET version - I assume they are new in the Java version. I
    looked into replacing the Query, Weight, Scorer set, modelling it off
    the Lucene.Net.Search.Spans stuff.... But nothing so far.



    Am I even looking in the right place? Is it even possible?



    I'm after something with a long, flat tail (so I guess I'm going to have
    to write something custom regardless) - eg stuff which is 1 day old gets
    a boost of 10, 1 week old is 5, 1 month is 2, over three months is 0 etc
    (with a floating scale on those - think "the long tail" diagram
    (http://en.wikipedia.org/wiki/The_Long_Tail), but have it hit 0 when it
    flattens out.)



    Anyone done this?



    Thanks heaps.



    Nic Wise

    Lead .NET Developer, TopGear.com redevelopment.

    BBC Worldwide.
    This e-mail (and any attachments) is confidential and may contain
    personal
    views which are not the views of the BBC unless specifically stated. If
    you
    have received it in error, please delete it from your system. Do not
    use,
    copy or disclose the information in any way nor act in reliance on it
    and
    notify the sender immediately.

    Please note that the BBC monitors e-mails sent or received. Further
    communication will signify your consent to this

    This e-mail has been sent by one of the following wholly-owned
    subsidiaries
    of the BBC:

    BBC Worldwide, Registration Number: 1420028 England, Registered Address:
    Woodlands, 80 Wood Lane, London W12 0TT
    BBC World, Registration Number: 04514407 England, Registered Address:
    Woodlands, 80 Wood Lane, London W12 0TT
    BBC World Distribution Limited, Registration Number: 04514408,
    Registered
    Address: Woodlands, 80 Wood Lane, London W12 0TT
    This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.

    Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this

    This e-mail has been sent by one of the following wholly-owned subsidiaries of the BBC:

    BBC Worldwide, Registration Number: 1420028 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
    BBC World, Registration Number: 04514407 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
    BBC World Distribution Limited, Registration Number: 04514408, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
  • DIGY at Jan 16, 2008 at 5:51 pm
    Hi Nic,

    What I meant was something like below

    DIGY


    public
    System.Collections.Generic.List<Lucene.Net.Documents.Document>
    Sort(Lucene.Net.Search.Hits Hits)
    {
    System.Collections.Generic.List<Lucene.Net.Documents.Document>
    retList = new List<Lucene.Net.Documents.Document>();
    System.Collections.Generic.List<float> scores = new
    List<float>();

    for (int i = 0; i < Hits.Length(); i++)
    {
    scores.Add(Hits.Score(i));
    retList.Add(Hits.Doc(i));
    }

    //BUBBLE SORT Q(n)=n*n
    //This is the one of the worst sorting algorithms you can ever
    find. Replace it with an inteligent one.
    bool anotherPassNeeded = true;
    while (anotherPassNeeded)
    {
    anotherPassNeeded = false;
    for (int i = 0; i < retList.Count-1; i++)
    {
    if(

    CalcScore(retList[i].GetField("field2").StringValue() , scores[i] ) <

    CalcScore(retList[i+1].GetField("field2").StringValue() , scores[i+1]) )
    {
    anotherPassNeeded = true;

    float fTemp = scores[i];
    scores[i] = scores[i + 1];
    scores[i + 1] = fTemp;

    Lucene.Net.Documents.Document doc = retList[i];
    retList[i] = retList[i + 1];
    retList[i + 1] = doc;
    }
    }
    }
    return retList;
    }


    //Your custom Score Function
    float CalcScore(string FieldValue, float OriginalScore)
    {
    char lastChar = FieldValue[FieldValue.Length - 1];
    return OriginalScore * lastChar;
    }

    -----Original Message-----
    From: Nic Wise
    Sent: Wednesday, January 16, 2008 1:43 PM
    To: lucene-net-user@incubator.apache.org
    Subject: RE: Bubbling up "newer" records

    Thanks! Do you have a URL or some sample code for how to write a custom
    sort function? I am wanting it to influence the results (as a boost
    does), not really sort by this one field.

    Eg, if I start out with a relevance of 0.1 and 0.9, if the 0.1 one was
    put in today, it may end up being 0.7, and if the 0.9 one was 3 months
    old, it may be down-graded to 0.72 - so it's not just a pure sort....

    I tried a quick google (and will continue once I have this new laptop
    built up), but couldn't find much. Is there a snippit somewhere?

    Thanks heaps!

    -----Original Message-----
    From: DIGY
    Sent: 15 January 2008 20:18
    To: lucene-net-user@incubator.apache.org
    Subject: RE: Bubbling up "newer" records

    Hi Nic,

    CustomScoreQuery or FieldScoreQuery is introduced with Lucene 2.2.
    If you don't want to wait for Lucene.Net 2.2, writing a custom sort
    function
    and sorting the returned results -for ex.,using the "timestamp" field
    and
    the Score() method of Hits class- may be a quick solution.

    DIGY

    -----Original Message-----
    From: Nic Wise
    Sent: Tuesday, January 15, 2008 7:46 PM
    To: lucene-net-user@incubator.apache.org
    Subject: Bubbling up "newer" records

    Hi there



    Long time user (and reader), first time poster J



    Just wondering if anyone knows of a way to bubble up (ie, increase the
    score on) items which are newer - either via putting a date field in the
    document, or some kind of timestamp / tick count.



    I have found some references to doing it in the Java version, but I
    can't find many of the the classes (FieldScoreQuery, CustomScoreQuery)
    in the .NET version - I assume they are new in the Java version. I
    looked into replacing the Query, Weight, Scorer set, modelling it off
    the Lucene.Net.Search.Spans stuff.... But nothing so far.



    Am I even looking in the right place? Is it even possible?



    I'm after something with a long, flat tail (so I guess I'm going to have
    to write something custom regardless) - eg stuff which is 1 day old gets
    a boost of 10, 1 week old is 5, 1 month is 2, over three months is 0 etc
    (with a floating scale on those - think "the long tail" diagram
    (http://en.wikipedia.org/wiki/The_Long_Tail), but have it hit 0 when it
    flattens out.)



    Anyone done this?



    Thanks heaps.



    Nic Wise

    Lead .NET Developer, TopGear.com redevelopment.

    BBC Worldwide.
    This e-mail (and any attachments) is confidential and may contain
    personal
    views which are not the views of the BBC unless specifically stated. If
    you
    have received it in error, please delete it from your system. Do not
    use,
    copy or disclose the information in any way nor act in reliance on it
    and
    notify the sender immediately.

    Please note that the BBC monitors e-mails sent or received. Further
    communication will signify your consent to this

    This e-mail has been sent by one of the following wholly-owned
    subsidiaries
    of the BBC:

    BBC Worldwide, Registration Number: 1420028 England, Registered Address:
    Woodlands, 80 Wood Lane, London W12 0TT
    BBC World, Registration Number: 04514407 England, Registered Address:
    Woodlands, 80 Wood Lane, London W12 0TT
    BBC World Distribution Limited, Registration Number: 04514408,
    Registered
    Address: Woodlands, 80 Wood Lane, London W12 0TT
    This e-mail (and any attachments) is confidential and may contain personal
    views which are not the views of the BBC unless specifically stated. If you
    have received it in error, please delete it from your system. Do not use,
    copy or disclose the information in any way nor act in reliance on it and
    notify the sender immediately.

    Please note that the BBC monitors e-mails sent or received. Further
    communication will signify your consent to this

    This e-mail has been sent by one of the following wholly-owned subsidiaries
    of the BBC:

    BBC Worldwide, Registration Number: 1420028 England, Registered Address:
    Woodlands, 80 Wood Lane, London W12 0TT
    BBC World, Registration Number: 04514407 England, Registered Address:
    Woodlands, 80 Wood Lane, London W12 0TT
    BBC World Distribution Limited, Registration Number: 04514408, Registered
    Address: Woodlands, 80 Wood Lane, London W12 0TT
  • Nic Wise at Jan 23, 2008 at 10:20 am
    Hi everyone

    I did pretty much what you recommended, DIGY. Thanks - I didn't think of
    loading it all into a collection and sorting it, too used to that being
    expensive to do (which it is from a DB, but not from Lucene it appears!)

    We used a linier (sp?) gradient from the boost. I'd prefer to use a
    curve, but this works, and it's easy.

    I've appended my code below, incase it's of use for someone else. Some
    notes from it:

    We deal with an IMetadataDocument, which is really just a wrapper around
    the Lucene document's fields, with the score added.

    There are a few things (eg CurrentTicks) which are inited twice. This is
    because we call them from unit tests.

    CalcScore can return the score * [2..0.5]

    Hope it's of help to someone!


    Thanks!

    Nic


    /// <summary>
    /// Perform a search against the index.
    /// </summary>
    /// <param name="searchTerms"></param>
    /// <returns></returns>
    private IList<IMetadataDocument> PerformSearch(string
    searchTerms)
    {
    IndexReader reader = GetIndexReader();

    Query query = queryParser.Parse(searchTerms);

    Hits hitsFound = searcher.Search(query);

    IList<IMetadataDocument> sortedDocuments =
    BubbleUpMoreCurrentDocuments(hitsFound);

    return sortedDocuments;
    }


    private long CurrentTicks = DateTime.Now.Ticks;
    private long TicksSixMonthsAgo =
    DateTime.Now.AddMonths(-6).Ticks;

    private IList<IMetadataDocument>
    BubbleUpMoreCurrentDocuments(Hits hitsFound)
    {
    CurrentTicks = DateTime.Now.Ticks;
    TicksSixMonthsAgo = DateTime.Now.AddMonths(-6).Ticks;


    List<IMetadataDocument> docs = new
    List<IMetadataDocument>();

    for (int i = 0; i < hitsFound.Length(); i++)
    {
    docs.Add(InterfaceFromLuceneDocument(hitsFound.Doc(i),
    CalcScore(hitsFound.Doc(i).Get("datecreated"), hitsFound.Score(i))));
    }

    docs.Sort(CompareDocuments);

    return docs;
    }

    private static int CompareDocuments(IMetadataDocument x,
    IMetadataDocument y)
    {

    // note negative values - we want it smalled to biggest

    if (x == null)
    {
    if (y == null)
    {
    // If x is null and y is null, they're
    // equal.
    return 0;
    }
    else
    {
    // If x is null and y is not null, y
    // is greater.
    return 1;
    }
    }
    else
    {
    // If x is not null...
    //
    if (y == null)
    // ...and y is null, x is greater.
    {
    return -1;
    }
    else
    {

    return -x.SearchScore.CompareTo(y.SearchScore);

    }
    }
    }



    //Your custom Score Function
    public float CalcScore(string FieldValue, float OriginalScore)
    {
    long fieldTicks = Int64.Parse(FieldValue);

    float scoreModifier = 0;
    float MinScoreModifier = 0.5f;
    float MaxScoreModifier = 2;


    if (fieldTicks < TicksSixMonthsAgo)
    {
    scoreModifier = MinScoreModifier;
    } else if (fieldTicks > CurrentTicks)
    {
    scoreModifier = MaxScoreModifier;
    } else
    {
    long fieldTickOffset = CurrentTicks - fieldTicks;
    long tickRange = CurrentTicks - TicksSixMonthsAgo;

    // General formula: gradient * x + offset
    // gradient = - (range of score (which is 2 to 0.5, so
    1.5) / range of ticks ( which is 0..SixMonths)
    // x is the field tick offset - how far back our value
    is from the current time (or 0 on the X axis)
    // offset is the maximum we can go

    scoreModifier = (-((
    (MaxScoreModifier-MinScoreModifier)/tickRange)*fieldTickOffset)) +
    MaxScoreModifier;


    }

    // if we had 0.5 as the original, things close to now return
    close to 2
    // and thigns close (or more than) 6 months old return close
    to 0.25
    return scoreModifier * OriginalScore;

    }




    -----Original Message-----
    From: DIGY
    Sent: 16 January 2008 17:47
    To: lucene-net-user@incubator.apache.org
    Subject: RE: Bubbling up "newer" records

    Hi Nic,

    What I meant was something like below

    DIGY

    <snip>
    This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.

    Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this

    This e-mail has been sent by one of the following wholly-owned subsidiaries of the BBC:

    BBC Worldwide, Registration Number: 1420028 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
    BBC World, Registration Number: 04514407 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
    BBC World Distribution Limited, Registration Number: 04514408, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
  • Michael Garski at Jan 16, 2008 at 6:15 pm
    Nic,

    You could also accomplish this in a hit collector - reading the value of a stored field and adjusting the score as necessary. We take that approach here for a few searches. If you inherit from TopDocCollector you can modify the score before collecting the hit and it will sort the results for you.

    Michael

    -----Original Message-----
    From: Nic Wise
    Sent: Wednesday, January 16, 2008 3:43 AM
    To: lucene-net-user@incubator.apache.org
    Subject: RE: Bubbling up "newer" records

    Thanks! Do you have a URL or some sample code for how to write a custom
    sort function? I am wanting it to influence the results (as a boost
    does), not really sort by this one field.

    Eg, if I start out with a relevance of 0.1 and 0.9, if the 0.1 one was
    put in today, it may end up being 0.7, and if the 0.9 one was 3 months
    old, it may be down-graded to 0.72 - so it's not just a pure sort....

    I tried a quick google (and will continue once I have this new laptop
    built up), but couldn't find much. Is there a snippit somewhere?

    Thanks heaps!

    -----Original Message-----
    From: DIGY
    Sent: 15 January 2008 20:18
    To: lucene-net-user@incubator.apache.org
    Subject: RE: Bubbling up "newer" records

    Hi Nic,

    CustomScoreQuery or FieldScoreQuery is introduced with Lucene 2.2.
    If you don't want to wait for Lucene.Net 2.2, writing a custom sort
    function
    and sorting the returned results -for ex.,using the "timestamp" field
    and
    the Score() method of Hits class- may be a quick solution.

    DIGY

    -----Original Message-----
    From: Nic Wise
    Sent: Tuesday, January 15, 2008 7:46 PM
    To: lucene-net-user@incubator.apache.org
    Subject: Bubbling up "newer" records

    Hi there



    Long time user (and reader), first time poster J



    Just wondering if anyone knows of a way to bubble up (ie, increase the
    score on) items which are newer - either via putting a date field in the
    document, or some kind of timestamp / tick count.



    I have found some references to doing it in the Java version, but I
    can't find many of the the classes (FieldScoreQuery, CustomScoreQuery)
    in the .NET version - I assume they are new in the Java version. I
    looked into replacing the Query, Weight, Scorer set, modelling it off
    the Lucene.Net.Search.Spans stuff.... But nothing so far.



    Am I even looking in the right place? Is it even possible?



    I'm after something with a long, flat tail (so I guess I'm going to have
    to write something custom regardless) - eg stuff which is 1 day old gets
    a boost of 10, 1 week old is 5, 1 month is 2, over three months is 0 etc
    (with a floating scale on those - think "the long tail" diagram
    (http://en.wikipedia.org/wiki/The_Long_Tail), but have it hit 0 when it
    flattens out.)



    Anyone done this?



    Thanks heaps.



    Nic Wise

    Lead .NET Developer, TopGear.com redevelopment.

    BBC Worldwide.
    This e-mail (and any attachments) is confidential and may contain
    personal
    views which are not the views of the BBC unless specifically stated. If
    you
    have received it in error, please delete it from your system. Do not
    use,
    copy or disclose the information in any way nor act in reliance on it
    and
    notify the sender immediately.

    Please note that the BBC monitors e-mails sent or received. Further
    communication will signify your consent to this

    This e-mail has been sent by one of the following wholly-owned
    subsidiaries
    of the BBC:

    BBC Worldwide, Registration Number: 1420028 England, Registered Address:
    Woodlands, 80 Wood Lane, London W12 0TT
    BBC World, Registration Number: 04514407 England, Registered Address:
    Woodlands, 80 Wood Lane, London W12 0TT
    BBC World Distribution Limited, Registration Number: 04514408,
    Registered
    Address: Woodlands, 80 Wood Lane, London W12 0TT
    This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.

    Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this

    This e-mail has been sent by one of the following wholly-owned subsidiaries of the BBC:

    BBC Worldwide, Registration Number: 1420028 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
    BBC World, Registration Number: 04514407 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
    BBC World Distribution Limited, Registration Number: 04514408, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouplucene-net-user @
categorieslucene
postedJan 15, '08 at 5:46p
activeJan 23, '08 at 10:20a
posts6
users3
websitelucene.apache.org

3 users in discussion

Nic Wise: 3 posts DIGY: 2 posts Michael Garski: 1 post

People

Translate

site design / logo © 2022 Grokbase