Hello Everyone,

The open relevance viewer requires a pluggable engine system. I think Lucene
would be a great start. I am very new to the field of IR, Lucene and Solr
(cutting my teeth). My thought process is I will write a plugin that submits
a query to an instance of lucene/solr somewhere. The results from the
lucene/solr instance will allow an end user to rank the topics as
"Relevant", "Not Relevant", or "Skip This". The recall and precision will be
recorded (plus additional metrics and end-user settings). Does anyone have a
public archive I can pull from? I previously read someone had indexed part
of the ASF email archive, if so can I use it as a base to begin work?

I am open to suggestions, questions or redirection.

Thanks,
--Dan

Search Discussions

  • Omar Alonso at Oct 5, 2010 at 5:21 pm
    Binary relevance assessments (yes/no) was done in the early days. Now, most of the experiments are using some sort of "graded" relevance:

    - Relevant, somewhat relevant, not relevant
    - Likert scale: strongly disagree, disagree, neither agree nor disagree, agree, strongly agree
    - Numeric range: 1 to 5 (1=irrelevant, 5=excellent). 1 to 10 works as well.

    I personally like the numeric range and not so much the labels. Labels can be very confusing.

    o.

    --- On Wed, 9/29/10, Dan Cardin wrote:
    From: Dan Cardin <dcardin2007@gmail.com>
    Subject: Open Revelance Technical Design Question
    To: "openrelevance-dev" <openrelevance-dev@lucene.apache.org>
    Date: Wednesday, September 29, 2010, 7:40 AM
    Hello Everyone,

    The open relevance viewer requires a pluggable engine
    system. I think Lucene
    would be a great start. I am very new to the field of IR,
    Lucene and Solr
    (cutting my teeth). My thought process is I will write a
    plugin that submits
    a query to an instance of lucene/solr somewhere. The
    results from the
    lucene/solr instance will allow an end user to rank the
    topics as
    "Relevant", "Not Relevant", or "Skip This". The recall and
    precision will be
    recorded (plus additional metrics and end-user settings).
    Does anyone have a
    public archive I can pull from? I previously read someone
    had indexed part
    of the ASF email archive, if so can I use it as a base to
    begin work?

    I am open to suggestions, questions or redirection.

    Thanks,
    --Dan
  • Dan Cardin at Oct 7, 2010 at 12:42 pm
    Hello Omar,

    Thank you for the feedback.

    --Dan
    On Tue, Oct 5, 2010 at 1:20 PM, Omar Alonso wrote:

    Binary relevance assessments (yes/no) was done in the early days. Now, most
    of the experiments are using some sort of "graded" relevance:

    - Relevant, somewhat relevant, not relevant
    - Likert scale: strongly disagree, disagree, neither agree nor disagree,
    agree, strongly agree
    - Numeric range: 1 to 5 (1=irrelevant, 5=excellent). 1 to 10 works as well.

    I personally like the numeric range and not so much the labels. Labels can
    be very confusing.

    o.

    --- On Wed, 9/29/10, Dan Cardin wrote:
    From: Dan Cardin <dcardin2007@gmail.com>
    Subject: Open Revelance Technical Design Question
    To: "openrelevance-dev" <openrelevance-dev@lucene.apache.org>
    Date: Wednesday, September 29, 2010, 7:40 AM
    Hello Everyone,

    The open relevance viewer requires a pluggable engine
    system. I think Lucene
    would be a great start. I am very new to the field of IR,
    Lucene and Solr
    (cutting my teeth). My thought process is I will write a
    plugin that submits
    a query to an instance of lucene/solr somewhere. The
    results from the
    lucene/solr instance will allow an end user to rank the
    topics as
    "Relevant", "Not Relevant", or "Skip This". The recall and
    precision will be
    recorded (plus additional metrics and end-user settings).
    Does anyone have a
    public archive I can pull from? I previously read someone
    had indexed part
    of the ASF email archive, if so can I use it as a base to
    begin work?

    I am open to suggestions, questions or redirection.

    Thanks,
    --Dan

  • Grant Ingersoll at Oct 11, 2010 at 12:09 pm

    On Oct 5, 2010, at 1:20 PM, Omar Alonso wrote:

    Binary relevance assessments (yes/no) was done in the early days. Now, most of the experiments are using some sort of "graded" relevance:

    - Relevant, somewhat relevant, not relevant
    - Likert scale: strongly disagree, disagree, neither agree nor disagree, agree, strongly agree
    - Numeric range: 1 to 5 (1=irrelevant, 5=excellent). 1 to 10 works as well.

    I personally like the numeric range and not so much the labels. Labels can be very confusing.
    You still end up w/ labels, as you need to say what 1 means and what 5 means...

    Longer term, I think it would be useful to be able to support the various different judgments. I'd also add that sometimes it is useful to simply say whether the current set of results (say top ten) is relevant or not. This allows for very quick judgments at the cost of some granularity.

    -Grant
  • Dan Cardin at Oct 11, 2010 at 12:23 pm
    Grant,

    Thanks for the input on scoring the result set returned as a whole.

    Would it be best to to have the following scoring system
    0 - Not relevant
    0 > X < 1 Some relevance. The granularity labels would be settable via
    configuration and could be optional.
    1 - Relevant

    Cheers,
    --Dan

    On Mon, Oct 11, 2010 at 8:08 AM, Grant Ingersoll wrote:

    On Oct 5, 2010, at 1:20 PM, Omar Alonso wrote:

    Binary relevance assessments (yes/no) was done in the early days. Now,
    most of the experiments are using some sort of "graded" relevance:
    - Relevant, somewhat relevant, not relevant
    - Likert scale: strongly disagree, disagree, neither agree nor disagree,
    agree, strongly agree
    - Numeric range: 1 to 5 (1=irrelevant, 5=excellent). 1 to 10 works as well.
    I personally like the numeric range and not so much the labels. Labels
    can be very confusing.

    You still end up w/ labels, as you need to say what 1 means and what 5
    means...

    Longer term, I think it would be useful to be able to support the various
    different judgments. I'd also add that sometimes it is useful to simply say
    whether the current set of results (say top ten) is relevant or not. This
    allows for very quick judgments at the cost of some granularity.

    -Grant

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupopenrelevance-dev @
categorieslucene
postedSep 29, '10 at 2:41p
activeOct 11, '10 at 12:23p
posts5
users3
websitelucene.apache.org...

People

Translate

site design / logo © 2018 Grokbase