Hello All,

I am trying to flush out the data importing component. So if you have any
ideas or feed back go to the wiki or respond to this email.

1. Should the Open Relevance viewer be capable of importing text and
images?
2. What are some standard formats used for corporas and their annotation
sets?

Is the objective of the Open Relevance Viewer to provide a crowd sourcing
tool that can have its data annotated and then to use the annotated data for
determining the performance of machine learning techniques/algorithms? Or,
is it to provide a generic crowd souring tool for academics, government, and
industry to annotate data with? Or am I missing the point?

Wiki Entry
https://cwiki.apache.org/confluence/display/ORP/Open+Relevance+Viewer

Thanks,
--Dan

Search Discussions

  • Grant Ingersoll at Sep 20, 2010 at 1:31 pm

    On Sep 15, 2010, at 10:24 PM, Dan Cardin wrote:

    Hello All,

    I am trying to flush out the data importing component. So if you have any
    ideas or feed back go to the wiki or respond to this email.

    1. Should the Open Relevance viewer be capable of importing text and
    images?
    2. What are some standard formats used for corporas and their annotation
    sets?
    I don't think there are any. Corpora, you can assume, are already indexed by the engine. TREC is probably the standard for judgments, but there are other ways.
    Is the objective of the Open Relevance Viewer to provide a crowd sourcing
    tool that can have its data annotated and then to use the annotated data for
    determining the performance of machine learning techniques/algorithms? Or,
    is it to provide a generic crowd souring tool for academics, government, and
    industry to annotate data with? Or am I missing the point?
    Here's my view of what we need:

    Tool that does a couple of things:

    1. User can enter queries and then judge the results (as deep as they want, but at a minimum top 10). All aspects of what they do is captured (the query, the results, the judgments)
    2. User can give a whole set of queries (i.e. the TREC ones) and provide judgments. Capture info as always
    3. System should be search engine agnostic with a well defined interface that allows people to plug in an implementation for their search engine. In other words, it should be just as easy to judge Google as it is Solr/Lucene.
    4. System should be able to give metrics on the results of both an individual user run and also, if others have done the run, inter-annotator runs. Metrics, at a minimum, are: precision, recall, MAP, and for multiple user setup, inter-annotator agreement. Potentially also mean-reciprocal rank for known-item searches (user could specify up front)
    5. Should be able to export judgments, etc. to TREC format and other formats (CSV, XML)
    6. Presumably, there should be an admin-only area which restricts access to the configuration, etc. It's likely the case that the metrics should be for admins only, since you don't want end users to be influenced by the results of others

    Longer term, if we had something to support things like HCIR and other tests, that would be great.

    For now, 1-6 is a good start, IMO.

    -Grant
  • Dan Cardin at Sep 20, 2010 at 7:09 pm
    Grant,

    Thanks for the feedback!

    I have a couple quesitons.

    1. How are annotations used for judgments obtained? Separate file,
    specifed by the user?
    2. Can you provide me with a direct link to the TREC format?


    --Dan
    On Mon, Sep 20, 2010 at 9:31 AM, Grant Ingersoll wrote:

    On Sep 15, 2010, at 10:24 PM, Dan Cardin wrote:

    Hello All,

    I am trying to flush out the data importing component. So if you have any
    ideas or feed back go to the wiki or respond to this email.

    1. Should the Open Relevance viewer be capable of importing text and
    images?
    2. What are some standard formats used for corporas and their
    annotation
    sets?
    I don't think there are any. Corpora, you can assume, are already indexed
    by the engine. TREC is probably the standard for judgments, but there are
    other ways.
    Is the objective of the Open Relevance Viewer to provide a crowd sourcing
    tool that can have its data annotated and then to use the annotated data for
    determining the performance of machine learning techniques/algorithms? Or,
    is it to provide a generic crowd souring tool for academics, government, and
    industry to annotate data with? Or am I missing the point?
    Here's my view of what we need:

    Tool that does a couple of things:

    1. User can enter queries and then judge the results (as deep as they want,
    but at a minimum top 10). All aspects of what they do is captured (the
    query, the results, the judgments)
    2. User can give a whole set of queries (i.e. the TREC ones) and provide
    judgments. Capture info as always
    3. System should be search engine agnostic with a well defined interface
    that allows people to plug in an implementation for their search engine. In
    other words, it should be just as easy to judge Google as it is Solr/Lucene.
    4. System should be able to give metrics on the results of both an
    individual user run and also, if others have done the run, inter-annotator
    runs. Metrics, at a minimum, are: precision, recall, MAP, and for multiple
    user setup, inter-annotator agreement. Potentially also mean-reciprocal
    rank for known-item searches (user could specify up front)
    5. Should be able to export judgments, etc. to TREC format and other
    formats (CSV, XML)
    6. Presumably, there should be an admin-only area which restricts access to
    the configuration, etc. It's likely the case that the metrics should be for
    admins only, since you don't want end users to be influenced by the results
    of others

    Longer term, if we had something to support things like HCIR and other
    tests, that would be great.

    For now, 1-6 is a good start, IMO.

    -Grant
  • Itamar Syn-Hershko at Sep 21, 2010 at 6:55 pm
    Addressing most of the recent discussion below...
    On 16/9/2010 4:24 AM, Dan Cardin wrote:
    1. Should the Open Relevance viewer be capable of importing text and
    images?
    Corpora, IMO, should be text only and index-ready (e.g. no special
    parsing required). This is what I assumed in Orev, as well (see below).
    Is the objective of the Open Relevance Viewer to provide a crowd sourcing
    tool that can have its data annotated and then to use the annotated data for
    determining the performance of machine learning techniques/algorithms? Or,
    is it to provide a generic crowd souring tool for academics, government, and
    industry to annotate data with? Or am I missing the point?
    This tool should be, as Grant and Mark mentioned, engine agnostic. It
    should provide those interested with tools to be able to judge
    effectiveness of different engines, and also different methods with the
    same engine.

    Hence, the most basic implementation should know to handle many corpora
    and topics for more than one (natural) language, and the crowd-sourcing
    portion of it is where a user can create judgments - e.g. view a
    document from a corpus side by side with a topic, and mark "Relevant",
    "Non-relevant" (or "Skip this").

    This banal implementation after several hundreds of human-hours will
    result in packages containing corpora, topics and judgments for several
    languages. This can then be used as basis for more sophisticated parts
    of the project, where relevance ranking of actual query results,
    TREC-like testing, MAP/MRR and user behavior tracking are just examples.
    In other words, IMHO Grant's view is a bit too far going for this stage,
    where there's still a lot of fundamental work to do.

    Robert, from the discussion we had a while ago I gathered you are
    thinking the same?

    Once such data exists in a central system, importing corpora and topics,
    and exporting them back with judgments in various formats (TREC, CLEF,
    FIRE) can be done fairly easily. We just need to make sure that system
    stores all data correctly.

    Sorry for bringing this up again, but I think I pretty much did most of
    that work already, so no need for redundant efforts. In Orev I have
    already spec'd and implemented all the above. What is missing is some
    better GUI and user management. I suggest you have a look at it and at
    its DB scheme: http://github.com/synhershko/Orev/blob/master/Orev.png
    How are annotations used for judgments obtained? Separate file, specifed by the user?
    If a tool like Orev will be used, then this data can be pulled directly
    from its DB by the actual test tools (if separate).
    Can you provide me with a direct link to the TREC format?
    http://trec.nist.gov/pubs/trec1/papers/01.txt

    But if we are not going to base data storage on the FS, there's no need
    to stick to a particular format, only when exporting judgments...

    Itamar
  • Grant Ingersoll at Sep 21, 2010 at 8:24 pm

    On Sep 21, 2010, at 2:54 PM, Itamar Syn-Hershko wrote:

    Addressing most of the recent discussion below...
    On 16/9/2010 4:24 AM, Dan Cardin wrote:
    1. Should the Open Relevance viewer be capable of importing text and
    images?
    Corpora, IMO, should be text only and index-ready (e.g. no special parsing required). This is what I assumed in Orev, as well (see below).
    I'm not sure it needs to be text/index-ready. That can mean diff. things to diff. engines. Our first requirement is that we have a corpora that has a known revision/signature so that all people are using the same raw content. What the engine chooses to do with it is up to the engine. Can we provide tools that help it be text/index ready? Of course, but that is not a requirement, in my view.
    Is the objective of the Open Relevance Viewer to provide a crowd sourcing
    tool that can have its data annotated and then to use the annotated data for
    determining the performance of machine learning techniques/algorithms? Or,
    is it to provide a generic crowd souring tool for academics, government, and
    industry to annotate data with? Or am I missing the point?
    This tool should be, as Grant and Mark mentioned, engine agnostic. It should provide those interested with tools to be able to judge effectiveness of different engines, and also different methods with the same engine.

    Hence, the most basic implementation should know to handle many corpora and topics for more than one (natural) language, and the crowd-sourcing portion of it is where a user can create judgments - e.g. view a document from a corpus side by side with a topic, and mark "Relevant", "Non-relevant" (or "Skip this").

    This banal implementation after several hundreds of human-hours will result in packages containing corpora, topics and judgments for several languages. This can then be used as basis for more sophisticated parts of the project, where relevance ranking of actual query results, TREC-like testing, MAP/MRR and user behavior tracking are just examples. In other words, IMHO Grant's view is a bit too far going for this stage, where there's still a lot of fundamental work to do.

    Robert, from the discussion we had a while ago I gathered you are thinking the same?

    Once such data exists in a central system, importing corpora and topics, and exporting them back with judgments in various formats (TREC, CLEF, FIRE) can be done fairly easily. We just need to make sure that system stores all data correctly.

    Sorry for bringing this up again, but I think I pretty much did most of that work already, so no need for redundant efforts. In Orev I have already spec'd and implemented all the above. What is missing is some better GUI and user management. I suggest you have a look at it and at its DB scheme: http://github.com/synhershko/Orev/blob/master/Orev.png
    You should supply a patch and build instructions, etc.
    How are annotations used for judgments obtained? Separate file, specifed by the user?
    If a tool like Orev will be used, then this data can be pulled directly from its DB by the actual test tools (if separate).
    Can you provide me with a direct link to the TREC format?
    http://trec.nist.gov/pubs/trec1/papers/01.txt

    But if we are not going to base data storage on the FS, there's no need to stick to a particular format, only when exporting judgments...
    Right
  • Itamar Syn-Hershko at Sep 21, 2010 at 9:07 pm

    On 21/9/2010 10:24 PM, Grant Ingersoll wrote:
    I'm not sure it needs to be text/index-ready. That can mean diff. things to diff. engines. Our first requirement is that we have a corpora that has a known revision/signature so that all people are using the same raw content. What the engine chooses to do with it is up to the engine. Can we provide tools that help it be text/index ready? Of course, but that is not a requirement, in my view.
    I hear what you say. I suggest we'll use use Orev as the starting point
    for discussions, assuming my point of view from last e-mail is
    acceptable, and build up from there.
    You should supply a patch and build instructions, etc.
    Where to? All sources are available directly from github through git or
    HTTP (tarball: http://github.com/synhershko/Orev/tarball/master). It
    isn't complete yet, so I think it better not be forked at this point.

    As far as build instructions go, you'll need a C# compiler - either a
    full blown IDE (MS DevEnv, and there is a free express version too) or a
    command line compiler (which I think comes with the .NET framework
    itself). I haven't tested on Mono, but it might work too.

    Itamar.
  • Mark Bennett at Sep 21, 2010 at 11:35 pm
    Coming in a bit late here on this topic. A few caveats:

    1: "standard" qrels format... not in practice:

    The QRels format has many variants. When we dug into things for our
    pharmaceutical client's relevancy project we encountered various formats,
    I'm not sure there is a canonical list.

    And another relevancy trial for a large tech firm used letter grades, which
    I thought actually had some benefits.

    2: Document summaries and titles vary, and this impact judgments

    When documents are presented, they often include summaries. In some setups
    the summaries come from a search engine, and each engine can present
    different summaries. This impacts the judgments of the SME's (Subject
    Matter Experts).

    Yes, ideally there would be a standard set of metadata, but in some setups
    that's not the case. Comparing auto-generated summaries is a related,
    though certainly different type of test.

    And in this case titles would be well defined. However, if two search
    engines index a set of documents that are in various PDF, Office or HTML
    formats they can somtimes have different metadata too. This is a function
    of the filters being used. I realize the initial intent here is to use TEXT
    assets, so no filtered needed.

    3: It'd be nice to have enough stats to judge the judges!

    As SME's get more familiar with the process their judgements can drift. Or
    they get bored or tired, etc. A good statistician can probably correct for
    this, if the stats are there. Ideally we'd like to see the SME repeat some
    tests or complete a standardized task. So for example are timestamps and
    cardinal positions tracked for the judges?

    4: Background info on the judges would be nice

    Maybe self-declared, that might even be fine. One category might be
    automated judge. I know this seems like the wrong direction, we want humans,
    but some trials also employ automated tools. At least knowing which is
    which would be nice.

    5: Can they intentionally re-judge? Or pickup where they left off?

    We've seen workflow issues.

    --
    Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
    Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513

    On Tue, Sep 21, 2010 at 11:54 AM, Itamar Syn-Hershko wrote:

    Addressing most of the recent discussion below...

    On 16/9/2010 4:24 AM, Dan Cardin wrote:

    1. Should the Open Relevance viewer be capable of importing text and
    images?
    Corpora, IMO, should be text only and index-ready (e.g. no special parsing
    required). This is what I assumed in Orev, as well (see below).


    Is the objective of the Open Relevance Viewer to provide a crowd sourcing
    tool that can have its data annotated and then to use the annotated data
    for
    determining the performance of machine learning techniques/algorithms? Or,
    is it to provide a generic crowd souring tool for academics, government,
    and
    industry to annotate data with? Or am I missing the point?
    This tool should be, as Grant and Mark mentioned, engine agnostic. It
    should provide those interested with tools to be able to judge effectiveness
    of different engines, and also different methods with the same engine.

    Hence, the most basic implementation should know to handle many corpora and
    topics for more than one (natural) language, and the crowd-sourcing portion
    of it is where a user can create judgments - e.g. view a document from a
    corpus side by side with a topic, and mark "Relevant", "Non-relevant" (or
    "Skip this").

    This banal implementation after several hundreds of human-hours will result
    in packages containing corpora, topics and judgments for several languages.
    This can then be used as basis for more sophisticated parts of the project,
    where relevance ranking of actual query results, TREC-like testing, MAP/MRR
    and user behavior tracking are just examples. In other words, IMHO Grant's
    view is a bit too far going for this stage, where there's still a lot of
    fundamental work to do.

    Robert, from the discussion we had a while ago I gathered you are thinking
    the same?

    Once such data exists in a central system, importing corpora and topics,
    and exporting them back with judgments in various formats (TREC, CLEF, FIRE)
    can be done fairly easily. We just need to make sure that system stores all
    data correctly.

    Sorry for bringing this up again, but I think I pretty much did most of
    that work already, so no need for redundant efforts. In Orev I have already
    spec'd and implemented all the above. What is missing is some better GUI and
    user management. I suggest you have a look at it and at its DB scheme:
    http://github.com/synhershko/Orev/blob/master/Orev.png


    How are annotations used for judgments obtained? Separate file, specifed
    by the user?
    If a tool like Orev will be used, then this data can be pulled directly
    from its DB by the actual test tools (if separate).


    Can you provide me with a direct link to the TREC format?
    http://trec.nist.gov/pubs/trec1/papers/01.txt

    But if we are not going to base data storage on the FS, there's no need to
    stick to a particular format, only when exporting judgments...

    Itamar

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupopenrelevance-dev @
categorieslucene
postedSep 16, '10 at 2:32a
activeSep 21, '10 at 11:35p
posts7
users4
websitelucene.apache.org...

People

Translate

site design / logo © 2018 Grokbase