FAQ
OK, so how do we get this started? Seems like there are a lot of
collections out there we could use. Also, we can crawl. Seems the
tricky part is getting judgments.

Thoughts?

-Grant

Search Discussions

  • Andrzej Bialecki at Jul 31, 2009 at 11:24 pm

    Grant Ingersoll wrote:
    OK, so how do we get this started? Seems like there are a lot of
    collections out there we could use. Also, we can crawl. Seems the
    tricky part is getting judgments.
    I think we should establish first what kind of relevance judgments we
    want to collect:

    1. given a corpus, and a query, define an ordered list of top-N
    documents that are relevant to the query. This is our baseline. Getting
    this sort of information is very time-consuming and subjective.

    2. given a corpus, a query and a list of top-N results obtained from a
    real search, define what results are relevant and how they should be
    ordered. The reviewed list of top-N results becomes then the initial
    approximation of our baseline. Calculate a distance metric between real
    and reviewed result, and adjust ranking to maximize this metric.

    The second scenario could be handled by a webapp, which could present
    the following areas of functionality:

    * corpus selection and browsing

    * searching using selected search impl and its ranking parameters, and
    storing tuples of <corpus, impl, query, results>

    * review of the results (marking relevant / non-relevant, reordering),
    and saving of tuples <corpus, impl, query, reviewed results>

    * calculation of distance metrics.

    * adjustment of ranking parameters for a given search implementation.

    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com
  • Simon Willnauer at Aug 1, 2009 at 6:28 pm

    On Sat, Aug 1, 2009 at 1:23 AM, Andrzej Bialeckiwrote:
    Grant Ingersoll wrote:
    OK, so how do we get this started?  Seems like there are a lot of
    collections out there we could use.  Also, we can crawl.  Seems the tricky
    part is getting judgments.
    I think we should establish first what kind of relevance judgments we want
    to collect:
    This looks like two different things.
    One thing is deciding what we use to get "a" collection of documents -
    a corpus. It seems to be a very good idea to me to create a
    heterogeneous collection of documents such as wikipedia to kick off
    ORP. I guess we do not need a huge collection of documents to get
    started, right?!
    @Grant: I might have missed something but have we a list of available
    collections on some wiki page?! Would be great to have something like
    that.
    Once we got this project going we can start building various
    collections from all kinds of areas. I found it interesting that all
    collections I have seen are build from large documents but with the
    advent of mobile devices collections could also be build from
    "data-records" like SMS, Address-Records, image-metadata,
    audio-metadata where the text/document is relatively small. I found
    that searching on such "small" document puts different requirements on
    scoring parameters than websearch...

    Another thing is what we do with this collections. I kind of like the
    idea of having something like a webapp that is able to preform corpus
    selection, distance measurement etc.
    I wanna extend Andrzej's list and throwing out some random thoughts...

    - It would be nice to have something like a immediate representation
    of a corpus that can be plugged into a relevance measurement app /
    webapp.

    - such a relevance measurement should be able to work on top of custom
    search applications. There could be an API which give applications
    access to the corpus for indexing
    and can search on this corpus through the API. I can imagine lots of
    usecases where users want to judge their custom search engine against
    a corpus and compare the results.


    simon (in the middle of moving his apartment)
    1. given a corpus, and a query, define an ordered list of top-N documents
    that are relevant to the query. This is our baseline. Getting this sort of
    information is very time-consuming and subjective.

    2. given a corpus, a query and a list of top-N results obtained from a real
    search, define what results are relevant and how they should be ordered. The
    reviewed list of top-N results becomes then the initial approximation of our
    baseline. Calculate a distance metric between real and reviewed result, and
    adjust ranking to maximize this metric.

    The second scenario could be handled by a webapp, which could present the
    following areas of functionality:

    * corpus selection and browsing

    * searching using selected search impl and its ranking parameters, and
    storing tuples of <corpus, impl, query, results>

    * review of the results (marking relevant / non-relevant, reordering), and
    saving of tuples <corpus, impl, query, reviewed results>

    * calculation of distance metrics.

    * adjustment of ranking parameters for a given search implementation.

    --
    Best regards,
    Andrzej Bialecki     <><
    ___. ___ ___ ___ _ _   __________________________________
    [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
    ___|||__||  \|  ||  |  Embedded Unix, System Integration
    http://www.sigram.com  Contact: info at sigram dot com
  • Grant Ingersoll at Aug 5, 2009 at 2:41 am

    On Aug 1, 2009, at 2:27 PM, Simon Willnauer wrote:

    On Sat, Aug 1, 2009 at 1:23 AM, Andrzej Bialeckiwrote:
    Grant Ingersoll wrote:
    OK, so how do we get this started? Seems like there are a lot of
    collections out there we could use. Also, we can crawl. Seems
    the tricky
    part is getting judgments.
    I think we should establish first what kind of relevance judgments
    we want
    to collect:
    This looks like two different things.
    One thing is deciding what we use to get "a" collection of documents -
    a corpus. It seems to be a very good idea to me to create a
    heterogeneous collection of documents such as wikipedia to kick off
    ORP. I guess we do not need a huge collection of documents to get
    started, right?!
    @Grant: I might have missed something but have we a list of available
    collections on some wiki page?! Would be great to have something like
    that.
    Not yet, we have some on Mahout
  • Andrzej Bialecki at Aug 5, 2009 at 8:41 am

    Grant Ingersoll wrote:
    On Aug 1, 2009, at 2:27 PM, Simon Willnauer wrote:

    On Sat, Aug 1, 2009 at 1:23 AM, Andrzej Bialeckiwrote:
    Grant Ingersoll wrote:
    OK, so how do we get this started? Seems like there are a lot of
    collections out there we could use. Also, we can crawl. Seems the
    tricky
    part is getting judgments.
    I think we should establish first what kind of relevance judgments we
    want
    to collect:
    This looks like two different things.
    One thing is deciding what we use to get "a" collection of documents -
    a corpus. It seems to be a very good idea to me to create a
    heterogeneous collection of documents such as wikipedia to kick off
    ORP. I guess we do not need a huge collection of documents to get
    started, right?!
    @Grant: I might have missed something but have we a list of available
    collections on some wiki page?! Would be great to have something like
    that.
    Not yet, we have some on Mahout
    This link may be of interest for us: http://evaluatir.org . Lucene
    results are there, although they are disappointingly low.


    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com
  • Peter Skomoroch at Jul 31, 2009 at 11:27 pm
    Mechanical Turk has built in tasks for evaluating search relevance

    Seed queries could start with the AOL search logs or wikipedia traffic
    logs?

    Pete

    Sent from my iPhone
    On Jul 31, 2009, at 7:01 PM, Grant Ingersoll wrote:

    OK, so how do we get this started? Seems like there are a lot of
    collections out there we could use. Also, we can crawl. Seems the
    tricky part is getting judgments.

    Thoughts?

    -Grant

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupopenrelevance-dev @
categorieslucene
postedJul 31, '09 at 11:01p
activeAug 5, '09 at 8:41a
posts6
users4
websitelucene.apache.org...

People

Translate

site design / logo © 2018 Grokbase