On Oct 16, 2009, at 8:30 PM, Omar Alonso wrote:
1- We can start by paying between 2 and 5 cents per document/query
pair (or document/topic) on a short data set (say 200 docs). That
should be in the order of $25 (assuming 2 cents and 5 turkers per
assignment + AMZN fee).
It also depends how many experiments one would like to run. My
suggestion would be to run 2 or 3 experiments with some small data
sets for say $100 to see what kind of response we get back and then
think about something else at large scale.
While I realize $100 isn't a lot, we simply don't have a budget for
such experiments and the point of ORP is to be able do this in the
community. I suppose we could ask the ASF board for the money, but I
don't think we are ready for that anyway. I very much have a "If you
build it, they will come" mentality, so I know if we can just get
bootstrapped with some data and some queries and a way to collect
their judgments, we can get people interested.
I have some tips on how to run crowdsourcing for relevance
evaluation here: http://wwwcsif.cs.ucdavis.edu/~alonsoom/ExperimentDesign.pdf
2- If the goal is to have everything open source (gold set +
relevance judgments), we need to produce a new data set from
scratch. Also, what is the goal here? What is the domain? Enterprise
search? Ad-hoc retrieval?
Yes. I think the primary goal of ORP is to give people within Lucene
a way to judge relevance that doesn't require us to purchase datasets,
just like the contrib/benchmarker gives us a way to talk about
performance. So, while it may evolve to be more, I'd be happy with a
simple, fixed collection at this point. Wikipedia is OK, but in my
experience, there is often only a few good answers for a query to
begin with, so it's harder to judge recall, but that doesn't mean it
I know there are a lot of issues around curating a good collection,
but I'd like to be pragmatic and just say, what can we arrive at in a
reasonable amount of time that best approximates what someone doing,
say, genetic/biopharma research might do. Just getting a raw dataset
like PubMed on a given day seems like a good first step, then we can
work to clean it up and generate queries on it.
In summary, I would start with something small (English only,
Creative Commons or Wikipedia). Build a few experiments and see the
results. Then expand on data sets and also make it multilingual.
Agreed. I'm not too worried about multilingual just yet, but it is a
--- On Fri, 10/16/09, Grant Ingersoll wrote:
From: Grant Ingersoll <firstname.lastname@example.org>
Subject: Re: OpenRelevance and crowdsourcing
Date: Friday, October 16, 2009, 3:38 PM
It sounds interesting, can you elaborate more on what you
had in mind?
A few questions come to mind:
1. Cost associated w/ Turk.
2. What dataset would you use?
On Oct 16, 2009, at 5:49 PM, Omar Alonso wrote:
I would like to know if there is interest in trying
some experiments on Mechanical Turk for the OpenRelevance
project. I've done TREC and INEX on MTurk and is a good
platform for trying relevance experiments.
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)