FAQ

[OpenRelevance-dev] dataset collection

Tommaso Teofili
Feb 21, 2011 at 4:26 pm
Hi ORPers,
I have to use and evaluate a machine learning system for clustering
documents so I am wondering if there is any available dataset used within
ORP I could use.
BTW, is the ORV already in place? May I give you any help with the
development/design of ORP system?
Regards,
Tommaso
reply

Search Discussions

5 responses

  • Itamar Syn-Hershko at Feb 27, 2011 at 7:10 pm
    Hi Tommaso,


    Grant posted a while ago about the ASF mail archive being available from
    a cloud store: http://search-lucene.com/m/9udC12n9y5A.


    The Orev application (open-relevance viewer) is still under development.
    I'm scheduled to resume work on it in a few weeks, and can actually use
    some help and feedback. What I have so far is also on github
    (https://github.com/synhershko/Orev), and is coded in .NET. I'm planning
    to complete this in .NET (and there's still plenty to do, also in terms
    of design), unless someone wishes to pick it up and do the actual coding
    in Java with me assisting (I'm more fluent with .NET).


    Itamar.

    On 21/2/2011 6:25 PM, Tommaso Teofili wrote:

    Hi ORPers,
    I have to use and evaluate a machine learning system for clustering
    documents so I am wondering if there is any available dataset used within
    ORP I could use.
    BTW, is the ORV already in place? May I give you any help with the
    development/design of ORP system?
    Regards,
    Tommaso
  • Tommaso Teofili at Mar 1, 2011 at 3:56 pm
    Hello Itamar,

    2011/2/27 Itamar Syn-Hershko <itamar@code972.com>
    Grant posted a while ago about the ASF mail archive being available from a
    cloud store: http://search-lucene.com/m/9udC12n9y5A.
    thanks! I'll try them out :)


    The Orev application (open-relevance viewer) is still under development.
    I'm scheduled to resume work on it in a few weeks, and can actually use some
    help and feedback. What I have so far is also on github (
    https://github.com/synhershko/Orev), and is coded in .NET. I'm planning to
    complete this in .NET (and there's still plenty to do, also in terms of
    design), unless someone wishes to pick it up and do the actual coding in
    Java with me assisting (I'm more fluent with .NET).
    I've had only minor experiences with .NET but if design decisions happen on
    this ML I'll be happy to help.
    Cheers,
    Tommaso
  • Otis Gospodnetic at Mar 2, 2011 at 2:13 am
    Itamar,

    Would you happen to have a screenshot that shows that ORev looks like?

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Itamar Syn-Hershko <itamar@code972.com>
    To: openrelevance-dev@lucene.apache.org
    Sent: Sun, February 27, 2011 2:09:02 PM
    Subject: Re: dataset collection

    Hi Tommaso,


    Grant posted a while ago about the ASF mail archive being available from a
    cloud store: http://search-lucene.com/m/9udC12n9y5A.


    The Orev application (open-relevance viewer) is still under development. I'm
    scheduled to resume work on it in a few weeks, and can actually use some help
    and feedback. What I have so far is also on github
    (https://github.com/synhershko/Orev), and is coded in .NET. I'm planning to
    complete this in .NET (and there's still plenty to do, also in terms of design),
    unless someone wishes to pick it up and do the actual coding in Java with me
    assisting (I'm more fluent with .NET).


    Itamar.

    On 21/2/2011 6:25 PM, Tommaso Teofili wrote:

    Hi ORPers,
    I have to use and evaluate a machine learning system for clustering
    documents so I am wondering if there is any available dataset used within
    ORP I could use.
    BTW, is the ORV already in place? May I give you any help with the
    development/design of ORP system?
    Regards,
    Tommaso
  • Itamar Syn-Hershko at Mar 27, 2011 at 11:42 pm
    Otis, I nearly forgot you :)

    Attached are the screenshots.

    The flow is quite obvious, but as a service here's the architecture
    description: https://github.com/synhershko/Orev/blob/master/Orev.png

    Honestly there's still quite a lot to do, but the basics are already
    there. The idea is to be able to handle several corpora per language,
    and to be able to have more than one language in the system. Also, we
    should be able to remove judgments based on users, and to have an
    overall smart system of detecting poor judgments (i.e. keep judgments
    from a new user in standby until he judges a few dozens, etc).

    There are quite a few questions that come to mind (and some were raised
    before), such as what is the ideal way of scoring (boolean, 1..5, other
    system), whether we should trust one judgment per doc/topic or we should
    try crossing, and so on.

    My plan is to give this some attention in a few weeks. As you can see,
    its quite a crucial part of my work on Hebrew search. Your thoughts /
    cooperation appreciated.
    On 02/03/2011 04:11, Otis Gospodnetic wrote:
    Itamar,

    Would you happen to have a screenshot that shows that ORev looks like?

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Itamar Syn-Hershko<itamar@code972.com>
    To: openrelevance-dev@lucene.apache.org
    Sent: Sun, February 27, 2011 2:09:02 PM
    Subject: Re: dataset collection

    Hi Tommaso,


    Grant posted a while ago about the ASF mail archive being available from a
    cloud store: http://search-lucene.com/m/9udC12n9y5A.


    The Orev application (open-relevance viewer) is still under development. I'm
    scheduled to resume work on it in a few weeks, and can actually use some help
    and feedback. What I have so far is also on github
    (https://github.com/synhershko/Orev), and is coded in .NET. I'm planning to
    complete this in .NET (and there's still plenty to do, also in terms of design),
    unless someone wishes to pick it up and do the actual coding in Java with me
    assisting (I'm more fluent with .NET).


    Itamar.

    On 21/2/2011 6:25 PM, Tommaso Teofili wrote:

    Hi ORPers,
    I have to use and evaluate a machine learning system for clustering
    documents so I am wondering if there is any available dataset used within
    ORP I could use.
    BTW, is the ORV already in place? May I give you any help with the
    development/design of ORP system?
    Regards,
    Tommaso
  • Itamar Syn-Hershko at Mar 27, 2011 at 11:46 pm
    Mailman seem to have dropped the images. Anyone interested, feel free to
    email me privately for them.


    Forgot to mention: the template is ASP.NET MVC's default template. At
    this point the valuable part is in the code...


    Itamar.

    On 28/03/2011 01:41, Itamar Syn-Hershko wrote:

    Otis, I nearly forgot you :)

    Attached are the screenshots.

    The flow is quite obvious, but as a service here's the architecture
    description: https://github.com/synhershko/Orev/blob/master/Orev.png

    Honestly there's still quite a lot to do, but the basics are already
    there. The idea is to be able to handle several corpora per language,
    and to be able to have more than one language in the system. Also, we
    should be able to remove judgments based on users, and to have an
    overall smart system of detecting poor judgments (i.e. keep judgments
    from a new user in standby until he judges a few dozens, etc).

    There are quite a few questions that come to mind (and some were
    raised before), such as what is the ideal way of scoring (boolean,
    1..5, other system), whether we should trust one judgment per
    doc/topic or we should try crossing, and so on.

    My plan is to give this some attention in a few weeks. As you can see,
    its quite a crucial part of my work on Hebrew search. Your thoughts /
    cooperation appreciated.
    On 02/03/2011 04:11, Otis Gospodnetic wrote:
    Itamar,

    Would you happen to have a screenshot that shows that ORev looks like?

    Otis
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
    Lucene ecosystem search :: http://search-lucene.com/


    ----- Original Message ----
    From: Itamar Syn-Hershko<itamar@code972.com>
    To: openrelevance-dev@lucene.apache.org
    Sent: Sun, February 27, 2011 2:09:02 PM
    Subject: Re: dataset collection

    Hi Tommaso,


    Grant posted a while ago about the ASF mail archive being available
    from a
    cloud store: http://search-lucene.com/m/9udC12n9y5A.


    The Orev application (open-relevance viewer) is still under
    development. I'm
    scheduled to resume work on it in a few weeks, and can actually use
    some help
    and feedback. What I have so far is also on github
    (https://github.com/synhershko/Orev), and is coded in .NET. I'm
    planning to
    complete this in .NET (and there's still plenty to do, also in terms
    of design),
    unless someone wishes to pick it up and do the actual coding in Java
    with me
    assisting (I'm more fluent with .NET).


    Itamar.

    On 21/2/2011 6:25 PM, Tommaso Teofili wrote:

    Hi ORPers,
    I have to use and evaluate a machine learning system for clustering
    documents so I am wondering if there is any available dataset used
    within
    ORP I could use.
    BTW, is the ORV already in place? May I give you any help with the
    development/design of ORP system?
    Regards,
    Tommaso

Related Discussions

Discussion Navigation
viewthread | post