FAQ
While MrUnit discussion draws to its natural conclusion I would like
to bring up another point which might be well aligned with that
discussion. Patrick Hunt has brought up this idea earlier today and I
believe it has to be elaborated further.

A number of testing projects both for Hadoop and Hadoop-related
component were brought to life over last year or two. Among those are
MRUnit, PigUnit, YCSB, Herriot, and perhaps a few more. They all
focusing on more or less the same problem e.g. validation of Hadoop or
on-top-of-Hadoop components, or application level testing for Hadoop.
However, the fact that they all are spread across a wide variety of
projects seems to confuse/mislead Hadoop users.

How about incubating a bigger Hadoop (Pig, Oozie, HBase) testing
project which will take care about development and support of common
(where's possible) tools, frameworks and the like? Please feel free to
share your thoughts :)
--
Take care,
Konstantin (Cos) Boudnik

On Tue, Feb 15, 2011 at 10:44, Eric Sammer wrote:
I've started the wiki page proposal for Incubator for mrunit. I'll ping
people off list for mentoring. Much appreciated for all the help!
On Tue, Feb 15, 2011 at 1:36 PM, Nigel Daley wrote:

I'm happy to help mentor as well.

Cheers,
Nige
On Feb 11, 2011, at 11:52 AM, Patrick Hunt wrote:

On Fri, Feb 11, 2011 at 9:44 AM, Mattmann, Chris A (388J)
wrote:
Guys, BTW, if you need help or a mentor in Apache Incubator-ville for
MRUnit, I would be happy to help.
I was going to suggest the same thing (mrunit to incubator). I would
also be happy to be a mentor.

Patrick

Search Discussions

  • Mattmann, Chris A (388J) at Feb 15, 2011 at 10:14 pm
    Sounds good to me, Cos. I'm fine to help/mentor with either one that ends up standing when the dust clears :)

    Cheers,
    Chris
    On Feb 15, 2011, at 1:58 PM, Konstantin Boudnik wrote:

    While MrUnit discussion draws to its natural conclusion I would like
    to bring up another point which might be well aligned with that
    discussion. Patrick Hunt has brought up this idea earlier today and I
    believe it has to be elaborated further.

    A number of testing projects both for Hadoop and Hadoop-related
    component were brought to life over last year or two. Among those are
    MRUnit, PigUnit, YCSB, Herriot, and perhaps a few more. They all
    focusing on more or less the same problem e.g. validation of Hadoop or
    on-top-of-Hadoop components, or application level testing for Hadoop.
    However, the fact that they all are spread across a wide variety of
    projects seems to confuse/mislead Hadoop users.

    How about incubating a bigger Hadoop (Pig, Oozie, HBase) testing
    project which will take care about development and support of common
    (where's possible) tools, frameworks and the like? Please feel free to
    share your thoughts :)
    --
    Take care,
    Konstantin (Cos) Boudnik

    On Tue, Feb 15, 2011 at 10:44, Eric Sammer wrote:
    I've started the wiki page proposal for Incubator for mrunit. I'll ping
    people off list for mentoring. Much appreciated for all the help!
    On Tue, Feb 15, 2011 at 1:36 PM, Nigel Daley wrote:

    I'm happy to help mentor as well.

    Cheers,
    Nige
    On Feb 11, 2011, at 11:52 AM, Patrick Hunt wrote:

    On Fri, Feb 11, 2011 at 9:44 AM, Mattmann, Chris A (388J)
    wrote:
    Guys, BTW, if you need help or a mentor in Apache Incubator-ville for
    MRUnit, I would be happy to help.
    I was going to suggest the same thing (mrunit to incubator). I would
    also be happy to be a mentor.

    Patrick

    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    Chris Mattmann, Ph.D.
    Senior Computer Scientist
    NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
    Office: 171-266B, Mailstop: 171-246
    Email: [email protected]
    WWW: http://sunset.usc.edu/~mattmann/
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    Adjunct Assistant Professor, Computer Science Department
    University of Southern California, Los Angeles, CA 90089 USA
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  • Eric Sammer at Feb 15, 2011 at 10:15 pm
    I think this is a good idea. The only thing I think is that it may make
    sense to split such an effort into two components: one for the testing of
    Hadoop and the projects themselves and one to test end user applications and
    libraries. Performance testing tools like YCSB are probably more in the
    former camp where mrunit is the latter, as a for instance. I think it's
    important to have separate artifacts to minimize uber-jar issues (or
    contrib-like situations where release cycles are coupled).
    On Tue, Feb 15, 2011 at 4:58 PM, Konstantin Boudnik wrote:

    While MrUnit discussion draws to its natural conclusion I would like
    to bring up another point which might be well aligned with that
    discussion. Patrick Hunt has brought up this idea earlier today and I
    believe it has to be elaborated further.

    A number of testing projects both for Hadoop and Hadoop-related
    component were brought to life over last year or two. Among those are
    MRUnit, PigUnit, YCSB, Herriot, and perhaps a few more. They all
    focusing on more or less the same problem e.g. validation of Hadoop or
    on-top-of-Hadoop components, or application level testing for Hadoop.
    However, the fact that they all are spread across a wide variety of
    projects seems to confuse/mislead Hadoop users.

    How about incubating a bigger Hadoop (Pig, Oozie, HBase) testing
    project which will take care about development and support of common
    (where's possible) tools, frameworks and the like? Please feel free to
    share your thoughts :)
    --
    Take care,
    Konstantin (Cos) Boudnik

    On Tue, Feb 15, 2011 at 10:44, Eric Sammer wrote:
    I've started the wiki page proposal for Incubator for mrunit. I'll ping
    people off list for mentoring. Much appreciated for all the help!
    On Tue, Feb 15, 2011 at 1:36 PM, Nigel Daley wrote:

    I'm happy to help mentor as well.

    Cheers,
    Nige
    On Feb 11, 2011, at 11:52 AM, Patrick Hunt wrote:

    On Fri, Feb 11, 2011 at 9:44 AM, Mattmann, Chris A (388J)
    wrote:
    Guys, BTW, if you need help or a mentor in Apache Incubator-ville for
    MRUnit, I would be happy to help.
    I was going to suggest the same thing (mrunit to incubator). I would
    also be happy to be a mentor.

    Patrick


    --
    Eric Sammer
    twitter: esammer
    data: www.cloudera.com
  • Konstantin Boudnik at Feb 15, 2011 at 11:55 pm

    On Tue, Feb 15, 2011 at 14:15, Eric Sammer wrote:
    I think this is a good idea. The only thing I think is that it may make
    sense to split such an effort into two components: one for the testing of
    Hadoop and the projects themselves and one to test end user applications and
    I expect to see even greater number of component, to be honest. E.g. a
    harness to run stacks testing (which as has been discussed with HBase
    folks might utilize YCSB artifacts). Which doesn't invalidate the
    purpose of central Hadoop testing project or whatever we might call
    it.
    libraries. Performance testing tools like YCSB are probably more in the
    former camp where mrunit is the latter, as a for instance. I think it's
    important to have separate artifacts to minimize uber-jar issues (or
    contrib-like situations where release cycles are coupled).
    Having separate artifacts/release cycles would be pretty important for
    another reason too: test artifacts might undergo significant changes
    between releases of a product. Thus requiring using different versions
    of such validating artifacts for differently composed Hadoop stacks.
    Uber-jar are proven to be inflexible and pain to deal with.

    Cos
    On Tue, Feb 15, 2011 at 4:58 PM, Konstantin Boudnik wrote:

    While MrUnit discussion draws to its natural conclusion I would like
    to bring up another point which might be well aligned with that
    discussion. Patrick Hunt has brought up this idea earlier today and I
    believe it has to be elaborated further.

    A number of testing projects both for Hadoop and Hadoop-related
    component were brought to life over last year or two. Among those are
    MRUnit, PigUnit, YCSB, Herriot, and perhaps a few more. They all
    focusing on more or less the same problem e.g. validation of Hadoop or
    on-top-of-Hadoop components, or application level testing for Hadoop.
    However, the fact that they all are spread across a wide variety of
    projects seems to confuse/mislead Hadoop users.

    How about incubating a bigger Hadoop (Pig, Oozie, HBase) testing
    project which will take care about development and support of common
    (where's possible) tools, frameworks and the like? Please feel free to
    share your thoughts :)
    --
    Take care,
    Konstantin (Cos) Boudnik

    On Tue, Feb 15, 2011 at 10:44, Eric Sammer wrote:
    I've started the wiki page proposal for Incubator for mrunit. I'll ping
    people off list for mentoring. Much appreciated for all the help!
    On Tue, Feb 15, 2011 at 1:36 PM, Nigel Daley wrote:

    I'm happy to help mentor as well.

    Cheers,
    Nige
    On Feb 11, 2011, at 11:52 AM, Patrick Hunt wrote:

    On Fri, Feb 11, 2011 at 9:44 AM, Mattmann, Chris A (388J)
    wrote:
    Guys, BTW, if you need help or a mentor in Apache Incubator-ville for
    MRUnit, I would be happy to help.
    I was going to suggest the same thing (mrunit to incubator). I would
    also be happy to be a mentor.

    Patrick


    --
    Eric Sammer
    twitter: esammer
    data: www.cloudera.com
  • Steve Loughran at Feb 16, 2011 at 11:38 am

    On 15/02/11 21:58, Konstantin Boudnik wrote:
    While MrUnit discussion draws to its natural conclusion I would like
    to bring up another point which might be well aligned with that
    discussion. Patrick Hunt has brought up this idea earlier today and I
    believe it has to be elaborated further.

    A number of testing projects both for Hadoop and Hadoop-related
    component were brought to life over last year or two. Among those are
    MRUnit, PigUnit, YCSB, Herriot, and perhaps a few more. They all
    focusing on more or less the same problem e.g. validation of Hadoop or
    on-top-of-Hadoop components, or application level testing for Hadoop.
    However, the fact that they all are spread across a wide variety of
    projects seems to confuse/mislead Hadoop users.

    How about incubating a bigger Hadoop (Pig, Oozie, HBase) testing
    project which will take care about development and support of common
    (where's possible) tools, frameworks and the like? Please feel free to
    share your thoughts :)
    --
    I think it would be good though specific projects will need/have their
    own testing needs -I'd expect more focus for testing redistributables to
    be on helping Hadoop users test their stuff against subsets of data,
    rather than the hadoop-*-dev problem of "stressing the hadoop stack once
    your latest patch is applied".

    That said, the whole problem of qualifying an OS, Java release and
    cluster is something we'd expect most end user teams to have to do
    -right now terasort is the main stress test.
  • Konstantin Boudnik at Feb 16, 2011 at 7:50 pm
    Steve.

    If the project under discussion will provide a common harness where such a test
    artifact (think of a Maven artifact for example) will click and will be
    executed automatically with all needed tools and dependencies resolved for you
    - would it be appealing for end-users' cause?

    As Joep said this "...will reduce the effort to take any (set of ) changes
    from development into production." Take it one step further: when your cluster
    is 'assembled' you need to validate it (on top of a concrete OS, etc.); is it
    desirable to follow N-steps process to bring about whatever testing work-load
    you need or you'd prefer to simply do something like:

    wget http://workloads.internal.mydomain.com/stackValidations/v12.4.pom \
    && mvn verify

    and check the results later on?

    These gonna be the same tools that dev. use for their tasks although worksets
    will be different. So what?

    Cos
    On Wed, Feb 16, 2011 at 11:37AM, Steve Loughran wrote:
    On 15/02/11 21:58, Konstantin Boudnik wrote:
    While MrUnit discussion draws to its natural conclusion I would like
    to bring up another point which might be well aligned with that
    discussion. Patrick Hunt has brought up this idea earlier today and I
    believe it has to be elaborated further.

    A number of testing projects both for Hadoop and Hadoop-related
    component were brought to life over last year or two. Among those are
    MRUnit, PigUnit, YCSB, Herriot, and perhaps a few more. They all
    focusing on more or less the same problem e.g. validation of Hadoop or
    on-top-of-Hadoop components, or application level testing for Hadoop.
    However, the fact that they all are spread across a wide variety of
    projects seems to confuse/mislead Hadoop users.

    How about incubating a bigger Hadoop (Pig, Oozie, HBase) testing
    project which will take care about development and support of common
    (where's possible) tools, frameworks and the like? Please feel free to
    share your thoughts :)
    --
    I think it would be good though specific projects will need/have their
    own testing needs -I'd expect more focus for testing redistributables to
    be on helping Hadoop users test their stuff against subsets of data,
    rather than the hadoop-*-dev problem of "stressing the hadoop stack once
    your latest patch is applied".

    That said, the whole problem of qualifying an OS, Java release and
    cluster is something we'd expect most end user teams to have to do
    -right now terasort is the main stress test.
  • Ian Holsman at Feb 17, 2011 at 1:46 pm
    I'm not sure it makes sense to all the testing packages under a different umbrella that covers the code they test.
    While there might be commonalities building a test harness, I would think that each testing tool would need to have deep knowledge of the tool's internals that it is testing. as such it would need someone with the experience to code it.

    I don't see what advantage combining PigUnit & say 'MRUnit' would be for example.
    --I
    On Feb 16, 2011, at 2:50 PM, Konstantin Boudnik wrote:

    Steve.

    If the project under discussion will provide a common harness where such a test
    artifact (think of a Maven artifact for example) will click and will be
    executed automatically with all needed tools and dependencies resolved for you
    - would it be appealing for end-users' cause?

    As Joep said this "...will reduce the effort to take any (set of ) changes
    from development into production." Take it one step further: when your cluster
    is 'assembled' you need to validate it (on top of a concrete OS, etc.); is it
    desirable to follow N-steps process to bring about whatever testing work-load
    you need or you'd prefer to simply do something like:

    wget http://workloads.internal.mydomain.com/stackValidations/v12.4.pom \
    && mvn verify

    and check the results later on?

    These gonna be the same tools that dev. use for their tasks although worksets
    will be different. So what?

    Cos
    On Wed, Feb 16, 2011 at 11:37AM, Steve Loughran wrote:
    On 15/02/11 21:58, Konstantin Boudnik wrote:
    While MrUnit discussion draws to its natural conclusion I would like
    to bring up another point which might be well aligned with that
    discussion. Patrick Hunt has brought up this idea earlier today and I
    believe it has to be elaborated further.

    A number of testing projects both for Hadoop and Hadoop-related
    component were brought to life over last year or two. Among those are
    MRUnit, PigUnit, YCSB, Herriot, and perhaps a few more. They all
    focusing on more or less the same problem e.g. validation of Hadoop or
    on-top-of-Hadoop components, or application level testing for Hadoop.
    However, the fact that they all are spread across a wide variety of
    projects seems to confuse/mislead Hadoop users.

    How about incubating a bigger Hadoop (Pig, Oozie, HBase) testing
    project which will take care about development and support of common
    (where's possible) tools, frameworks and the like? Please feel free to
    share your thoughts :)
    --
    I think it would be good though specific projects will need/have their
    own testing needs -I'd expect more focus for testing redistributables to
    be on helping Hadoop users test their stuff against subsets of data,
    rather than the hadoop-*-dev problem of "stressing the hadoop stack once
    your latest patch is applied".

    That said, the whole problem of qualifying an OS, Java release and
    cluster is something we'd expect most end user teams to have to do
    -right now terasort is the main stress test.
  • Konstantin Boudnik at Feb 17, 2011 at 5:34 pm

    On Thu, Feb 17, 2011 at 05:45, Ian Holsman wrote:
    I'm not sure it makes sense to  all the testing packages under a different umbrella that covers the code they test.
    While there might be commonalities building a test harness, I would think that each testing tool would need to have deep knowledge of the tool's internals that it is testing. as such it would need someone with the experience to code it.
    That's pretty much true indeed if you are talking about tests for a
    project or closely tightened projects such as Herriot in Hadoop.
    Speaking of tools there are some benefits though. Say, PigUnit and
    MRUnit are both xUnit frameworks. The former allows you to run Pig
    jobs in local and cluster mode. The latter is to validate MB jobs
    without a need to fire up a cluster.
    I don't see what advantage combining PigUnit & say 'MRUnit' would be for example.
    Don't you think Pig user would benefit if Pig scripts can be tested
    against MRUnit which gives you a flavor of cluster environment without
    one? Now, do you think it is likely that someone will go great lengths
    to make such an effort and build such a bridge right now?

    Cos
    --I
    On Feb 16, 2011, at 2:50 PM, Konstantin Boudnik wrote:

    Steve.

    If the project under discussion will provide a common harness where such a test
    artifact (think of a Maven artifact for example) will click and will be
    executed automatically with all needed tools and dependencies resolved for you
    - would it be appealing for end-users' cause?

    As Joep said this "...will reduce the effort to take any (set of ) changes
    from development into production." Take it one step further: when your cluster
    is 'assembled' you need to validate it (on top of a concrete OS, etc.); is it
    desirable to follow N-steps process to bring about whatever testing work-load
    you need or you'd prefer to simply do something like:

    wget http://workloads.internal.mydomain.com/stackValidations/v12.4.pom \
    && mvn verify

    and check the results later on?

    These gonna be the same tools that dev. use for their tasks although worksets
    will be different. So what?

    Cos
    On Wed, Feb 16, 2011 at 11:37AM, Steve Loughran wrote:
    On 15/02/11 21:58, Konstantin Boudnik wrote:
    While MrUnit discussion draws to its natural conclusion I would like
    to bring up another point which might be well aligned with that
    discussion. Patrick Hunt has brought up this idea earlier today and I
    believe it has to be elaborated further.

    A number of testing projects both for Hadoop and Hadoop-related
    component were brought to life over last year or two. Among those are
    MRUnit, PigUnit, YCSB, Herriot, and perhaps a few more. They all
    focusing on more or less the same problem e.g. validation of Hadoop or
    on-top-of-Hadoop components, or application level testing for Hadoop.
    However, the fact that they all are spread across a wide variety of
    projects seems to confuse/mislead Hadoop users.

    How about incubating a bigger Hadoop (Pig, Oozie, HBase) testing
    project which will take care about development and support of common
    (where's possible) tools, frameworks and the like? Please feel free to
    share your thoughts :)
    --
    I think it would be good though specific projects will need/have their
    own testing needs -I'd expect more focus for testing redistributables to
    be on helping Hadoop users test their stuff against subsets of data,
    rather than the hadoop-*-dev problem of "stressing the hadoop stack once
    your latest patch is applied".

    That said, the whole problem of qualifying an OS, Java release and
    cluster is something we'd expect most end user teams to have to do
    -right now terasort is the main stress test.
  • Eric Yang at Feb 17, 2011 at 6:37 pm
    The biggest hurtle in hadoop adoption is that there is no easy way to setup
    a pseudo cluster on developer's machine. People are steering off course to
    build additional simulation tools and validation tools. In practice, those
    tools don't provide nearly enough insight in things that could go wrong in a
    real cluster. For example, if a pig job uses HBaseStorage for accessing
    data. There is not a single hint that, the hbase-site.xml needs to be in a
    jar file in the pig class path for pig job to distribute the hbase
    environment to the MR cluster for the job to work. Regardless of how good
    the simulation tools, they are limited to the silo environment. What we can
    do to improve the integration is to have a set of installable packages that
    can integrate well across hadoop ecosystems on the developer's machine.

    This is similar situation that mass majority of developer on LAMP stack,
    people don't start with compiling their own apache server and mysql server
    to start development and testing. They start by installing binary packages
    and getting their work tested on real software.

    Hence, hadoop developers bearing the responsibility of testing release
    packages, while end user developer should be responsible for certifying the
    integrated system on his/her own cluster. There are already a list of tools
    to validate a cluster, like Terasort or GridMix 1,2,3.

    I think the bigger concern is that Hadoop ecosystem does not have a standard
    method in linking dependencies. Hbase depends on Zookeeper, and Pig depends
    on Hadoop and Hbase. Then pig decided to put hadoop-core jar in it's own
    jar file. Chukwa depends on pig + hbase + hadoop and zookeeper. The
    version incompatibility is probably what driving people nuts. Hence, there
    is a new proposal on how to integrate among hadoop ecosystem. I urge
    project owners to review the proposal and provide feedbacks.

    The proposal is located at:

    https://issues.apache.org/jira/secure/attachment/12470823/deployment.pdf

    The related jiras are:

    https://issues.apache.org/jira/browse/HADOOP-6255
    https://issues.apache.org/jira/browse/PIG-1857

    There are plans to file more jiras for related projects. The integration
    would also be a lot easier if all related projects are using maven for
    dependency management.

    Regards,
    Eric
    On 2/17/11 9:33 AM, "Konstantin Boudnik" wrote:
    On Thu, Feb 17, 2011 at 05:45, Ian Holsman wrote:
    I'm not sure it makes sense to  all the testing packages under a different
    umbrella that covers the code they test.
    While there might be commonalities building a test harness, I would think
    that each testing tool would need to have deep knowledge of the tool's
    internals that it is testing. as such it would need someone with the
    experience to code it.
    That's pretty much true indeed if you are talking about tests for a
    project or closely tightened projects such as Herriot in Hadoop.
    Speaking of tools there are some benefits though. Say, PigUnit and
    MRUnit are both xUnit frameworks. The former allows you to run Pig
    jobs in local and cluster mode. The latter is to validate MB jobs
    without a need to fire up a cluster.
    I don't see what advantage combining PigUnit & say 'MRUnit' would be for
    example.
    Don't you think Pig user would benefit if Pig scripts can be tested
    against MRUnit which gives you a flavor of cluster environment without
    one? Now, do you think it is likely that someone will go great lengths
    to make such an effort and build such a bridge right now?

    Cos
    --I
    On Feb 16, 2011, at 2:50 PM, Konstantin Boudnik wrote:

    Steve.

    If the project under discussion will provide a common harness where such a
    test
    artifact (think of a Maven artifact for example) will click and will be
    executed automatically with all needed tools and dependencies resolved for
    you
    - would it be appealing for end-users' cause?

    As Joep said this "...will reduce the effort to take any (set of ) changes
    from development into production." Take it one step further: when your
    cluster
    is 'assembled' you need to validate it (on top of a concrete OS, etc.); is
    it
    desirable to follow N-steps process to bring about whatever testing
    work-load
    you need or you'd prefer to simply do something like:

    wget http://workloads.internal.mydomain.com/stackValidations/v12.4.pom \
    && mvn verify

    and check the results later on?

    These gonna be the same tools that dev. use for their tasks although
    worksets
    will be different. So what?

    Cos
    On Wed, Feb 16, 2011 at 11:37AM, Steve Loughran wrote:
    On 15/02/11 21:58, Konstantin Boudnik wrote:
    While MrUnit discussion draws to its natural conclusion I would like
    to bring up another point which might be well aligned with that
    discussion. Patrick Hunt has brought up this idea earlier today and I
    believe it has to be elaborated further.

    A number of testing projects both for Hadoop and Hadoop-related
    component were brought to life over last year or two. Among those are
    MRUnit, PigUnit, YCSB, Herriot, and perhaps a few more. They all
    focusing on more or less the same problem e.g. validation of Hadoop or
    on-top-of-Hadoop components, or application level testing for Hadoop.
    However, the fact that they all are spread across a wide variety of
    projects seems to confuse/mislead Hadoop users.

    How about incubating a bigger Hadoop (Pig, Oozie, HBase) testing
    project which will take care about development and support of common
    (where's possible) tools, frameworks and the like? Please feel free to
    share your thoughts :)
    --
    I think it would be good though specific projects will need/have their
    own testing needs -I'd expect more focus for testing redistributables to
    be on helping Hadoop users test their stuff against subsets of data,
    rather than the hadoop-*-dev problem of "stressing the hadoop stack once
    your latest patch is applied".

    That said, the whole problem of qualifying an OS, Java release and
    cluster is something we'd expect most end user teams to have to do
    -right now terasort is the main stress test.
  • Konstantin Boudnik at Feb 17, 2011 at 7:21 pm
    Eric.

    I am sure that packaging of Hadoop and other application working
    directly with Hadoop is a highly needed thing (although there's always
    a tricky question how many platforms you plan to provide packaging
    for, etc.). What we are discussing here is software testing, not
    packaging nor integration issues between packaged bits.

    If you want to - please start a separate discussion to avoid steering
    this thread away and not mixing the issues.

    Cos
    I think the bigger concern is that Hadoop ecosystem does not have a standard
    method in linking dependencies.  Hbase depends on Zookeeper, and Pig depends
    on Hadoop and Hbase.  Then pig decided to put hadoop-core jar in it's own
    jar file.  Chukwa depends on pig + hbase + hadoop and zookeeper.  The
    version incompatibility is probably what driving people nuts.  Hence, there
    is a new proposal on how to integrate among hadoop ecosystem.  I urge
    project owners to review the proposal and provide feedbacks.

    The proposal is located at:

    https://issues.apache.org/jira/secure/attachment/12470823/deployment.pdf

    The related jiras are:

    https://issues.apache.org/jira/browse/HADOOP-6255
    https://issues.apache.org/jira/browse/PIG-1857

    There are plans to file more jiras for related projects.  The integration
    would also be a lot easier if all related projects are using maven for
    dependency management.

    Regards,
    Eric
    On 2/17/11 9:33 AM, "Konstantin Boudnik" wrote:
    On Thu, Feb 17, 2011 at 05:45, Ian Holsman wrote:
    I'm not sure it makes sense to  all the testing packages under a different
    umbrella that covers the code they test.
    While there might be commonalities building a test harness, I would think
    that each testing tool would need to have deep knowledge of the tool's
    internals that it is testing. as such it would need someone with the
    experience to code it.
    That's pretty much true indeed if you are talking about tests for a
    project or closely tightened projects such as Herriot in Hadoop.
    Speaking of tools there are some benefits though. Say, PigUnit and
    MRUnit are both xUnit frameworks. The former allows you to run Pig
    jobs in local and cluster mode. The latter is to validate MB jobs
    without a need to fire up a cluster.
    I don't see what advantage combining PigUnit & say 'MRUnit' would be for
    example.
    Don't you think Pig user would benefit if Pig scripts can be tested
    against MRUnit which gives you a flavor of cluster environment without
    one? Now, do you think it is likely that someone will go great lengths
    to make such an effort and build such a bridge right now?

    Cos
    --I
    On Feb 16, 2011, at 2:50 PM, Konstantin Boudnik wrote:

    Steve.

    If the project under discussion will provide a common harness where such a
    test
    artifact (think of a Maven artifact for example) will click and will be
    executed automatically with all needed tools and dependencies resolved for
    you
    - would it be appealing for end-users' cause?

    As Joep said this "...will reduce the effort to take any (set of ) changes
    from development into production." Take it one step further: when your
    cluster
    is 'assembled' you need to validate it (on top of a concrete OS, etc.); is
    it
    desirable to follow N-steps process to bring about whatever testing
    work-load
    you need or you'd prefer to simply do something like:

    wget http://workloads.internal.mydomain.com/stackValidations/v12.4.pom \
    && mvn verify

    and check the results later on?

    These gonna be the same tools that dev. use for their tasks although
    worksets
    will be different. So what?

    Cos
    On Wed, Feb 16, 2011 at 11:37AM, Steve Loughran wrote:
    On 15/02/11 21:58, Konstantin Boudnik wrote:
    While MrUnit discussion draws to its natural conclusion I would like
    to bring up another point which might be well aligned with that
    discussion. Patrick Hunt has brought up this idea earlier today and I
    believe it has to be elaborated further.

    A number of testing projects both for Hadoop and Hadoop-related
    component were brought to life over last year or two. Among those are
    MRUnit, PigUnit, YCSB, Herriot, and perhaps a few more. They all
    focusing on more or less the same problem e.g. validation of Hadoop or
    on-top-of-Hadoop components, or application level testing for Hadoop.
    However, the fact that they all are spread across a wide variety of
    projects seems to confuse/mislead Hadoop users.

    How about incubating a bigger Hadoop (Pig, Oozie, HBase) testing
    project which will take care about development and support of common
    (where's possible) tools, frameworks and the like? Please feel free to
    share your thoughts :)
    --
    I think it would be good though specific projects will need/have their
    own testing needs -I'd expect more focus for testing redistributables to
    be on helping Hadoop users test their stuff against subsets of data,
    rather than the hadoop-*-dev problem of "stressing the hadoop stack once
    your latest patch is applied".

    That said, the whole problem of qualifying an OS, Java release and
    cluster is something we'd expect most end user teams to have to do
    -right now terasort is the main stress test.
  • Steve Loughran at Feb 18, 2011 at 4:26 pm

    On 17/02/2011 18:36, Eric Yang wrote:

    I think the bigger concern is that Hadoop ecosystem does not have a standard
    method in linking dependencies. Hbase depends on Zookeeper, and Pig depends
    on Hadoop and Hbase. Then pig decided to put hadoop-core jar in it's own
    jar file. Chukwa depends on pig + hbase + hadoop and zookeeper. The
    version incompatibility is probably what driving people nuts. Hence, there
    is a new proposal on how to integrate among hadoop ecosystem. I urge
    project owners to review the proposal and provide feedbacks.

    The proposal is located at:

    https://issues.apache.org/jira/secure/attachment/12470823/deployment.pdf

    The related jiras are:

    https://issues.apache.org/jira/browse/HADOOP-6255
    https://issues.apache.org/jira/browse/PIG-1857

    There are plans to file more jiras for related projects.
    I think a focus on RPMs/debs and then say "local VM for testing",
    because that gets people used to deploying on Linux from day one, rather
    than developing on windows and being surprised that things work
    differently in production. You still have the problem of debugging
    remotely to a VM in the VLAN, but that's tractable.
    The integration
    would also be a lot easier if all related projects are using maven for
    dependency management.
    Do you mean Maven the tool or the maven repo format?
  • Eric Yang at Feb 18, 2011 at 5:50 pm

    On 2/18/11 8:24 AM, "Steve Loughran" wrote:
    On 17/02/2011 18:36, Eric Yang wrote:

    I think the bigger concern is that Hadoop ecosystem does not have a standard
    method in linking dependencies. Hbase depends on Zookeeper, and Pig depends
    on Hadoop and Hbase. Then pig decided to put hadoop-core jar in it's own
    jar file. Chukwa depends on pig + hbase + hadoop and zookeeper. The
    version incompatibility is probably what driving people nuts. Hence, there
    is a new proposal on how to integrate among hadoop ecosystem. I urge
    project owners to review the proposal and provide feedbacks.

    The proposal is located at:

    https://issues.apache.org/jira/secure/attachment/12470823/deployment.pdf

    The related jiras are:

    https://issues.apache.org/jira/browse/HADOOP-6255
    https://issues.apache.org/jira/browse/PIG-1857

    There are plans to file more jiras for related projects.
    I think a focus on RPMs/debs and then say "local VM for testing",
    because that gets people used to deploying on Linux from day one, rather
    than developing on windows and being surprised that things work
    differently in production. You still have the problem of debugging
    remotely to a VM in the VLAN, but that's tractable.
    Packaging, VM testing, scale, then deployment automation. My goal is
    "deployment automation", and this requires polishing each steps along the
    way. For hadoop testing project, it depends on the type of testing that you
    are focus on. It will be easier to understand the scope and apply
    implementation.
    The integration
    would also be a lot easier if all related projects are using maven for
    dependency management.
    Do you mean Maven the tool or the maven repo format?
    I meant both, it is a little easier to manage dependencies in maven build
    tools than ivy, and deploying jar files to maven repository is also an
    essential mean for integrate project more closely.

    Regards,
    Eric
  • Allen Wittenauer at Feb 17, 2011 at 5:14 pm

    On Feb 16, 2011, at 11:50 AM, Konstantin Boudnik wrote:
    As Joep said this "...will reduce the effort to take any (set of ) changes
    from development into production." Take it one step further: when your cluster
    is 'assembled' you need to validate it (on top of a concrete OS, etc.); is it
    desirable to follow N-steps process to bring about whatever testing work-load
    you need or you'd prefer to simply do something like:

    wget http://workloads.internal.mydomain.com/stackValidations/v12.4.pom \
    && mvn verify

    and check the results later on?
    We need a definition of what 'validate' means. Is it "does stuff run"? In practice, that's useless. Even horribly misconfigured grids can usually run something at least once.
  • Rottinghuis, Joep at Feb 16, 2011 at 6:23 pm
    +1
    Having a coherent approach for system level testing increases confidence in the various Hadoop releases and and will reduce the effort to take any (set of ) changes from development into production.
    The more automated and formalized system testing, the better!

    Thanks,

    Joep
    ________________________________________
    From: [email protected] [c[email protected]] On Behalf Of Konstantin Boudnik [[email protected]]
    Sent: Tuesday, February 15, 2011 1:58 PM
    To: [email protected]
    Subject: Hadoop testing project [Was: [VOTE] Abandon mrunit MapReduce contrib]

    While MrUnit discussion draws to its natural conclusion I would like
    to bring up another point which might be well aligned with that
    discussion. Patrick Hunt has brought up this idea earlier today and I
    believe it has to be elaborated further.

    A number of testing projects both for Hadoop and Hadoop-related
    component were brought to life over last year or two. Among those are
    MRUnit, PigUnit, YCSB, Herriot, and perhaps a few more. They all
    focusing on more or less the same problem e.g. validation of Hadoop or
    on-top-of-Hadoop components, or application level testing for Hadoop.
    However, the fact that they all are spread across a wide variety of
    projects seems to confuse/mislead Hadoop users.

    How about incubating a bigger Hadoop (Pig, Oozie, HBase) testing
    project which will take care about development and support of common
    (where's possible) tools, frameworks and the like? Please feel free to
    share your thoughts :)
    --
    Take care,
    Konstantin (Cos) Boudnik

    On Tue, Feb 15, 2011 at 10:44, Eric Sammer wrote:
    I've started the wiki page proposal for Incubator for mrunit. I'll ping
    people off list for mentoring. Much appreciated for all the help!
    On Tue, Feb 15, 2011 at 1:36 PM, Nigel Daley wrote:

    I'm happy to help mentor as well.

    Cheers,
    Nige
    On Feb 11, 2011, at 11:52 AM, Patrick Hunt wrote:

    On Fri, Feb 11, 2011 at 9:44 AM, Mattmann, Chris A (388J)
    wrote:
    Guys, BTW, if you need help or a mentor in Apache Incubator-ville for
    MRUnit, I would be happy to help.
    I was going to suggest the same thing (mrunit to incubator). I would
    also be happy to be a mentor.

    Patrick
  • Aaron Kimball at Feb 17, 2011 at 7:28 pm
    Working to develop code as a client of Hadoop is a path full of landmines.
    The more tools we can provide to users to improve the quality of their code,
    the better. I think it is important, though, to draw a clear distinction
    between tools intended for different audiences. Talking about system testing
    tools for Hadoop release/QA processes is good, but one of the benefits I see
    of calling MRUnit (designed for client app developers) out of the Hadoop
    project at large is to increase its usability. Conflating it with a system
    testing tool (for release engineers) would not fulfill that need.

    As long as the new project can release several distinct artifacts in a way
    that makes their intent clear to the user community, I'm in favor of
    gathering as many perspectives on Hadoop testing under one "roof" as
    possible.

    - Aaron

    On Wed, Feb 16, 2011 at 10:19 AM, Rottinghuis, Joep
    wrote:
    +1
    Having a coherent approach for system level testing increases confidence in
    the various Hadoop releases and and will reduce the effort to take any (set
    of ) changes from development into production.
    The more automated and formalized system testing, the better!

    Thanks,

    Joep
    ________________________________________
    From: cos@boudnik.org [[email protected]] On Behalf Of Konstantin Boudnik [
    [email protected]]
    Sent: Tuesday, February 15, 2011 1:58 PM
    To: [email protected]
    Subject: Hadoop testing project [Was: [VOTE] Abandon mrunit MapReduce
    contrib]

    While MrUnit discussion draws to its natural conclusion I would like
    to bring up another point which might be well aligned with that
    discussion. Patrick Hunt has brought up this idea earlier today and I
    believe it has to be elaborated further.

    A number of testing projects both for Hadoop and Hadoop-related
    component were brought to life over last year or two. Among those are
    MRUnit, PigUnit, YCSB, Herriot, and perhaps a few more. They all
    focusing on more or less the same problem e.g. validation of Hadoop or
    on-top-of-Hadoop components, or application level testing for Hadoop.
    However, the fact that they all are spread across a wide variety of
    projects seems to confuse/mislead Hadoop users.

    How about incubating a bigger Hadoop (Pig, Oozie, HBase) testing
    project which will take care about development and support of common
    (where's possible) tools, frameworks and the like? Please feel free to
    share your thoughts :)
    --
    Take care,
    Konstantin (Cos) Boudnik

    On Tue, Feb 15, 2011 at 10:44, Eric Sammer wrote:
    I've started the wiki page proposal for Incubator for mrunit. I'll ping
    people off list for mentoring. Much appreciated for all the help!
    On Tue, Feb 15, 2011 at 1:36 PM, Nigel Daley wrote:

    I'm happy to help mentor as well.

    Cheers,
    Nige
    On Feb 11, 2011, at 11:52 AM, Patrick Hunt wrote:

    On Fri, Feb 11, 2011 at 9:44 AM, Mattmann, Chris A (388J)
    wrote:
    Guys, BTW, if you need help or a mentor in Apache Incubator-ville for
    MRUnit, I would be happy to help.
    I was going to suggest the same thing (mrunit to incubator). I would
    also be happy to be a mentor.

    Patrick
  • Konstantin Boudnik at Feb 17, 2011 at 8:03 pm
    On Thu, Feb 17, 2011 at 11:27, Aaron Kimball wrote:
    Working to develop code as a client of Hadoop is a path full of landmines.
    The more tools we can provide to users to improve the quality of their code,
    the better. I think it is important, though, to draw a clear distinction
    between tools intended for different audiences. Talking about system testing
    tools for Hadoop release/QA processes is good, but one of the benefits I see
    of calling MRUnit (designed for client app developers) out of the Hadoop
    project at large is to increase its usability. Conflating it with a system
    testing tool (for release engineers) would not fulfill that need.
    Yup, they are different all right.
    As long as the new project can release several distinct artifacts in a way
    that makes their intent clear to the user community, I'm in favor of
    gathering as many perspectives on Hadoop testing under one "roof" as
    possible.
    That's the goal.
    - Aaron

    On Wed, Feb 16, 2011 at 10:19 AM, Rottinghuis, Joep
    wrote:
    +1
    Having a coherent approach for system level testing increases confidence in
    the various Hadoop releases and and will reduce the effort to take any (set
    of ) changes from development into production.
    The more automated and formalized system testing, the better!

    Thanks,

    Joep
    ________________________________________
    From: cos@boudnik.org [[email protected]] On Behalf Of Konstantin Boudnik [
    [email protected]]
    Sent: Tuesday, February 15, 2011 1:58 PM
    To: [email protected]
    Subject: Hadoop testing project [Was: [VOTE] Abandon mrunit MapReduce
    contrib]

    While MrUnit discussion draws to its natural conclusion I would like
    to bring up another point which might be well aligned with that
    discussion. Patrick Hunt has brought up this idea earlier today and I
    believe it has to be elaborated further.

    A number of testing projects both for Hadoop and Hadoop-related
    component were brought to life over last year or two. Among those are
    MRUnit, PigUnit, YCSB, Herriot, and perhaps a few more. They all
    focusing on more or less the same problem e.g. validation of Hadoop or
    on-top-of-Hadoop components, or application level testing for Hadoop.
    However, the fact that they all are spread across a wide variety of
    projects seems to confuse/mislead Hadoop users.

    How about incubating a bigger Hadoop (Pig, Oozie, HBase) testing
    project which will take care about development and support of common
    (where's possible) tools, frameworks and the like? Please feel free to
    share your thoughts :)
    --
    Take care,
    Konstantin (Cos) Boudnik

    On Tue, Feb 15, 2011 at 10:44, Eric Sammer wrote:
    I've started the wiki page proposal for Incubator for mrunit. I'll ping
    people off list for mentoring. Much appreciated for all the help!
    On Tue, Feb 15, 2011 at 1:36 PM, Nigel Daley wrote:

    I'm happy to help mentor as well.

    Cheers,
    Nige
    On Feb 11, 2011, at 11:52 AM, Patrick Hunt wrote:

    On Fri, Feb 11, 2011 at 9:44 AM, Mattmann, Chris A (388J)
    wrote:
    Guys, BTW, if you need help or a mentor in Apache Incubator-ville for
    MRUnit, I would be happy to help.
    I was going to suggest the same thing (mrunit to incubator). I would
    also be happy to be a mentor.

    Patrick

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgeneral @
categorieshadoop
postedFeb 15, '11 at 9:59p
activeFeb 18, '11 at 5:50p
posts16
users9
websitehadoop.apache.org
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase