FAQ
Copying common-dev.

Summarizing the below discussion: What should be the tools layout after mavenization?

Option #1: Have hadoop-tools at top level i.e
trunk/
hadoop-tools/
hadoop-distcp/
Pros:
Cleaner layout.
In future, tools could be released separately from Hadoop releases

Cons: Difficult to maintain

Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if they are depending on MapReduce/HDFS/Common respectively.
For ex:
hadoop-mapreduce-project/
hadoop-mr-tools/
hadoop-distcp/

Pros: Easy to maintain
Cons: Still has tight coupling with related projects.

Personally, I'm fine with any of the above options. Looking for suggestions and reaching a consensus on this.

Thanks
Amareshwari

On 8/30/11 12:10 AM, "Allen Wittenauer" wrote:



I have a feeling this discussion should get moved to common-dev or even to general.

My #1 question is if tools is basically contrib reborn. If not, what makes it different?
On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:

Some questions on making hadoop-tools top level under trunk,

1. Should the patches for tools be created against Hadoop Common?
2. What will happen to the tools test automation? Will it run as part of Hadoop Common tests?
3. Will it introduce a dependency from MapReduce to Common? Or is this taken care in Mavenization?


Thanks
Amareshwari

On 8/26/11 10:17 PM, "Alejandro Abdelnur" wrote:

Please, don't add more Mavenization work on us (eventually I want to go back
to coding)

Given that Hadoop is already Mavenized, the patch should be Mavenized.

What will have to be done extra (besides Mavenizing distcp) is to create a
hadoop-tools module at root level and within it a hadoop-distcp module.

The hadoop-tools POM will look pretty much like the hadoop-common-project
POM.

The hadoop-distcp POM should follow the hadoop-common POM patterns.

Thanks.

Alejandro

On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
amarsri@yahoo-inc.com> wrote:
Agree with Mithun and Robert. DistCp and Tools restructuring are separate
tasks. Since DistCp code is ready to be committed, it need not wait for the
Tools separation from MR/HDFS.
I would say it can go into contrib as the patch is now, and when the tools
restructuring happens it would be just an svn mv. If there are no issues
with this proposal I can commit the code tomorrow.

Thanks
Amareshwari

On 8/26/11 7:45 PM, "Robert Evans" wrote:

I agree with Mithun. They are related but this goes beyond distcpv2 and
should not block distcpv2 from going in. It would be very nice, however, to
get the layout settled soon so that we all know where to find something when
we want to work on it.

Also +1 for Alejandro's I also prefer to keep tools at the trunk level.

Even though HDFS, Common, and Mapreduce and perhaps soon tools are separate
modules right now, there is still tight coupling between the different
pieces, especially with tests. IMO until we can reduce that coupling we
should treat building and testing Hadoop as a single project instead of
trying to keep them separate.

--Bobby

On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <mithun.radhakrishnan@yahoo.com>
wrote:

Would it be acceptable if retooling of tools/ were taken up separately? It
sounds to me like this might be a distinct (albeit related) task.

Mithun


________________________________
From: Giridharan Kesavan <gkesavan@hortonworks.com>
To: mapreduce-dev@hadoop.apache.org
Sent: Friday, August 26, 2011 12:04 PM
Subject: Re: DistCpV2 in 0.23

+1 to Alejandro's

I prefer to keep the hadoop-tools at trunk level.

-Giri

On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tucu@cloudera.com>
wrote:
I'd suggest putting hadoop-tools either at trunk/ level or having a a tools
aggregator module for hdfs and other for common.

I personal would prefer at trunk/.

Thanks.

Alejandro

On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
amarsri@yahoo-inc.com> wrote:
Agree. It should be separate maven module (and patch puts it as separate
maven module now). And top level for hadoop tools is nice to have, but
it
becomes hard to maintain until patch automation tests run the tests
under
tools. Currently we see many times the changes in HDFS effecting RAID
tests
in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.

I propose we can have something like the following:

trunk/
- hadoop-mapreduce
- hadoop-mr-client
- hadoop-yarn
- hadoop-tools
- hadoop-streaming
- hadoop-archives
- hadoop-distcp

Thoughts?

@Eli and @JD, we did not replace old legacy distcp because this is
really a
complete rewrite and did not want to remove it until users are
familiarized
with new one.

On 8/26/11 12:51 AM, "Todd Lipcon" wrote:

Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
in there as well - ie tools that are downstream of MR and/or HDFS.

On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
mahadev@hortonworks.com>
wrote:
+1 for a seperate module in hadoop-mapreduce-project. I think
hadoop-mapreduce-client might not be right place for it. We might have
to pick a new maven module under hadoop-mapreduce-project that could
host streaming/distcp/hadoop archives.

thanks
mahadev

On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
tucu@cloudera.com>
wrote:
Agree, it should be a separate maven module.

And it should be under hadoop-mapreduce-client, right?

And now that we are in the topic, the same should go for streaming,
no?
Thanks.

Alejandro

On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <todd@cloudera.com>
wrote:
On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <eli@cloudera.com>
wrote:
Nice work! I definitely think this should go in 23 and 20x.

Agree with JD that it should be in the core code, not contrib. If
it's going to be maintained then we should put it in the core
code.
Now that we're all mavenized, though, a separate maven module and
artifact does make sense IMO - ie "hadoop jar
hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"

-Todd
--
Todd Lipcon
Software Engineer, Cloudera


--
Todd Lipcon
Software Engineer, Cloudera


--
-Giri

Search Discussions

  • Vinod Kumar Vavilapalli at Aug 30, 2011 at 12:43 pm
    As long as hadoop-tools is in some directory at some depth under trunk,
    release of the hadoop-tools is tied to the release of core.

    So we actually have these two options instead:
    (1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools)
    -- Sources at tools/trunk/hadoop-distcp
    -- Each tool will work with specific version of Hadoop core.
    -- Releases can really be separate
    (2) Same source tree: trunk/
    -- Sources at either (1.1) trunk/hadoop-tools or (1.2)
    trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/
    -- Given release isn't decoupled anyway, either will work. (1.2) is
    prefereable if building mapreduce builds the tools also.

    +Vinod

    On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu wrote:

    Copying common-dev.

    Summarizing the below discussion: What should be the tools layout after
    mavenization?

    Option #1: Have hadoop-tools at top level i.e
    trunk/
    hadoop-tools/
    hadoop-distcp/
    Pros:
    Cleaner layout.
    In future, tools could be released separately from Hadoop releases

    Cons: Difficult to maintain

    Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if
    they are depending on MapReduce/HDFS/Common respectively.
    For ex:
    hadoop-mapreduce-project/
    hadoop-mr-tools/
    hadoop-distcp/

    Pros: Easy to maintain
    Cons: Still has tight coupling with related projects.

    Personally, I'm fine with any of the above options. Looking for suggestions
    and reaching a consensus on this.

    Thanks
    Amareshwari

    On 8/30/11 12:10 AM, "Allen Wittenauer" wrote:



    I have a feeling this discussion should get moved to common-dev or even to
    general.

    My #1 question is if tools is basically contrib reborn. If not, what makes
    it different?
    On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:

    Some questions on making hadoop-tools top level under trunk,

    1. Should the patches for tools be created against Hadoop Common?
    2. What will happen to the tools test automation? Will it run as part of
    Hadoop Common tests?
    3. Will it introduce a dependency from MapReduce to Common? Or is this
    taken care in Mavenization?

    Thanks
    Amareshwari

    On 8/26/11 10:17 PM, "Alejandro Abdelnur" wrote:

    Please, don't add more Mavenization work on us (eventually I want to go back
    to coding)

    Given that Hadoop is already Mavenized, the patch should be Mavenized.

    What will have to be done extra (besides Mavenizing distcp) is to create a
    hadoop-tools module at root level and within it a hadoop-distcp module.

    The hadoop-tools POM will look pretty much like the hadoop-common-project
    POM.

    The hadoop-distcp POM should follow the hadoop-common POM patterns.

    Thanks.

    Alejandro

    On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
    amarsri@yahoo-inc.com> wrote:
    Agree with Mithun and Robert. DistCp and Tools restructuring are
    separate
    tasks. Since DistCp code is ready to be committed, it need not wait for
    the
    Tools separation from MR/HDFS.
    I would say it can go into contrib as the patch is now, and when the
    tools
    restructuring happens it would be just an svn mv. If there are no
    issues
    with this proposal I can commit the code tomorrow.

    Thanks
    Amareshwari

    On 8/26/11 7:45 PM, "Robert Evans" wrote:

    I agree with Mithun. They are related but this goes beyond distcpv2 and
    should not block distcpv2 from going in. It would be very nice,
    however, to
    get the layout settled soon so that we all know where to find something
    when
    we want to work on it.

    Also +1 for Alejandro's I also prefer to keep tools at the trunk level.

    Even though HDFS, Common, and Mapreduce and perhaps soon tools are
    separate
    modules right now, there is still tight coupling between the different
    pieces, especially with tests. IMO until we can reduce that coupling we
    should treat building and testing Hadoop as a single project instead of
    trying to keep them separate.

    --Bobby

    On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
    mithun.radhakrishnan@yahoo.com>
    wrote:

    Would it be acceptable if retooling of tools/ were taken up separately?
    It
    sounds to me like this might be a distinct (albeit related) task.

    Mithun


    ________________________________
    From: Giridharan Kesavan <gkesavan@hortonworks.com>
    To: mapreduce-dev@hadoop.apache.org
    Sent: Friday, August 26, 2011 12:04 PM
    Subject: Re: DistCpV2 in 0.23

    +1 to Alejandro's

    I prefer to keep the hadoop-tools at trunk level.

    -Giri

    On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tucu@cloudera.com>
    wrote:
    I'd suggest putting hadoop-tools either at trunk/ level or having a a tools
    aggregator module for hdfs and other for common.

    I personal would prefer at trunk/.

    Thanks.

    Alejandro

    On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
    amarsri@yahoo-inc.com> wrote:
    Agree. It should be separate maven module (and patch puts it as
    separate
    maven module now). And top level for hadoop tools is nice to have, but
    it
    becomes hard to maintain until patch automation tests run the tests
    under
    tools. Currently we see many times the changes in HDFS effecting RAID
    tests
    in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.

    I propose we can have something like the following:

    trunk/
    - hadoop-mapreduce
    - hadoop-mr-client
    - hadoop-yarn
    - hadoop-tools
    - hadoop-streaming
    - hadoop-archives
    - hadoop-distcp

    Thoughts?

    @Eli and @JD, we did not replace old legacy distcp because this is
    really a
    complete rewrite and did not want to remove it until users are
    familiarized
    with new one.

    On 8/26/11 12:51 AM, "Todd Lipcon" wrote:

    Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
    in there as well - ie tools that are downstream of MR and/or HDFS.

    On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
    mahadev@hortonworks.com>
    wrote:
    +1 for a seperate module in hadoop-mapreduce-project. I think
    hadoop-mapreduce-client might not be right place for it. We might
    have
    to pick a new maven module under hadoop-mapreduce-project that could
    host streaming/distcp/hadoop archives.

    thanks
    mahadev

    On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
    tucu@cloudera.com>
    wrote:
    Agree, it should be a separate maven module.

    And it should be under hadoop-mapreduce-client, right?

    And now that we are in the topic, the same should go for streaming,
    no?
    Thanks.

    Alejandro

    On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <todd@cloudera.com>
    wrote:
    On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <eli@cloudera.com>
    wrote:
    Nice work! I definitely think this should go in 23 and 20x.

    Agree with JD that it should be in the core code, not contrib. If
    it's going to be maintained then we should put it in the core
    code.
    Now that we're all mavenized, though, a separate maven module and
    artifact does make sense IMO - ie "hadoop jar
    hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"

    -Todd
    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    -Giri

  • Amareshwari Sri Ramadasu at Sep 6, 2011 at 7:14 am
    + Copying common dev.

    On 9/6/11 10:58 AM, "Mithun Radhakrishnan" wrote:

    I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm hoping that's going to be acceptable to this forum. This way, moving it out to a separate source tree should be easier.

    It would be nice to have clarity on how tools will be dealt with. It'd be convenient to distcp in trunk. (It's tiny and useful.) On the other hand, that might be opening doors to adding too much, and complicating the build/release. I'd appreciate advice on which way is best.

    In the meantime, I'll align the distcpv2 pom.xml with the maven-ized version of things, as per Tucu's suggestions.

    Mithun


    ________________________________
    From: Vinod Kumar Vavilapalli <vinodkv@hortonworks.com>
    To: mapreduce-dev@hadoop.apache.org
    Cc: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>; Mithun Radhakrishnan <mithun.radhakrishnan@yahoo.com>
    Sent: Tuesday, August 30, 2011 6:13 PM
    Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

    As long as hadoop-tools is in some directory at some depth under trunk,
    release of the hadoop-tools is tied to the release of core.

    So we actually have these two options instead:
    (1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools)
    -- Sources at tools/trunk/hadoop-distcp
    -- Each tool will work with specific version of Hadoop core.
    -- Releases can really be separate
    (2) Same source tree: trunk/
    -- Sources at either (1.1) trunk/hadoop-tools or (1.2)
    trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/
    -- Given release isn't decoupled anyway, either will work. (1.2) is
    prefereable if building mapreduce builds the tools also.

    +Vinod

    On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu wrote:

    Copying common-dev.

    Summarizing the below discussion: What should be the tools layout after
    mavenization?

    Option #1: Have hadoop-tools at top level i.e
    trunk/
    hadoop-tools/
    hadoop-distcp/
    Pros:
    Cleaner layout.
    In future, tools could be released separately from Hadoop releases

    Cons: Difficult to maintain

    Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if
    they are depending on MapReduce/HDFS/Common respectively.
    For ex:
    hadoop-mapreduce-project/
    hadoop-mr-tools/
    hadoop-distcp/

    Pros: Easy to maintain
    Cons: Still has tight coupling with related projects.

    Personally, I'm fine with any of the above options. Looking for suggestions
    and reaching a consensus on this.

    Thanks
    Amareshwari

    On 8/30/11 12:10 AM, "Allen Wittenauer" wrote:



    I have a feeling this discussion should get moved to common-dev or even to
    general.

    My #1 question is if tools is basically contrib reborn. If not, what makes
    it different?
    On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:

    Some questions on making hadoop-tools top level under trunk,

    1. Should the patches for tools be created against Hadoop Common?
    2. What will happen to the tools test automation? Will it run as part of
    Hadoop Common tests?
    3. Will it introduce a dependency from MapReduce to Common? Or is this
    taken care in Mavenization?

    Thanks
    Amareshwari

    On 8/26/11 10:17 PM, "Alejandro Abdelnur" wrote:

    Please, don't add more Mavenization work on us (eventually I want to go back
    to coding)

    Given that Hadoop is already Mavenized, the patch should be Mavenized.

    What will have to be done extra (besides Mavenizing distcp) is to create a
    hadoop-tools module at root level and within it a hadoop-distcp module.

    The hadoop-tools POM will look pretty much like the hadoop-common-project
    POM.

    The hadoop-distcp POM should follow the hadoop-common POM patterns.

    Thanks.

    Alejandro

    On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
    amarsri@yahoo-inc.com> wrote:
    Agree with Mithun and Robert. DistCp and Tools restructuring are
    separate
    tasks. Since DistCp code is ready to be committed, it need not wait for
    the
    Tools separation from MR/HDFS.
    I would say it can go into contrib as the patch is now, and when the
    tools
    restructuring happens it would be just an svn mv. If there are no
    issues
    with this proposal I can commit the code tomorrow.

    Thanks
    Amareshwari

    On 8/26/11 7:45 PM, "Robert Evans" wrote:

    I agree with Mithun. They are related but this goes beyond distcpv2 and
    should not block distcpv2 from going in. It would be very nice,
    however, to
    get the layout settled soon so that we all know where to find something
    when
    we want to work on it.

    Also +1 for Alejandro's I also prefer to keep tools at the trunk level.

    Even though HDFS, Common, and Mapreduce and perhaps soon tools are
    separate
    modules right now, there is still tight coupling between the different
    pieces, especially with tests. IMO until we can reduce that coupling we
    should treat building and testing Hadoop as a single project instead of
    trying to keep them separate.

    --Bobby

    On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
    mithun.radhakrishnan@yahoo.com>
    wrote:

    Would it be acceptable if retooling of tools/ were taken up separately?
    It
    sounds to me like this might be a distinct (albeit related) task.

    Mithun


    ________________________________
    From: Giridharan Kesavan <gkesavan@hortonworks.com>
    To: mapreduce-dev@hadoop.apache.org
    Sent: Friday, August 26, 2011 12:04 PM
    Subject: Re: DistCpV2 in 0.23

    +1 to Alejandro's

    I prefer to keep the hadoop-tools at trunk level.

    -Giri

    On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tucu@cloudera.com>
    wrote:
    I'd suggest putting hadoop-tools either at trunk/ level or having a a tools
    aggregator module for hdfs and other for common.

    I personal would prefer at trunk/.

    Thanks.

    Alejandro

    On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
    amarsri@yahoo-inc.com> wrote:
    Agree. It should be separate maven module (and patch puts it as
    separate
    maven module now). And top level for hadoop tools is nice to have, but
    it
    becomes hard to maintain until patch automation tests run the tests
    under
    tools. Currently we see many times the changes in HDFS effecting RAID
    tests
    in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.

    I propose we can have something like the following:

    trunk/
    - hadoop-mapreduce
    - hadoop-mr-client
    - hadoop-yarn
    - hadoop-tools
    - hadoop-streaming
    - hadoop-archives
    - hadoop-distcp

    Thoughts?

    @Eli and @JD, we did not replace old legacy distcp because this is
    really a
    complete rewrite and did not want to remove it until users are
    familiarized
    with new one.

    On 8/26/11 12:51 AM, "Todd Lipcon" wrote:

    Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
    in there as well - ie tools that are downstream of MR and/or HDFS.

    On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
    mahadev@hortonworks.com>
    wrote:
    +1 for a seperate module in hadoop-mapreduce-project. I think
    hadoop-mapreduce-client might not be right place for it. We might
    have
    to pick a new maven module under hadoop-mapreduce-project that could
    host streaming/distcp/hadoop archives.

    thanks
    mahadev

    On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
    tucu@cloudera.com>
    wrote:
    Agree, it should be a separate maven module.

    And it should be under hadoop-mapreduce-client, right?

    And now that we are in the topic, the same should go for streaming,
    no?
    Thanks.

    Alejandro

    On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <todd@cloudera.com>
    wrote:
    On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <eli@cloudera.com>
    wrote:
    Nice work! I definitely think this should go in 23 and 20x.

    Agree with JD that it should be in the core code, not contrib. If
    it's going to be maintained then we should put it in the core
    code.
    Now that we're all mavenized, though, a separate maven module and
    artifact does make sense IMO - ie "hadoop jar
    hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"

    -Todd
    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    -Giri

  • Arun C Murthy at Sep 6, 2011 at 7:20 am
    +1
    On Sep 6, 2011, at 12:13 AM, Amareshwari Sri Ramadasu wrote:

    + Copying common dev.

    On 9/6/11 10:58 AM, "Mithun Radhakrishnan" wrote:

    I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm hoping that's going to be acceptable to this forum. This way, moving it out to a separate source tree should be easier.

    It would be nice to have clarity on how tools will be dealt with. It'd be convenient to distcp in trunk. (It's tiny and useful.) On the other hand, that might be opening doors to adding too much, and complicating the build/release. I'd appreciate advice on which way is best.

    In the meantime, I'll align the distcpv2 pom.xml with the maven-ized version of things, as per Tucu's suggestions.

    Mithun


    ________________________________
    From: Vinod Kumar Vavilapalli <vinodkv@hortonworks.com>
    To: mapreduce-dev@hadoop.apache.org
    Cc: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>; Mithun Radhakrishnan <mithun.radhakrishnan@yahoo.com>
    Sent: Tuesday, August 30, 2011 6:13 PM
    Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

    As long as hadoop-tools is in some directory at some depth under trunk,
    release of the hadoop-tools is tied to the release of core.

    So we actually have these two options instead:
    (1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools)
    -- Sources at tools/trunk/hadoop-distcp
    -- Each tool will work with specific version of Hadoop core.
    -- Releases can really be separate
    (2) Same source tree: trunk/
    -- Sources at either (1.1) trunk/hadoop-tools or (1.2)
    trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/
    -- Given release isn't decoupled anyway, either will work. (1.2) is
    prefereable if building mapreduce builds the tools also.

    +Vinod


    On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu <
    amarsri@yahoo-inc.com> wrote:
    Copying common-dev.

    Summarizing the below discussion: What should be the tools layout after
    mavenization?

    Option #1: Have hadoop-tools at top level i.e
    trunk/
    hadoop-tools/
    hadoop-distcp/
    Pros:
    Cleaner layout.
    In future, tools could be released separately from Hadoop releases

    Cons: Difficult to maintain

    Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if
    they are depending on MapReduce/HDFS/Common respectively.
    For ex:
    hadoop-mapreduce-project/
    hadoop-mr-tools/
    hadoop-distcp/

    Pros: Easy to maintain
    Cons: Still has tight coupling with related projects.

    Personally, I'm fine with any of the above options. Looking for suggestions
    and reaching a consensus on this.

    Thanks
    Amareshwari

    On 8/30/11 12:10 AM, "Allen Wittenauer" wrote:



    I have a feeling this discussion should get moved to common-dev or even to
    general.

    My #1 question is if tools is basically contrib reborn. If not, what makes
    it different?
    On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:

    Some questions on making hadoop-tools top level under trunk,

    1. Should the patches for tools be created against Hadoop Common?
    2. What will happen to the tools test automation? Will it run as part of
    Hadoop Common tests?
    3. Will it introduce a dependency from MapReduce to Common? Or is this
    taken care in Mavenization?

    Thanks
    Amareshwari

    On 8/26/11 10:17 PM, "Alejandro Abdelnur" wrote:

    Please, don't add more Mavenization work on us (eventually I want to go back
    to coding)

    Given that Hadoop is already Mavenized, the patch should be Mavenized.

    What will have to be done extra (besides Mavenizing distcp) is to create a
    hadoop-tools module at root level and within it a hadoop-distcp module.

    The hadoop-tools POM will look pretty much like the hadoop-common-project
    POM.

    The hadoop-distcp POM should follow the hadoop-common POM patterns.

    Thanks.

    Alejandro

    On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
    amarsri@yahoo-inc.com> wrote:
    Agree with Mithun and Robert. DistCp and Tools restructuring are
    separate
    tasks. Since DistCp code is ready to be committed, it need not wait for
    the
    Tools separation from MR/HDFS.
    I would say it can go into contrib as the patch is now, and when the
    tools
    restructuring happens it would be just an svn mv. If there are no
    issues
    with this proposal I can commit the code tomorrow.

    Thanks
    Amareshwari

    On 8/26/11 7:45 PM, "Robert Evans" wrote:

    I agree with Mithun. They are related but this goes beyond distcpv2 and
    should not block distcpv2 from going in. It would be very nice,
    however, to
    get the layout settled soon so that we all know where to find something
    when
    we want to work on it.

    Also +1 for Alejandro's I also prefer to keep tools at the trunk level.

    Even though HDFS, Common, and Mapreduce and perhaps soon tools are
    separate
    modules right now, there is still tight coupling between the different
    pieces, especially with tests. IMO until we can reduce that coupling we
    should treat building and testing Hadoop as a single project instead of
    trying to keep them separate.

    --Bobby

    On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
    mithun.radhakrishnan@yahoo.com>
    wrote:

    Would it be acceptable if retooling of tools/ were taken up separately?
    It
    sounds to me like this might be a distinct (albeit related) task.

    Mithun


    ________________________________
    From: Giridharan Kesavan <gkesavan@hortonworks.com>
    To: mapreduce-dev@hadoop.apache.org
    Sent: Friday, August 26, 2011 12:04 PM
    Subject: Re: DistCpV2 in 0.23

    +1 to Alejandro's

    I prefer to keep the hadoop-tools at trunk level.

    -Giri

    On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tucu@cloudera.com>
    wrote:
    I'd suggest putting hadoop-tools either at trunk/ level or having a a tools
    aggregator module for hdfs and other for common.

    I personal would prefer at trunk/.

    Thanks.

    Alejandro

    On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
    amarsri@yahoo-inc.com> wrote:
    Agree. It should be separate maven module (and patch puts it as
    separate
    maven module now). And top level for hadoop tools is nice to have, but
    it
    becomes hard to maintain until patch automation tests run the tests
    under
    tools. Currently we see many times the changes in HDFS effecting RAID
    tests
    in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.

    I propose we can have something like the following:

    trunk/
    - hadoop-mapreduce
    - hadoop-mr-client
    - hadoop-yarn
    - hadoop-tools
    - hadoop-streaming
    - hadoop-archives
    - hadoop-distcp

    Thoughts?

    @Eli and @JD, we did not replace old legacy distcp because this is
    really a
    complete rewrite and did not want to remove it until users are
    familiarized
    with new one.

    On 8/26/11 12:51 AM, "Todd Lipcon" wrote:

    Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
    in there as well - ie tools that are downstream of MR and/or HDFS.

    On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
    mahadev@hortonworks.com>
    wrote:
    +1 for a seperate module in hadoop-mapreduce-project. I think
    hadoop-mapreduce-client might not be right place for it. We might
    have
    to pick a new maven module under hadoop-mapreduce-project that could
    host streaming/distcp/hadoop archives.

    thanks
    mahadev

    On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
    tucu@cloudera.com>
    wrote:
    Agree, it should be a separate maven module.

    And it should be under hadoop-mapreduce-client, right?

    And now that we are in the topic, the same should go for streaming,
    no?
    Thanks.

    Alejandro

    On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <todd@cloudera.com>
    wrote:
    On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <eli@cloudera.com>
    wrote:
    Nice work! I definitely think this should go in 23 and 20x.

    Agree with JD that it should be in the core code, not contrib. If
    it's going to be maintained then we should put it in the core
    code.
    Now that we're all mavenized, though, a separate maven module and
    artifact does make sense IMO - ie "hadoop jar
    hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"

    -Todd
    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    -Giri

  • Vinod Kumar Vavilapalli at Sep 6, 2011 at 4:31 pm

    On Tue, Sep 6, 2011 at 10:58 AM, Mithun Radhakrishnan wrote:

    I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm
    hoping that's going to be acceptable to this forum. This way, moving it out
    to a separate source tree should be easier.

    +1 for moving forward with this proposal.

    We still need to answer Amareshwari's question (2) she asked some time back
    about the automated code compilation and test execution of the tools module.
    Right now we have separate automated builds for common, hdfs and mapreduce.
    If we go with the above proposal, we need to setup automated builds for the
    tools modules and possibly tie the related JIRA/Jenkins emails with the
    common-project lists.


    It would be nice to have clarity on how tools will be dealt with. It'd be
    convenient to distcp in trunk. (It's tiny and useful.) On the other hand,
    that might be opening doors to adding too much, and complicating the
    build/release. I'd appreciate advice on which way is best.

    In the meantime, I'll align the distcpv2 pom.xml with the maven-ized
    version of things, as per Tucu's suggestions.
    +1


    Thanks,
    +Vinod


    ________________________________
    From: Vinod Kumar Vavilapalli <vinodkv@hortonworks.com>
    To: mapreduce-dev@hadoop.apache.org
    Cc: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>; Mithun
    Radhakrishnan <mithun.radhakrishnan@yahoo.com>
    Sent: Tuesday, August 30, 2011 6:13 PM
    Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

    As long as hadoop-tools is in some directory at some depth under trunk,
    release of the hadoop-tools is tied to the release of core.

    So we actually have these two options instead:
    (1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools)
    -- Sources at tools/trunk/hadoop-distcp
    -- Each tool will work with specific version of Hadoop core.
    -- Releases can really be separate
    (2) Same source tree: trunk/
    -- Sources at either (1.1) trunk/hadoop-tools or (1.2)
    trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/
    -- Given release isn't decoupled anyway, either will work. (1.2) is
    prefereable if building mapreduce builds the tools also.

    +Vinod


    On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu <
    amarsri@yahoo-inc.com> wrote:
    Copying common-dev.

    Summarizing the below discussion: What should be the tools layout after
    mavenization?

    Option #1: Have hadoop-tools at top level i.e
    trunk/
    hadoop-tools/
    hadoop-distcp/
    Pros:
    Cleaner layout.
    In future, tools could be released separately from Hadoop releases

    Cons: Difficult to maintain

    Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if
    they are depending on MapReduce/HDFS/Common respectively.
    For ex:
    hadoop-mapreduce-project/
    hadoop-mr-tools/
    hadoop-distcp/

    Pros: Easy to maintain
    Cons: Still has tight coupling with related projects.

    Personally, I'm fine with any of the above options. Looking for
    suggestions
    and reaching a consensus on this.

    Thanks
    Amareshwari

    On 8/30/11 12:10 AM, "Allen Wittenauer" wrote:



    I have a feeling this discussion should get moved to common-dev or even to
    general.

    My #1 question is if tools is basically contrib reborn. If not, what makes
    it different?
    On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:

    Some questions on making hadoop-tools top level under trunk,

    1. Should the patches for tools be created against Hadoop Common?
    2. What will happen to the tools test automation? Will it run as part
    of
    Hadoop Common tests?
    3. Will it introduce a dependency from MapReduce to Common? Or is this
    taken care in Mavenization?

    Thanks
    Amareshwari

    On 8/26/11 10:17 PM, "Alejandro Abdelnur" wrote:

    Please, don't add more Mavenization work on us (eventually I want to go back
    to coding)

    Given that Hadoop is already Mavenized, the patch should be Mavenized.

    What will have to be done extra (besides Mavenizing distcp) is to
    create
    a
    hadoop-tools module at root level and within it a hadoop-distcp module.

    The hadoop-tools POM will look pretty much like the
    hadoop-common-project
    POM.

    The hadoop-distcp POM should follow the hadoop-common POM patterns.

    Thanks.

    Alejandro

    On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
    amarsri@yahoo-inc.com> wrote:
    Agree with Mithun and Robert. DistCp and Tools restructuring are
    separate
    tasks. Since DistCp code is ready to be committed, it need not wait
    for
    the
    Tools separation from MR/HDFS.
    I would say it can go into contrib as the patch is now, and when the
    tools
    restructuring happens it would be just an svn mv. If there are no
    issues
    with this proposal I can commit the code tomorrow.

    Thanks
    Amareshwari

    On 8/26/11 7:45 PM, "Robert Evans" wrote:

    I agree with Mithun. They are related but this goes beyond distcpv2
    and
    should not block distcpv2 from going in. It would be very nice,
    however, to
    get the layout settled soon so that we all know where to find
    something
    when
    we want to work on it.

    Also +1 for Alejandro's I also prefer to keep tools at the trunk
    level.
    Even though HDFS, Common, and Mapreduce and perhaps soon tools are
    separate
    modules right now, there is still tight coupling between the different
    pieces, especially with tests. IMO until we can reduce that coupling
    we
    should treat building and testing Hadoop as a single project instead
    of
    trying to keep them separate.

    --Bobby

    On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
    mithun.radhakrishnan@yahoo.com>
    wrote:

    Would it be acceptable if retooling of tools/ were taken up
    separately?
    It
    sounds to me like this might be a distinct (albeit related) task.

    Mithun


    ________________________________
    From: Giridharan Kesavan <gkesavan@hortonworks.com>
    To: mapreduce-dev@hadoop.apache.org
    Sent: Friday, August 26, 2011 12:04 PM
    Subject: Re: DistCpV2 in 0.23

    +1 to Alejandro's

    I prefer to keep the hadoop-tools at trunk level.

    -Giri

    On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <
    tucu@cloudera.com>
    wrote:
    I'd suggest putting hadoop-tools either at trunk/ level or having a a tools
    aggregator module for hdfs and other for common.

    I personal would prefer at trunk/.

    Thanks.

    Alejandro

    On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
    amarsri@yahoo-inc.com> wrote:
    Agree. It should be separate maven module (and patch puts it as
    separate
    maven module now). And top level for hadoop tools is nice to have,
    but
    it
    becomes hard to maintain until patch automation tests run the tests
    under
    tools. Currently we see many times the changes in HDFS effecting
    RAID
    tests
    in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.

    I propose we can have something like the following:

    trunk/
    - hadoop-mapreduce
    - hadoop-mr-client
    - hadoop-yarn
    - hadoop-tools
    - hadoop-streaming
    - hadoop-archives
    - hadoop-distcp

    Thoughts?

    @Eli and @JD, we did not replace old legacy distcp because this is
    really a
    complete rewrite and did not want to remove it until users are
    familiarized
    with new one.

    On 8/26/11 12:51 AM, "Todd Lipcon" wrote:

    Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
    in there as well - ie tools that are downstream of MR and/or HDFS.

    On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
    mahadev@hortonworks.com>
    wrote:
    +1 for a seperate module in hadoop-mapreduce-project. I think
    hadoop-mapreduce-client might not be right place for it. We might
    have
    to pick a new maven module under hadoop-mapreduce-project that
    could
    host streaming/distcp/hadoop archives.

    thanks
    mahadev

    On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
    tucu@cloudera.com>
    wrote:
    Agree, it should be a separate maven module.

    And it should be under hadoop-mapreduce-client, right?

    And now that we are in the topic, the same should go for
    streaming,
    no?
    Thanks.

    Alejandro

    On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <todd@cloudera.com>
    wrote:
    On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <eli@cloudera.com>
    wrote:
    Nice work! I definitely think this should go in 23 and 20x.

    Agree with JD that it should be in the core code, not contrib.
    If
    it's going to be maintained then we should put it in the core
    code.
    Now that we're all mavenized, though, a separate maven module and
    artifact does make sense IMO - ie "hadoop jar
    hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"

    -Todd
    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    -Giri

  • Allen Wittenauer at Sep 6, 2011 at 5:11 pm

    On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
    We still need to answer Amareshwari's question (2) she asked some time back
    about the automated code compilation and test execution of the tools module.
    My #1 question is if tools is basically contrib reborn. If not, what makes
    it different?

    I'm still waiting for this answer as well.

    Until such, I would be pretty much against a tools module. Changing the name of the dumping ground doesn't make it any less of a dumping ground.
  • Eli Collins at Sep 6, 2011 at 11:33 pm

    On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer wrote:
    On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
    We still need to answer Amareshwari's question (2) she asked some time back
    about the automated code compilation and test execution of the tools module.
    My #1 question is if tools is basically contrib reborn.  If not, what makes
    it different?

    I'm still waiting for this answer as well.

    Until such, I would be pretty much against a tools module.  Changing the name of the dumping ground doesn't make it any less of a dumping ground.
    IMO if the tools module only gets stuff like distcp that's maintained
    then it's not contrib, if it contains all the stuff from the current
    MR contrib then tools is just a re-labeling of contrib. Given that
    this proposal only covers moving distcp to tools it doesn't sound like
    contrib to me.

    Thanks,
    Eli
  • Allen Wittenauer at Sep 6, 2011 at 11:46 pm

    On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:

    IMO if the tools module only gets stuff like distcp that's maintained
    then it's not contrib, if it contains all the stuff from the current
    MR contrib then tools is just a re-labeling of contrib. Given that
    this proposal only covers moving distcp to tools it doesn't sound like
    contrib to me.
    At one point, everything in contrib was maintained. So I guess the big question is: what is the gating criteria for something to get entry into tools?
  • Eric Yang at Sep 7, 2011 at 1:38 am
    Option #2 proposed by Amareshwari, seems like a better proposal. We don't want to repeat history for contrib again with hadoop-tools. Having a generic module like hadoop-tools increases the risk of accumulate dead code. It would be better to categorize the hdfs or mapreduce specific tools in their respected subcategories. It is also easier to manage from package/deployment prospective.

    regards,
    Eric
    On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
    On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer wrote:
    On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
    We still need to answer Amareshwari's question (2) she asked some time back
    about the automated code compilation and test execution of the tools module.
    My #1 question is if tools is basically contrib reborn. If not, what makes
    it different?

    I'm still waiting for this answer as well.

    Until such, I would be pretty much against a tools module. Changing the name of the dumping ground doesn't make it any less of a dumping ground.
    IMO if the tools module only gets stuff like distcp that's maintained
    then it's not contrib, if it contains all the stuff from the current
    MR contrib then tools is just a re-labeling of contrib. Given that
    this proposal only covers moving distcp to tools it doesn't sound like
    contrib to me.

    Thanks,
    Eli
  • Alejandro Abdelnur at Sep 7, 2011 at 1:56 am
    Eric,

    Personally I'm fine either way.

    Still, I fail to see why a generic/categorized tools increase/reduce the
    risk of dead code and how they make more-difficult/easier the
    package&deployment.

    Would you please explain this?

    Thanks.

    Alejandro
    On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang wrote:

    Option #2 proposed by Amareshwari, seems like a better proposal. We don't
    want to repeat history for contrib again with hadoop-tools. Having a
    generic module like hadoop-tools increases the risk of accumulate dead code.
    It would be better to categorize the hdfs or mapreduce specific tools in
    their respected subcategories. It is also easier to manage from
    package/deployment prospective.

    regards,
    Eric
    On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
    On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer wrote:
    On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
    We still need to answer Amareshwari's question (2) she asked some time
    back
    about the automated code compilation and test execution of the tools
    module.

    My #1 question is if tools is basically contrib reborn. If not, what makes
    it different?

    I'm still waiting for this answer as well.

    Until such, I would be pretty much against a tools module.
    Changing the name of the dumping ground doesn't make it any less of a
    dumping ground.
    IMO if the tools module only gets stuff like distcp that's maintained
    then it's not contrib, if it contains all the stuff from the current
    MR contrib then tools is just a re-labeling of contrib. Given that
    this proposal only covers moving distcp to tools it doesn't sound like
    contrib to me.

    Thanks,
    Eli
  • Vinod Kumar Vavilapalli at Sep 7, 2011 at 1:33 pm
    There are a bunch of so called tools in hadoop-mapreduce-project/src/tools -
    DistCp, HadoopArchives, Rumen etc. And contrib projects are in src/contrib
    in all of common, hdfs and mapred source trees. Not sure how the distinction
    was ever made.

    The last time we had a discussion about moving contrib projects out of the
    core, we didn't reach any consensus - *
    http://s.apache.org/HadoopContribDiscussion*. Do we want to revive that
    discucssion now? Or we want to keep the status-quo, imitate the source
    structure of the present day tools and contrib, but move them to appropriate
    maven modules and then have that discussion separately?

    I personally prefer the later, given the length and the eventual failure of
    the previous discussion.

    HADOOP-7590 is a related issue where the src location of contribs like
    gridmix, streaming etc is being talked about. I suppose that issue and this
    thread ought to converge.

    Thanks,
    +Vinod
  • Eric Yang at Sep 7, 2011 at 5:50 pm
    Mapreduce and HDFS are distinct function of Hadoop. They are loosely
    coupled. If we have tools aggregator module, it will not have as
    clear distinct function as other Hadoop modules. Hence, it is
    possible for a tool to be depend on both HDFS and map reduce. If
    something broke in tools module, it is unclear which subproject's
    responsibility to maintain tools function. Therefore, it is safer to
    send tools to incubator or apache extra rather than deposit the
    utility tools in tools subcategory. There are many short lived
    projects that attempts to associate themselves with Hadoop but not
    being maintained. It would be better to spin off those utility
    projects than use Hadoop as a dumping ground.

    The previous discussion for removing contrib, most people were in
    favor of doing so, and only a few contrib owners were reluctant to
    remove contrib. Fewer people has participated in restore
    functionality of broken contrib projects. History speaks for itself.
    -1 (non-binding) for hadoop-tools.

    regards,
    Eric
    On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur wrote:
    Eric,

    Personally I'm fine either way.

    Still, I fail to see why a generic/categorized tools increase/reduce the
    risk of dead code and how they make more-difficult/easier the
    package&deployment.

    Would you please explain this?

    Thanks.

    Alejandro
    On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang wrote:

    Option #2 proposed by Amareshwari, seems like a better proposal.  We don't
    want to repeat history for contrib again with hadoop-tools.  Having a
    generic module like hadoop-tools increases the risk of accumulate dead code.
    It would be better to categorize the hdfs or mapreduce specific tools in
    their respected subcategories.  It is also easier to manage from
    package/deployment prospective.

    regards,
    Eric
    On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
    On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer wrote:
    On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
    We still need to answer Amareshwari's question (2) she asked some time
    back
    about the automated code compilation and test execution of the tools
    module.

    My #1 question is if tools is basically contrib reborn.  If not, what makes
    it different?

    I'm still waiting for this answer as well.

    Until such, I would be pretty much against a tools module.
    Changing the name of the dumping ground doesn't make it any less of a
    dumping ground.
    IMO if the tools module only gets stuff like distcp that's maintained
    then it's not contrib, if it contains all the stuff from the current
    MR contrib then tools is just a re-labeling of contrib. Given that
    this proposal only covers moving distcp to tools it doesn't sound like
    contrib to me.

    Thanks,
    Eli
  • Amareshwari Sri Ramadasu at Sep 8, 2011 at 4:35 am
    It is good to have hadoop-tools module separately. But as I asked before we need to answer some questions here. I'm trying to answer them myself. Comments are welcome.
    1. Should the patches for tools be created against Hadoop Common?
    Here, I meant should Hadoop common mailing list be used Or should we have a separate mailing list for Tools? I agree with Vinod here, that we can tie it Hadoop-common jira/mailing lists.
    2. What will happen to the tools test automation? Will it run as part of Hadoop Common tests?
    Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop common if use Hadoop common mailing list for this.
    Also, I propose every patch build of HDFS and MAPREDUCE should also run tools tests to make sure nothing is broken. That would ease the maintenance of hadoop-tools module. I presume tools test should not take much time (some thing like not more than 30 minutes).
    3. Will it introduce a dependency from MapReduce to Common? Or is this
    taken care in Mavenization?
    I'm not sure about this whether Mavenization can take care of it.

    Thanks
    Amareshwari

    On 9/8/11 9:13 AM, "Rottinghuis, Joep" wrote:

    Does a separate hadoop-tools module imply that there will be a separate Jenkins build as well?

    Thanks,

    Joep
    ________________________________________
    From: Alejandro Abdelnur [tucu@cloudera.com]
    Sent: Wednesday, September 07, 2011 11:35 AM
    To: mapreduce-dev@hadoop.apache.org
    Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

    Makes sense
    On Wed, Sep 7, 2011 at 11:32 AM, wrote:

    +1 for separate hadoop-tools module. However, if a tool is broken at
    release time, and no one comes forward to fix it, it should be removed.
    (i.e. Unlike contrib modules, where build and test failures were
    tolerated.)

    - milind
    On 9/7/11 11:27 AM, "Mahadev Konar" wrote:

    I like the idea of having tools as a seperate module and I dont think
    that it will be a dumping ground unless we choose to make one of it.

    +1 for hadoop tools module under trunk.

    thanks
    mahadev

    On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tucu@cloudera.com>
    wrote:
    Agreed, we should not have a dumping ground. IMO, what it would go into
    hadoop-tools (i.e. distcp, streaming and someone could argue for
    FsShell as
    well) are effectively hadoop CLI utilities. Having them in a separate
    module
    rather in than in the core module (common, hdfs, mapreduce) does not
    mean
    that they are secondary things, just modularization. Also it will help
    to
    get those tools to use public interfaces of the core module, and when we
    finally have a clean hadoop-client layer, those tools should only
    depend on
    that.

    Finally, the fact that tools would end up under trunk/hadoop-tools, it
    does
    not prevent that the packaging from HDFS and MAPREDUCE to bundle the
    same/different tools

    +1 for hadoop-tools/ (not binding)

    Thanks.

    On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang wrote:

    Mapreduce and HDFS are distinct function of Hadoop. They are loosely
    coupled. If we have tools aggregator module, it will not have as
    clear distinct function as other Hadoop modules. Hence, it is
    possible for a tool to be depend on both HDFS and map reduce. If
    something broke in tools module, it is unclear which subproject's
    responsibility to maintain tools function. Therefore, it is safer to
    send tools to incubator or apache extra rather than deposit the
    utility tools in tools subcategory. There are many short lived
    projects that attempts to associate themselves with Hadoop but not
    being maintained. It would be better to spin off those utility
    projects than use Hadoop as a dumping ground.

    The previous discussion for removing contrib, most people were in
    favor of doing so, and only a few contrib owners were reluctant to
    remove contrib. Fewer people has participated in restore
    functionality of broken contrib projects. History speaks for itself.
    -1 (non-binding) for hadoop-tools.

    regards,
    Eric

    On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <tucu@cloudera.com>
    wrote:
    Eric,

    Personally I'm fine either way.

    Still, I fail to see why a generic/categorized tools increase/reduce the
    risk of dead code and how they make more-difficult/easier the
    package&deployment.

    Would you please explain this?

    Thanks.

    Alejandro
    On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang wrote:

    Option #2 proposed by Amareshwari, seems like a better proposal. We
    don't
    want to repeat history for contrib again with hadoop-tools. Having
    a
    generic module like hadoop-tools increases the risk of accumulate
    dead
    code.
    It would be better to categorize the hdfs or mapreduce specific
    tools
    in
    their respected subcategories. It is also easier to manage from
    package/deployment prospective.

    regards,
    Eric
    On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:

    On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw@apache.org>
    wrote:
    On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
    We still need to answer Amareshwari's question (2) she asked
    some
    time
    back
    about the automated code compilation and test execution of the
    tools
    module.

    My #1 question is if tools is basically contrib reborn. If
    not,
    what
    makes
    it different?

    I'm still waiting for this answer as well.

    Until such, I would be pretty much against a tools module.
    Changing the name of the dumping ground doesn't make it any less
    of a
    dumping ground.
    IMO if the tools module only gets stuff like distcp that's
    maintained
    then it's not contrib, if it contains all the stuff from the
    current
    MR contrib then tools is just a re-labeling of contrib. Given that
    this proposal only covers moving distcp to tools it doesn't sound
    like
    contrib to me.

    Thanks,
    Eli
  • Rottinghuis, Joep at Sep 9, 2011 at 5:26 am
    If hadoop-tools will be built as part of hadoop-common, then none of these tools should be allowed to have a dependency on hdfs or mapreduce.
    Conversely is also true, when tools do have any such dependency, they cannot be bult as part of hadoop-common.
    We cannot have circular dependencies like that.

    That is probably obvious, but I'm just saying...

    Joep
    ________________________________________
    From: Amareshwari Sri Ramadasu [amarsri@yahoo-inc.com]
    Sent: Wednesday, September 07, 2011 9:33 PM
    To: mapreduce-dev@hadoop.apache.org
    Cc: common-dev@hadoop.apache.org
    Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

    It is good to have hadoop-tools module separately. But as I asked before we need to answer some questions here. I'm trying to answer them myself. Comments are welcome.
    1. Should the patches for tools be created against Hadoop Common?
    Here, I meant should Hadoop common mailing list be used Or should we have a separate mailing list for Tools? I agree with Vinod here, that we can tie it Hadoop-common jira/mailing lists.
    2. What will happen to the tools test automation? Will it run as part of Hadoop Common tests?
    Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop common if use Hadoop common mailing list for this.
    Also, I propose every patch build of HDFS and MAPREDUCE should also run tools tests to make sure nothing is broken. That would ease the maintenance of hadoop-tools module. I presume tools test should not take much time (some thing like not more than 30 minutes).
    3. Will it introduce a dependency from MapReduce to Common? Or is this
    taken care in Mavenization?
    I'm not sure about this whether Mavenization can take care of it.

    Thanks
    Amareshwari

    On 9/8/11 9:13 AM, "Rottinghuis, Joep" wrote:

    Does a separate hadoop-tools module imply that there will be a separate Jenkins build as well?

    Thanks,

    Joep
    ________________________________________
    From: Alejandro Abdelnur [tucu@cloudera.com]
    Sent: Wednesday, September 07, 2011 11:35 AM
    To: mapreduce-dev@hadoop.apache.org
    Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

    Makes sense
    On Wed, Sep 7, 2011 at 11:32 AM, wrote:

    +1 for separate hadoop-tools module. However, if a tool is broken at
    release time, and no one comes forward to fix it, it should be removed.
    (i.e. Unlike contrib modules, where build and test failures were
    tolerated.)

    - milind
    On 9/7/11 11:27 AM, "Mahadev Konar" wrote:

    I like the idea of having tools as a seperate module and I dont think
    that it will be a dumping ground unless we choose to make one of it.

    +1 for hadoop tools module under trunk.

    thanks
    mahadev

    On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tucu@cloudera.com>
    wrote:
    Agreed, we should not have a dumping ground. IMO, what it would go into
    hadoop-tools (i.e. distcp, streaming and someone could argue for
    FsShell as
    well) are effectively hadoop CLI utilities. Having them in a separate
    module
    rather in than in the core module (common, hdfs, mapreduce) does not
    mean
    that they are secondary things, just modularization. Also it will help
    to
    get those tools to use public interfaces of the core module, and when we
    finally have a clean hadoop-client layer, those tools should only
    depend on
    that.

    Finally, the fact that tools would end up under trunk/hadoop-tools, it
    does
    not prevent that the packaging from HDFS and MAPREDUCE to bundle the
    same/different tools

    +1 for hadoop-tools/ (not binding)

    Thanks.

    On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang wrote:

    Mapreduce and HDFS are distinct function of Hadoop. They are loosely
    coupled. If we have tools aggregator module, it will not have as
    clear distinct function as other Hadoop modules. Hence, it is
    possible for a tool to be depend on both HDFS and map reduce. If
    something broke in tools module, it is unclear which subproject's
    responsibility to maintain tools function. Therefore, it is safer to
    send tools to incubator or apache extra rather than deposit the
    utility tools in tools subcategory. There are many short lived
    projects that attempts to associate themselves with Hadoop but not
    being maintained. It would be better to spin off those utility
    projects than use Hadoop as a dumping ground.

    The previous discussion for removing contrib, most people were in
    favor of doing so, and only a few contrib owners were reluctant to
    remove contrib. Fewer people has participated in restore
    functionality of broken contrib projects. History speaks for itself.
    -1 (non-binding) for hadoop-tools.

    regards,
    Eric

    On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <tucu@cloudera.com>
    wrote:
    Eric,

    Personally I'm fine either way.

    Still, I fail to see why a generic/categorized tools increase/reduce the
    risk of dead code and how they make more-difficult/easier the
    package&deployment.

    Would you please explain this?

    Thanks.

    Alejandro
    On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang wrote:

    Option #2 proposed by Amareshwari, seems like a better proposal. We
    don't
    want to repeat history for contrib again with hadoop-tools. Having
    a
    generic module like hadoop-tools increases the risk of accumulate
    dead
    code.
    It would be better to categorize the hdfs or mapreduce specific
    tools
    in
    their respected subcategories. It is also easier to manage from
    package/deployment prospective.

    regards,
    Eric
    On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:

    On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw@apache.org>
    wrote:
    On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
    We still need to answer Amareshwari's question (2) she asked
    some
    time
    back
    about the automated code compilation and test execution of the
    tools
    module.

    My #1 question is if tools is basically contrib reborn. If
    not,
    what
    makes
    it different?

    I'm still waiting for this answer as well.

    Until such, I would be pretty much against a tools module.
    Changing the name of the dumping ground doesn't make it any less
    of a
    dumping ground.
    IMO if the tools module only gets stuff like distcp that's
    maintained
    then it's not contrib, if it contains all the stuff from the
    current
    MR contrib then tools is just a re-labeling of contrib. Given that
    this proposal only covers moving distcp to tools it doesn't sound
    like
    contrib to me.

    Thanks,
    Eli
  • Vinod Kumar Vavilapalli at Sep 12, 2011 at 1:51 pm
    Alright, I think we've discussed enough on this and everybody seems to agree
    about a top level hadoop-tools module.

    Time to get into the action. I've filed HADOOP-7624. Amareshwari we can
    track the rest of the implementation related details and questions for your
    specific answers there.

    Thanks everyone for putting in your thoughts here.
    +Vinod

    On Fri, Sep 9, 2011 at 10:55 AM, Rottinghuis, Joep wrote:

    If hadoop-tools will be built as part of hadoop-common, then none of these
    tools should be allowed to have a dependency on hdfs or mapreduce.
    Conversely is also true, when tools do have any such dependency, they
    cannot be bult as part of hadoop-common.
    We cannot have circular dependencies like that.

    That is probably obvious, but I'm just saying...

    Joep
    ________________________________________
    From: Amareshwari Sri Ramadasu [amarsri@yahoo-inc.com]
    Sent: Wednesday, September 07, 2011 9:33 PM
    To: mapreduce-dev@hadoop.apache.org
    Cc: common-dev@hadoop.apache.org
    Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

    It is good to have hadoop-tools module separately. But as I asked before we
    need to answer some questions here. I'm trying to answer them myself.
    Comments are welcome.
    1. Should the patches for tools be created against Hadoop Common?
    Here, I meant should Hadoop common mailing list be used Or should we have a
    separate mailing list for Tools? I agree with Vinod here, that we can tie
    it Hadoop-common jira/mailing lists.
    2. What will happen to the tools test automation? Will it run as part
    of Hadoop Common tests?
    Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop
    common if use Hadoop common mailing list for this.
    Also, I propose every patch build of HDFS and MAPREDUCE should also run
    tools tests to make sure nothing is broken. That would ease the maintenance
    of hadoop-tools module. I presume tools test should not take much time (some
    thing like not more than 30 minutes).
    3. Will it introduce a dependency from MapReduce to Common? Or is this
    taken care in Mavenization?
    I'm not sure about this whether Mavenization can take care of it.

    Thanks
    Amareshwari

    On 9/8/11 9:13 AM, "Rottinghuis, Joep" wrote:

    Does a separate hadoop-tools module imply that there will be a separate
    Jenkins build as well?

    Thanks,

    Joep
    ________________________________________
    From: Alejandro Abdelnur [tucu@cloudera.com]
    Sent: Wednesday, September 07, 2011 11:35 AM
    To: mapreduce-dev@hadoop.apache.org
    Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

    Makes sense
    On Wed, Sep 7, 2011 at 11:32 AM, wrote:

    +1 for separate hadoop-tools module. However, if a tool is broken at
    release time, and no one comes forward to fix it, it should be removed.
    (i.e. Unlike contrib modules, where build and test failures were
    tolerated.)

    - milind
    On 9/7/11 11:27 AM, "Mahadev Konar" wrote:

    I like the idea of having tools as a seperate module and I dont think
    that it will be a dumping ground unless we choose to make one of it.

    +1 for hadoop tools module under trunk.

    thanks
    mahadev

    On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tucu@cloudera.com>
    wrote:
    Agreed, we should not have a dumping ground. IMO, what it would go
    into
    hadoop-tools (i.e. distcp, streaming and someone could argue for
    FsShell as
    well) are effectively hadoop CLI utilities. Having them in a separate
    module
    rather in than in the core module (common, hdfs, mapreduce) does not
    mean
    that they are secondary things, just modularization. Also it will help
    to
    get those tools to use public interfaces of the core module, and when
    we
    finally have a clean hadoop-client layer, those tools should only
    depend on
    that.

    Finally, the fact that tools would end up under trunk/hadoop-tools, it
    does
    not prevent that the packaging from HDFS and MAPREDUCE to bundle the
    same/different tools

    +1 for hadoop-tools/ (not binding)

    Thanks.

    On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang wrote:

    Mapreduce and HDFS are distinct function of Hadoop. They are loosely
    coupled. If we have tools aggregator module, it will not have as
    clear distinct function as other Hadoop modules. Hence, it is
    possible for a tool to be depend on both HDFS and map reduce. If
    something broke in tools module, it is unclear which subproject's
    responsibility to maintain tools function. Therefore, it is safer to
    send tools to incubator or apache extra rather than deposit the
    utility tools in tools subcategory. There are many short lived
    projects that attempts to associate themselves with Hadoop but not
    being maintained. It would be better to spin off those utility
    projects than use Hadoop as a dumping ground.

    The previous discussion for removing contrib, most people were in
    favor of doing so, and only a few contrib owners were reluctant to
    remove contrib. Fewer people has participated in restore
    functionality of broken contrib projects. History speaks for itself.
    -1 (non-binding) for hadoop-tools.

    regards,
    Eric

    On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <
    tucu@cloudera.com>
    wrote:
    Eric,

    Personally I'm fine either way.

    Still, I fail to see why a generic/categorized tools
    increase/reduce
    the
    risk of dead code and how they make more-difficult/easier the
    package&deployment.

    Would you please explain this?

    Thanks.

    Alejandro
    On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang wrote:

    Option #2 proposed by Amareshwari, seems like a better proposal.
    We
    don't
    want to repeat history for contrib again with hadoop-tools.
    Having
    a
    generic module like hadoop-tools increases the risk of accumulate
    dead
    code.
    It would be better to categorize the hdfs or mapreduce specific
    tools
    in
    their respected subcategories. It is also easier to manage from
    package/deployment prospective.

    regards,
    Eric
    On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:

    On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <
    aw@apache.org>
    wrote:
    On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
    We still need to answer Amareshwari's question (2) she asked
    some
    time
    back
    about the automated code compilation and test execution of the
    tools
    module.

    My #1 question is if tools is basically contrib reborn. If
    not,
    what
    makes
    it different?

    I'm still waiting for this answer as well.

    Until such, I would be pretty much against a tools
    module.
    Changing the name of the dumping ground doesn't make it any less
    of a
    dumping ground.
    IMO if the tools module only gets stuff like distcp that's
    maintained
    then it's not contrib, if it contains all the stuff from the
    current
    MR contrib then tools is just a re-labeling of contrib. Given
    that
    this proposal only covers moving distcp to tools it doesn't
    sound
    like
    contrib to me.

    Thanks,
    Eli
  • Alejandro Abdelnur at Oct 18, 2011 at 7:42 pm
    Following up on this one, the hadoop-tools/ module is already in trunk,
    distcp v2 addition could start.

    Thanks.

    Alejandro
    On Mon, Sep 12, 2011 at 6:47 AM, Vinod Kumar Vavilapalli wrote:

    Alright, I think we've discussed enough on this and everybody seems to
    agree
    about a top level hadoop-tools module.

    Time to get into the action. I've filed HADOOP-7624. Amareshwari we can
    track the rest of the implementation related details and questions for your
    specific answers there.

    Thanks everyone for putting in your thoughts here.
    +Vinod


    On Fri, Sep 9, 2011 at 10:55 AM, Rottinghuis, Joep <jrottinghuis@ebay.com
    wrote:
    If hadoop-tools will be built as part of hadoop-common, then none of these
    tools should be allowed to have a dependency on hdfs or mapreduce.
    Conversely is also true, when tools do have any such dependency, they
    cannot be bult as part of hadoop-common.
    We cannot have circular dependencies like that.

    That is probably obvious, but I'm just saying...

    Joep
    ________________________________________
    From: Amareshwari Sri Ramadasu [amarsri@yahoo-inc.com]
    Sent: Wednesday, September 07, 2011 9:33 PM
    To: mapreduce-dev@hadoop.apache.org
    Cc: common-dev@hadoop.apache.org
    Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

    It is good to have hadoop-tools module separately. But as I asked before we
    need to answer some questions here. I'm trying to answer them myself.
    Comments are welcome.
    1. Should the patches for tools be created against Hadoop Common?
    Here, I meant should Hadoop common mailing list be used Or should we have a
    separate mailing list for Tools? I agree with Vinod here, that we can tie
    it Hadoop-common jira/mailing lists.
    2. What will happen to the tools test automation? Will it run as
    part
    of Hadoop Common tests?
    Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop
    common if use Hadoop common mailing list for this.
    Also, I propose every patch build of HDFS and MAPREDUCE should also run
    tools tests to make sure nothing is broken. That would ease the
    maintenance
    of hadoop-tools module. I presume tools test should not take much time (some
    thing like not more than 30 minutes).
    3. Will it introduce a dependency from MapReduce to Common? Or is
    this
    taken care in Mavenization?
    I'm not sure about this whether Mavenization can take care of it.

    Thanks
    Amareshwari

    On 9/8/11 9:13 AM, "Rottinghuis, Joep" wrote:

    Does a separate hadoop-tools module imply that there will be a separate
    Jenkins build as well?

    Thanks,

    Joep
    ________________________________________
    From: Alejandro Abdelnur [tucu@cloudera.com]
    Sent: Wednesday, September 07, 2011 11:35 AM
    To: mapreduce-dev@hadoop.apache.org
    Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

    Makes sense
    On Wed, Sep 7, 2011 at 11:32 AM, wrote:

    +1 for separate hadoop-tools module. However, if a tool is broken at
    release time, and no one comes forward to fix it, it should be removed.
    (i.e. Unlike contrib modules, where build and test failures were
    tolerated.)

    - milind
    On 9/7/11 11:27 AM, "Mahadev Konar" wrote:

    I like the idea of having tools as a seperate module and I dont think
    that it will be a dumping ground unless we choose to make one of it.

    +1 for hadoop tools module under trunk.

    thanks
    mahadev

    On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <
    tucu@cloudera.com>
    wrote:
    Agreed, we should not have a dumping ground. IMO, what it would go
    into
    hadoop-tools (i.e. distcp, streaming and someone could argue for
    FsShell as
    well) are effectively hadoop CLI utilities. Having them in a
    separate
    module
    rather in than in the core module (common, hdfs, mapreduce) does not
    mean
    that they are secondary things, just modularization. Also it will
    help
    to
    get those tools to use public interfaces of the core module, and
    when
    we
    finally have a clean hadoop-client layer, those tools should only
    depend on
    that.

    Finally, the fact that tools would end up under trunk/hadoop-tools,
    it
    does
    not prevent that the packaging from HDFS and MAPREDUCE to bundle the
    same/different tools

    +1 for hadoop-tools/ (not binding)

    Thanks.

    On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang wrote:

    Mapreduce and HDFS are distinct function of Hadoop. They are
    loosely
    coupled. If we have tools aggregator module, it will not have as
    clear distinct function as other Hadoop modules. Hence, it is
    possible for a tool to be depend on both HDFS and map reduce. If
    something broke in tools module, it is unclear which subproject's
    responsibility to maintain tools function. Therefore, it is safer
    to
    send tools to incubator or apache extra rather than deposit the
    utility tools in tools subcategory. There are many short lived
    projects that attempts to associate themselves with Hadoop but not
    being maintained. It would be better to spin off those utility
    projects than use Hadoop as a dumping ground.

    The previous discussion for removing contrib, most people were in
    favor of doing so, and only a few contrib owners were reluctant to
    remove contrib. Fewer people has participated in restore
    functionality of broken contrib projects. History speaks for
    itself.
    -1 (non-binding) for hadoop-tools.

    regards,
    Eric

    On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <
    tucu@cloudera.com>
    wrote:
    Eric,

    Personally I'm fine either way.

    Still, I fail to see why a generic/categorized tools
    increase/reduce
    the
    risk of dead code and how they make more-difficult/easier the
    package&deployment.

    Would you please explain this?

    Thanks.

    Alejandro

    On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <eric818@gmail.com>
    wrote:
    Option #2 proposed by Amareshwari, seems like a better proposal.
    We
    don't
    want to repeat history for contrib again with hadoop-tools.
    Having
    a
    generic module like hadoop-tools increases the risk of
    accumulate
    dead
    code.
    It would be better to categorize the hdfs or mapreduce specific
    tools
    in
    their respected subcategories. It is also easier to manage from
    package/deployment prospective.

    regards,
    Eric
    On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:

    On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <
    aw@apache.org>
    wrote:
    On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
    We still need to answer Amareshwari's question (2) she asked
    some
    time
    back
    about the automated code compilation and test execution of
    the
    tools
    module.

    My #1 question is if tools is basically contrib reborn.
    If
    not,
    what
    makes
    it different?

    I'm still waiting for this answer as well.

    Until such, I would be pretty much against a tools
    module.
    Changing the name of the dumping ground doesn't make it any
    less
    of a
    dumping ground.
    IMO if the tools module only gets stuff like distcp that's
    maintained
    then it's not contrib, if it contains all the stuff from the
    current
    MR contrib then tools is just a re-labeling of contrib. Given
    that
    this proposal only covers moving distcp to tools it doesn't
    sound
    like
    contrib to me.

    Thanks,
    Eli

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedAug 30, '11 at 8:02a
activeOct 18, '11 at 7:42p
posts16
users8
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase