FAQ
Following up with Hadoop Common mavenization (HADOOP-6671) I've just posted
a patch for HDFS mavenization (HDFS-2096)

The HADOOP-6671 patch integrates all feedback received in the JIRA and, IMO,
it is ready for prime time.

In order not break HDFS and MAPRED which are still Ant based, there are 2
patches HDFS-2196 & MAPREDUCE-2741that make some corrections in the ivy
configuration to work correctly with the Hadoop common JAR (build/published
by Mavenized build).

HDFS-2096 is not 100% ready, some testcases are failing and native code
testing is not wired, but everything else (compile, test, package, tar,
binary, jdiff, etc is wired).

* https://issues.apache.org/jira/browse/HADOOP-6671
* https://issues.apache.org/jira/browse/HDFS-2196
* https://issues.apache.org/jira/browse/MAPREDUCE-2741
* https://issues.apache.org/jira/browse/HDFS-2096

I know these are big changes and we'll have some hiccups, but the benefits
are big (running testcases is faster, it easily works from IDEs, Maven build
system can easily be understood by anybody that knows Maven).

Keeping the patches current is time-consuming, because of this, it would be
great if we can get in the ones ready (HADOOP-6671, HDFS-2196,
MAPREDUCE-2741) so we can focus on the rest of the Mavenization work.

Thanks.

Alejandro

Search Discussions

  • Rottinghuis, Joep at Jul 29, 2011 at 2:11 am
    Alejandro,

    Are you trying the use-case when people will want to locally build a consistent set of common, hdfs, and mapreduce without the downstream projects depending on published Maven SNAPSHOTS?
    I'm working to get this going on 0.22 right now (see HDFS-843, HDFS-2214, and I'll have to file two equivalent bugs on mapreduce).

    Part of the problem is that the assumption was that people always compile hdfs against hadoop-common-0.xyz-SNAPSHOT.
    When applying one patch at a time from Jira attachments that may be fine.

    If I set up a Jenkins build I will want to make sure that first hadoop-common builds with a new build number (not snapshot), then hdfs against that same build number, then mapreduce against hadoop-common and hdfs.
    Otherwise you can get a situation when the mapreduce build is still running and hadoop-common build has already produced a new snapshot build.

    Local caching in ~/.m2 and ~/.ivy2 repos makes this situation even more complex.

    Having the ability to build without Internet connectivity is not just for laptops on the go. For corporate environments one may not want to have a build server have Internet connectivity.
    In that case should one do a build on a machine with connectivity first and then fork-lift the ~/.m2/repository directory over?
    Should any hadoop-common, hadoop-hdfs and hadoop-mapreduce artifacts be purged in that case (since they should be rebuilt locally)?

    Thanks,

    Joep

    -----Original Message-----
    From: Alejandro Abdelnur
    Sent: Thursday, July 28, 2011 4:41 PM
    To: general@hadoop.apache.org
    Subject: follow up Hadoop mavenization work

    Following up with Hadoop Common mavenization (HADOOP-6671) I've just posted a patch for HDFS mavenization (HDFS-2096)

    The HADOOP-6671 patch integrates all feedback received in the JIRA and, IMO, it is ready for prime time.

    In order not break HDFS and MAPRED which are still Ant based, there are 2 patches HDFS-2196 & MAPREDUCE-2741that make some corrections in the ivy configuration to work correctly with the Hadoop common JAR (build/published by Mavenized build).

    HDFS-2096 is not 100% ready, some testcases are failing and native code testing is not wired, but everything else (compile, test, package, tar, binary, jdiff, etc is wired).

    * https://issues.apache.org/jira/browse/HADOOP-6671
    * https://issues.apache.org/jira/browse/HDFS-2196
    * https://issues.apache.org/jira/browse/MAPREDUCE-2741
    * https://issues.apache.org/jira/browse/HDFS-2096

    I know these are big changes and we'll have some hiccups, but the benefits are big (running testcases is faster, it easily works from IDEs, Maven build system can easily be understood by anybody that knows Maven).

    Keeping the patches current is time-consuming, because of this, it would be great if we can get in the ones ready (HADOOP-6671, HDFS-2196,
    MAPREDUCE-2741) so we can focus on the rest of the Mavenization work.

    Thanks.

    Alejandro
  • Alejandro Abdelnur at Jul 29, 2011 at 3:01 pm
    Hi Joep,

    Regarding your first paragraph question, this is a byproduct of using maven.

    Regarding using a particular build, if you are using SNAPSHOT versions it
    means you want to be on the 'head', you are on development mode. Of course
    you can, as you indicate, to fix on a particular SNAPSHOT artifact by using
    its exact timestamp. But the later it is something you normally would not
    do. If I would have to do something like that I would use some kind of nano
    version (the problem with this is the ordering) or other version qualifier.

    Both Ant/Ivy & Maven download and cache JARs of dependencies (and in the
    case of Maven, the plugin JARs used for the build). After you do a full
    build with all profiles active (to make sure all plugin JARs have been
    used), you can tar the ~/.m2 and use when in disconnected mode. Another
    thing you can do (recommended) is o setup a Maven repo proxy in your
    disconnected network and configure all your devel boxes to use the proxy.
    You could seed the proxy with a full .m2 or by doing a clean (no ~/.m2 in
    your devel box) build trough the proxy.

    Finally, if you are building from trunk/ you are using ALL artifacts produce
    by the whole project, if you are building from trunk/hdfs/ or trunk/mapred/
    you are using previously published artifact (installed in ~/.m2 or deployed
    to a maven repo) for all the modules that are not part of the current build.

    Hope this clarifies your concerns.

    Thanks.

    Alejandro
    On Thu, Jul 28, 2011 at 7:10 PM, Rottinghuis, Joep wrote:

    Alejandro,

    Are you trying the use-case when people will want to locally build a
    consistent set of common, hdfs, and mapreduce without the downstream
    projects depending on published Maven SNAPSHOTS?
    I'm working to get this going on 0.22 right now (see HDFS-843, HDFS-2214,
    and I'll have to file two equivalent bugs on mapreduce).

    Part of the problem is that the assumption was that people always compile
    hdfs against hadoop-common-0.xyz-SNAPSHOT.
    When applying one patch at a time from Jira attachments that may be fine.

    If I set up a Jenkins build I will want to make sure that first
    hadoop-common builds with a new build number (not snapshot), then hdfs
    against that same build number, then mapreduce against hadoop-common and
    hdfs.
    Otherwise you can get a situation when the mapreduce build is still running
    and hadoop-common build has already produced a new snapshot build.

    Local caching in ~/.m2 and ~/.ivy2 repos makes this situation even more
    complex.

    Having the ability to build without Internet connectivity is not just for
    laptops on the go. For corporate environments one may not want to have a
    build server have Internet connectivity.
    In that case should one do a build on a machine with connectivity first and
    then fork-lift the ~/.m2/repository directory over?
    Should any hadoop-common, hadoop-hdfs and hadoop-mapreduce artifacts be
    purged in that case (since they should be rebuilt locally)?

    Thanks,

    Joep

    -----Original Message-----
    From: Alejandro Abdelnur
    Sent: Thursday, July 28, 2011 4:41 PM
    To: general@hadoop.apache.org
    Subject: follow up Hadoop mavenization work

    Following up with Hadoop Common mavenization (HADOOP-6671) I've just posted
    a patch for HDFS mavenization (HDFS-2096)

    The HADOOP-6671 patch integrates all feedback received in the JIRA and,
    IMO, it is ready for prime time.

    In order not break HDFS and MAPRED which are still Ant based, there are 2
    patches HDFS-2196 & MAPREDUCE-2741that make some corrections in the ivy
    configuration to work correctly with the Hadoop common JAR (build/published
    by Mavenized build).

    HDFS-2096 is not 100% ready, some testcases are failing and native code
    testing is not wired, but everything else (compile, test, package, tar,
    binary, jdiff, etc is wired).

    * https://issues.apache.org/jira/browse/HADOOP-6671
    * https://issues.apache.org/jira/browse/HDFS-2196
    * https://issues.apache.org/jira/browse/MAPREDUCE-2741
    * https://issues.apache.org/jira/browse/HDFS-2096

    I know these are big changes and we'll have some hiccups, but the benefits
    are big (running testcases is faster, it easily works from IDEs, Maven build
    system can easily be understood by anybody that knows Maven).

    Keeping the patches current is time-consuming, because of this, it would be
    great if we can get in the ones ready (HADOOP-6671, HDFS-2196,
    MAPREDUCE-2741) so we can focus on the rest of the Mavenization work.

    Thanks.

    Alejandro
  • Scott Carey at Jul 29, 2011 at 9:16 pm

    On 7/29/11 8:00 AM, "Alejandro Abdelnur" wrote:
    Hi Joep,

    Regarding your first paragraph question, this is a byproduct of using
    maven.
    It should be possible to clean your local repo, then type 'mvn compile' on
    the top level pom (aggregator of common, mapreduce, hdfs) and have that
    succeed without ever publishing any SNAPSHOTs from projects contained in
    the aggregator.

    Otherwise the project is misconfigured, or there is a buggy plugin in use.

    Likewise for all other lifecycle goals prior to 'install'. These should
    succeed without any prior published SNAPSHOTS from modules contained in
    the build.

    However, if you build only a submodule from the submodule pom, (example,
    move down into hdfs), it will require a published snapshot from
    dependencies such as common.
  • Rottinghuis, Joep at Jul 30, 2011 at 12:11 am
    Sounds good. Will give this a try on trunk after patch merges in.

    Thanks,

    Joep

    -----Original Message-----
    From: Scott Carey
    Sent: Friday, July 29, 2011 2:18 PM
    To: general@hadoop.apache.org
    Subject: Re: follow up Hadoop mavenization work


    On 7/29/11 8:00 AM, "Alejandro Abdelnur" wrote:

    Hi Joep,

    Regarding your first paragraph question, this is a byproduct of using
    maven.
    It should be possible to clean your local repo, then type 'mvn compile' on the top level pom (aggregator of common, mapreduce, hdfs) and have that succeed without ever publishing any SNAPSHOTs from projects contained in the aggregator.

    Otherwise the project is misconfigured, or there is a buggy plugin in use.

    Likewise for all other lifecycle goals prior to 'install'. These should succeed without any prior published SNAPSHOTS from modules contained in the build.

    However, if you build only a submodule from the submodule pom, (example, move down into hdfs), it will require a published snapshot from dependencies such as common.
  • Steve Loughran at Jul 29, 2011 at 3:33 pm

    On 29/07/11 03:10, Rottinghuis, Joep wrote:
    Alejandro,

    Are you trying the use-case when people will want to locally build a consistent set of common, hdfs, and mapreduce without the downstream projects depending on published Maven SNAPSHOTS?
    I'm working to get this going on 0.22 right now (see HDFS-843, HDFS-2214, and I'll have to file two equivalent bugs on mapreduce).

    Part of the problem is that the assumption was that people always compile hdfs against hadoop-common-0.xyz-SNAPSHOT.
    When applying one patch at a time from Jira attachments that may be fine.

    If I set up a Jenkins build I will want to make sure that first hadoop-common builds with a new build number (not snapshot), then hdfs against that same build number, then mapreduce against hadoop-common and hdfs.
    Otherwise you can get a situation when the mapreduce build is still running and hadoop-common build has already produced a new snapshot build.

    Local caching in ~/.m2 and ~/.ivy2 repos makes this situation even more complex.
    One option here is to set up >1 virtual machine (The centos 6.0 minimal
    are pretty lightweight) and delegate work to these jenkins instances,
    forcing different branches into different virtual hosts, and jenkins to
    build stuff serially on a single machine. That ensures a strict order
    and isolates you. You can even have ant targets to purge the repository
    caches.

    I have some Centos VMs set up to do release work on my desktop as it
    ensures that I never release under-development code; the functional test
    runs don't interfere with my desktop test runs, and I can keep editing
    the code. It works OK, if you have enough RAM and HDD to spare

    -steve
  • Rottinghuis, Joep at Jul 30, 2011 at 12:10 am
    Thanks for the replies.

    To elaborate on why I want to build on a server w/o Internet access:
    Build should not reach out to Internet and grab jars from unverified sources w/o md5 hash check etc.
    The resulting code will run on a large production cluster with sensitive/private data. From a compliance and risk perspective I want to be able to control which jars get pulled in from where.

    Manual verification of ~/.m2, tar.gz and scp to build server is an acceptable workaround.
    Maven proxy simply bypasses the firewalls which were there for good reason.

    Looking forward to try this all on trunk after patch is committed. Until then I'll work on making this function on 0.22.

    Thanks,

    Joep

    -----Original Message-----
    From: Steve Loughran
    Sent: Friday, July 29, 2011 8:32 AM
    To: general@hadoop.apache.org
    Subject: Re: follow up Hadoop mavenization work
    On 29/07/11 03:10, Rottinghuis, Joep wrote:
    Alejandro,

    Are you trying the use-case when people will want to locally build a consistent set of common, hdfs, and mapreduce without the downstream projects depending on published Maven SNAPSHOTS?
    I'm working to get this going on 0.22 right now (see HDFS-843, HDFS-2214, and I'll have to file two equivalent bugs on mapreduce).

    Part of the problem is that the assumption was that people always compile hdfs against hadoop-common-0.xyz-SNAPSHOT.
    When applying one patch at a time from Jira attachments that may be fine.

    If I set up a Jenkins build I will want to make sure that first hadoop-common builds with a new build number (not snapshot), then hdfs against that same build number, then mapreduce against hadoop-common and hdfs.
    Otherwise you can get a situation when the mapreduce build is still running and hadoop-common build has already produced a new snapshot build.

    Local caching in ~/.m2 and ~/.ivy2 repos makes this situation even more complex.
    One option here is to set up >1 virtual machine (The centos 6.0 minimal are pretty lightweight) and delegate work to these jenkins instances, forcing different branches into different virtual hosts, and jenkins to build stuff serially on a single machine. That ensures a strict order and isolates you. You can even have ant targets to purge the repository caches.

    I have some Centos VMs set up to do release work on my desktop as it ensures that I never release under-development code; the functional test runs don't interfere with my desktop test runs, and I can keep editing the code. It works OK, if you have enough RAM and HDD to spare

    -steve
  • Alejandro Abdelnur at Jul 30, 2011 at 12:18 am
    Joep,

    Ivy & Maven pull JARs from the maven repos you specify.

    Maven verifies checksums and I assume Ivy does.

    You could turn your verified ~/.m2 into a Maven proxy and switch fetching
    JARs not found in the proxy cache.

    Bottom line, for you concerns Ivy and Maven are equally good or bad.

    Thanks.

    Alejandro
    On Fri, Jul 29, 2011 at 5:09 PM, Rottinghuis, Joep wrote:

    Thanks for the replies.

    To elaborate on why I want to build on a server w/o Internet access:
    Build should not reach out to Internet and grab jars from unverified
    sources w/o md5 hash check etc.
    The resulting code will run on a large production cluster with
    sensitive/private data. From a compliance and risk perspective I want to be
    able to control which jars get pulled in from where.

    Manual verification of ~/.m2, tar.gz and scp to build server is an
    acceptable workaround.
    Maven proxy simply bypasses the firewalls which were there for good reason.

    Looking forward to try this all on trunk after patch is committed. Until
    then I'll work on making this function on 0.22.

    Thanks,

    Joep

    -----Original Message-----
    From: Steve Loughran
    Sent: Friday, July 29, 2011 8:32 AM
    To: general@hadoop.apache.org
    Subject: Re: follow up Hadoop mavenization work
    On 29/07/11 03:10, Rottinghuis, Joep wrote:
    Alejandro,

    Are you trying the use-case when people will want to locally build a
    consistent set of common, hdfs, and mapreduce without the downstream
    projects depending on published Maven SNAPSHOTS?
    I'm working to get this going on 0.22 right now (see HDFS-843, HDFS-2214,
    and I'll have to file two equivalent bugs on mapreduce).
    Part of the problem is that the assumption was that people always compile
    hdfs against hadoop-common-0.xyz-SNAPSHOT.
    When applying one patch at a time from Jira attachments that may be fine.

    If I set up a Jenkins build I will want to make sure that first
    hadoop-common builds with a new build number (not snapshot), then hdfs
    against that same build number, then mapreduce against hadoop-common and
    hdfs.
    Otherwise you can get a situation when the mapreduce build is still
    running and hadoop-common build has already produced a new snapshot build.
    Local caching in ~/.m2 and ~/.ivy2 repos makes this situation even more
    complex.

    One option here is to set up >1 virtual machine (The centos 6.0 minimal are
    pretty lightweight) and delegate work to these jenkins instances, forcing
    different branches into different virtual hosts, and jenkins to build stuff
    serially on a single machine. That ensures a strict order and isolates you.
    You can even have ant targets to purge the repository caches.

    I have some Centos VMs set up to do release work on my desktop as it
    ensures that I never release under-development code; the functional test
    runs don't interfere with my desktop test runs, and I can keep editing the
    code. It works OK, if you have enough RAM and HDD to spare

    -steve
  • Rottinghuis, Joep at Jul 30, 2011 at 1:45 am
    Agreed.

    I was not arguing against Mavenization. I was merely explaining what my usecase is and why.

    Thanks,

    Joep

    -----Original Message-----
    From: Alejandro Abdelnur
    Sent: Friday, July 29, 2011 5:17 PM
    To: general@hadoop.apache.org
    Subject: Re: follow up Hadoop mavenization work

    Joep,

    Ivy & Maven pull JARs from the maven repos you specify.

    Maven verifies checksums and I assume Ivy does.

    You could turn your verified ~/.m2 into a Maven proxy and switch fetching JARs not found in the proxy cache.

    Bottom line, for you concerns Ivy and Maven are equally good or bad.

    Thanks.

    Alejandro
    On Fri, Jul 29, 2011 at 5:09 PM, Rottinghuis, Joep wrote:

    Thanks for the replies.

    To elaborate on why I want to build on a server w/o Internet access:
    Build should not reach out to Internet and grab jars from unverified
    sources w/o md5 hash check etc.
    The resulting code will run on a large production cluster with
    sensitive/private data. From a compliance and risk perspective I want
    to be able to control which jars get pulled in from where.

    Manual verification of ~/.m2, tar.gz and scp to build server is an
    acceptable workaround.
    Maven proxy simply bypasses the firewalls which were there for good reason.

    Looking forward to try this all on trunk after patch is committed.
    Until then I'll work on making this function on 0.22.

    Thanks,

    Joep

    -----Original Message-----
    From: Steve Loughran
    Sent: Friday, July 29, 2011 8:32 AM
    To: general@hadoop.apache.org
    Subject: Re: follow up Hadoop mavenization work
    On 29/07/11 03:10, Rottinghuis, Joep wrote:
    Alejandro,

    Are you trying the use-case when people will want to locally build a
    consistent set of common, hdfs, and mapreduce without the downstream
    projects depending on published Maven SNAPSHOTS?
    I'm working to get this going on 0.22 right now (see HDFS-843,
    HDFS-2214,
    and I'll have to file two equivalent bugs on mapreduce).
    Part of the problem is that the assumption was that people always
    compile
    hdfs against hadoop-common-0.xyz-SNAPSHOT.
    When applying one patch at a time from Jira attachments that may be fine.

    If I set up a Jenkins build I will want to make sure that first
    hadoop-common builds with a new build number (not snapshot), then hdfs
    against that same build number, then mapreduce against hadoop-common
    and hdfs.
    Otherwise you can get a situation when the mapreduce build is still
    running and hadoop-common build has already produced a new snapshot build.
    Local caching in ~/.m2 and ~/.ivy2 repos makes this situation even
    more
    complex.

    One option here is to set up >1 virtual machine (The centos 6.0
    minimal are pretty lightweight) and delegate work to these jenkins
    instances, forcing different branches into different virtual hosts,
    and jenkins to build stuff serially on a single machine. That ensures a strict order and isolates you.
    You can even have ant targets to purge the repository caches.

    I have some Centos VMs set up to do release work on my desktop as it
    ensures that I never release under-development code; the functional
    test runs don't interfere with my desktop test runs, and I can keep
    editing the code. It works OK, if you have enough RAM and HDD to spare

    -steve
  • Steve Loughran at Aug 2, 2011 at 10:31 am

    On 30/07/11 01:17, Alejandro Abdelnur wrote:
    Joep,

    Ivy& Maven pull JARs from the maven repos you specify.

    Maven verifies checksums and I assume Ivy does.
    by default? If so, that's an improvement.
    You could turn your verified ~/.m2 into a Maven proxy and switch fetching
    JARs not found in the proxy cache.

    Bottom line, for you concerns Ivy and Maven are equally good or bad.
    If you could set them up to retrieve checksums from elsewhere life would
    be better. Even if the ASF won't host GPL or LGPL artifacts, there's no
    reason why we couldn't host the checksums
  • Ted Dunning at Jul 30, 2011 at 5:53 pm
    Actually this is a prime case for the proxy. All of the major maven proxies
    allow you to lock down their contents after filling them however you like.
    This actually provides better control than the .m2 process since you can
    easily wind up with multiple versions of the .m2 directory on different
    build machines. Once you have locked down the maven proxy, you don't even
    have to allow it to touch the internet and it can provide version
    standardization across all of your build machines.
    On Fri, Jul 29, 2011 at 5:09 PM, Rottinghuis, Joep wrote:

    Thanks for the replies.

    To elaborate on why I want to build on a server w/o Internet access:
    Build should not reach out to Internet and grab jars from unverified
    sources w/o md5 hash check etc.
    The resulting code will run on a large production cluster with
    sensitive/private data. From a compliance and risk perspective I want to be
    able to control which jars get pulled in from where.

    Manual verification of ~/.m2, tar.gz and scp to build server is an
    acceptable workaround.
    Maven proxy simply bypasses the firewalls which were there for good reason.

    Looking forward to try this all on trunk after patch is committed. Until
    then I'll work on making this function on 0.22.

    Thanks,

    Joep

    -----Original Message-----
    From: Steve Loughran
    Sent: Friday, July 29, 2011 8:32 AM
    To: general@hadoop.apache.org
    Subject: Re: follow up Hadoop mavenization work
    On 29/07/11 03:10, Rottinghuis, Joep wrote:
    Alejandro,

    Are you trying the use-case when people will want to locally build a
    consistent set of common, hdfs, and mapreduce without the downstream
    projects depending on published Maven SNAPSHOTS?
    I'm working to get this going on 0.22 right now (see HDFS-843, HDFS-2214,
    and I'll have to file two equivalent bugs on mapreduce).
    Part of the problem is that the assumption was that people always compile
    hdfs against hadoop-common-0.xyz-SNAPSHOT.
    When applying one patch at a time from Jira attachments that may be fine.

    If I set up a Jenkins build I will want to make sure that first
    hadoop-common builds with a new build number (not snapshot), then hdfs
    against that same build number, then mapreduce against hadoop-common and
    hdfs.
    Otherwise you can get a situation when the mapreduce build is still
    running and hadoop-common build has already produced a new snapshot build.
    Local caching in ~/.m2 and ~/.ivy2 repos makes this situation even more
    complex.

    One option here is to set up >1 virtual machine (The centos 6.0 minimal are
    pretty lightweight) and delegate work to these jenkins instances, forcing
    different branches into different virtual hosts, and jenkins to build stuff
    serially on a single machine. That ensures a strict order and isolates you.
    You can even have ant targets to purge the repository caches.

    I have some Centos VMs set up to do release work on my desktop as it
    ensures that I never release under-development code; the functional test
    runs don't interfere with my desktop test runs, and I can keep editing the
    code. It works OK, if you have enough RAM and HDD to spare

    -steve
  • Steve Loughran at Aug 2, 2011 at 10:24 am

    On 30/07/11 01:09, Rottinghuis, Joep wrote:
    Thanks for the replies.

    To elaborate on why I want to build on a server w/o Internet access:
    Build should not reach out to Internet and grab jars from unverified sources w/o md5 hash check etc.
    automated hash checking is flawed for various reasons
    -older versions of M2 didn't do the check; you have to build with
    --strict-checksums to force that check in
    -some artifacts have crept into the repository with bad checksums (see
    below), which Ivy finds, as it does checksum
    -verifying checksums from the same HTTP server that served up the file
    doesn't prevent malicious attacks. Verifying against an HTTPS server
    managed by the ASF would
    The resulting code will run on a large production cluster with sensitive/private data. From a compliance and risk perspective I want to be able to control which jars get pulled in from where.

    Manual verification of ~/.m2, tar.gz and scp to build server is an acceptable workaround.
    The way to verify the artifacts are valid is to through the release
    notes of every artifact you depend on, check the (signed) release notes
    of them and that the checksum you've got on the downloaded artifact
    matches.

    Even then you are vulnerable to "the bad POM attack": POM checksums
    aren't included in release notes, so someone could put a POM up there
    that declares a dependency on a non-ASF artifact containing malicious
    code. Unless you know the exact dependency tree of your entire
    application, you are vulnerable here.

    -Steve

    Internally I keep under SCM all our dependencies, set up Ivy to build
    offline only with a strict conflict manager, which halts the build if
    there are inconsistent versions, then tune the ivy.xml files to exclude
    the old versions. I do verify the checksums of ASF releases, and examine
    the dependency graph to see if there's anything in there I don't
    recognise, though I don't decompile every JAR for review.

    ---------- Forwarded message ----------
    From: Steve Loughran <steve.loughran@gmail.com>
    Date: 10 September 2010 13:09
    Subject: bad checksums in activemq-protobuf-1.1.pom
    To: repository@apache.org


    http://mirrors.ibiblio.org/pub/mirrors/maven2/org/apache/activemq/protobuf/activemq-protobuf/1.1/activemq-protobuf-1.1.pom

    http://mirrors.ibiblio.org/pub/mirrors/maven2/org/apache/activemq/protobuf/activemq-protobuf/1.1/activemq-protobuf-1.1.pom.sha1
    says 255bd0c7703022d85da7416f87802a11053de120

    but shasum activemq-protobuf-1.1.pom
    c92f02aa8a96139ff4274e8c80701bb8f4bd7c1e activemq-protobuf-1.1.pom
  • Tom White at Aug 1, 2011 at 4:58 am
    HADOOP-6671 has now received two +1s (one from Eric Yang and one from
    me), so I would like to commit it on Tuesday at 16:00 GMT
    (http://s.apache.org/6nx). I'll also update the Jenkins jobs for
    running test-patch and performing the nightly build.

    For developers this change will mean that you need to use Maven to
    build Hadoop Common. The build instructions are listed in the
    BUILDING.txt file in the patch, as well as at
    http://wiki.apache.org/hadoop/HowToContribute and
    http://s.apache.org/wb.

    Note that HDFS and MapReduce will still use Ant for building, but
    follow on JIRAs HDFS-2096 and MAPREDUCE-2607 will introduce Maven to
    those builds in the near future. (In terms of staging, it makes sense
    for MAPREDUCE-2607 to go in after MAPREDUCE-279, since the MR2 work
    uses Maven to build its new modules, so the Mavenization of MapReduce
    should build on that work.)

    Thanks,
    Tom
    On Thu, Jul 28, 2011 at 4:41 PM, Alejandro Abdelnur wrote:
    Following up with Hadoop Common mavenization (HADOOP-6671) I've just posted
    a patch for HDFS mavenization (HDFS-2096)

    The HADOOP-6671 patch integrates all feedback received in the JIRA and, IMO,
    it is ready for prime time.

    In order not break HDFS and MAPRED which are still Ant based, there are 2
    patches HDFS-2196 & MAPREDUCE-2741that make some corrections in the ivy
    configuration to work correctly with the Hadoop common JAR (build/published
    by Mavenized build).

    HDFS-2096 is not 100% ready, some testcases are failing and native code
    testing is not wired, but everything else (compile, test, package, tar,
    binary, jdiff, etc is wired).

    * https://issues.apache.org/jira/browse/HADOOP-6671
    * https://issues.apache.org/jira/browse/HDFS-2196
    * https://issues.apache.org/jira/browse/MAPREDUCE-2741
    * https://issues.apache.org/jira/browse/HDFS-2096

    I know these are big changes and we'll have some hiccups, but the benefits
    are big (running testcases is faster, it easily works from IDEs, Maven build
    system can easily be understood by anybody that knows Maven).

    Keeping the patches current is time-consuming, because of this, it would be
    great if we can get in the ones ready (HADOOP-6671, HDFS-2196,
    MAPREDUCE-2741) so we can focus on the rest of the Mavenization work.

    Thanks.

    Alejandro
  • Tom White at Aug 2, 2011 at 4:17 pm

    On Sun, Jul 31, 2011 at 9:57 PM, Tom White wrote:
    HADOOP-6671 has now received two +1s (one from Eric Yang and one from
    me), so I would like to commit it on Tuesday at 16:00 GMT
    (http://s.apache.org/6nx). I'll also update the Jenkins jobs for
    running test-patch and performing the nightly build.
    I'm going to go ahead and commit the patch now.

    Tom
    For developers this change will mean that you need to use Maven to
    build Hadoop Common. The build instructions are listed in the
    BUILDING.txt file in the patch, as well as at
    http://wiki.apache.org/hadoop/HowToContribute and
    http://s.apache.org/wb.

    Note that HDFS and MapReduce will still use Ant for building, but
    follow on JIRAs HDFS-2096 and MAPREDUCE-2607 will introduce Maven to
    those builds in the near future. (In terms of staging, it makes sense
    for MAPREDUCE-2607 to go in after MAPREDUCE-279, since the MR2 work
    uses Maven to build its new modules, so the Mavenization of MapReduce
    should build on that work.)

    Thanks,
    Tom
    On Thu, Jul 28, 2011 at 4:41 PM, Alejandro Abdelnur wrote:
    Following up with Hadoop Common mavenization (HADOOP-6671) I've just posted
    a patch for HDFS mavenization (HDFS-2096)

    The HADOOP-6671 patch integrates all feedback received in the JIRA and, IMO,
    it is ready for prime time.

    In order not break HDFS and MAPRED which are still Ant based, there are 2
    patches HDFS-2196 & MAPREDUCE-2741that make some corrections in the ivy
    configuration to work correctly with the Hadoop common JAR (build/published
    by Mavenized build).

    HDFS-2096 is not 100% ready, some testcases are failing and native code
    testing is not wired, but everything else (compile, test, package, tar,
    binary, jdiff, etc is wired).

    * https://issues.apache.org/jira/browse/HADOOP-6671
    * https://issues.apache.org/jira/browse/HDFS-2196
    * https://issues.apache.org/jira/browse/MAPREDUCE-2741
    * https://issues.apache.org/jira/browse/HDFS-2096

    I know these are big changes and we'll have some hiccups, but the benefits
    are big (running testcases is faster, it easily works from IDEs, Maven build
    system can easily be understood by anybody that knows Maven).

    Keeping the patches current is time-consuming, because of this, it would be
    great if we can get in the ones ready (HADOOP-6671, HDFS-2196,
    MAPREDUCE-2741) so we can focus on the rest of the Mavenization work.

    Thanks.

    Alejandro
  • Tom White at Aug 2, 2011 at 5:38 pm
    HADOOP-6671 is now in trunk. After you do an update you will need to
    use Maven 3 to build Common. There are instructions at

    https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-common/BUILDING.txt

    HDFS and MapReduce still use Ant, and the instructions for
    cross-project builds are in
    http://wiki.apache.org/hadoop/HowToContribute.

    Please send any questions/ideas for improvements, etc to the dev list.

    Thanks,
    Tom
    On Tue, Aug 2, 2011 at 9:16 AM, Tom White wrote:
    On Sun, Jul 31, 2011 at 9:57 PM, Tom White wrote:
    HADOOP-6671 has now received two +1s (one from Eric Yang and one from
    me), so I would like to commit it on Tuesday at 16:00 GMT
    (http://s.apache.org/6nx). I'll also update the Jenkins jobs for
    running test-patch and performing the nightly build.
    I'm going to go ahead and commit the patch now.

    Tom
    For developers this change will mean that you need to use Maven to
    build Hadoop Common. The build instructions are listed in the
    BUILDING.txt file in the patch, as well as at
    http://wiki.apache.org/hadoop/HowToContribute and
    http://s.apache.org/wb.

    Note that HDFS and MapReduce will still use Ant for building, but
    follow on JIRAs HDFS-2096 and MAPREDUCE-2607 will introduce Maven to
    those builds in the near future. (In terms of staging, it makes sense
    for MAPREDUCE-2607 to go in after MAPREDUCE-279, since the MR2 work
    uses Maven to build its new modules, so the Mavenization of MapReduce
    should build on that work.)

    Thanks,
    Tom
    On Thu, Jul 28, 2011 at 4:41 PM, Alejandro Abdelnur wrote:
    Following up with Hadoop Common mavenization (HADOOP-6671) I've just posted
    a patch for HDFS mavenization (HDFS-2096)

    The HADOOP-6671 patch integrates all feedback received in the JIRA and, IMO,
    it is ready for prime time.

    In order not break HDFS and MAPRED which are still Ant based, there are 2
    patches HDFS-2196 & MAPREDUCE-2741that make some corrections in the ivy
    configuration to work correctly with the Hadoop common JAR (build/published
    by Mavenized build).

    HDFS-2096 is not 100% ready, some testcases are failing and native code
    testing is not wired, but everything else (compile, test, package, tar,
    binary, jdiff, etc is wired).

    * https://issues.apache.org/jira/browse/HADOOP-6671
    * https://issues.apache.org/jira/browse/HDFS-2196
    * https://issues.apache.org/jira/browse/MAPREDUCE-2741
    * https://issues.apache.org/jira/browse/HDFS-2096

    I know these are big changes and we'll have some hiccups, but the benefits
    are big (running testcases is faster, it easily works from IDEs, Maven build
    system can easily be understood by anybody that knows Maven).

    Keeping the patches current is time-consuming, because of this, it would be
    great if we can get in the ones ready (HADOOP-6671, HDFS-2196,
    MAPREDUCE-2741) so we can focus on the rest of the Mavenization work.

    Thanks.

    Alejandro

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgeneral @
categorieshadoop
postedJul 28, '11 at 11:42p
activeAug 2, '11 at 5:38p
posts15
users6
websitehadoop.apache.org
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase