FAQ
I'm a newbie and I am confused by the Hadoop releases.
I thought 0.21.0 is the latest & greatest release that I
should be using but I noticed 0.20.203 has been released
lately, and 0.21.X is marked "unstable, unsupported".

Should I be using 0.20.203?
----
T. "Kuro" Kurosaka

Search Discussions

  • Arun C Murthy at Jul 14, 2011 at 11:43 pm
    Hi,

    0.20.203 is the latest stable release which includes a ton of features (security - kerberos based authentication) and fixes. Its currently deployed at over 50k machines at Yahoo too.
    So, yes, I'd encourage you to use 0.20.203. We, the community, are currently working on hadoop-0.23 and hope to get it out soon.

    thanks,
    Arun
    On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:

    I'm a newbie and I am confused by the Hadoop releases.
    I thought 0.21.0 is the latest & greatest release that I
    should be using but I noticed 0.20.203 has been released
    lately, and 0.21.X is marked "unstable, unsupported".

    Should I be using 0.20.203?
    ----
    T. "Kuro" Kurosaka
  • Isaac Dooley at Jul 15, 2011 at 1:13 pm
    Will 0.23 include Kerberos authentication? Will this finally unite the Yahoo and Apache branches?

    -----Original Message-----
    From: Arun C Murthy
    Sent: Thursday, July 14, 2011 7:43 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Which release to use?

    Hi,

    0.20.203 is the latest stable release which includes a ton of features (security - kerberos based authentication) and fixes. Its currently deployed at over 50k machines at Yahoo too.
    So, yes, I'd encourage you to use 0.20.203. We, the community, are currently working on hadoop-0.23 and hope to get it out soon.

    thanks,
    Arun
    On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:

    I'm a newbie and I am confused by the Hadoop releases.
    I thought 0.21.0 is the latest & greatest release that I
    should be using but I noticed 0.20.203 has been released
    lately, and 0.21.X is marked "unstable, unsupported".

    Should I be using 0.20.203?
    ----
    T. "Kuro" Kurosaka
  • Jonathan Coveney at Jul 15, 2011 at 2:33 pm
    Isaac: there is no more yahoo branch. They are committing all of their code
    to apache.

    2011/7/15 Isaac Dooley <Isaac.Dooley@twosigma.com>
    Will 0.23 include Kerberos authentication? Will this finally unite the
    Yahoo and Apache branches?

    -----Original Message-----
    From: Arun C Murthy
    Sent: Thursday, July 14, 2011 7:43 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Which release to use?

    Hi,

    0.20.203 is the latest stable release which includes a ton of features
    (security - kerberos based authentication) and fixes. Its currently deployed
    at over 50k machines at Yahoo too.
    So, yes, I'd encourage you to use 0.20.203. We, the community, are
    currently working on hadoop-0.23 and hope to get it out soon.

    thanks,
    Arun
    On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:

    I'm a newbie and I am confused by the Hadoop releases.
    I thought 0.21.0 is the latest & greatest release that I
    should be using but I noticed 0.20.203 has been released
    lately, and 0.21.X is marked "unstable, unsupported".

    Should I be using 0.20.203?
    ----
    T. "Kuro" Kurosaka
  • Owen O'Malley at Jul 14, 2011 at 11:46 pm

    On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:

    I'm a newbie and I am confused by the Hadoop releases.
    I thought 0.21.0 is the latest & greatest release that I
    should be using but I noticed 0.20.203 has been released
    lately, and 0.21.X is marked "unstable, unsupported".

    Should I be using 0.20.203?
    Yes, I apologize for confusing release numbering, but the best release to use is 0.20.203.0. It includes security, job limits, and many other improvements over 0.20.2 and 0.21.0. Unfortunately, it doesn't have the new sync support so it isn't suitable for using with HBase. Most large clusters use a separate version of HDFS for HBase.

    -- Owen
  • Adarsh Sharma at Jul 15, 2011 at 4:49 am
    Hadoop releases are issued time by time. But one more thing related to
    hadoop usage,

    There are so many providers that provides the distribution of Hadoop ;

    1. Apache Hadoop
    2. Cloudera
    3. Yahoo

    etc.
    Which distribution is best among them on production usage.
    I think Cloudera's is best among them.


    Best Regards,
    Adarsh
    Owen O'Malley wrote:
    On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:

    I'm a newbie and I am confused by the Hadoop releases.
    I thought 0.21.0 is the latest & greatest release that I
    should be using but I noticed 0.20.203 has been released
    lately, and 0.21.X is marked "unstable, unsupported".

    Should I be using 0.20.203?
    Yes, I apologize for confusing release numbering, but the best release to use is 0.20.203.0. It includes security, job limits, and many other improvements over 0.20.2 and 0.21.0. Unfortunately, it doesn't have the new sync support so it isn't suitable for using with HBase. Most large clusters use a separate version of HDFS for HBase.

    -- Owen
  • Robert Evans at Jul 15, 2011 at 2:36 pm
    Adarsh,

    Yahoo! no longer has its own distribution of Hadoop. It has been merged into the 0.20.2XX line so 0.20.203 is what Yahoo is running internally right now, and we are moving towards 0.20.204 which should be out soon. I am not an expert on Cloudera so I cannot really map its releases to the Apache Releases, but their distro is based off of Apache Hadoop with a few bug fixes and maybe a few features like append added in on top of it, but you need to talk to Cloudera about the exact details. For the most part they are all very similar. You need to think most about support, there are several companies that can sell you support if you want/need it. You also need to think about features vs. stability. The 0.20.203 release has been tested on a lot of machines by many different groups, but may be missing some features that are needed in some situations.

    --Bobby


    On 7/14/11 11:49 PM, "Adarsh Sharma" wrote:

    Hadoop releases are issued time by time. But one more thing related to
    hadoop usage,

    There are so many providers that provides the distribution of Hadoop ;

    1. Apache Hadoop
    2. Cloudera
    3. Yahoo

    etc.
    Which distribution is best among them on production usage.
    I think Cloudera's is best among them.


    Best Regards,
    Adarsh
    Owen O'Malley wrote:
    On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:

    I'm a newbie and I am confused by the Hadoop releases.
    I thought 0.21.0 is the latest & greatest release that I
    should be using but I noticed 0.20.203 has been released
    lately, and 0.21.X is marked "unstable, unsupported".

    Should I be using 0.20.203?
    Yes, I apologize for confusing release numbering, but the best release to use is 0.20.203.0. It includes security, job limits, and many other improvements over 0.20.2 and 0.21.0. Unfortunately, it doesn't have the new sync support so it isn't suitable for using with HBase. Most large clusters use a separate version of HDFS for HBase.

    -- Owen
  • Michael Segel at Jul 15, 2011 at 2:58 pm
    Unfortunately the picture is a bit more confusing.

    Yahoo! is now HortonWorks. Their stated goal is to not have their own derivative release but to sell commercial support for the official Apache release.
    So those selling commercial support are:
    *Cloudera
    *HortonWorks
    *MapRTech
    *EMC (reselling MapRTech, but had announced their own)
    *IBM (not sure what they are selling exactly... still seems like smoke and mirrors...)
    *DataStax

    So while you can use the Apache release, it may not make sense for your organization to do so. (Said as I don the flame retardant suit...)

    The issue is that outside of HortonWorks which is stating that they will support the official Apache release, everything else is a derivative work of Apache's Hadoop. From what I have seen, Cloudera's release is the closest to the Apache release.

    Like I said, things are getting interesting.

    HTH

    -Mike


    From: evans@yahoo-inc.com
    To: common-user@hadoop.apache.org
    Date: Fri, 15 Jul 2011 07:35:45 -0700
    Subject: Re: Which release to use?

    Adarsh,

    Yahoo! no longer has its own distribution of Hadoop. It has been merged into the 0.20.2XX line so 0.20.203 is what Yahoo is running internally right now, and we are moving towards 0.20.204 which should be out soon. I am not an expert on Cloudera so I cannot really map its releases to the Apache Releases, but their distro is based off of Apache Hadoop with a few bug fixes and maybe a few features like append added in on top of it, but you need to talk to Cloudera about the exact details. For the most part they are all very similar. You need to think most about support, there are several companies that can sell you support if you want/need it. You also need to think about features vs. stability. The 0.20.203 release has been tested on a lot of machines by many different groups, but may be missing some features that are needed in some situations.

    --Bobby


    On 7/14/11 11:49 PM, "Adarsh Sharma" wrote:

    Hadoop releases are issued time by time. But one more thing related to
    hadoop usage,

    There are so many providers that provides the distribution of Hadoop ;

    1. Apache Hadoop
    2. Cloudera
    3. Yahoo

    etc.
    Which distribution is best among them on production usage.
    I think Cloudera's is best among them.


    Best Regards,
    Adarsh
    Owen O'Malley wrote:
    On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:

    I'm a newbie and I am confused by the Hadoop releases.
    I thought 0.21.0 is the latest & greatest release that I
    should be using but I noticed 0.20.203 has been released
    lately, and 0.21.X is marked "unstable, unsupported".

    Should I be using 0.20.203?
    Yes, I apologize for confusing release numbering, but the best release to use is 0.20.203.0. It includes security, job limits, and many other improvements over 0.20.2 and 0.21.0. Unfortunately, it doesn't have the new sync support so it isn't suitable for using with HBase. Most large clusters use a separate version of HDFS for HBase.

    -- Owen
  • Owen O'Malley at Jul 15, 2011 at 4:08 pm

    On Jul 15, 2011, at 7:58 AM, Michael Segel wrote:

    So while you can use the Apache release, it may not make sense for your organization to do so. (Said as I don the flame retardant suit...)
    I obviously disagree. *grin* Apache Hadoop 0.20.203.0 is the most stable and well tested release and has been deployed on Yahoo's 40,000 Hadoop machines in clusters of up to 4,500 machines and has been used extensively for running production work loads. We are actively working to make the install and deployment of Apache Hadoop easier

    In terms of commercial support, HortonWorks is absolutely supporting the Apache releases. IBM is also supporting the Apache releases:

    http://davidmenninger.ventanaresearch.com/2011/05/18/ibm-chooses-hadoop-unity-not-shipping-the-elephant/

    So lack of commercial support isn't a problem...

    -- Owen
  • Tom Deutsch at Jul 15, 2011 at 4:38 pm
    One quick clarification - IBM GA'd a product called BigInsights in 2Q. It
    faithfully uses the Hadoop stack and many related projects - but provides
    a number of extensions (that are compatible) based on customer requests.
    Not appropriate to say any more on this list, but the info on it is all
    publically available.


    ------------------------------------------------
    Tom Deutsch
    Program Director
    CTO Office: Information Management
    Hadoop Product Manager / Customer Exec
    IBM
    3565 Harbor Blvd
    Costa Mesa, CA 92626-1420
    tdeutsch@us.ibm.com




    Michael Segel <michael_segel@hotmail.com>
    07/15/2011 07:58 AM
    Please respond to
    common-user@hadoop.apache.org


    To
    <common-user@hadoop.apache.org>
    cc

    Subject
    RE: Which release to use?







    Unfortunately the picture is a bit more confusing.

    Yahoo! is now HortonWorks. Their stated goal is to not have their own
    derivative release but to sell commercial support for the official Apache
    release.
    So those selling commercial support are:
    *Cloudera
    *HortonWorks
    *MapRTech
    *EMC (reselling MapRTech, but had announced their own)
    *IBM (not sure what they are selling exactly... still seems like smoke and
    mirrors...)
    *DataStax

    So while you can use the Apache release, it may not make sense for your
    organization to do so. (Said as I don the flame retardant suit...)

    The issue is that outside of HortonWorks which is stating that they will
    support the official Apache release, everything else is a derivative work
    of Apache's Hadoop. From what I have seen, Cloudera's release is the
    closest to the Apache release.

    Like I said, things are getting interesting.

    HTH
  • Rita at Jul 16, 2011 at 3:53 pm
    I am curious about the IBM product BigInishgts. Where can we download it? It
    seems we have to register to download it?

    On Fri, Jul 15, 2011 at 12:38 PM, Tom Deutsch wrote:

    One quick clarification - IBM GA'd a product called BigInsights in 2Q. It
    faithfully uses the Hadoop stack and many related projects - but provides
    a number of extensions (that are compatible) based on customer requests.
    Not appropriate to say any more on this list, but the info on it is all
    publically available.


    ------------------------------------------------
    Tom Deutsch
    Program Director
    CTO Office: Information Management
    Hadoop Product Manager / Customer Exec
    IBM
    3565 Harbor Blvd
    Costa Mesa, CA 92626-1420
    tdeutsch@us.ibm.com




    Michael Segel <michael_segel@hotmail.com>
    07/15/2011 07:58 AM
    Please respond to
    common-user@hadoop.apache.org


    To
    <common-user@hadoop.apache.org>
    cc

    Subject
    RE: Which release to use?







    Unfortunately the picture is a bit more confusing.

    Yahoo! is now HortonWorks. Their stated goal is to not have their own
    derivative release but to sell commercial support for the official Apache
    release.
    So those selling commercial support are:
    *Cloudera
    *HortonWorks
    *MapRTech
    *EMC (reselling MapRTech, but had announced their own)
    *IBM (not sure what they are selling exactly... still seems like smoke and
    mirrors...)
    *DataStax

    So while you can use the Apache release, it may not make sense for your
    organization to do so. (Said as I don the flame retardant suit...)

    The issue is that outside of HortonWorks which is stating that they will
    support the official Apache release, everything else is a derivative work
    of Apache's Hadoop. From what I have seen, Cloudera's release is the
    closest to the Apache release.

    Like I said, things are getting interesting.

    HTH



    --
    --- Get your facts first, then you can distort them as you please.--
  • Steve Loughran at Jul 17, 2011 at 7:35 pm

    On 16/07/2011 16:53, Rita wrote:
    I am curious about the IBM product BigInishgts. Where can we download it? It
    seems we have to register to download it?
    I think you have to pay to use it
  • Tom Deutsch at Jul 18, 2011 at 10:30 am
    Hi Rita - I want to make sure we are honoring the purpose/approach of this
    list. So you are welcome to ping me for information, but let's take this
    discussion off the list at this point.

    ------------------------------------------------
    Tom Deutsch
    Program Director
    CTO Office: Information Management
    Hadoop Product Manager / Customer Exec
    IBM
    3565 Harbor Blvd
    Costa Mesa, CA 92626-1420
    tdeutsch@us.ibm.com




    Rita <rmorgan466@gmail.com>
    07/16/2011 08:53 AM
    Please respond to
    common-user@hadoop.apache.org


    To
    common-user@hadoop.apache.org
    cc

    Subject
    Re: Which release to use?






    I am curious about the IBM product BigInishgts. Where can we download it?
    It
    seems we have to register to download it?

    On Fri, Jul 15, 2011 at 12:38 PM, Tom Deutsch wrote:

    One quick clarification - IBM GA'd a product called BigInsights in 2Q. It
    faithfully uses the Hadoop stack and many related projects - but provides
    a number of extensions (that are compatible) based on customer requests.
    Not appropriate to say any more on this list, but the info on it is all
    publically available.


    ------------------------------------------------
    Tom Deutsch
    Program Director
    CTO Office: Information Management
    Hadoop Product Manager / Customer Exec
    IBM
    3565 Harbor Blvd
    Costa Mesa, CA 92626-1420
    tdeutsch@us.ibm.com




    Michael Segel <michael_segel@hotmail.com>
    07/15/2011 07:58 AM
    Please respond to
    common-user@hadoop.apache.org


    To
    <common-user@hadoop.apache.org>
    cc

    Subject
    RE: Which release to use?







    Unfortunately the picture is a bit more confusing.

    Yahoo! is now HortonWorks. Their stated goal is to not have their own
    derivative release but to sell commercial support for the official Apache
    release.
    So those selling commercial support are:
    *Cloudera
    *HortonWorks
    *MapRTech
    *EMC (reselling MapRTech, but had announced their own)
    *IBM (not sure what they are selling exactly... still seems like smoke and
    mirrors...)
    *DataStax

    So while you can use the Apache release, it may not make sense for your
    organization to do so. (Said as I don the flame retardant suit...)

    The issue is that outside of HortonWorks which is stating that they will
    support the official Apache release, everything else is a derivative work
    of Apache's Hadoop. From what I have seen, Cloudera's release is the
    closest to the Apache release.

    Like I said, things are getting interesting.

    HTH



    --
    --- Get your facts first, then you can distort them as you please.--
  • Michael Segel at Jul 18, 2011 at 7:10 pm
    Tom,

    I'm not sure that you're really honoring the purpose and approach of this list.

    I mean on the one hand, you're not under any obligation to respond or participate on the list. And I can respect that. You're not in an S&D role so you're not 'customer facing' and not used to having to deal with these types of questions.

    On the other, you're not being free with your information. So when this type of question comes up, it becomes very easy to discount IBM as a release or source provider for commercial support.

    Without information, I'm afraid that I may have to make recommendations to my clients that may be out of date.

    There is even some speculation from analysts that recent comments from IBM are more of an indication that IBM is still not ready for prime time.

    I'm sorry you're not in a position to detail your offering.

    Maybe by September you might be ready and then talk to our CHUG?

    -Mike


    To: common-user@hadoop.apache.org
    Subject: Re: Which release to use?
    From: tdeutsch@us.ibm.com
    Date: Sat, 16 Jul 2011 10:29:55 -0700

    Hi Rita - I want to make sure we are honoring the purpose/approach of this
    list. So you are welcome to ping me for information, but let's take this
    discussion off the list at this point.

    ------------------------------------------------
    Tom Deutsch
    Program Director
    CTO Office: Information Management
    Hadoop Product Manager / Customer Exec
    IBM
    3565 Harbor Blvd
    Costa Mesa, CA 92626-1420
    tdeutsch@us.ibm.com




    Rita <rmorgan466@gmail.com>
    07/16/2011 08:53 AM
    Please respond to
    common-user@hadoop.apache.org


    To
    common-user@hadoop.apache.org
    cc

    Subject
    Re: Which release to use?






    I am curious about the IBM product BigInishgts. Where can we download it?
    It
    seems we have to register to download it?

    On Fri, Jul 15, 2011 at 12:38 PM, Tom Deutsch wrote:

    One quick clarification - IBM GA'd a product called BigInsights in 2Q. It
    faithfully uses the Hadoop stack and many related projects - but provides
    a number of extensions (that are compatible) based on customer requests.
    Not appropriate to say any more on this list, but the info on it is all
    publically available.


    ------------------------------------------------
    Tom Deutsch
    Program Director
    CTO Office: Information Management
    Hadoop Product Manager / Customer Exec
    IBM
    3565 Harbor Blvd
    Costa Mesa, CA 92626-1420
    tdeutsch@us.ibm.com




    Michael Segel <michael_segel@hotmail.com>
    07/15/2011 07:58 AM
    Please respond to
    common-user@hadoop.apache.org


    To
    <common-user@hadoop.apache.org>
    cc

    Subject
    RE: Which release to use?







    Unfortunately the picture is a bit more confusing.

    Yahoo! is now HortonWorks. Their stated goal is to not have their own
    derivative release but to sell commercial support for the official Apache
    release.
    So those selling commercial support are:
    *Cloudera
    *HortonWorks
    *MapRTech
    *EMC (reselling MapRTech, but had announced their own)
    *IBM (not sure what they are selling exactly... still seems like smoke and
    mirrors...)
    *DataStax

    So while you can use the Apache release, it may not make sense for your
    organization to do so. (Said as I don the flame retardant suit...)

    The issue is that outside of HortonWorks which is stating that they will
    support the official Apache release, everything else is a derivative work
    of Apache's Hadoop. From what I have seen, Cloudera's release is the
    closest to the Apache release.

    Like I said, things are getting interesting.

    HTH



    --
    --- Get your facts first, then you can distort them as you please.--
  • Jeff Schmitz at Jul 18, 2011 at 7:31 pm
    Most people are using CH3 - if you need some features from another
    distro use that -

    http://www.cloudera.com/hadoop/

    I wonder if the Cloudera people realize that CH3 was a pretty happening
    punk band back in the day (if not they do now = )

    http://en.wikipedia.org/wiki/Channel_3_%28band%29

    cheers -


    Jeffery Schmitz
    Projects and Technology
    3737 Bellaire Blvd Houston, Texas 77001
    Tel: +1-713-245-7326 Fax: +1 713 245 7678
    Email: Jeff.Schmitz@shell.com
    Intergalactic Proton Powered Electrical Tentacled Advertising Droids!





    -----Original Message-----
    From: Michael Segel
    Sent: Monday, July 18, 2011 2:10 PM
    To: common-user@hadoop.apache.org
    Subject: RE: Which release to use?


    Tom,

    I'm not sure that you're really honoring the purpose and approach of
    this list.

    I mean on the one hand, you're not under any obligation to respond or
    participate on the list. And I can respect that. You're not in an S&D
    role so you're not 'customer facing' and not used to having to deal with
    these types of questions.

    On the other, you're not being free with your information. So when this
    type of question comes up, it becomes very easy to discount IBM as a
    release or source provider for commercial support.

    Without information, I'm afraid that I may have to make recommendations
    to my clients that may be out of date.

    There is even some speculation from analysts that recent comments from
    IBM are more of an indication that IBM is still not ready for prime
    time.

    I'm sorry you're not in a position to detail your offering.

    Maybe by September you might be ready and then talk to our CHUG?

    -Mike


    To: common-user@hadoop.apache.org
    Subject: Re: Which release to use?
    From: tdeutsch@us.ibm.com
    Date: Sat, 16 Jul 2011 10:29:55 -0700

    Hi Rita - I want to make sure we are honoring the purpose/approach of this
    list. So you are welcome to ping me for information, but let's take this
    discussion off the list at this point.

    ------------------------------------------------
    Tom Deutsch
    Program Director
    CTO Office: Information Management
    Hadoop Product Manager / Customer Exec
    IBM
    3565 Harbor Blvd
    Costa Mesa, CA 92626-1420
    tdeutsch@us.ibm.com




    Rita <rmorgan466@gmail.com>
    07/16/2011 08:53 AM
    Please respond to
    common-user@hadoop.apache.org


    To
    common-user@hadoop.apache.org
    cc

    Subject
    Re: Which release to use?






    I am curious about the IBM product BigInishgts. Where can we download it?
    It
    seems we have to register to download it?

    On Fri, Jul 15, 2011 at 12:38 PM, Tom Deutsch wrote:

    One quick clarification - IBM GA'd a product called BigInsights in
    2Q.
    It
    faithfully uses the Hadoop stack and many related projects - but provides
    a number of extensions (that are compatible) based on customer
    requests.
    Not appropriate to say any more on this list, but the info on it is
    all
    publically available.


    ------------------------------------------------
    Tom Deutsch
    Program Director
    CTO Office: Information Management
    Hadoop Product Manager / Customer Exec
    IBM
    3565 Harbor Blvd
    Costa Mesa, CA 92626-1420
    tdeutsch@us.ibm.com




    Michael Segel <michael_segel@hotmail.com>
    07/15/2011 07:58 AM
    Please respond to
    common-user@hadoop.apache.org


    To
    <common-user@hadoop.apache.org>
    cc

    Subject
    RE: Which release to use?







    Unfortunately the picture is a bit more confusing.

    Yahoo! is now HortonWorks. Their stated goal is to not have their
    own
    derivative release but to sell commercial support for the official Apache
    release.
    So those selling commercial support are:
    *Cloudera
    *HortonWorks
    *MapRTech
    *EMC (reselling MapRTech, but had announced their own)
    *IBM (not sure what they are selling exactly... still seems like
    smoke
    and
    mirrors...)
    *DataStax

    So while you can use the Apache release, it may not make sense for
    your
    organization to do so. (Said as I don the flame retardant suit...)

    The issue is that outside of HortonWorks which is stating that they
    will
    support the official Apache release, everything else is a derivative work
    of Apache's Hadoop. From what I have seen, Cloudera's release is the
    closest to the Apache release.

    Like I said, things are getting interesting.

    HTH



    --
    --- Get your facts first, then you can distort them as you please.--
  • Michael Segel at Jul 18, 2011 at 8:50 pm
    Well that's CDH3. :-)

    And yes, that's because up until the past month... other releases didn't exist w commercial support.

    Now there are more players as we look at the movement from leading edge to mainstream adopters.


    Subject: RE: Which release to use?
    Date: Mon, 18 Jul 2011 14:30:39 -0500
    From: Jeff.Schmitz@shell.com
    To: common-user@hadoop.apache.org


    Most people are using CH3 - if you need some features from another
    distro use that -

    http://www.cloudera.com/hadoop/

    I wonder if the Cloudera people realize that CH3 was a pretty happening
    punk band back in the day (if not they do now = )

    http://en.wikipedia.org/wiki/Channel_3_%28band%29

    cheers -


    Jeffery Schmitz
    Projects and Technology
    3737 Bellaire Blvd Houston, Texas 77001
    Tel: +1-713-245-7326 Fax: +1 713 245 7678
    Email: Jeff.Schmitz@shell.com
    Intergalactic Proton Powered Electrical Tentacled Advertising Droids!





    -----Original Message-----
    From: Michael Segel
    Sent: Monday, July 18, 2011 2:10 PM
    To: common-user@hadoop.apache.org
    Subject: RE: Which release to use?


    Tom,

    I'm not sure that you're really honoring the purpose and approach of
    this list.

    I mean on the one hand, you're not under any obligation to respond or
    participate on the list. And I can respect that. You're not in an S&D
    role so you're not 'customer facing' and not used to having to deal with
    these types of questions.

    On the other, you're not being free with your information. So when this
    type of question comes up, it becomes very easy to discount IBM as a
    release or source provider for commercial support.

    Without information, I'm afraid that I may have to make recommendations
    to my clients that may be out of date.

    There is even some speculation from analysts that recent comments from
    IBM are more of an indication that IBM is still not ready for prime
    time.

    I'm sorry you're not in a position to detail your offering.

    Maybe by September you might be ready and then talk to our CHUG?

    -Mike


    To: common-user@hadoop.apache.org
    Subject: Re: Which release to use?
    From: tdeutsch@us.ibm.com
    Date: Sat, 16 Jul 2011 10:29:55 -0700

    Hi Rita - I want to make sure we are honoring the purpose/approach of this
    list. So you are welcome to ping me for information, but let's take this
    discussion off the list at this point.

    ------------------------------------------------
    Tom Deutsch
    Program Director
    CTO Office: Information Management
    Hadoop Product Manager / Customer Exec
    IBM
    3565 Harbor Blvd
    Costa Mesa, CA 92626-1420
    tdeutsch@us.ibm.com




    Rita <rmorgan466@gmail.com>
    07/16/2011 08:53 AM
    Please respond to
    common-user@hadoop.apache.org


    To
    common-user@hadoop.apache.org
    cc

    Subject
    Re: Which release to use?






    I am curious about the IBM product BigInishgts. Where can we download it?
    It
    seems we have to register to download it?

    On Fri, Jul 15, 2011 at 12:38 PM, Tom Deutsch wrote:

    One quick clarification - IBM GA'd a product called BigInsights in
    2Q.
    It
    faithfully uses the Hadoop stack and many related projects - but provides
    a number of extensions (that are compatible) based on customer
    requests.
    Not appropriate to say any more on this list, but the info on it is
    all
    publically available.


    ------------------------------------------------
    Tom Deutsch
    Program Director
    CTO Office: Information Management
    Hadoop Product Manager / Customer Exec
    IBM
    3565 Harbor Blvd
    Costa Mesa, CA 92626-1420
    tdeutsch@us.ibm.com




    Michael Segel <michael_segel@hotmail.com>
    07/15/2011 07:58 AM
    Please respond to
    common-user@hadoop.apache.org


    To
    <common-user@hadoop.apache.org>
    cc

    Subject
    RE: Which release to use?







    Unfortunately the picture is a bit more confusing.

    Yahoo! is now HortonWorks. Their stated goal is to not have their
    own
    derivative release but to sell commercial support for the official Apache
    release.
    So those selling commercial support are:
    *Cloudera
    *HortonWorks
    *MapRTech
    *EMC (reselling MapRTech, but had announced their own)
    *IBM (not sure what they are selling exactly... still seems like
    smoke
    and
    mirrors...)
    *DataStax

    So while you can use the Apache release, it may not make sense for
    your
    organization to do so. (Said as I don the flame retardant suit...)

    The issue is that outside of HortonWorks which is stating that they
    will
    support the official Apache release, everything else is a derivative work
    of Apache's Hadoop. From what I have seen, Cloudera's release is the
    closest to the Apache release.

    Like I said, things are getting interesting.

    HTH



    --
    --- Get your facts first, then you can distort them as you please.--
  • Rita at Jul 19, 2011 at 12:02 am
    I made the big mistake by using the latest version, 0.21.0 and found bunch
    of bugs so I got pissed off at hdfs. Then, after reading this thread it
    seems I should of used 0.20.x .

    I really wish we can fix this on the website, stating 0.21.0 as unstable.


    On Mon, Jul 18, 2011 at 4:50 PM, Michael Segel wrote:


    Well that's CDH3. :-)

    And yes, that's because up until the past month... other releases didn't
    exist w commercial support.

    Now there are more players as we look at the movement from leading edge to
    mainstream adopters.


    Subject: RE: Which release to use?
    Date: Mon, 18 Jul 2011 14:30:39 -0500
    From: Jeff.Schmitz@shell.com
    To: common-user@hadoop.apache.org


    Most people are using CH3 - if you need some features from another
    distro use that -

    http://www.cloudera.com/hadoop/

    I wonder if the Cloudera people realize that CH3 was a pretty happening
    punk band back in the day (if not they do now = )

    http://en.wikipedia.org/wiki/Channel_3_%28band%29

    cheers -


    Jeffery Schmitz
    Projects and Technology
    3737 Bellaire Blvd Houston, Texas 77001
    Tel: +1-713-245-7326 Fax: +1 713 245 7678
    Email: Jeff.Schmitz@shell.com
    Intergalactic Proton Powered Electrical Tentacled Advertising Droids!





    -----Original Message-----
    From: Michael Segel
    Sent: Monday, July 18, 2011 2:10 PM
    To: common-user@hadoop.apache.org
    Subject: RE: Which release to use?


    Tom,

    I'm not sure that you're really honoring the purpose and approach of
    this list.

    I mean on the one hand, you're not under any obligation to respond or
    participate on the list. And I can respect that. You're not in an S&D
    role so you're not 'customer facing' and not used to having to deal with
    these types of questions.

    On the other, you're not being free with your information. So when this
    type of question comes up, it becomes very easy to discount IBM as a
    release or source provider for commercial support.

    Without information, I'm afraid that I may have to make recommendations
    to my clients that may be out of date.

    There is even some speculation from analysts that recent comments from
    IBM are more of an indication that IBM is still not ready for prime
    time.

    I'm sorry you're not in a position to detail your offering.

    Maybe by September you might be ready and then talk to our CHUG?

    -Mike


    To: common-user@hadoop.apache.org
    Subject: Re: Which release to use?
    From: tdeutsch@us.ibm.com
    Date: Sat, 16 Jul 2011 10:29:55 -0700

    Hi Rita - I want to make sure we are honoring the purpose/approach of this
    list. So you are welcome to ping me for information, but let's take this
    discussion off the list at this point.

    ------------------------------------------------
    Tom Deutsch
    Program Director
    CTO Office: Information Management
    Hadoop Product Manager / Customer Exec
    IBM
    3565 Harbor Blvd
    Costa Mesa, CA 92626-1420
    tdeutsch@us.ibm.com




    Rita <rmorgan466@gmail.com>
    07/16/2011 08:53 AM
    Please respond to
    common-user@hadoop.apache.org


    To
    common-user@hadoop.apache.org
    cc

    Subject
    Re: Which release to use?






    I am curious about the IBM product BigInishgts. Where can we download it?
    It
    seems we have to register to download it?


    On Fri, Jul 15, 2011 at 12:38 PM, Tom Deutsch <tdeutsch@us.ibm.com>
    wrote:
    One quick clarification - IBM GA'd a product called BigInsights in
    2Q.
    It
    faithfully uses the Hadoop stack and many related projects - but provides
    a number of extensions (that are compatible) based on customer
    requests.
    Not appropriate to say any more on this list, but the info on it is
    all
    publically available.


    ------------------------------------------------
    Tom Deutsch
    Program Director
    CTO Office: Information Management
    Hadoop Product Manager / Customer Exec
    IBM
    3565 Harbor Blvd
    Costa Mesa, CA 92626-1420
    tdeutsch@us.ibm.com




    Michael Segel <michael_segel@hotmail.com>
    07/15/2011 07:58 AM
    Please respond to
    common-user@hadoop.apache.org


    To
    <common-user@hadoop.apache.org>
    cc

    Subject
    RE: Which release to use?







    Unfortunately the picture is a bit more confusing.

    Yahoo! is now HortonWorks. Their stated goal is to not have their
    own
    derivative release but to sell commercial support for the official Apache
    release.
    So those selling commercial support are:
    *Cloudera
    *HortonWorks
    *MapRTech
    *EMC (reselling MapRTech, but had announced their own)
    *IBM (not sure what they are selling exactly... still seems like
    smoke
    and
    mirrors...)
    *DataStax

    So while you can use the Apache release, it may not make sense for
    your
    organization to do so. (Said as I don the flame retardant suit...)

    The issue is that outside of HortonWorks which is stating that they
    will
    support the official Apache release, everything else is a derivative work
    of Apache's Hadoop. From what I have seen, Cloudera's release is the
    closest to the Apache release.

    Like I said, things are getting interesting.

    HTH



    --
    --- Get your facts first, then you can distort them as you please.--


    --
    --- Get your facts first, then you can distort them as you please.--
  • Allen Wittenauer at Jul 19, 2011 at 12:12 am

    On Jul 18, 2011, at 5:01 PM, Rita wrote:

    I made the big mistake by using the latest version, 0.21.0 and found bunch
    of bugs so I got pissed off at hdfs. Then, after reading this thread it
    seems I should of used 0.20.x .

    I really wish we can fix this on the website, stating 0.21.0 as unstable.

    It is stated in a few places on the website that 0.21 isn't stable:

    http://hadoop.apache.org/common/releases.html#23+August%2C+2010%3A+release+0.21.0+available

    "It has not undergone testing at scale and should not be considered stable or suitable for production."

    ... and ...

    http://hadoop.apache.org/common/releases.html#Download

    "0.21.X - unstable, unsupported, does not include security"

    and it isn't in the stable directory on the apache download mirrors.
  • Rita at Jul 19, 2011 at 1:03 am
    I am a dimwit.

    On Mon, Jul 18, 2011 at 8:12 PM, Allen Wittenauer wrote:

    On Jul 18, 2011, at 5:01 PM, Rita wrote:

    I made the big mistake by using the latest version, 0.21.0 and found bunch
    of bugs so I got pissed off at hdfs. Then, after reading this thread it
    seems I should of used 0.20.x .

    I really wish we can fix this on the website, stating 0.21.0 as unstable.

    It is stated in a few places on the website that 0.21 isn't stable:


    http://hadoop.apache.org/common/releases.html#23+August%2C+2010%3A+release+0.21.0+available

    "It has not undergone testing at scale and should not be considered stable
    or suitable for production."

    ... and ...

    http://hadoop.apache.org/common/releases.html#Download

    "0.21.X - unstable, unsupported, does not include security"

    and it isn't in the stable directory on the apache download mirrors.


    --
    --- Get your facts first, then you can distort them as you please.--
  • Allen Wittenauer at Jul 19, 2011 at 1:12 am

    On Jul 18, 2011, at 6:02 PM, Rita wrote:

    I am a dimwit.

    We are conditioned by marketing that a higher number is always better. Experience tells us that this is not necessarily true.
  • Arun C Murthy at Jul 15, 2011 at 5:07 pm
    Apache Hadoop is a volunteer driven, open-source project. The contributors to Apache Hadoop, both individuals and folks across a diverse set of organizations, are committed to driving the project forward and making timely releases - see discussion on hadoop-0.23 with a raft newer features such as HDFS Federation, NextGen MapReduce and plans for HA NameNode etc.

    As with most successful projects there are several options for commercial support to Hadoop or its derivatives.

    However, Apache Hadoop has thrived before there was any commercial support (I've personally been involved in over 20 releases of Apache Hadoop and deployed them while at Yahoo) and I'm sure it will in this new world order.

    We, the Apache Hadoop community, are committed to keeping Apache Hadoop 'free', providing support to our users and to move it forward at a rapid rate.

    Arun
    On Jul 15, 2011, at 7:58 AM, Michael Segel wrote:


    Unfortunately the picture is a bit more confusing.

    Yahoo! is now HortonWorks. Their stated goal is to not have their own derivative release but to sell commercial support for the official Apache release.
    So those selling commercial support are:
    *Cloudera
    *HortonWorks
    *MapRTech
    *EMC (reselling MapRTech, but had announced their own)
    *IBM (not sure what they are selling exactly... still seems like smoke and mirrors...)
    *DataStax

    So while you can use the Apache release, it may not make sense for your organization to do so. (Said as I don the flame retardant suit...)

    The issue is that outside of HortonWorks which is stating that they will support the official Apache release, everything else is a derivative work of Apache's Hadoop. From what I have seen, Cloudera's release is the closest to the Apache release.

    Like I said, things are getting interesting.

    HTH

    -Mike


    From: evans@yahoo-inc.com
    To: common-user@hadoop.apache.org
    Date: Fri, 15 Jul 2011 07:35:45 -0700
    Subject: Re: Which release to use?

    Adarsh,

    Yahoo! no longer has its own distribution of Hadoop. It has been merged into the 0.20.2XX line so 0.20.203 is what Yahoo is running internally right now, and we are moving towards 0.20.204 which should be out soon. I am not an expert on Cloudera so I cannot really map its releases to the Apache Releases, but their distro is based off of Apache Hadoop with a few bug fixes and maybe a few features like append added in on top of it, but you need to talk to Cloudera about the exact details. For the most part they are all very similar. You need to think most about support, there are several companies that can sell you support if you want/need it. You also need to think about features vs. stability. The 0.20.203 release has been tested on a lot of machines by many different groups, but may be missing some features that are needed in some situations.

    --Bobby


    On 7/14/11 11:49 PM, "Adarsh Sharma" wrote:

    Hadoop releases are issued time by time. But one more thing related to
    hadoop usage,

    There are so many providers that provides the distribution of Hadoop ;

    1. Apache Hadoop
    2. Cloudera
    3. Yahoo

    etc.
    Which distribution is best among them on production usage.
    I think Cloudera's is best among them.


    Best Regards,
    Adarsh
    Owen O'Malley wrote:
    On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:

    I'm a newbie and I am confused by the Hadoop releases.
    I thought 0.21.0 is the latest & greatest release that I
    should be using but I noticed 0.20.203 has been released
    lately, and 0.21.X is marked "unstable, unsupported".

    Should I be using 0.20.203?
    Yes, I apologize for confusing release numbering, but the best release to use is 0.20.203.0. It includes security, job limits, and many other improvements over 0.20.2 and 0.21.0. Unfortunately, it doesn't have the new sync support so it isn't suitable for using with HBase. Most large clusters use a separate version of HDFS for HBase.

    -- Owen
  • Steve Loughran at Jul 15, 2011 at 9:07 pm

    On 15/07/2011 18:06, Arun C Murthy wrote:
    Apache Hadoop is a volunteer driven, open-source project. The contributors to Apache Hadoop, both individuals and folks across a diverse set of organizations, are committed to driving the project forward and making timely releases - see discussion on hadoop-0.23 with a raft newer features such as HDFS Federation, NextGen MapReduce and plans for HA NameNode etc.

    As with most successful projects there are several options for commercial support to Hadoop or its derivatives.

    However, Apache Hadoop has thrived before there was any commercial support (I've personally been involved in over 20 releases of Apache Hadoop and deployed them while at Yahoo) and I'm sure it will in this new world order.

    We, the Apache Hadoop community, are committed to keeping Apache Hadoop 'free', providing support to our users and to move it forward at a rapid rate.
    Arun makes a good point which is that the Apache project depends on
    contributions from the community to thrive. That includes

    -bug reports
    -patches to fix problems
    -more tests
    -documentation improvements: more examples, more on getting started,
    troubleshooting, etc.

    If there's something lacking in the codebase, and you think you can fix
    it, please do so. Helping with the documentation is a good start, as it
    can be improved, and you aren't going to break anything.

    Once you get into changing the code, you'll end up working with the head
    of whichever branch you are targeting.

    The other area everyone can contribute on is testing. Yes, Y! and FB can
    test at scale, yes, other people can test large clusters too -but nobody
    has a network that looks like yours but you. And Hadoop does care about
    network configurations. Testing beta and release candidate releases in
    your infrastructure, helps verify that the final release will work on
    your site, and you don't end up getting all the phone calls about
    something not working
  • Jeff Schmitz at Jul 18, 2011 at 1:31 pm
    Steve,

    I read your blog nice post - I believe EMC is selling the Greenplumb
    solution as an appliance -

    Cheers -

    Jeffery

    -----Original Message-----
    From: Steve Loughran
    Sent: Friday, July 15, 2011 4:07 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Which release to use?
    On 15/07/2011 18:06, Arun C Murthy wrote:
    Apache Hadoop is a volunteer driven, open-source project. The
    contributors to Apache Hadoop, both individuals and folks across a
    diverse set of organizations, are committed to driving the project
    forward and making timely releases - see discussion on hadoop-0.23 with
    a raft newer features such as HDFS Federation, NextGen MapReduce and
    plans for HA NameNode etc.
    As with most successful projects there are several options for
    commercial support to Hadoop or its derivatives.
    However, Apache Hadoop has thrived before there was any commercial
    support (I've personally been involved in over 20 releases of Apache
    Hadoop and deployed them while at Yahoo) and I'm sure it will in this
    new world order.
    We, the Apache Hadoop community, are committed to keeping Apache
    Hadoop 'free', providing support to our users and to move it forward at
    a rapid rate.
    >

    Arun makes a good point which is that the Apache project depends on
    contributions from the community to thrive. That includes

    -bug reports
    -patches to fix problems
    -more tests
    -documentation improvements: more examples, more on getting started,
    troubleshooting, etc.

    If there's something lacking in the codebase, and you think you can fix
    it, please do so. Helping with the documentation is a good start, as it
    can be improved, and you aren't going to break anything.

    Once you get into changing the code, you'll end up working with the head

    of whichever branch you are targeting.

    The other area everyone can contribute on is testing. Yes, Y! and FB can

    test at scale, yes, other people can test large clusters too -but nobody

    has a network that looks like yours but you. And Hadoop does care about
    network configurations. Testing beta and release candidate releases in
    your infrastructure, helps verify that the final release will work on
    your site, and you don't end up getting all the phone calls about
    something not working
  • Michael Segel at Jul 18, 2011 at 6:34 pm
    EMC has inked a deal with MapRTech to resell their release and support services for MapRTech.
    Does this mean that they are going to stop selling their own release on Greenplum? Maybe not in the near future, however,
    a Greenplum appliance may not get the customer transaction that their reselling of MapR will generate.

    It sounds like they are hedging their bets and are taking an 'IBM' approach.

    Subject: RE: Which release to use?
    Date: Mon, 18 Jul 2011 08:30:59 -0500
    From: Jeff.Schmitz@shell.com
    To: common-user@hadoop.apache.org

    Steve,

    I read your blog nice post - I believe EMC is selling the Greenplumb
    solution as an appliance -

    Cheers -

    Jeffery

    -----Original Message-----
    From: Steve Loughran
    Sent: Friday, July 15, 2011 4:07 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Which release to use?
    On 15/07/2011 18:06, Arun C Murthy wrote:
    Apache Hadoop is a volunteer driven, open-source project. The
    contributors to Apache Hadoop, both individuals and folks across a
    diverse set of organizations, are committed to driving the project
    forward and making timely releases - see discussion on hadoop-0.23 with
    a raft newer features such as HDFS Federation, NextGen MapReduce and
    plans for HA NameNode etc.
    As with most successful projects there are several options for
    commercial support to Hadoop or its derivatives.
    However, Apache Hadoop has thrived before there was any commercial
    support (I've personally been involved in over 20 releases of Apache
    Hadoop and deployed them while at Yahoo) and I'm sure it will in this
    new world order.
    We, the Apache Hadoop community, are committed to keeping Apache
    Hadoop 'free', providing support to our users and to move it forward at
    a rapid rate.
    Arun makes a good point which is that the Apache project depends on
    contributions from the community to thrive. That includes

    -bug reports
    -patches to fix problems
    -more tests
    -documentation improvements: more examples, more on getting started,
    troubleshooting, etc.

    If there's something lacking in the codebase, and you think you can fix
    it, please do so. Helping with the documentation is a good start, as it
    can be improved, and you aren't going to break anything.

    Once you get into changing the code, you'll end up working with the head

    of whichever branch you are targeting.

    The other area everyone can contribute on is testing. Yes, Y! and FB can

    test at scale, yes, other people can test large clusters too -but nobody

    has a network that looks like yours but you. And Hadoop does care about
    network configurations. Testing beta and release candidate releases in
    your infrastructure, helps verify that the final release will work on
    your site, and you don't end up getting all the phone calls about
    something not working
  • M. C. Srivas at Jul 19, 2011 at 1:20 am
    Mike,

    Just a minor inaccuracy in your email. Here's setting the record straight:

    1. MapR directly sells their distribution of Hadoop. Support is from MapR.
    2. EMC also sells the MapR distribution, for use on any hardware. Support is
    from EMC worldwide.
    3. EMC also sells a Hadoop appliance, which has the MapR distribution
    specially built for it. Support is from EMC.

    4. MapR also has a free, unlimited, unrestricted version called M3, which
    has the same 2-5x performance, management and stability improvements, and
    includes NFS. It is not crippleware, and the unlimited, unrestricted, free
    use does not expire on any date.

    Hope that clarifies what MapR is doing.

    thanks & regards,
    Srivas.

    On Mon, Jul 18, 2011 at 11:33 AM, Michael Segel
    wrote:
    EMC has inked a deal with MapRTech to resell their release and support
    services for MapRTech.
    Does this mean that they are going to stop selling their own release on
    Greenplum? Maybe not in the near future, however,
    a Greenplum appliance may not get the customer transaction that their
    reselling of MapR will generate.

    It sounds like they are hedging their bets and are taking an 'IBM'
    approach.

    Subject: RE: Which release to use?
    Date: Mon, 18 Jul 2011 08:30:59 -0500
    From: Jeff.Schmitz@shell.com
    To: common-user@hadoop.apache.org

    Steve,

    I read your blog nice post - I believe EMC is selling the Greenplumb
    solution as an appliance -

    Cheers -

    Jeffery

    -----Original Message-----
    From: Steve Loughran
    Sent: Friday, July 15, 2011 4:07 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Which release to use?
    On 15/07/2011 18:06, Arun C Murthy wrote:
    Apache Hadoop is a volunteer driven, open-source project. The
    contributors to Apache Hadoop, both individuals and folks across a
    diverse set of organizations, are committed to driving the project
    forward and making timely releases - see discussion on hadoop-0.23 with
    a raft newer features such as HDFS Federation, NextGen MapReduce and
    plans for HA NameNode etc.
    As with most successful projects there are several options for
    commercial support to Hadoop or its derivatives.
    However, Apache Hadoop has thrived before there was any commercial
    support (I've personally been involved in over 20 releases of Apache
    Hadoop and deployed them while at Yahoo) and I'm sure it will in this
    new world order.
    We, the Apache Hadoop community, are committed to keeping Apache
    Hadoop 'free', providing support to our users and to move it forward at
    a rapid rate.
    Arun makes a good point which is that the Apache project depends on
    contributions from the community to thrive. That includes

    -bug reports
    -patches to fix problems
    -more tests
    -documentation improvements: more examples, more on getting started,
    troubleshooting, etc.

    If there's something lacking in the codebase, and you think you can fix
    it, please do so. Helping with the documentation is a good start, as it
    can be improved, and you aren't going to break anything.

    Once you get into changing the code, you'll end up working with the head

    of whichever branch you are targeting.

    The other area everyone can contribute on is testing. Yes, Y! and FB can

    test at scale, yes, other people can test large clusters too -but nobody

    has a network that looks like yours but you. And Hadoop does care about
    network configurations. Testing beta and release candidate releases in
    your infrastructure, helps verify that the final release will work on
    your site, and you don't end up getting all the phone calls about
    something not working
  • Michael Segel at Jul 19, 2011 at 1:51 am

    Date: Mon, 18 Jul 2011 18:19:38 -0700
    Subject: Re: Which release to use?
    From: mcsrivas@gmail.com
    To: common-user@hadoop.apache.org

    Mike,

    Just a minor inaccuracy in your email. Here's setting the record straight:

    1. MapR directly sells their distribution of Hadoop. Support is from MapR.
    2. EMC also sells the MapR distribution, for use on any hardware. Support is
    from EMC worldwide.
    3. EMC also sells a Hadoop appliance, which has the MapR distribution
    specially built for it. Support is from EMC.

    4. MapR also has a free, unlimited, unrestricted version called M3, which
    has the same 2-5x performance, management and stability improvements, and
    includes NFS. It is not crippleware, and the unlimited, unrestricted, free
    use does not expire on any date.

    Hope that clarifies what MapR is doing.

    thanks & regards,
    Srivas.
    Srivas,

    I'm sorry, I thought I was being clear in that I was only addressing EMC and not MapR directly.
    I was responding to post about EMC selling a Greenplum appliance. I wanted to point out that EMC will resell MapR's release along with their own (EMC) support.

    The point I was trying to make was that with respect to derivatives of Hadoop, I believe that MapR has a more compelling story than either EMC or DataStax. IMHO replacing Java HDFS w either GreenPlum or Cassandra has a limited market. When a company is going to look at a M/R solution cost and performance are going to be at the top of the list. MapR isn't cheap but if you look at the features in M5, if they work, then you have a very compelling reason to look at their release. Some of the people I spoke to when I was in Santa Clara were in the beta program. They indicated that MapR did what they claimed.

    Things are definitely starting to look interesting.

    -Mike
    On Mon, Jul 18, 2011 at 11:33 AM, Michael Segel
    wrote:
    EMC has inked a deal with MapRTech to resell their release and support
    services for MapRTech.
    Does this mean that they are going to stop selling their own release on
    Greenplum? Maybe not in the near future, however,
    a Greenplum appliance may not get the customer transaction that their
    reselling of MapR will generate.

    It sounds like they are hedging their bets and are taking an 'IBM'
    approach.

    Subject: RE: Which release to use?
    Date: Mon, 18 Jul 2011 08:30:59 -0500
    From: Jeff.Schmitz@shell.com
    To: common-user@hadoop.apache.org

    Steve,

    I read your blog nice post - I believe EMC is selling the Greenplumb
    solution as an appliance -

    Cheers -

    Jeffery

    -----Original Message-----
    From: Steve Loughran
    Sent: Friday, July 15, 2011 4:07 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Which release to use?
    On 15/07/2011 18:06, Arun C Murthy wrote:
    Apache Hadoop is a volunteer driven, open-source project. The
    contributors to Apache Hadoop, both individuals and folks across a
    diverse set of organizations, are committed to driving the project
    forward and making timely releases - see discussion on hadoop-0.23 with
    a raft newer features such as HDFS Federation, NextGen MapReduce and
    plans for HA NameNode etc.
    As with most successful projects there are several options for
    commercial support to Hadoop or its derivatives.
    However, Apache Hadoop has thrived before there was any commercial
    support (I've personally been involved in over 20 releases of Apache
    Hadoop and deployed them while at Yahoo) and I'm sure it will in this
    new world order.
    We, the Apache Hadoop community, are committed to keeping Apache
    Hadoop 'free', providing support to our users and to move it forward at
    a rapid rate.
    Arun makes a good point which is that the Apache project depends on
    contributions from the community to thrive. That includes

    -bug reports
    -patches to fix problems
    -more tests
    -documentation improvements: more examples, more on getting started,
    troubleshooting, etc.

    If there's something lacking in the codebase, and you think you can fix
    it, please do so. Helping with the documentation is a good start, as it
    can be improved, and you aren't going to break anything.

    Once you get into changing the code, you'll end up working with the head

    of whichever branch you are targeting.

    The other area everyone can contribute on is testing. Yes, Y! and FB can

    test at scale, yes, other people can test large clusters too -but nobody

    has a network that looks like yours but you. And Hadoop does care about
    network configurations. Testing beta and release candidate releases in
    your infrastructure, helps verify that the final release will work on
    your site, and you don't end up getting all the phone calls about
    something not working
  • Joe Stein at Jul 19, 2011 at 2:01 am
    So, last I checked this list was about Apache Hadoop not about derivative works.

    The Cloudera team has always been diligent (you rock) about redirecting non apache CDH releases to their list for answers.

    I commend those supporting apache releases of Hadoop too, very cool!!!

    But yeah, even I have to ask what the latest release will be. Is there going to be a single Hadoop release or a continued branch that Horton maintains and will only support?

    There is something to be said for release from trunk that gets everyone on the same page towards our common goals. You can pin the "state the obvious" paper on my back but kinda feel it had to be said.

    One love, Apache Hadoop!

    /*
    Joe Stein
    http://www.medialets.com
    Twitter: @allthingshadoop
    */
    On Jul 18, 2011, at 9:51 PM, Michael Segel wrote:



    Date: Mon, 18 Jul 2011 18:19:38 -0700
    Subject: Re: Which release to use?
    From: mcsrivas@gmail.com
    To: common-user@hadoop.apache.org

    Mike,

    Just a minor inaccuracy in your email. Here's setting the record straight:

    1. MapR directly sells their distribution of Hadoop. Support is from MapR.
    2. EMC also sells the MapR distribution, for use on any hardware. Support is
    from EMC worldwide.
    3. EMC also sells a Hadoop appliance, which has the MapR distribution
    specially built for it. Support is from EMC.

    4. MapR also has a free, unlimited, unrestricted version called M3, which
    has the same 2-5x performance, management and stability improvements, and
    includes NFS. It is not crippleware, and the unlimited, unrestricted, free
    use does not expire on any date.

    Hope that clarifies what MapR is doing.

    thanks & regards,
    Srivas.
    Srivas,

    I'm sorry, I thought I was being clear in that I was only addressing EMC and not MapR directly.
    I was responding to post about EMC selling a Greenplum appliance. I wanted to point out that EMC will resell MapR's release along with their own (EMC) support.

    The point I was trying to make was that with respect to derivatives of Hadoop, I believe that MapR has a more compelling story than either EMC or DataStax. IMHO replacing Java HDFS w either GreenPlum or Cassandra has a limited market. When a company is going to look at a M/R solution cost and performance are going to be at the top of the list. MapR isn't cheap but if you look at the features in M5, if they work, then you have a very compelling reason to look at their release. Some of the people I spoke to when I was in Santa Clara were in the beta program. They indicated that MapR did what they claimed.

    Things are definitely starting to look interesting.

    -Mike
    On Mon, Jul 18, 2011 at 11:33 AM, Michael Segel
    wrote:
    EMC has inked a deal with MapRTech to resell their release and support
    services for MapRTech.
    Does this mean that they are going to stop selling their own release on
    Greenplum? Maybe not in the near future, however,
    a Greenplum appliance may not get the customer transaction that their
    reselling of MapR will generate.

    It sounds like they are hedging their bets and are taking an 'IBM'
    approach.

    Subject: RE: Which release to use?
    Date: Mon, 18 Jul 2011 08:30:59 -0500
    From: Jeff.Schmitz@shell.com
    To: common-user@hadoop.apache.org

    Steve,

    I read your blog nice post - I believe EMC is selling the Greenplumb
    solution as an appliance -

    Cheers -

    Jeffery

    -----Original Message-----
    From: Steve Loughran
    Sent: Friday, July 15, 2011 4:07 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Which release to use?
    On 15/07/2011 18:06, Arun C Murthy wrote:
    Apache Hadoop is a volunteer driven, open-source project. The
    contributors to Apache Hadoop, both individuals and folks across a
    diverse set of organizations, are committed to driving the project
    forward and making timely releases - see discussion on hadoop-0.23 with
    a raft newer features such as HDFS Federation, NextGen MapReduce and
    plans for HA NameNode etc.
    As with most successful projects there are several options for
    commercial support to Hadoop or its derivatives.
    However, Apache Hadoop has thrived before there was any commercial
    support (I've personally been involved in over 20 releases of Apache
    Hadoop and deployed them while at Yahoo) and I'm sure it will in this
    new world order.
    We, the Apache Hadoop community, are committed to keeping Apache
    Hadoop 'free', providing support to our users and to move it forward at
    a rapid rate.
    Arun makes a good point which is that the Apache project depends on
    contributions from the community to thrive. That includes

    -bug reports
    -patches to fix problems
    -more tests
    -documentation improvements: more examples, more on getting started,
    troubleshooting, etc.

    If there's something lacking in the codebase, and you think you can fix
    it, please do so. Helping with the documentation is a good start, as it
    can be improved, and you aren't going to break anything.

    Once you get into changing the code, you'll end up working with the head

    of whichever branch you are targeting.

    The other area everyone can contribute on is testing. Yes, Y! and FB can

    test at scale, yes, other people can test large clusters too -but nobody

    has a network that looks like yours but you. And Hadoop does care about
    network configurations. Testing beta and release candidate releases in
    your infrastructure, helps verify that the final release will work on
    your site, and you don't end up getting all the phone calls about
    something not working
  • Arun Murthy at Jul 19, 2011 at 3:07 am
    Joe,

    The dev community is currently gearing up for hadoop-0.23 off trunk.

    0.23 is a massive step forward with with HDFS Federation, NextGen
    MapReduce and possible others such as wire-compat and HA NameNode.

    In a couple of weeks I plan to create the 0.23 branch off trunk and we
    then spend all our energies stabilizing & pushing the release out.
    Please see my note to general@ for more details.

    Arun
    On Jul 18, 2011, at 7:01 PM, Joe Stein wrote:

    So, last I checked this list was about Apache Hadoop not about derivative works.

    The Cloudera team has always been diligent (you rock) about redirecting non apache CDH releases to their list for answers.

    I commend those supporting apache releases of Hadoop too, very cool!!!

    But yeah, even I have to ask what the latest release will be. Is there going to be a single Hadoop release or a continued branch that Horton maintains and will only support?

    There is something to be said for release from trunk that gets everyone on the same page towards our common goals. You can pin the "state the obvious" paper on my back but kinda feel it had to be said.

    One love, Apache Hadoop!

    /*
    Joe Stein
    http://www.medialets.com
    Twitter: @allthingshadoop
    */
    On Jul 18, 2011, at 9:51 PM, Michael Segel wrote:



    Date: Mon, 18 Jul 2011 18:19:38 -0700
    Subject: Re: Which release to use?
    From: mcsrivas@gmail.com
    To: common-user@hadoop.apache.org

    Mike,

    Just a minor inaccuracy in your email. Here's setting the record straight:

    1. MapR directly sells their distribution of Hadoop. Support is from MapR.
    2. EMC also sells the MapR distribution, for use on any hardware. Support is
    from EMC worldwide.
    3. EMC also sells a Hadoop appliance, which has the MapR distribution
    specially built for it. Support is from EMC.

    4. MapR also has a free, unlimited, unrestricted version called M3, which
    has the same 2-5x performance, management and stability improvements, and
    includes NFS. It is not crippleware, and the unlimited, unrestricted, free
    use does not expire on any date.

    Hope that clarifies what MapR is doing.

    thanks & regards,
    Srivas.
    Srivas,

    I'm sorry, I thought I was being clear in that I was only addressing EMC and not MapR directly.
    I was responding to post about EMC selling a Greenplum appliance. I wanted to point out that EMC will resell MapR's release along with their own (EMC) support.

    The point I was trying to make was that with respect to derivatives of Hadoop, I believe that MapR has a more compelling story than either EMC or DataStax. IMHO replacing Java HDFS w either GreenPlum or Cassandra has a limited market. When a company is going to look at a M/R solution cost and performance are going to be at the top of the list. MapR isn't cheap but if you look at the features in M5, if they work, then you have a very compelling reason to look at their release. Some of the people I spoke to when I was in Santa Clara were in the beta program. They indicated that MapR did what they claimed.

    Things are definitely starting to look interesting.

    -Mike
    On Mon, Jul 18, 2011 at 11:33 AM, Michael Segel
    wrote:
    EMC has inked a deal with MapRTech to resell their release and support
    services for MapRTech.
    Does this mean that they are going to stop selling their own release on
    Greenplum? Maybe not in the near future, however,
    a Greenplum appliance may not get the customer transaction that their
    reselling of MapR will generate.

    It sounds like they are hedging their bets and are taking an 'IBM'
    approach.

    Subject: RE: Which release to use?
    Date: Mon, 18 Jul 2011 08:30:59 -0500
    From: Jeff.Schmitz@shell.com
    To: common-user@hadoop.apache.org

    Steve,

    I read your blog nice post - I believe EMC is selling the Greenplumb
    solution as an appliance -

    Cheers -

    Jeffery

    -----Original Message-----
    From: Steve Loughran
    Sent: Friday, July 15, 2011 4:07 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Which release to use?
    On 15/07/2011 18:06, Arun C Murthy wrote:
    Apache Hadoop is a volunteer driven, open-source project. The
    contributors to Apache Hadoop, both individuals and folks across a
    diverse set of organizations, are committed to driving the project
    forward and making timely releases - see discussion on hadoop-0.23 with
    a raft newer features such as HDFS Federation, NextGen MapReduce and
    plans for HA NameNode etc.
    As with most successful projects there are several options for
    commercial support to Hadoop or its derivatives.
    However, Apache Hadoop has thrived before there was any commercial
    support (I've personally been involved in over 20 releases of Apache
    Hadoop and deployed them while at Yahoo) and I'm sure it will in this
    new world order.
    We, the Apache Hadoop community, are committed to keeping Apache
    Hadoop 'free', providing support to our users and to move it forward at
    a rapid rate.
    Arun makes a good point which is that the Apache project depends on
    contributions from the community to thrive. That includes

    -bug reports
    -patches to fix problems
    -more tests
    -documentation improvements: more examples, more on getting started,
    troubleshooting, etc.

    If there's something lacking in the codebase, and you think you can fix
    it, please do so. Helping with the documentation is a good start, as it
    can be improved, and you aren't going to break anything.

    Once you get into changing the code, you'll end up working with the head

    of whichever branch you are targeting.

    The other area everyone can contribute on is testing. Yes, Y! and FB can

    test at scale, yes, other people can test large clusters too -but nobody

    has a network that looks like yours but you. And Hadoop does care about
    network configurations. Testing beta and release candidate releases in
    your infrastructure, helps verify that the final release will work on
    your site, and you don't end up getting all the phone calls about
    something not working
  • Joe Stein at Jul 19, 2011 at 3:20 am
    Arun,

    Thanks for the update.

    Again, I hate to have to play the part of captain obvious.

    Glad to hear the same contiguous mantra for this next release. I think sometimes the plebeians ( of which I am one ) need that affirmation.

    One love, Apache Hadoop!

    /*
    Joe Stein
    http://www.medialets.com
    Twitter: @allthingshadoop
    */
    On Jul 18, 2011, at 11:06 PM, Arun Murthy wrote:

    Joe,

    The dev community is currently gearing up for hadoop-0.23 off trunk.

    0.23 is a massive step forward with with HDFS Federation, NextGen
    MapReduce and possible others such as wire-compat and HA NameNode.

    In a couple of weeks I plan to create the 0.23 branch off trunk and we
    then spend all our energies stabilizing & pushing the release out.
    Please see my note to general@ for more details.

    Arun
    On Jul 18, 2011, at 7:01 PM, Joe Stein wrote:

    So, last I checked this list was about Apache Hadoop not about derivative works.

    The Cloudera team has always been diligent (you rock) about redirecting non apache CDH releases to their list for answers.

    I commend those supporting apache releases of Hadoop too, very cool!!!

    But yeah, even I have to ask what the latest release will be. Is there going to be a single Hadoop release or a continued branch that Horton maintains and will only support?

    There is something to be said for release from trunk that gets everyone on the same page towards our common goals. You can pin the "state the obvious" paper on my back but kinda feel it had to be said.

    One love, Apache Hadoop!

    /*
    Joe Stein
    http://www.medialets.com
    Twitter: @allthingshadoop
    */
    On Jul 18, 2011, at 9:51 PM, Michael Segel wrote:



    Date: Mon, 18 Jul 2011 18:19:38 -0700
    Subject: Re: Which release to use?
    From: mcsrivas@gmail.com
    To: common-user@hadoop.apache.org

    Mike,

    Just a minor inaccuracy in your email. Here's setting the record straight:

    1. MapR directly sells their distribution of Hadoop. Support is from MapR.
    2. EMC also sells the MapR distribution, for use on any hardware. Support is
    from EMC worldwide.
    3. EMC also sells a Hadoop appliance, which has the MapR distribution
    specially built for it. Support is from EMC.

    4. MapR also has a free, unlimited, unrestricted version called M3, which
    has the same 2-5x performance, management and stability improvements, and
    includes NFS. It is not crippleware, and the unlimited, unrestricted, free
    use does not expire on any date.

    Hope that clarifies what MapR is doing.

    thanks & regards,
    Srivas.
    Srivas,

    I'm sorry, I thought I was being clear in that I was only addressing EMC and not MapR directly.
    I was responding to post about EMC selling a Greenplum appliance. I wanted to point out that EMC will resell MapR's release along with their own (EMC) support.

    The point I was trying to make was that with respect to derivatives of Hadoop, I believe that MapR has a more compelling story than either EMC or DataStax. IMHO replacing Java HDFS w either GreenPlum or Cassandra has a limited market. When a company is going to look at a M/R solution cost and performance are going to be at the top of the list. MapR isn't cheap but if you look at the features in M5, if they work, then you have a very compelling reason to look at their release. Some of the people I spoke to when I was in Santa Clara were in the beta program. They indicated that MapR did what they claimed.

    Things are definitely starting to look interesting.

    -Mike
    On Mon, Jul 18, 2011 at 11:33 AM, Michael Segel
    wrote:
    EMC has inked a deal with MapRTech to resell their release and support
    services for MapRTech.
    Does this mean that they are going to stop selling their own release on
    Greenplum? Maybe not in the near future, however,
    a Greenplum appliance may not get the customer transaction that their
    reselling of MapR will generate.

    It sounds like they are hedging their bets and are taking an 'IBM'
    approach.

    Subject: RE: Which release to use?
    Date: Mon, 18 Jul 2011 08:30:59 -0500
    From: Jeff.Schmitz@shell.com
    To: common-user@hadoop.apache.org

    Steve,

    I read your blog nice post - I believe EMC is selling the Greenplumb
    solution as an appliance -

    Cheers -

    Jeffery

    -----Original Message-----
    From: Steve Loughran
    Sent: Friday, July 15, 2011 4:07 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Which release to use?
    On 15/07/2011 18:06, Arun C Murthy wrote:
    Apache Hadoop is a volunteer driven, open-source project. The
    contributors to Apache Hadoop, both individuals and folks across a
    diverse set of organizations, are committed to driving the project
    forward and making timely releases - see discussion on hadoop-0.23 with
    a raft newer features such as HDFS Federation, NextGen MapReduce and
    plans for HA NameNode etc.
    As with most successful projects there are several options for
    commercial support to Hadoop or its derivatives.
    However, Apache Hadoop has thrived before there was any commercial
    support (I've personally been involved in over 20 releases of Apache
    Hadoop and deployed them while at Yahoo) and I'm sure it will in this
    new world order.
    We, the Apache Hadoop community, are committed to keeping Apache
    Hadoop 'free', providing support to our users and to move it forward at
    a rapid rate.
    Arun makes a good point which is that the Apache project depends on
    contributions from the community to thrive. That includes

    -bug reports
    -patches to fix problems
    -more tests
    -documentation improvements: more examples, more on getting started,
    troubleshooting, etc.

    If there's something lacking in the codebase, and you think you can fix
    it, please do so. Helping with the documentation is a good start, as it
    can be improved, and you aren't going to break anything.

    Once you get into changing the code, you'll end up working with the head

    of whichever branch you are targeting.

    The other area everyone can contribute on is testing. Yes, Y! and FB can

    test at scale, yes, other people can test large clusters too -but nobody

    has a network that looks like yours but you. And Hadoop does care about
    network configurations. Testing beta and release candidate releases in
    your infrastructure, helps verify that the final release will work on
    your site, and you don't end up getting all the phone calls about
    something not working
  • Rita at Jul 19, 2011 at 11:44 am
    Arun,

    I second Joeś comment.
    Thanks for giving us a heads up.
    I will wait patiently until 0.23 is considered stable.


    On Mon, Jul 18, 2011 at 11:19 PM, Joe Stein
    wrote:
    Arun,

    Thanks for the update.

    Again, I hate to have to play the part of captain obvious.

    Glad to hear the same contiguous mantra for this next release. I think
    sometimes the plebeians ( of which I am one ) need that affirmation.

    One love, Apache Hadoop!

    /*
    Joe Stein
    http://www.medialets.com
    Twitter: @allthingshadoop
    */
    On Jul 18, 2011, at 11:06 PM, Arun Murthy wrote:

    Joe,

    The dev community is currently gearing up for hadoop-0.23 off trunk.

    0.23 is a massive step forward with with HDFS Federation, NextGen
    MapReduce and possible others such as wire-compat and HA NameNode.

    In a couple of weeks I plan to create the 0.23 branch off trunk and we
    then spend all our energies stabilizing & pushing the release out.
    Please see my note to general@ for more details.

    Arun
    On Jul 18, 2011, at 7:01 PM, Joe Stein wrote:

    So, last I checked this list was about Apache Hadoop not about
    derivative works.
    The Cloudera team has always been diligent (you rock) about redirecting
    non apache CDH releases to their list for answers.
    I commend those supporting apache releases of Hadoop too, very cool!!!

    But yeah, even I have to ask what the latest release will be. Is there
    going to be a single Hadoop release or a continued branch that Horton
    maintains and will only support?
    There is something to be said for release from trunk that gets everyone
    on the same page towards our common goals. You can pin the "state the
    obvious" paper on my back but kinda feel it had to be said.
    One love, Apache Hadoop!

    /*
    Joe Stein
    http://www.medialets.com
    Twitter: @allthingshadoop
    */
    On Jul 18, 2011, at 9:51 PM, Michael Segel wrote:



    Date: Mon, 18 Jul 2011 18:19:38 -0700
    Subject: Re: Which release to use?
    From: mcsrivas@gmail.com
    To: common-user@hadoop.apache.org

    Mike,

    Just a minor inaccuracy in your email. Here's setting the record
    straight:
    1. MapR directly sells their distribution of Hadoop. Support is from
    MapR.
    2. EMC also sells the MapR distribution, for use on any hardware.
    Support is
    from EMC worldwide.
    3. EMC also sells a Hadoop appliance, which has the MapR distribution
    specially built for it. Support is from EMC.

    4. MapR also has a free, unlimited, unrestricted version called M3,
    which
    has the same 2-5x performance, management and stability improvements,
    and
    includes NFS. It is not crippleware, and the unlimited, unrestricted,
    free
    use does not expire on any date.

    Hope that clarifies what MapR is doing.

    thanks & regards,
    Srivas.
    Srivas,

    I'm sorry, I thought I was being clear in that I was only addressing
    EMC and not MapR directly.
    I was responding to post about EMC selling a Greenplum appliance. I
    wanted to point out that EMC will resell MapR's release along with their own
    (EMC) support.
    The point I was trying to make was that with respect to derivatives of
    Hadoop, I believe that MapR has a more compelling story than either EMC or
    DataStax. IMHO replacing Java HDFS w either GreenPlum or Cassandra has a
    limited market. When a company is going to look at a M/R solution cost and
    performance are going to be at the top of the list. MapR isn't cheap but if
    you look at the features in M5, if they work, then you have a very
    compelling reason to look at their release. Some of the people I spoke to
    when I was in Santa Clara were in the beta program. They indicated that MapR
    did what they claimed.
    Things are definitely starting to look interesting.

    -Mike
    On Mon, Jul 18, 2011 at 11:33 AM, Michael Segel
    wrote:
    EMC has inked a deal with MapRTech to resell their release and
    support
    services for MapRTech.
    Does this mean that they are going to stop selling their own release
    on
    Greenplum? Maybe not in the near future, however,
    a Greenplum appliance may not get the customer transaction that their
    reselling of MapR will generate.

    It sounds like they are hedging their bets and are taking an 'IBM'
    approach.

    Subject: RE: Which release to use?
    Date: Mon, 18 Jul 2011 08:30:59 -0500
    From: Jeff.Schmitz@shell.com
    To: common-user@hadoop.apache.org

    Steve,

    I read your blog nice post - I believe EMC is selling the Greenplumb
    solution as an appliance -

    Cheers -

    Jeffery

    -----Original Message-----
    From: Steve Loughran
    Sent: Friday, July 15, 2011 4:07 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Which release to use?
    On 15/07/2011 18:06, Arun C Murthy wrote:
    Apache Hadoop is a volunteer driven, open-source project. The
    contributors to Apache Hadoop, both individuals and folks across a
    diverse set of organizations, are committed to driving the project
    forward and making timely releases - see discussion on hadoop-0.23
    with
    a raft newer features such as HDFS Federation, NextGen MapReduce and
    plans for HA NameNode etc.
    As with most successful projects there are several options for
    commercial support to Hadoop or its derivatives.
    However, Apache Hadoop has thrived before there was any commercial
    support (I've personally been involved in over 20 releases of Apache
    Hadoop and deployed them while at Yahoo) and I'm sure it will in
    this
    new world order.
    We, the Apache Hadoop community, are committed to keeping Apache
    Hadoop 'free', providing support to our users and to move it forward
    at
    a rapid rate.
    Arun makes a good point which is that the Apache project depends on
    contributions from the community to thrive. That includes

    -bug reports
    -patches to fix problems
    -more tests
    -documentation improvements: more examples, more on getting started,
    troubleshooting, etc.

    If there's something lacking in the codebase, and you think you can
    fix
    it, please do so. Helping with the documentation is a good start, as
    it
    can be improved, and you aren't going to break anything.

    Once you get into changing the code, you'll end up working with the
    head
    of whichever branch you are targeting.

    The other area everyone can contribute on is testing. Yes, Y! and FB
    can
    test at scale, yes, other people can test large clusters too -but
    nobody
    has a network that looks like yours but you. And Hadoop does care
    about
    network configurations. Testing beta and release candidate releases
    in
    your infrastructure, helps verify that the final release will work
    on
    your site, and you don't end up getting all the phone calls about
    something not working


    --
    --- Get your facts first, then you can distort them as you please.--
  • Steve Loughran at Jul 19, 2011 at 11:51 am

    On 19/07/11 12:44, Rita wrote:
    Arun,

    I second Joeś comment.
    Thanks for giving us a heads up.
    I will wait patiently until 0.23 is considered stable.
    API-wise, 0.21 is better. I know that as I'm working with 0.20.203 right
    now, and it is a step backwards.

    Regarding future releases, the best way to get it stable is participate
    in release testing in your own infrastructure. Nothing else will find
    the problems unique to your setup of hardware, network and software
  • Vitalii Tymchyshyn at Jul 19, 2011 at 12:10 pm

    19.07.11 14:50, Steve Loughran написав(ла):
    On 19/07/11 12:44, Rita wrote:
    Arun,

    I second Joeś comment.
    Thanks for giving us a heads up.
    I will wait patiently until 0.23 is considered stable.
    API-wise, 0.21 is better. I know that as I'm working with 0.20.203
    right now, and it is a step backwards.

    Regarding future releases, the best way to get it stable is
    participate in release testing in your own infrastructure. Nothing
    else will find the problems unique to your setup of hardware, network
    and software
    My little hadoop adoption story (or why I won't test 0.23)
    I am among those who think that latest release is what is supported and
    so we got to 0.21 way.
    BTW: I've tried to find some release roadmap, but could not find
    anything up to date.
    We are using HDFR without Map/Reduce.
    As far as I can see now 0.21 nowhere near beta quality with non-working
    new features like backup node or append. Also there is no option for
    such unlucky people to back off to 0.20 (at least "hadoop downgrade"
    search do not give any good results).
    I did already fill 5 tickets in Jira, 3 of them with patches. On two
    there is no activity at all, on other three answer is the latest
    non-autogenerated message (and over 3 weeks old).
    I did send few messages to this list, one to hdfs-user. No answers.
    With this level of project activity, I can't afford to test a thing that
    have not got to 0.21 quality level yet. If I will have any problems, I
    can't afford to wait for months to be heard.
    I am more or less stable on my own patched 0.21 for now and will either
    move forward if I will see more project activity or move somewhere else
    if it will become "less stable".

    Best regards, Vitalii Tymchyshyn
  • Steve Loughran at Jul 15, 2011 at 9:01 pm

    On 15/07/2011 15:58, Michael Segel wrote:
    Unfortunately the picture is a bit more confusing.

    Yahoo! is now HortonWorks. Their stated goal is to not have their own derivative release but to sell commercial support for the official Apache release.
    So those selling commercial support are:
    *Cloudera
    *HortonWorks
    *MapRTech
    *EMC (reselling MapRTech, but had announced their own)
    *IBM (not sure what they are selling exactly... still seems like smoke and mirrors...)
    *DataStax
    + Amazon, indirectly, that do their own derivative work of some release
    of Hadoop (which version is it based on?)

    I've used 0.21, which was the first with the new APIs and, with MRUnit,
    has the best test framework. For my small-cluster uses, it worked well.
    (oh, and I didn't care about security)
  • Mark Kerzner at Jul 15, 2011 at 9:26 pm
    Steve,

    this is so well said, do you mind if I repeat it here,
    http://shmsoft.blogspot.com/2011/07/hadoop-commercial-support-options.html

    Thank you,
    Mark
    On Fri, Jul 15, 2011 at 4:00 PM, Steve Loughran wrote:
    On 15/07/2011 15:58, Michael Segel wrote:


    Unfortunately the picture is a bit more confusing.

    Yahoo! is now HortonWorks. Their stated goal is to not have their own
    derivative release but to sell commercial support for the official Apache
    release.
    So those selling commercial support are:
    *Cloudera
    *HortonWorks
    *MapRTech
    *EMC (reselling MapRTech, but had announced their own)
    *IBM (not sure what they are selling exactly... still seems like smoke and
    mirrors...)
    *DataStax
    + Amazon, indirectly, that do their own derivative work of some release of
    Hadoop (which version is it based on?)

    I've used 0.21, which was the first with the new APIs and, with MRUnit, has
    the best test framework. For my small-cluster uses, it worked well. (oh, and
    I didn't care about security)

  • Michael Segel at Jul 15, 2011 at 10:01 pm
    See, I knew there was something that I forgot.

    It all goes back to the question ... 'which release to use'...

    2 years ago it was a very simple decision. Now, not so much. :-)

    And while Arun and Ownen work for a vendor, I do not and I try to follow each company and their offering.

    As Hadoop goes mainstream, the question of which vendor to choose gets interesting.
    Just like in the 90's during the database vendor wars, it looks like the vendor who has the best sales force and PR will win.
    (Not necessarily the best product.)

    JMHO

    -Mike

    Date: Fri, 15 Jul 2011 16:25:55 -0500
    Subject: Re: Which release to use?
    From: markkerzner@gmail.com
    To: common-user@hadoop.apache.org

    Steve,

    this is so well said, do you mind if I repeat it here,
    http://shmsoft.blogspot.com/2011/07/hadoop-commercial-support-options.html

    Thank you,
    Mark
    On Fri, Jul 15, 2011 at 4:00 PM, Steve Loughran wrote:
    On 15/07/2011 15:58, Michael Segel wrote:


    Unfortunately the picture is a bit more confusing.

    Yahoo! is now HortonWorks. Their stated goal is to not have their own
    derivative release but to sell commercial support for the official Apache
    release.
    So those selling commercial support are:
    *Cloudera
    *HortonWorks
    *MapRTech
    *EMC (reselling MapRTech, but had announced their own)
    *IBM (not sure what they are selling exactly... still seems like smoke and
    mirrors...)
    *DataStax
    + Amazon, indirectly, that do their own derivative work of some release of
    Hadoop (which version is it based on?)

    I've used 0.21, which was the first with the new APIs and, with MRUnit, has
    the best test framework. For my small-cluster uses, it worked well. (oh, and
    I didn't care about security)

  • Tom Deutsch at Jul 17, 2011 at 8:07 pm
    There are two release levels - one is free but most of our customers want our additional engineering so they use Enterprise Edition (which is not free).

    Happy to answer questions off list.

    ---------------------------------------
    Sent from my Blackberry so please excuse typing and spelling errors.


    ----- Original Message -----
    From: Steve Loughran [stevel@apache.org]
    Sent: 07/17/2011 08:34 PM CET
    To: common-user@hadoop.apache.org
    Subject: Re: Which release to use?


    On 16/07/2011 16:53, Rita wrote:
    I am curious about the IBM product BigInishgts. Where can we download it? It
    seems we have to register to download it?
    I think you have to pay to use it
  • Michael Segel at Jul 18, 2011 at 1:53 am
    Well I'm sort of curious as to what is in the 'free' version which differentiates from the Apache release?

    Earlier you wrote that IBM was faithful to the Apache release, plus a few 'extras'. (I think I can find your exact quote and I'm sorry I'm paraphrasing your statements.)

    This begs two questions...

    1) What is IBM providing to your 'customers' to justify the uplift or premium for IBM's brand name.

    2) If your release includes components which are not part of the Apache release, is it Apache's Hadoop? or considered a derivative?

    The interesting thing about #2 is that I don't know if or what represents Hadoop. I mean if you take an earlier release of Hadoop like 20.2 where current is 20.203 and apply a subset of patches that are Apache committed, is this not Apache Hadoop or a derivative work since you are not 100% at the latest release. Note: This is a broader question than just what IBM is releasing but what is meant by saying Hadoop or derived from Hadoop. Clearly DataStax and MapR are derivatives. Cloudera? This goes back to the OP's question 'Which release to use?'...

    And I have to apologize if I seem a bit suspect on what IBM has to say. When IBM first entered with an announced Hadoop release it was only for 32bit JVM and only on IBM's JVM.
    The last I heard, IBM's upsell was a configuration tool, which if anyone has built more than one Cloud/Cluster, its pretty much worthless.

    So it would be interesting to see what IBM is really offering in this space.

    HTH

    -Mike

    Subject: Re: Which release to use?
    From: tdeutsch@us.ibm.com
    Date: Sun, 17 Jul 2011 14:07:20 -0600
    To: common-user@hadoop.apache.org

    There are two release levels - one is free but most of our customers want our additional engineering so they use Enterprise Edition (which is not free).

    Happy to answer questions off list.

    ---------------------------------------
    Sent from my Blackberry so please excuse typing and spelling errors.


    ----- Original Message -----
    From: Steve Loughran [stevel@apache.org]
    Sent: 07/17/2011 08:34 PM CET
    To: common-user@hadoop.apache.org
    Subject: Re: Which release to use?


    On 16/07/2011 16:53, Rita wrote:
    I am curious about the IBM product BigInishgts. Where can we download it? It
    seems we have to register to download it?
    I think you have to pay to use it

Related Discussions

People

Translate

site design / logo © 2021 Grokbase