FAQ
Howl is a table management system built to provide metadata and
storage management across data processing tools in Hadoop (Pig, Hive,
MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl
. For the last six months the code has been hosted at github. The
Howl team would like to move the project into the Apache Incubator.
You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal
.

In order to be accepted as an Incubator project Howl needs a
Sponsoring project. I propose that we, the Pig project, sponsor
Howl. By sponsoring Howl we are saying that we believe it is a good
fit for the ASF and that we will assist the Howl project to succeed.
You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor
.

Our bylaws don't explicitly cover such a vote, but I think lazy
majority should be reasonable. All votes are welcome, PMC member
votes will be binding.

Clearly I'm +1.

Alan.

Search Discussions

  • Jeff Hammerbacher at Feb 2, 2011 at 10:08 pm
    Awesome! Huge +1.
    On Wed, Feb 2, 2011 at 1:18 PM, Alan Gates wrote:

    Howl is a table management system built to provide metadata and storage
    management across data processing tools in Hadoop (Pig, Hive, MapReduce,
    ...). You can learn more details at http://wiki.apache.org/pig/Howl. For
    the last six months the code has been hosted at github. The Howl team would
    like to move the project into the Apache Incubator. You can see the
    proposal for the project at http://wiki.apache.org/incubator/HowlProposal.

    In order to be accepted as an Incubator project Howl needs a Sponsoring
    project. I propose that we, the Pig project, sponsor Howl. By sponsoring
    Howl we are saying that we believe it is a good fit for the ASF and that we
    will assist the Howl project to succeed. You can read full details of
    sponsoring a project at
    http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor
    .

    Our bylaws don't explicitly cover such a vote, but I think lazy majority
    should be reasonable. All votes are welcome, PMC member votes will be
    binding.

    Clearly I'm +1.

    Alan.
  • Edward Capriolo at Feb 2, 2011 at 11:12 pm

    On Wed, Feb 2, 2011 at 5:08 PM, Jeff Hammerbacher wrote:
    Awesome! Huge +1.
    On Wed, Feb 2, 2011 at 1:18 PM, Alan Gates wrote:

    Howl is a table management system built to provide metadata and storage
    management across data processing tools in Hadoop (Pig, Hive, MapReduce,
    ...).  You can learn more details at http://wiki.apache.org/pig/Howl.  For
    the last six months the code has been hosted at github.  The Howl team would
    like to move the project into the Apache Incubator.  You can see the
    proposal for the project at http://wiki.apache.org/incubator/HowlProposal.

    In order to be accepted as an Incubator project Howl needs a Sponsoring
    project.  I propose that we, the Pig project, sponsor Howl.  By sponsoring
    Howl we are saying that we believe it is a good fit for the ASF and that we
    will assist the Howl project to succeed.  You can read full details of
    sponsoring a project at
    http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor
    .

    Our bylaws don't explicitly cover such a vote, but I think lazy majority
    should be reasonable.  All votes are welcome, PMC member votes will be
    binding.

    Clearly I'm +1.

    Alan.
    I do think it is a great idea that hive/pig/ and map reduce share a
    meta store. However I am not sure I agree with the approach. IMHO Howl
    should be a hive sub project.

    "The initial release of Howl will allow interoperability of data
    between Pig, Map Reduce, and Hive"
    I believe the "The initial release of Howl should support hive"
    at this point hive should remove the /metastore code from inside hive
    and depend on howl.

    I say this because hive is very actively reworking the metastore right
    now for security, a new type of views, and indexes. I feel if the
    metastore branches from the hive as howl getting the two entities back
    together will be difficult. Having 99% of the same code base shared
    between hive and howl but not having compatibility between the two is
    my fear.
  • Alan Gates at Feb 3, 2011 at 5:17 am
    Edward,

    I understand your concern with having a copy of the metastore code in
    Howl. However, let's separate code from governance. The reason Howl
    has a copy of Hive's metastore is not because we're proposing it for
    the Incubator, it is because in the course of developing it over the
    last six months we've found that Howl development needs to move much
    faster than Hive development can. This is appropriate, since Hive is
    a mature product and has at least one large customer that runs code in
    production very soon after it is checked in. Thus the Hive community
    is rightly cautious about checking in changes to the metastore. Howl,
    on the other hand, is new and innovating quickly, so it likes to get
    things checked in quickly. Over the last six months every patch Howl
    has made to the Hive metastore code has made it back into Hive code.
    But it generally takes a few weeks or more to get in.

    Whether Howl is a Hive subproject or an Incubator project it faces the
    same dilemma. The only other alternative that was suggested was to
    have Howl extern the metastore code from Hive and keep its patches in
    its build and apply them at build time. But this is very fragile,
    since any changes in the Hive metastore code could invalidate all
    those patches. We know that this is not sustainable in the long run,
    which is why the proposal calls out the need to resolve this one way
    or another as the project matures.

    As far as reaching an end state where Hive and Howl are not
    compatible, we would view that as a failure for Howl. The goal for
    Howl is to be a metastore for Pig, MapReduce, and Hive, not just 2 out
    3. So we have a strong motivation to maintain that compatibility.

    In terms of governance, given that we have significant contributions
    coming from members of the Pig team, the Hive team, and the core
    Hadoop team it seemed that giving Howl its own space in the Incubator
    made more sense than adding it as a subproject of any one of those
    teams.

    Alan.
    On Feb 2, 2011, at 3:11 PM, Edward Capriolo wrote:

    On Wed, Feb 2, 2011 at 5:08 PM, Jeff Hammerbacher
    wrote:
    Awesome! Huge +1.

    On Wed, Feb 2, 2011 at 1:18 PM, Alan Gates <gates@yahoo-inc.com>
    wrote:
    Howl is a table management system built to provide metadata and
    storage
    management across data processing tools in Hadoop (Pig, Hive,
    MapReduce,
    ...). You can learn more details at http://wiki.apache.org/pig/
    Howl. For
    the last six months the code has been hosted at github. The Howl
    team would
    like to move the project into the Apache Incubator. You can see the
    proposal for the project at http://wiki.apache.org/incubator/HowlProposal
    .

    In order to be accepted as an Incubator project Howl needs a
    Sponsoring
    project. I propose that we, the Pig project, sponsor Howl. By
    sponsoring
    Howl we are saying that we believe it is a good fit for the ASF
    and that we
    will assist the Howl project to succeed. You can read full
    details of
    sponsoring a project at
    http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor
    .

    Our bylaws don't explicitly cover such a vote, but I think lazy
    majority
    should be reasonable. All votes are welcome, PMC member votes
    will be
    binding.

    Clearly I'm +1.

    Alan.
    I do think it is a great idea that hive/pig/ and map reduce share a
    meta store. However I am not sure I agree with the approach. IMHO Howl
    should be a hive sub project.

    "The initial release of Howl will allow interoperability of data
    between Pig, Map Reduce, and Hive"
    I believe the "The initial release of Howl should support hive"
    at this point hive should remove the /metastore code from inside hive
    and depend on howl.

    I say this because hive is very actively reworking the metastore right
    now for security, a new type of views, and indexes. I feel if the
    metastore branches from the hive as howl getting the two entities back
    together will be difficult. Having 99% of the same code base shared
    between hive and howl but not having compatibility between the two is
    my fear.
  • Santhosh Srinivasan at Feb 3, 2011 at 6:12 am
    +1 for Howl as an incubator project.

    -----Original Message-----
    From: Alan Gates
    Sent: Wednesday, February 02, 2011 9:17 PM
    To: user@pig.apache.org
    Cc: user@hive.apache.org
    Subject: Re: [VOTE] Sponsoring Howl as an Apache Incubator project

    Edward,

    I understand your concern with having a copy of the metastore code in Howl. However, let's separate code from governance. The reason Howl has a copy of Hive's metastore is not because we're proposing it for the Incubator, it is because in the course of developing it over the last six months we've found that Howl development needs to move much faster than Hive development can. This is appropriate, since Hive is a mature product and has at least one large customer that runs code in production very soon after it is checked in. Thus the Hive community is rightly cautious about checking in changes to the metastore. Howl, on the other hand, is new and innovating quickly, so it likes to get things checked in quickly. Over the last six months every patch Howl
    has made to the Hive metastore code has made it back into Hive code.
    But it generally takes a few weeks or more to get in.

    Whether Howl is a Hive subproject or an Incubator project it faces the same dilemma. The only other alternative that was suggested was to have Howl extern the metastore code from Hive and keep its patches in its build and apply them at build time. But this is very fragile, since any changes in the Hive metastore code could invalidate all those patches. We know that this is not sustainable in the long run, which is why the proposal calls out the need to resolve this one way or another as the project matures.

    As far as reaching an end state where Hive and Howl are not compatible, we would view that as a failure for Howl. The goal for Howl is to be a metastore for Pig, MapReduce, and Hive, not just 2 out 3. So we have a strong motivation to maintain that compatibility.

    In terms of governance, given that we have significant contributions coming from members of the Pig team, the Hive team, and the core Hadoop team it seemed that giving Howl its own space in the Incubator made more sense than adding it as a subproject of any one of those teams.

    Alan.
    On Feb 2, 2011, at 3:11 PM, Edward Capriolo wrote:

    On Wed, Feb 2, 2011 at 5:08 PM, Jeff Hammerbacher
    wrote:
    Awesome! Huge +1.

    On Wed, Feb 2, 2011 at 1:18 PM, Alan Gates <gates@yahoo-inc.com>
    wrote:
    Howl is a table management system built to provide metadata and
    storage management across data processing tools in Hadoop (Pig,
    Hive, MapReduce, ...). You can learn more details at
    http://wiki.apache.org/pig/ Howl. For the last six months the code
    has been hosted at github. The Howl team would like to move the
    project into the Apache Incubator. You can see the proposal for the
    project at http://wiki.apache.org/incubator/HowlProposal
    .

    In order to be accepted as an Incubator project Howl needs a
    Sponsoring project. I propose that we, the Pig project, sponsor
    Howl. By sponsoring Howl we are saying that we believe it is a good
    fit for the ASF and that we will assist the Howl project to succeed.
    You can read full details of sponsoring a project at
    http://incubator.apache.org/incubation/Roles_and_Responsibilities.ht
    ml#Sponsor
    .

    Our bylaws don't explicitly cover such a vote, but I think lazy
    majority should be reasonable. All votes are welcome, PMC member
    votes will be binding.

    Clearly I'm +1.

    Alan.
    I do think it is a great idea that hive/pig/ and map reduce share a
    meta store. However I am not sure I agree with the approach. IMHO Howl
    should be a hive sub project.

    "The initial release of Howl will allow interoperability of data
    between Pig, Map Reduce, and Hive"
    I believe the "The initial release of Howl should support hive"
    at this point hive should remove the /metastore code from inside hive
    and depend on howl.

    I say this because hive is very actively reworking the metastore right
    now for security, a new type of views, and indexes. I feel if the
    metastore branches from the hive as howl getting the two entities back
    together will be difficult. Having 99% of the same code base shared
    between hive and howl but not having compatibility between the two is
    my fear.
  • Edward Capriolo at Feb 3, 2011 at 4:10 pm

    On Thu, Feb 3, 2011 at 12:16 AM, Alan Gates wrote:
    Edward,

    I understand your concern with having a copy of the metastore code in Howl.
    However, let's separate code from governance.  The reason Howl has a copy
    of Hive's metastore is not because we're proposing it for the Incubator, it
    is because in the course of developing it over the last six months we've
    found that Howl development needs to move much faster than Hive development
    can.  This is appropriate, since Hive is a mature product and has at least
    one large customer that runs code in production very soon after it is
    checked in.  Thus the Hive community is rightly cautious about checking in
    changes to the metastore.  Howl, on the other hand, is new and innovating
    quickly, so it likes to get things checked in quickly.  Over the last six
    months every patch Howl has made to the Hive metastore code has made it back
    into Hive code.  But it generally takes a few weeks or more to get in.

    Whether Howl is a Hive subproject or an Incubator project it faces the same
    dilemma. The only other alternative that was suggested was to have Howl
    extern the metastore code from Hive and keep its patches in its build and
    apply them at build time.  But this is very fragile, since any changes in
    the Hive metastore code could invalidate all those patches.  We know that
    this is not sustainable in the long run, which is why the proposal calls out
    the need to resolve this one way or another as the project matures.

    As far as reaching an end state where Hive and Howl are not compatible, we
    would view that as a failure for Howl.  The goal for Howl is to be a
    metastore for Pig, MapReduce, and Hive, not just 2 out 3.  So we have a
    strong motivation to maintain that compatibility.

    In terms of governance, given that we have significant contributions coming
    from members of the Pig team, the Hive team, and the core Hadoop team it
    seemed that giving Howl its own space in the Incubator made more sense than
    adding it as a subproject of any one of those teams.

    Alan.
    On Feb 2, 2011, at 3:11 PM, Edward Capriolo wrote:

    On Wed, Feb 2, 2011 at 5:08 PM, Jeff Hammerbacher <hammer@cloudera.com>
    wrote:
    Awesome! Huge +1.
    On Wed, Feb 2, 2011 at 1:18 PM, Alan Gates wrote:

    Howl is a table management system built to provide metadata and storage
    management across data processing tools in Hadoop (Pig, Hive, MapReduce,
    ...).  You can learn more details at http://wiki.apache.org/pig/Howl.
    For
    the last six months the code has been hosted at github.  The Howl team
    would
    like to move the project into the Apache Incubator.  You can see the
    proposal for the project at
    http://wiki.apache.org/incubator/HowlProposal.

    In order to be accepted as an Incubator project Howl needs a Sponsoring
    project.  I propose that we, the Pig project, sponsor Howl.  By
    sponsoring
    Howl we are saying that we believe it is a good fit for the ASF and that
    we
    will assist the Howl project to succeed.  You can read full details of
    sponsoring a project at

    http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor
    .

    Our bylaws don't explicitly cover such a vote, but I think lazy majority
    should be reasonable.  All votes are welcome, PMC member votes will be
    binding.

    Clearly I'm +1.

    Alan.
    I do think it is a great idea that hive/pig/ and map reduce share a
    meta store. However I am not sure I agree with the approach. IMHO Howl
    should be a hive sub project.

    "The initial release of Howl will allow interoperability of data
    between Pig, Map Reduce, and Hive"
    I believe the "The initial release of Howl should support hive"
    at this point hive should remove the /metastore code from inside hive
    and depend on howl.

    I say this because hive is very actively reworking the metastore right
    now for security, a new type of views, and indexes. I feel if the
    metastore branches from the hive as howl getting the two entities back
    together will be difficult. Having 99% of the same code base shared
    between hive and howl but not having compatibility between the two is
    my fear.
    Alan,

    I see your points. I agree with you and I am +1.

    (incubator/subproject is not important to me)

    You mentioned that hive is cautious about checking changes into the
    meta-store. I would not say we (hive) are cautious. Hive is getting
    pulled in many people in many directions (this is a good thing). But
    the number of people that can technically review patches might be
    burdened at times by the number of them.

    Ideally, I would think hive committers are going to be active (and
    probably would have commit) on howl or is it going to be the burden of
    howl track pig and hive until hive drops /metastore and begins using
    howl? I am just curious about what you think the time line looks like
    (IE how long howl will be in the incubator for) (rought guess of
    course)

    Thank you,
    Edward
  • Alan Gates at Feb 3, 2011 at 4:37 pm

    Alan,

    I see your points. I agree with you and I am +1.

    (incubator/subproject is not important to me)

    You mentioned that hive is cautious about checking changes into the
    meta-store. I would not say we (hive) are cautious. Hive is getting
    pulled in many people in many directions (this is a good thing). But
    the number of people that can technically review patches might be
    burdened at times by the number of them.

    Ideally, I would think hive committers are going to be active (and
    probably would have commit) on howl or is it going to be the burden of
    howl track pig and hive until hive drops /metastore and begins using
    howl? I am just curious about what you think the time line looks like
    (IE how long howl will be in the incubator for) (rought guess of
    course)
    I hope that Hive committers do become active in Howl, and we will be
    starting with Paul as a committer and John as a mentor. At least so
    far the Howl developers have taken up the burden of tracking the
    changes in Hive, since, as you mention, Hive committers are busy and
    Howl developers have had the motivation to get it done.

    As far as how long it will take, prognostication has never been my
    strength. But I would think it would take at least a year for Howl to
    mature to the point that Hive would be willing to trust it as its
    metastore or its development would slow to the point that it could
    pull the metastore code from Hive.

    Alan.
    Thank you,
    Edward
  • Julien Le Dem at Feb 2, 2011 at 11:50 pm
    +1


    On 2/2/11 2:08 PM, "Jeff Hammerbacher" wrote:

    Awesome! Huge +1.
    On Wed, Feb 2, 2011 at 1:18 PM, Alan Gates wrote:

    Howl is a table management system built to provide metadata and storage
    management across data processing tools in Hadoop (Pig, Hive, MapReduce,
    ...). You can learn more details at http://wiki.apache.org/pig/Howl. For
    the last six months the code has been hosted at github. The Howl team would
    like to move the project into the Apache Incubator. You can see the
    proposal for the project at http://wiki.apache.org/incubator/HowlProposal.

    In order to be accepted as an Incubator project Howl needs a Sponsoring
    project. I propose that we, the Pig project, sponsor Howl. By sponsoring
    Howl we are saying that we believe it is a good fit for the ASF and that we
    will assist the Howl project to succeed. You can read full details of
    sponsoring a project at
    http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor
    .

    Our bylaws don't explicitly cover such a vote, but I think lazy majority
    should be reasonable. All votes are welcome, PMC member votes will be
    binding.

    Clearly I'm +1.

    Alan.
  • Olga Natkovich at Feb 2, 2011 at 11:06 pm
    +1

    -----Original Message-----
    From: Alan Gates
    Sent: Wednesday, February 02, 2011 1:19 PM
    To: user@pig.apache.org
    Subject: [VOTE] Sponsoring Howl as an Apache Incubator project

    Howl is a table management system built to provide metadata and
    storage management across data processing tools in Hadoop (Pig, Hive,
    MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl
    . For the last six months the code has been hosted at github. The
    Howl team would like to move the project into the Apache Incubator.
    You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal
    .

    In order to be accepted as an Incubator project Howl needs a
    Sponsoring project. I propose that we, the Pig project, sponsor
    Howl. By sponsoring Howl we are saying that we believe it is a good
    fit for the ASF and that we will assist the Howl project to succeed.
    You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor
    .

    Our bylaws don't explicitly cover such a vote, but I think lazy
    majority should be reasonable. All votes are welcome, PMC member
    votes will be binding.

    Clearly I'm +1.

    Alan.
  • Daniel Dai at Feb 2, 2011 at 11:16 pm
    +1
    Olga Natkovich wrote:
    +1

    -----Original Message-----
    From: Alan Gates
    Sent: Wednesday, February 02, 2011 1:19 PM
    To: user@pig.apache.org
    Subject: [VOTE] Sponsoring Howl as an Apache Incubator project

    Howl is a table management system built to provide metadata and
    storage management across data processing tools in Hadoop (Pig, Hive,
    MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl
    . For the last six months the code has been hosted at github. The
    Howl team would like to move the project into the Apache Incubator.
    You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal
    .

    In order to be accepted as an Incubator project Howl needs a
    Sponsoring project. I propose that we, the Pig project, sponsor
    Howl. By sponsoring Howl we are saying that we believe it is a good
    fit for the ASF and that we will assist the Howl project to succeed.
    You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor
    .

    Our bylaws don't explicitly cover such a vote, but I think lazy
    majority should be reasonable. All votes are welcome, PMC member
    votes will be binding.

    Clearly I'm +1.

    Alan.
  • Benjamin Reed at Feb 2, 2011 at 11:27 pm
    +1
    On 02/02/2011 03:15 PM, Daniel Dai wrote:
    +1
    Olga Natkovich wrote:
    +1

    -----Original Message-----
    From: Alan Gates
    Sent: Wednesday, February 02, 2011 1:19 PM
    To: user@pig.apache.org
    Subject: [VOTE] Sponsoring Howl as an Apache Incubator project

    Howl is a table management system built to provide metadata and
    storage management across data processing tools in Hadoop (Pig, Hive,
    MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl
    . For the last six months the code has been hosted at github. The
    Howl team would like to move the project into the Apache Incubator.
    You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal
    .

    In order to be accepted as an Incubator project Howl needs a
    Sponsoring project. I propose that we, the Pig project, sponsor
    Howl. By sponsoring Howl we are saying that we believe it is a good
    fit for the ASF and that we will assist the Howl project to succeed.
    You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor
    .

    Our bylaws don't explicitly cover such a vote, but I think lazy
    majority should be reasonable. All votes are welcome, PMC member
    votes will be binding.

    Clearly I'm +1.

    Alan.
  • Thejas M Nair at Feb 2, 2011 at 11:32 pm
    +1
    -Thejas



    On 2/2/11 1:18 PM, "Alan Gates" wrote:

    Howl is a table management system built to provide metadata and
    storage management across data processing tools in Hadoop (Pig, Hive,
    MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl
    . For the last six months the code has been hosted at github. The
    Howl team would like to move the project into the Apache Incubator.
    You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal
    .

    In order to be accepted as an Incubator project Howl needs a
    Sponsoring project. I propose that we, the Pig project, sponsor
    Howl. By sponsoring Howl we are saying that we believe it is a good
    fit for the ASF and that we will assist the Howl project to succeed.
    You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor
    .

    Our bylaws don't explicitly cover such a vote, but I think lazy
    majority should be reasonable. All votes are welcome, PMC member
    votes will be binding.

    Clearly I'm +1.

    Alan.
  • Milind Bhandarkar at Feb 3, 2011 at 12:57 am
    I feel that Howl should start as a contrib to Hadoop, and move to be a subproject of Hadoop once there is sufficient adoption, rather than going the incubator way. My reasons are as follows:

    1. Howl is aimed at providing abstractions for facilitating interoperability between various systems built *on top of Hadoop*, and should not limit itself to Pig, Hive, and native MapReduce. So, any system that is hadoop compatible should be able to use Howl as a metadata store.

    2. Having Howl as contrib of Hadoop will ensure that the input and output formats, compression codecs, underlying storage APIs etc remain in sync from release to release, and users do not have to worry about whether version x of Howl is compatible with version y of Hadoop or not.

    3. Pig, Hive, Cascading, .. are all already dependent on Hadoop. Including Howl as Hadoop contrib means they do not add any more dependencies.

    4. The roadmap of Howl includes authentication and authorization support. It is a standard industry practice that metadata security mechanisms match those for data security. Thus, a significant code can be shared with hadoop's authorization and authentication.

    5. Hadoop-compatible file systems provide an abstraction over underlying storage systems. Howl currently provides a table abstraction over the file system. In future, when Hadoop provides blockpool abstraction (as part of federation), Howl will be able to take advantage of that and optimize.

    6. Howl roadmap currently does not contain multi-tenancy features such as quotas. Since there is a strong correlation between number of tables, number of partitions in Howl and number of directories and files in HDFS, it could be streamlined if Howl is part of Hadoop.

    Thoughts ?

    - milind

    On Feb 2, 2011, at 1:18 PM, Alan Gates wrote:

    Howl is a table management system built to provide metadata and storage management across data processing tools in Hadoop (Pig, Hive, MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl. For the last six months the code has been hosted at github. The Howl team would like to move the project into the Apache Incubator. You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal.

    In order to be accepted as an Incubator project Howl needs a Sponsoring project. I propose that we, the Pig project, sponsor Howl. By sponsoring Howl we are saying that we believe it is a good fit for the ASF and that we will assist the Howl project to succeed. You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor.

    Our bylaws don't explicitly cover such a vote, but I think lazy majority should be reasonable. All votes are welcome, PMC member votes will be binding.

    Clearly I'm +1.

    Alan.
    ---
    Milind Bhandarkar
    mbhandarkar@linkedin.com
  • Alan Gates at Feb 3, 2011 at 4:59 am
    I see a couple blockers that prevent this from being a contrib project
    of Hadoop:

    1) The Hadoop project is actively trying to remove the contrib
    projects it has, see http://tinyurl.com/6yl25jz. I doubt it's
    interested in any new ones.

    2) The Hadoop project is producing a release every 2 or 3 years
    currently. As a new project Howl will be wanting to release every 2
    or 3 months for a while. Being tied to something as slow moving as
    Hadoop for releases would make it hard for Howl get releases out the
    door.

    Alan.
    On Feb 2, 2011, at 4:57 PM, Milind Bhandarkar wrote:

    I feel that Howl should start as a contrib to Hadoop, and move to be
    a subproject of Hadoop once there is sufficient adoption, rather
    than going the incubator way. My reasons are as follows:

    1. Howl is aimed at providing abstractions for facilitating
    interoperability between various systems built *on top of Hadoop*,
    and should not limit itself to Pig, Hive, and native MapReduce. So,
    any system that is hadoop compatible should be able to use Howl as a
    metadata store.

    2. Having Howl as contrib of Hadoop will ensure that the input and
    output formats, compression codecs, underlying storage APIs etc
    remain in sync from release to release, and users do not have to
    worry about whether version x of Howl is compatible with version y
    of Hadoop or not.

    3. Pig, Hive, Cascading, .. are all already dependent on Hadoop.
    Including Howl as Hadoop contrib means they do not add any more
    dependencies.

    4. The roadmap of Howl includes authentication and authorization
    support. It is a standard industry practice that metadata security
    mechanisms match those for data security. Thus, a significant code
    can be shared with hadoop's authorization and authentication.

    5. Hadoop-compatible file systems provide an abstraction over
    underlying storage systems. Howl currently provides a table
    abstraction over the file system. In future, when Hadoop provides
    blockpool abstraction (as part of federation), Howl will be able to
    take advantage of that and optimize.

    6. Howl roadmap currently does not contain multi-tenancy features
    such as quotas. Since there is a strong correlation between number
    of tables, number of partitions in Howl and number of directories
    and files in HDFS, it could be streamlined if Howl is part of Hadoop.

    Thoughts ?

    - milind

    On Feb 2, 2011, at 1:18 PM, Alan Gates wrote:

    Howl is a table management system built to provide metadata and
    storage management across data processing tools in Hadoop (Pig,
    Hive, MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl
    . For the last six months the code has been hosted at github. The
    Howl team would like to move the project into the Apache
    Incubator. You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal
    .

    In order to be accepted as an Incubator project Howl needs a
    Sponsoring project. I propose that we, the Pig project, sponsor
    Howl. By sponsoring Howl we are saying that we believe it is a
    good fit for the ASF and that we will assist the Howl project to
    succeed. You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor
    .

    Our bylaws don't explicitly cover such a vote, but I think lazy
    majority should be reasonable. All votes are welcome, PMC member
    votes will be binding.

    Clearly I'm +1.

    Alan.
    ---
    Milind Bhandarkar
    mbhandarkar@linkedin.com

  • Milind Bhandarkar at Feb 3, 2011 at 9:23 pm
    Alan,

    1. Contribs being removed from hadoop is due to a. inactivity and b. test failures. Since Howl will be actively worked on, and will be well-tested as a production deployment, I am sure it will not be objected to.

    2. That was when Yahoo! was producing it's own distribution, thus not having dependencies on apache releases. With the recent announcements, that would change, no ?

    - milind
    On Feb 2, 2011, at 8:58 PM, Alan Gates wrote:

    I see a couple blockers that prevent this from being a contrib project of Hadoop:

    1) The Hadoop project is actively trying to remove the contrib projects it has, see http://tinyurl.com/6yl25jz. I doubt it's interested in any new ones.

    2) The Hadoop project is producing a release every 2 or 3 years currently. As a new project Howl will be wanting to release every 2 or 3 months for a while. Being tied to something as slow moving as Hadoop for releases would make it hard for Howl get releases out the door.

    Alan.
    On Feb 2, 2011, at 4:57 PM, Milind Bhandarkar wrote:

    I feel that Howl should start as a contrib to Hadoop, and move to be a subproject of Hadoop once there is sufficient adoption, rather than going the incubator way. My reasons are as follows:

    1. Howl is aimed at providing abstractions for facilitating interoperability between various systems built *on top of Hadoop*, and should not limit itself to Pig, Hive, and native MapReduce. So, any system that is hadoop compatible should be able to use Howl as a metadata store.

    2. Having Howl as contrib of Hadoop will ensure that the input and output formats, compression codecs, underlying storage APIs etc remain in sync from release to release, and users do not have to worry about whether version x of Howl is compatible with version y of Hadoop or not.

    3. Pig, Hive, Cascading, .. are all already dependent on Hadoop. Including Howl as Hadoop contrib means they do not add any more dependencies.

    4. The roadmap of Howl includes authentication and authorization support. It is a standard industry practice that metadata security mechanisms match those for data security. Thus, a significant code can be shared with hadoop's authorization and authentication.

    5. Hadoop-compatible file systems provide an abstraction over underlying storage systems. Howl currently provides a table abstraction over the file system. In future, when Hadoop provides blockpool abstraction (as part of federation), Howl will be able to take advantage of that and optimize.

    6. Howl roadmap currently does not contain multi-tenancy features such as quotas. Since there is a strong correlation between number of tables, number of partitions in Howl and number of directories and files in HDFS, it could be streamlined if Howl is part of Hadoop.

    Thoughts ?

    - milind

    On Feb 2, 2011, at 1:18 PM, Alan Gates wrote:

    Howl is a table management system built to provide metadata and storage management across data processing tools in Hadoop (Pig, Hive, MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl. For the last six months the code has been hosted at github. The Howl team would like to move the project into the Apache Incubator. You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal.

    In order to be accepted as an Incubator project Howl needs a Sponsoring project. I propose that we, the Pig project, sponsor Howl. By sponsoring Howl we are saying that we believe it is a good fit for the ASF and that we will assist the Howl project to succeed. You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor.

    Our bylaws don't explicitly cover such a vote, but I think lazy majority should be reasonable. All votes are welcome, PMC member votes will be binding.

    Clearly I'm +1.

    Alan.
    ---
    Milind Bhandarkar
    mbhandarkar@linkedin.com

    ---
    Milind Bhandarkar
    mbhandarkar@linkedin.com
  • Richard Ding at Feb 3, 2011 at 1:06 am
    +1


    On 2/2/11 1:18 PM, "Alan Gates" wrote:

    Howl is a table management system built to provide metadata and
    storage management across data processing tools in Hadoop (Pig, Hive,
    MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl
    . For the last six months the code has been hosted at github. The
    Howl team would like to move the project into the Apache Incubator.
    You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal
    .

    In order to be accepted as an Incubator project Howl needs a
    Sponsoring project. I propose that we, the Pig project, sponsor
    Howl. By sponsoring Howl we are saying that we believe it is a good
    fit for the ASF and that we will assist the Howl project to succeed.
    You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor
    .

    Our bylaws don't explicitly cover such a vote, but I think lazy
    majority should be reasonable. All votes are welcome, PMC member
    votes will be binding.

    Clearly I'm +1.

    Alan.
  • Ashutosh Chauhan at Feb 3, 2011 at 4:55 pm
    +1
    On Wed, Feb 2, 2011 at 13:18, Alan Gates wrote:
    Howl is a table management system built to provide metadata and storage
    management across data processing tools in Hadoop (Pig, Hive, MapReduce,
    ...).  You can learn more details at http://wiki.apache.org/pig/Howl.  For
    the last six months the code has been hosted at github.  The Howl team would
    like to move the project into the Apache Incubator.  You can see the
    proposal for the project at http://wiki.apache.org/incubator/HowlProposal.

    In order to be accepted as an Incubator project Howl needs a Sponsoring
    project.  I propose that we, the Pig project, sponsor Howl.  By sponsoring
    Howl we are saying that we believe it is a good fit for the ASF and that we
    will assist the Howl project to succeed.  You can read full details of
    sponsoring a project at
    http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor.

    Our bylaws don't explicitly cover such a vote, but I think lazy majority
    should be reasonable.  All votes are welcome, PMC member votes will be
    binding.

    Clearly I'm +1.

    Alan.
  • Jay Booth at Feb 3, 2011 at 4:55 pm
    Food for thought, what if the metastore were moved to Howl more
    aggressively? It seems like the end state everyone's aiming for is
    that Hive and Pig share Howl as a metastore layer, which makes all
    kinds of sense.. would it increase the chances of long term success
    if you guys just went for it and introduced the Hive->Howl dependency
    as soon as possible? It would probably create some short term
    disruption but it could be more healthy for Howl assuming that things
    were worked out, design choices could be validated faster, you have
    that end-to-end "it works" thing going, etc.
    On Thu, Feb 3, 2011 at 11:43 AM, Ashutosh Chauhan wrote:
    +1
    On Wed, Feb 2, 2011 at 13:18, Alan Gates wrote:
    Howl is a table management system built to provide metadata and storage
    management across data processing tools in Hadoop (Pig, Hive, MapReduce,
    ...).  You can learn more details at http://wiki.apache.org/pig/Howl.  For
    the last six months the code has been hosted at github.  The Howl team would
    like to move the project into the Apache Incubator.  You can see the
    proposal for the project at http://wiki.apache.org/incubator/HowlProposal.

    In order to be accepted as an Incubator project Howl needs a Sponsoring
    project.  I propose that we, the Pig project, sponsor Howl.  By sponsoring
    Howl we are saying that we believe it is a good fit for the ASF and that we
    will assist the Howl project to succeed.  You can read full details of
    sponsoring a project at
    http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor.

    Our bylaws don't explicitly cover such a vote, but I think lazy majority
    should be reasonable.  All votes are welcome, PMC member votes will be
    binding.

    Clearly I'm +1.

    Alan.
  • John Sichi at Feb 3, 2011 at 7:49 pm
    Besides the fact that the refactoring required is significant, I don't think this is possible to do quickly since:

    1) Hive (unlike Pig) requires a metastore

    2) Hive releases can't depend on an incubator project

    It's worth pointing out that Howl is already using Hive's CLI+DDL (not just metastore). That's a huge amount of code. In biological terms, Howl has the same DNA as Hive (plus some new Howl-specific genes on a separate plugin chromosome), but only a subset of the Hive genes are expressed when running Howl; the rest are just junk DNA from Howl's perspective.

    It's not clear yet that refactoring is worth the effort even in the end state. We can achieve the desired compatibility by keeping the current approach but removing the Hive code copy from Howl, instead creating a dependency from Howl to Hive. In this case, graduating to become a Hive subproject might be the correct exit from the incubator.

    If we do go ahead with pulling the metastore out of Hive, it might make most sense for Howl to become its own TLP rather than a subproject.

    In the incubator proposal, we have mentioned these issues, but we've attempted to avoid prejudicing any decision. Instead, we'd like to assess the pros and cons (including effort required and impact expected) for both approaches as part of the incubation process.

    I don't have any voting rights on Pig but obviously I'm +1 on the proposal for incubation.

    JVS
    On Feb 3, 2011, at 8:52 AM, Jay Booth wrote:

    Food for thought, what if the metastore were moved to Howl more
    aggressively? It seems like the end state everyone's aiming for is
    that Hive and Pig share Howl as a metastore layer, which makes all
    kinds of sense.. would it increase the chances of long term success
    if you guys just went for it and introduced the Hive->Howl dependency
    as soon as possible? It would probably create some short term
    disruption but it could be more healthy for Howl assuming that things
    were worked out, design choices could be validated faster, you have
    that end-to-end "it works" thing going, etc.
    On Thu, Feb 3, 2011 at 11:43 AM, Ashutosh Chauhan wrote:
    +1
    On Wed, Feb 2, 2011 at 13:18, Alan Gates wrote:
    Howl is a table management system built to provide metadata and storage
    management across data processing tools in Hadoop (Pig, Hive, MapReduce,
    ...). You can learn more details at http://wiki.apache.org/pig/Howl. For
    the last six months the code has been hosted at github. The Howl team would
    like to move the project into the Apache Incubator. You can see the
    proposal for the project at http://wiki.apache.org/incubator/HowlProposal.

    In order to be accepted as an Incubator project Howl needs a Sponsoring
    project. I propose that we, the Pig project, sponsor Howl. By sponsoring
    Howl we are saying that we believe it is a good fit for the ASF and that we
    will assist the Howl project to succeed. You can read full details of
    sponsoring a project at
    http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor.

    Our bylaws don't explicitly cover such a vote, but I think lazy majority
    should be reasonable. All votes are welcome, PMC member votes will be
    binding.

    Clearly I'm +1.

    Alan.
  • Jeff Hammerbacher at Feb 3, 2011 at 9:15 pm
    Hey,

    If we do go ahead with pulling the metastore out of Hive, it might make
    most sense for Howl to become its own TLP rather than a subproject.
    Yes, I did not read the proposal closely enough. I think an end state as a
    TLP makes more sense for Howl than as a Pig subproject. I'd really love to
    see Howl replace the metastore in Hive and it would be more natural to do so
    as a TLP than as a Pig subproject--especially since the current Howl
    repository is literally a fork of Hive.

    In the incubator proposal, we have mentioned these issues, but we've
    attempted to avoid prejudicing any decision. Instead, we'd like to assess
    the pros and cons (including effort required and impact expected) for both
    approaches as part of the incubation process.
    Glad the issues are being considered.

    Later,
    Jeff
  • Yongqiang he at Feb 3, 2011 at 10:18 pm
    I am interested in some numbers around the lines of code changes (or
    files of changes) which are in Howl but not in Hive?
    Can anyone give some information here?

    Thanks
    Yongqiang
    On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher wrote:
    Hey,
    If we do go ahead with pulling the metastore out of Hive, it might make
    most sense for Howl to become its own TLP rather than a subproject.
    Yes, I did not read the proposal closely enough. I think an end state as a
    TLP makes more sense for Howl than as a Pig subproject. I'd really love to
    see Howl replace the metastore in Hive and it would be more natural to do so
    as a TLP than as a Pig subproject--especially since the current Howl
    repository is literally a fork of Hive.
    In the incubator proposal, we have mentioned these issues, but we've
    attempted to avoid prejudicing any decision.  Instead, we'd like to assess
    the pros and cons (including effort required and impact expected) for both
    approaches as part of the incubation process.
    Glad the issues are being considered.
    Later,
    Jeff
  • Ashutosh Chauhan at Feb 3, 2011 at 10:18 pm
    There are none as of today. In the past, whenever we had to have
    changes, we do it in a separate branch in Howl and once those get
    committed to hive repo, we pull it over in our trunk and drop the
    branch.

    Ashutosh
    On Thu, Feb 3, 2011 at 13:41, yongqiang he wrote:
    I am interested in some numbers around the lines of code changes (or
    files of changes) which are in Howl but not in Hive?
    Can anyone give some information here?

    Thanks
    Yongqiang
    On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher wrote:
    Hey,
    If we do go ahead with pulling the metastore out of Hive, it might make
    most sense for Howl to become its own TLP rather than a subproject.
    Yes, I did not read the proposal closely enough. I think an end state as a
    TLP makes more sense for Howl than as a Pig subproject. I'd really love to
    see Howl replace the metastore in Hive and it would be more natural to do so
    as a TLP than as a Pig subproject--especially since the current Howl
    repository is literally a fork of Hive.
    In the incubator proposal, we have mentioned these issues, but we've
    attempted to avoid prejudicing any decision.  Instead, we'd like to assess
    the pros and cons (including effort required and impact expected) for both
    approaches as part of the incubation process.
    Glad the issues are being considered.
    Later,
    Jeff
  • John Sichi at Feb 3, 2011 at 10:53 pm
    But Howl does layer on some additional code, right?

    https://github.com/yahoo/howl/tree/howl/howl

    JVS
    On Feb 3, 2011, at 1:49 PM, Ashutosh Chauhan wrote:

    There are none as of today. In the past, whenever we had to have
    changes, we do it in a separate branch in Howl and once those get
    committed to hive repo, we pull it over in our trunk and drop the
    branch.

    Ashutosh
    On Thu, Feb 3, 2011 at 13:41, yongqiang he wrote:
    I am interested in some numbers around the lines of code changes (or
    files of changes) which are in Howl but not in Hive?
    Can anyone give some information here?

    Thanks
    Yongqiang
    On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher wrote:
    Hey,
    If we do go ahead with pulling the metastore out of Hive, it might make
    most sense for Howl to become its own TLP rather than a subproject.
    Yes, I did not read the proposal closely enough. I think an end state as a
    TLP makes more sense for Howl than as a Pig subproject. I'd really love to
    see Howl replace the metastore in Hive and it would be more natural to do so
    as a TLP than as a Pig subproject--especially since the current Howl
    repository is literally a fork of Hive.
    In the incubator proposal, we have mentioned these issues, but we've
    attempted to avoid prejudicing any decision. Instead, we'd like to assess
    the pros and cons (including effort required and impact expected) for both
    approaches as part of the incubation process.
    Glad the issues are being considered.
    Later,
    Jeff
  • Alan Gates at Feb 3, 2011 at 11:12 pm
    Yes, it adds Input and Output formats for MapReduce and load and store
    functions for Pig. In the future it we expect it will continue to add
    more additional layers.

    Alan.
    On Feb 3, 2011, at 2:49 PM, John Sichi wrote:

    But Howl does layer on some additional code, right?

    https://github.com/yahoo/howl/tree/howl/howl

    JVS
    On Feb 3, 2011, at 1:49 PM, Ashutosh Chauhan wrote:

    There are none as of today. In the past, whenever we had to have
    changes, we do it in a separate branch in Howl and once those get
    committed to hive repo, we pull it over in our trunk and drop the
    branch.

    Ashutosh
    On Thu, Feb 3, 2011 at 13:41, yongqiang he
    wrote:
    I am interested in some numbers around the lines of code changes (or
    files of changes) which are in Howl but not in Hive?
    Can anyone give some information here?

    Thanks
    Yongqiang
    On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher <hammer@cloudera.com
    wrote:
    Hey,
    If we do go ahead with pulling the metastore out of Hive, it
    might make
    most sense for Howl to become its own TLP rather than a
    subproject.
    Yes, I did not read the proposal closely enough. I think an end
    state as a
    TLP makes more sense for Howl than as a Pig subproject. I'd
    really love to
    see Howl replace the metastore in Hive and it would be more
    natural to do so
    as a TLP than as a Pig subproject--especially since the current
    Howl
    repository is literally a fork of Hive.
    In the incubator proposal, we have mentioned these issues, but
    we've
    attempted to avoid prejudicing any decision. Instead, we'd like
    to assess
    the pros and cons (including effort required and impact
    expected) for both
    approaches as part of the incubation process.
    Glad the issues are being considered.
    Later,
    Jeff
  • John Sichi at Feb 4, 2011 at 1:02 am
    I forgot about the serde dependencies...can you add those to the Initial Source note in [[HowlProposal]] just for completeness?

    JVS
    On Feb 3, 2011, at 3:11 PM, Alan Gates wrote:

    Yes, it adds Input and Output formats for MapReduce and load and store functions for Pig. In the future it we expect it will continue to add more additional layers.

    Alan.
    On Feb 3, 2011, at 2:49 PM, John Sichi wrote:

    But Howl does layer on some additional code, right?

    https://github.com/yahoo/howl/tree/howl/howl

    JVS
    On Feb 3, 2011, at 1:49 PM, Ashutosh Chauhan wrote:

    There are none as of today. In the past, whenever we had to have
    changes, we do it in a separate branch in Howl and once those get
    committed to hive repo, we pull it over in our trunk and drop the
    branch.

    Ashutosh
    On Thu, Feb 3, 2011 at 13:41, yongqiang he wrote:
    I am interested in some numbers around the lines of code changes (or
    files of changes) which are in Howl but not in Hive?
    Can anyone give some information here?

    Thanks
    Yongqiang
    On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher wrote:
    Hey,
    If we do go ahead with pulling the metastore out of Hive, it might make
    most sense for Howl to become its own TLP rather than a subproject.
    Yes, I did not read the proposal closely enough. I think an end state as a
    TLP makes more sense for Howl than as a Pig subproject. I'd really love to
    see Howl replace the metastore in Hive and it would be more natural to do so
    as a TLP than as a Pig subproject--especially since the current Howl
    repository is literally a fork of Hive.
    In the incubator proposal, we have mentioned these issues, but we've
    attempted to avoid prejudicing any decision. Instead, we'd like to assess
    the pros and cons (including effort required and impact expected) for both
    approaches as part of the incubation process.
    Glad the issues are being considered.
    Later,
    Jeff
  • Alan Gates at Feb 4, 2011 at 1:10 am
    Are you referring to the serde jar or any particular serde's we are
    making use of?

    Alan.
    On Feb 3, 2011, at 4:30 PM, John Sichi wrote:

    I forgot about the serde dependencies...can you add those to the
    Initial Source note in [[HowlProposal]] just for completeness?

    JVS
    On Feb 3, 2011, at 3:11 PM, Alan Gates wrote:

    Yes, it adds Input and Output formats for MapReduce and load and
    store functions for Pig. In the future it we expect it will
    continue to add more additional layers.

    Alan.
    On Feb 3, 2011, at 2:49 PM, John Sichi wrote:

    But Howl does layer on some additional code, right?

    https://github.com/yahoo/howl/tree/howl/howl

    JVS
    On Feb 3, 2011, at 1:49 PM, Ashutosh Chauhan wrote:

    There are none as of today. In the past, whenever we had to have
    changes, we do it in a separate branch in Howl and once those get
    committed to hive repo, we pull it over in our trunk and drop the
    branch.

    Ashutosh
    On Thu, Feb 3, 2011 at 13:41, yongqiang he <heyongqiangict@gmail.com
    wrote:
    I am interested in some numbers around the lines of code changes
    (or
    files of changes) which are in Howl but not in Hive?
    Can anyone give some information here?

    Thanks
    Yongqiang
    On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher <hammer@cloudera.com
    wrote:
    Hey,
    If we do go ahead with pulling the metastore out of Hive, it
    might make
    most sense for Howl to become its own TLP rather than a
    subproject.
    Yes, I did not read the proposal closely enough. I think an end
    state as a
    TLP makes more sense for Howl than as a Pig subproject. I'd
    really love to
    see Howl replace the metastore in Hive and it would be more
    natural to do so
    as a TLP than as a Pig subproject--especially since the current
    Howl
    repository is literally a fork of Hive.
    In the incubator proposal, we have mentioned these issues, but
    we've
    attempted to avoid prejudicing any decision. Instead, we'd
    like to assess
    the pros and cons (including effort required and impact
    expected) for both
    approaches as part of the incubation process.
    Glad the issues are being considered.
    Later,
    Jeff
  • John Sichi at Feb 4, 2011 at 3:17 pm

    On Feb 3, 2011, at 5:09 PM, Alan Gates wrote:

    Are you referring to the serde jar or any particular serde's we are making use of?

    Both (see below).

    JVS

    ----

    [jsichi@dev1066 ~/open/howl/howl/howl/src/java/org/apache/hadoop/hive/howl] ls
    cli/ common/ data/ mapreduce/ pig/ rcfile/
    [jsichi@dev1066 ~/open/howl/howl/howl/src/java/org/apache/hadoop/hive/howl] grep serde */*
    common/HowlUtil.java:import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
    common/HowlUtil.java:import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
    rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde.Constants;
    rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.ColumnProjectionUtils;
    rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.SerDe;
    rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.SerDeException;
    rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable;
    rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe;
    rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.columnar.ColumnarStruct;
    rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
    rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector;
    rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
    rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
    rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.StructField;
    rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
    rcfile/RCFileInputDriver.java: private SerDe serde;
    rcfile/RCFileInputDriver.java: struct = (ColumnarStruct)serde.deserialize(bytesRefArray);
    rcfile/RCFileInputDriver.java: serde = new ColumnarSerDe();
    rcfile/RCFileInputDriver.java: serde.initialize(context.getConfiguration(), howlProperties);
    rcfile/RCFileInputDriver.java: oi = (StructObjectInspector) serde.getObjectInspector();
    rcfile/RCFileMapReduceInputFormat.java:import org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable;
    rcfile/RCFileMapReduceOutputFormat.java:import org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable;
    rcfile/RCFileMapReduceRecordReader.java:import org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable;
    rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde.Constants;
    rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.SerDe;
    rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.SerDeException;
    rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe;
    rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
    rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector;
    rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
    rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
    rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
    rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
    rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo;
    rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.typeinfo.MapTypeInfo;
    rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo;
    rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo;
    rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
    rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
    rcfile/RCFileOutputDriver.java: /** The serde for serializing the HowlRecord to bytes writable */
    rcfile/RCFileOutputDriver.java: private SerDe serde;
    rcfile/RCFileOutputDriver.java: return serde.serialize(value.getAll(), objectInspector);
    rcfile/RCFileOutputDriver.java: serde = new ColumnarSerDe();
    rcfile/RCFileOutputDriver.java: serde.initialize(context.getConfiguration(), howlProperties);

    ----

    Howl, howl, howl, howl! O! you are men of stones:
    Had I your tongues and eyes, I'd use them so
    That heaven's vaults should crack
  • Ashutosh Chauhan at Feb 3, 2011 at 11:13 pm
    What I am referring to is metastore/ dir of hive, part of hive code
    which howl cares about most. Other howl code is for additional
    functionalities that Howl provides (none of which lives in metastore/
    dir) they are in howl/ dir. There are few build file changes, but they
    are trivial.

    Ashutosh
    On Thu, Feb 3, 2011 at 14:49, John Sichi wrote:
    But Howl does layer on some additional code, right?

    https://github.com/yahoo/howl/tree/howl/howl

    JVS
    On Feb 3, 2011, at 1:49 PM, Ashutosh Chauhan wrote:

    There are none as of today. In the past, whenever we had to have
    changes, we do it in a separate branch in Howl and once those get
    committed to hive repo, we pull it over in our trunk and drop the
    branch.

    Ashutosh
    On Thu, Feb 3, 2011 at 13:41, yongqiang he wrote:
    I am interested in some numbers around the lines of code changes (or
    files of changes) which are in Howl but not in Hive?
    Can anyone give some information here?

    Thanks
    Yongqiang
    On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher wrote:
    Hey,
    If we do go ahead with pulling the metastore out of Hive, it might make
    most sense for Howl to become its own TLP rather than a subproject.
    Yes, I did not read the proposal closely enough. I think an end state as a
    TLP makes more sense for Howl than as a Pig subproject. I'd really love to
    see Howl replace the metastore in Hive and it would be more natural to do so
    as a TLP than as a Pig subproject--especially since the current Howl
    repository is literally a fork of Hive.
    In the incubator proposal, we have mentioned these issues, but we've
    attempted to avoid prejudicing any decision.  Instead, we'd like to assess
    the pros and cons (including effort required and impact expected) for both
    approaches as part of the incubation process.
    Glad the issues are being considered.
    Later,
    Jeff
  • Alex Boisvert at Feb 3, 2011 at 11:33 pm

    On Thu, Feb 3, 2011 at 11:38 AM, John Sichi wrote:

    Besides the fact that the refactoring required is significant, I don't
    think this is possible to do quickly since:

    1) Hive (unlike Pig) requires a metastore

    2) Hive releases can't depend on an incubator project
    I'm not sure what you mean by "can't depend on an incubator project" here.
    AFAIK, there is no policy at Apache that projects should not depend on
    incubator projects. Can you clarify what you mean and why you think such a
    restriction exists?

    alex
  • John Sichi at Feb 4, 2011 at 1:02 am
    I was going off of what I read in HADOOP-3676 (which lacks a reference as well). But I guess if a release can be made from the incubator, then it's not a blocker.

    JVS
    On Feb 3, 2011, at 3:29 PM, Alex Boisvert wrote:

    On Thu, Feb 3, 2011 at 11:38 AM, John Sichi wrote:
    Besides the fact that the refactoring required is significant, I don't think this is possible to do quickly since:

    1) Hive (unlike Pig) requires a metastore

    2) Hive releases can't depend on an incubator project

    I'm not sure what you mean by "can't depend on an incubator project" here. AFAIK, there is no policy at Apache that projects should not depend on incubator projects. Can you clarify what you mean and why you think such a restriction exists?

    alex
  • Alex Boisvert at Feb 4, 2011 at 1:03 am
    Hi John,

    Just to clarify where I was going with my line of questioning. There's no
    Apache policy that prevents dependencies on incubator project, whether it's
    releases, snapshots or even home-made hacked-together packaging of an
    incubator project. It's been done before and as long as the incubator
    code's IP has been cleared and the packaging isn't represented as an
    official release if it isn't so, there's no wrong in doing that.

    Now, whether the project choses to use and release with an incubator
    dependency is a matter of judgment (and ultimately a vote by committers if
    there is no consensus). I just wanted to make sure there were no incorrect
    assumptions made.

    alex

    On Thu, Feb 3, 2011 at 4:07 PM, John Sichi wrote:

    I was going off of what I read in HADOOP-3676 (which lacks a reference as
    well). But I guess if a release can be made from the incubator, then it's
    not a blocker.

    JVS
    On Feb 3, 2011, at 3:29 PM, Alex Boisvert wrote:

    On Thu, Feb 3, 2011 at 11:38 AM, John Sichi wrote:
    Besides the fact that the refactoring required is significant, I don't
    think this is possible to do quickly since:
    1) Hive (unlike Pig) requires a metastore

    2) Hive releases can't depend on an incubator project

    I'm not sure what you mean by "can't depend on an incubator project"
    here. AFAIK, there is no policy at Apache that projects should not depend
    on incubator projects. Can you clarify what you mean and why you think such
    a restriction exists?
    alex
  • John Sichi at Feb 4, 2011 at 3:18 pm
    Got it, thanks for the correction.

    JVS
    On Feb 3, 2011, at 4:56 PM, Alex Boisvert wrote:

    Hi John,

    Just to clarify where I was going with my line of questioning. There's no Apache policy that prevents dependencies on incubator project, whether it's releases, snapshots or even home-made hacked-together packaging of an incubator project. It's been done before and as long as the incubator code's IP has been cleared and the packaging isn't represented as an official release if it isn't so, there's no wrong in doing that.

    Now, whether the project choses to use and release with an incubator dependency is a matter of judgment (and ultimately a vote by committers if there is no consensus). I just wanted to make sure there were no incorrect assumptions made.

    alex


    On Thu, Feb 3, 2011 at 4:07 PM, John Sichi wrote:
    I was going off of what I read in HADOOP-3676 (which lacks a reference as well). But I guess if a release can be made from the incubator, then it's not a blocker.

    JVS
    On Feb 3, 2011, at 3:29 PM, Alex Boisvert wrote:

    On Thu, Feb 3, 2011 at 11:38 AM, John Sichi wrote:
    Besides the fact that the refactoring required is significant, I don't think this is possible to do quickly since:

    1) Hive (unlike Pig) requires a metastore

    2) Hive releases can't depend on an incubator project

    I'm not sure what you mean by "can't depend on an incubator project" here. AFAIK, there is no policy at Apache that projects should not depend on incubator projects. Can you clarify what you mean and why you think such a restriction exists?

    alex
  • Alan Gates at Feb 8, 2011 at 6:11 pm
    With 8 +1 votes and no -1s, the vote passes.

    Alan.
    On Feb 2, 2011, at 1:18 PM, Alan Gates wrote:

    Howl is a table management system built to provide metadata and
    storage management across data processing tools in Hadoop (Pig, Hive,
    MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl
    . For the last six months the code has been hosted at github. The
    Howl team would like to move the project into the Apache Incubator.
    You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal
    .

    In order to be accepted as an Incubator project Howl needs a
    Sponsoring project. I propose that we, the Pig project, sponsor
    Howl. By sponsoring Howl we are saying that we believe it is a good
    fit for the ASF and that we will assist the Howl project to succeed.
    You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor
    .

    Our bylaws don't explicitly cover such a vote, but I think lazy
    majority should be reasonable. All votes are welcome, PMC member
    votes will be binding.

    Clearly I'm +1.

    Alan.

Related Discussions

People

Translate

site design / logo © 2021 Grokbase