Grokbase Groups Pig dev August 2010
FAQ
Twitter hosted this month's Pig contributor meeting.
Developers from Yahoo, Twitter, LinkedIn, RichRelevance, and Cloudera were
present.

1. Howl
First, Alan Gates demoed Howl, a project whose goal is to provide table
management service for all of hadoop. The vision is that ultimately you will
be able to read/write data using regular MR, or Pig, or Hive, and read it
using any of those three, with full support of a partition-aware metadata
store that will tell you what data is available, what its schema is, etc,
reusing a single table abstraction.

Currently, tables are created using (a restricted subset of) Hive ddl
statements; a howl cli for this will be created, which will enforce the
restricted subset.
Writing to the table using Pig or MapReduce is supported. Reading can
already be done using all three.

At the moment, a single Pig store statement can only store into a single
partition; adding ability to "spray" across partitions is on the roadmap.
This, and a good api for interacting with the metastore, are the two areas
that were identified as good opportunities for the wider developer community
to get involved with the project. The source code is on GitHub, and is at
the moment synchronized with the development trunk manually; Yahoo folks
will look into changing this.

Security is a concern, and Yahoo will be working on it. Making it possible
for Hive to write to the tables is at the moment not as high a priority as
the others listed, it would basically involve just writing a Hive SerDe (an
equivalent of Pig's StoreFunc).

2. Azkaban presentation
Russel Jurney and Richard Park from LinkedIn presented the workflow
management tool open-sourced by LinkedIn, called Azkaban. It allows you to
declare job dependencies, has a web interface for launching and monitoring
jobs, etc. It has a special exec mode for Pig that lets you set some
Pig-specific options on a per-job basis. It does not currently have
triggering or job-instance parameter substitution (it does have job-level
parameter substitution). When asked what would Pig could do to make life
easier for Azkaban, the two things Richard identified were registering jars
through the grunt command line and a way to monitor the running job -- both
of these are already in trunk, so we're in pretty good shaped for 0.8

3. Piggybank discussion
Kevin Weil led a discussion of the piggybank. There are a few problems with
it -- it's released on the Pig schedule, and has quite a few barriers to
submission that are, anecdotally at least, preventing people from
contributing. Several options were discussed, with the group finally
settling on starting a community-curated GitHub project for piggybank. It
will have a number of committers from different companies, and will aim to
make it easy for folks to contribute (all contribs will still have to have
tests, and be Apache 2.0-licensed). More details will be forthcoming as we
figure them out. Initially this project will be seeded with the current
Piggybank functions some time after 0.8 is branched. The initial list of
committers Kevin Weil (Twitter), Dmitriy Ryaboy (Twitter), Carl Steinbach
(Cloudera), and Russel Jurney (LinkedIn). Yahoo will also nominate someone.
Please send us any thoughts you might have on this subject. It was suggested
that a lot of common code might be shared with Hive UDFs, which have the
same problems as Piggybank does, and that perhaps the project can be another
collaboration point between the projects. Not clear how that would work,
Carl will talk to other Hive people.

Pig 0.9
So far the items on the list for 0.9 are: better type propagation /
resolution story and documentation, perhaps different parser (ANTLR?), some
performance tweaks, and map types with fixed-type values. Much still to be
decided.

The next contributor meeting will be hosted by LinkedIn in October.

-Dmitriy

Search Discussions

  • Jeff Zhang at Aug 26, 2010 at 7:56 am
    Wonderful, Dmitriy, It's pity for me missing the contributor meeting.
    And any ppt shared ?


    On Wed, Aug 25, 2010 at 8:32 PM, Dmitriy Ryaboy wrote:
    Twitter hosted this month's Pig contributor meeting.
    Developers from Yahoo, Twitter, LinkedIn, RichRelevance, and Cloudera were
    present.

    1. Howl
    First, Alan Gates demoed Howl, a project whose goal is to provide table
    management service for all of hadoop. The vision is that ultimately you will
    be able to read/write data using regular MR, or Pig, or Hive, and read it
    using any of those three, with full support of a partition-aware metadata
    store that will tell you what data is available, what its schema is, etc,
    reusing a single table abstraction.

    Currently, tables are created using (a restricted subset of) Hive ddl
    statements; a howl cli for this will be created, which will enforce the
    restricted subset.
    Writing to the table using Pig or MapReduce is supported. Reading can
    already be done using all three.

    At the moment, a single Pig store statement can only store into a single
    partition; adding ability to "spray" across partitions is on the roadmap.
    This, and a good api for interacting with the metastore, are the two areas
    that were identified as good opportunities for the wider developer community
    to get involved with the project. The source code is on GitHub, and is at
    the moment synchronized with the development trunk manually; Yahoo folks
    will look into changing this.

    Security is a concern, and Yahoo will be working on it. Making it possible
    for Hive to write to the tables is at the moment not as high a priority as
    the others listed, it would basically involve just writing a Hive SerDe (an
    equivalent of Pig's StoreFunc).

    2. Azkaban presentation
    Russel Jurney and Richard Park from LinkedIn presented the workflow
    management tool open-sourced by LinkedIn, called Azkaban. It allows you to
    declare job dependencies, has a web interface for launching and monitoring
    jobs, etc. It has a special exec mode for Pig that lets you set some
    Pig-specific options on a per-job basis. It does not currently have
    triggering or job-instance parameter substitution (it does have job-level
    parameter substitution).  When asked what would Pig could do to make life
    easier for Azkaban, the two things Richard identified were registering jars
    through the grunt command line and a way to monitor the running job -- both
    of these are already in trunk, so we're in pretty good shaped for 0.8

    3. Piggybank discussion
    Kevin Weil led a discussion of the piggybank. There are a few problems with
    it -- it's released on the Pig schedule, and has quite a few barriers to
    submission that are, anecdotally at least, preventing people from
    contributing. Several options were discussed, with the group finally
    settling on starting a community-curated GitHub project for piggybank. It
    will have a number of committers from different companies, and will aim to
    make it easy for folks to contribute (all contribs will still have to have
    tests, and be Apache 2.0-licensed). More details will be forthcoming as we
    figure them out. Initially this project will be seeded with the current
    Piggybank functions some time after 0.8 is branched. The initial list of
    committers Kevin Weil (Twitter), Dmitriy Ryaboy (Twitter), Carl Steinbach
    (Cloudera), and Russel Jurney (LinkedIn). Yahoo will also nominate someone.
    Please send us any thoughts you might have on this subject. It was suggested
    that a lot of common code might be shared with Hive UDFs, which have the
    same problems as Piggybank does, and that perhaps the project can be another
    collaboration point between the projects. Not clear how that would work,
    Carl will talk to other Hive people.

    Pig 0.9
    So far the items on the list for 0.9 are: better type propagation /
    resolution story and documentation,  perhaps different parser (ANTLR?), some
    performance tweaks, and map types with fixed-type values. Much still to be
    decided.

    The next contributor meeting will be hosted by LinkedIn in October.

    -Dmitriy


    --
    Best Regards

    Jeff Zhang
  • Russell Jurney at Aug 26, 2010 at 5:18 pm
    Slides about Azkaban and Pig:
    http://www.slideshare.net/rjurney/azkaban-pig-5057793
    On Thu, Aug 26, 2010 at 12:55 AM, Jeff Zhang wrote:

    Wonderful, Dmitriy, It's pity for me missing the contributor meeting.
    And any ppt shared ?


    On Wed, Aug 25, 2010 at 8:32 PM, Dmitriy Ryaboy wrote:
    Twitter hosted this month's Pig contributor meeting.
    Developers from Yahoo, Twitter, LinkedIn, RichRelevance, and Cloudera were
    present.

    1. Howl
    First, Alan Gates demoed Howl, a project whose goal is to provide table
    management service for all of hadoop. The vision is that ultimately you will
    be able to read/write data using regular MR, or Pig, or Hive, and read it
    using any of those three, with full support of a partition-aware metadata
    store that will tell you what data is available, what its schema is, etc,
    reusing a single table abstraction.

    Currently, tables are created using (a restricted subset of) Hive ddl
    statements; a howl cli for this will be created, which will enforce the
    restricted subset.
    Writing to the table using Pig or MapReduce is supported. Reading can
    already be done using all three.

    At the moment, a single Pig store statement can only store into a single
    partition; adding ability to "spray" across partitions is on the roadmap.
    This, and a good api for interacting with the metastore, are the two areas
    that were identified as good opportunities for the wider developer community
    to get involved with the project. The source code is on GitHub, and is at
    the moment synchronized with the development trunk manually; Yahoo folks
    will look into changing this.

    Security is a concern, and Yahoo will be working on it. Making it possible
    for Hive to write to the tables is at the moment not as high a priority as
    the others listed, it would basically involve just writing a Hive SerDe (an
    equivalent of Pig's StoreFunc).

    2. Azkaban presentation
    Russel Jurney and Richard Park from LinkedIn presented the workflow
    management tool open-sourced by LinkedIn, called Azkaban. It allows you to
    declare job dependencies, has a web interface for launching and
    monitoring
    jobs, etc. It has a special exec mode for Pig that lets you set some
    Pig-specific options on a per-job basis. It does not currently have
    triggering or job-instance parameter substitution (it does have job-level
    parameter substitution). When asked what would Pig could do to make life
    easier for Azkaban, the two things Richard identified were registering jars
    through the grunt command line and a way to monitor the running job -- both
    of these are already in trunk, so we're in pretty good shaped for 0.8

    3. Piggybank discussion
    Kevin Weil led a discussion of the piggybank. There are a few problems with
    it -- it's released on the Pig schedule, and has quite a few barriers to
    submission that are, anecdotally at least, preventing people from
    contributing. Several options were discussed, with the group finally
    settling on starting a community-curated GitHub project for piggybank. It
    will have a number of committers from different companies, and will aim to
    make it easy for folks to contribute (all contribs will still have to have
    tests, and be Apache 2.0-licensed). More details will be forthcoming as we
    figure them out. Initially this project will be seeded with the current
    Piggybank functions some time after 0.8 is branched. The initial list of
    committers Kevin Weil (Twitter), Dmitriy Ryaboy (Twitter), Carl Steinbach
    (Cloudera), and Russel Jurney (LinkedIn). Yahoo will also nominate someone.
    Please send us any thoughts you might have on this subject. It was suggested
    that a lot of common code might be shared with Hive UDFs, which have the
    same problems as Piggybank does, and that perhaps the project can be another
    collaboration point between the projects. Not clear how that would work,
    Carl will talk to other Hive people.

    Pig 0.9
    So far the items on the list for 0.9 are: better type propagation /
    resolution story and documentation, perhaps different parser (ANTLR?), some
    performance tweaks, and map types with fixed-type values. Much still to be
    decided.

    The next contributor meeting will be hosted by LinkedIn in October.

    -Dmitriy


    --
    Best Regards

    Jeff Zhang
  • Alan Gates at Aug 26, 2010 at 6:36 pm

    On Aug 26, 2010, at 12:55 AM, Jeff Zhang wrote:

    Wonderful, Dmitriy, It's pity for me missing the contributor meeting.
    And any ppt shared ?
    Jeff,

    We don't want to exclude our contributors who don't happen to live in
    the San Francisco Bay Area. If we could include you via Skype or some
    other technology we'd be happy to set it up on our end. Do you think
    something like that would work for you?

    Alan.
  • Jeff Zhang at Aug 27, 2010 at 1:15 am
    Alan,

    That's great, next time I will try to join the contributor meeting.
    On Thu, Aug 26, 2010 at 11:35 AM, Alan Gates wrote:
    On Aug 26, 2010, at 12:55 AM, Jeff Zhang wrote:

    Wonderful, Dmitriy, It's pity for me missing the contributor meeting.
    And any ppt shared ?
    Jeff,

    We don't want to exclude our contributors who don't happen to live in the
    San Francisco Bay Area.  If we could include you via Skype or some other
    technology we'd be happy to set it up on our end.  Do you think something
    like that would work for you?

    Alan.


    --
    Best Regards

    Jeff Zhang
  • Jeff Zhang at Aug 27, 2010 at 1:17 am
    BTW, actually Dmitriy has invited me to join this meeting through
    skype, but it's pity that I have no time to join it this time.

    On Thu, Aug 26, 2010 at 6:15 PM, Jeff Zhang wrote:
    Alan,

    That's great, next time I will try to join the contributor meeting.
    On Thu, Aug 26, 2010 at 11:35 AM, Alan Gates wrote:
    On Aug 26, 2010, at 12:55 AM, Jeff Zhang wrote:

    Wonderful, Dmitriy, It's pity for me missing the contributor meeting.
    And any ppt shared ?
    Jeff,

    We don't want to exclude our contributors who don't happen to live in the
    San Francisco Bay Area.  If we could include you via Skype or some other
    technology we'd be happy to set it up on our end.  Do you think something
    like that would work for you?

    Alan.


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedAug 26, '10 at 3:33a
activeAug 27, '10 at 1:17a
posts6
users4
websitepig.apache.org

People

Translate

site design / logo © 2022 Grokbase