Grokbase Groups Hive user August 2010
FAQ
August 8th, 2010

- Yongqiang He gave a presentation about his work on index support in
Hive.
-

Slides are available here:
http://files.meetup.com/1658206/Hive%20Index.pptx
- John Sichi talked about his work on filter-pushdown optimizations.
This is applicable to the HBase storage handler and the new index
infrastructure.
- Pradeep Kamath gave an update on progress with Howl.
-

The Howl source code is available on GitHub </hadoop/GitHub> here:
http://github.com/yahoo/howl
- Starting to work on security for Howl. For the first iteration the
plan is to base it on DFS permissions.
- General agreement that we should aim to desupport pre-0.20.0 versions
of Hadoop in Hive 0.7.0. This will allow us to remove the shim layer and
will make it easier to transition to the new mapreduce APIs. But we also
want to get a better idea of how many users are stuck on pre-0.20 versions
of Hadoop.
- Remove Thrift generated code from repository.
- Pro: reduce noise in diffs during reviews.
- Con: requires developers to install Thrift compiler.
- Discussed moving the documentation from the wiki to version control.
- Probably not practical to maintain the trunk version of the docs on
the wiki and roll over to version control at release time, so
trunk version
of docs will be maintained in vcs.
- It was agreed that feature patches should include updates to the
docs, but it is also acceptable to file a doc ticket if there is time
pressure to commit.j
- Will maintain an errata page on the wiki for collecting
updates/corrections from users. These notes will be rolled into the
documentation in vcs on a monthly basis.
- The next meeting will be held in September at Cloudera's office in Palo
Alto.

Search Discussions

  • John Sichi at Aug 30, 2010 at 7:05 pm
    As Carl mentioned below, there was agreement at the last Hive contributor meeting that we should drop support for pre-0.20 Hadoop versions in Hive trunk. This means that starting with the Hive 0.7 release, Hadoop 0.20 or later will be required. Anyone stuck on an earlier Hadoop version will need to remain on Hive 0.6 and backport any patches they need from trunk.

    There are two major benefits to this:

    * we can finally move from mapred to mapreduce API's across all of Hive

    * we'll enjoy a significant reduction in code maintenance and testing overhead (not to mention commit latency) for Hive contributors and committers

    Note that although we'll delete the pre-0.20 shim implementations, we will still keep the generic shim mechanism itself in place so that we can continue to support multiple Hadoop API versions as new ones are released in the future.

    For those who were not present at the contributor meeting, please speak up if you have an opinion on this.

    JVS

    On Aug 28, 2010, at 2:59 AM, Carl Steinbach wrote:


    August 8th, 2010

    * Yongqiang He gave a presentation about his work on index support in Hive.
    *
    Slides are available here: http://files.meetup.com/1658206/Hive%20Index.pptx
    * John Sichi talked about his work on filter-pushdown optimizations. This is applicable to the HBase storage handler and the new index infrastructure.
    * Pradeep Kamath gave an update on progress with Howl.
    *
    The Howl source code is available on GitHub<x-msg://225/hadoop/GitHub> here: http://github.com/yahoo/howl
    * Starting to work on security for Howl. For the first iteration the plan is to base it on DFS permissions.
    * General agreement that we should aim to desupport pre-0.20.0 versions of Hadoop in Hive 0.7.0. This will allow us to remove the shim layer and will make it easier to transition to the new mapreduce APIs. But we also want to get a better idea of how many users are stuck on pre-0.20 versions of Hadoop.
    * Remove Thrift generated code from repository.
    * Pro: reduce noise in diffs during reviews.
    * Con: requires developers to install Thrift compiler.
    * Discussed moving the documentation from the wiki to version control.
    * Probably not practical to maintain the trunk version of the docs on the wiki and roll over to version control at release time, so trunk version of docs will be maintained in vcs.
    * It was agreed that feature patches should include updates to the docs, but it is also acceptable to file a doc ticket if there is time pressure to commit.j
    * Will maintain an errata page on the wiki for collecting updates/corrections from users. These notes will be rolled into the documentation in vcs on a monthly basis.
    * The next meeting will be held in September at Cloudera's office in Palo Alto.
  • Ashutosh Chauhan at Aug 30, 2010 at 11:53 pm
    * we can finally move from mapred to mapreduce API's across all of Hive
    This will be useful for Howl since we are primarily working on mapreduce
    API. In Howl, we already have wrapped RCFile to make it work with mapreduce
    API which we might be able to contribute back.

    Ashutosh
    On Mon, Aug 30, 2010 at 12:03, John Sichi wrote:

    As Carl mentioned below, there was agreement at the last Hive contributor
    meeting that we should drop support for pre-0.20 Hadoop versions in Hive
    trunk. This means that starting with the Hive 0.7 release, Hadoop 0.20 or
    later will be required. Anyone stuck on an earlier Hadoop version will need
    to remain on Hive 0.6 and backport any patches they need from trunk.

    There are two major benefits to this:

    * we can finally move from mapred to mapreduce API's across all of Hive

    * we'll enjoy a significant reduction in code maintenance and testing
    overhead (not to mention commit latency) for Hive contributors and
    committers

    Note that although we'll delete the pre-0.20 shim implementations, we will
    still keep the generic shim mechanism itself in place so that we can
    continue to support multiple Hadoop API versions as new ones are released in
    the future.

    For those who were not present at the contributor meeting, please speak up
    if you have an opinion on this.

    JVS

    On Aug 28, 2010, at 2:59 AM, Carl Steinbach wrote:

    August 8th, 2010

    - Yongqiang He gave a presentation about his work on index support in
    Hive.
    - Slides are available here:
    http://files.meetup.com/1658206/Hive%20Index.pptx<http://files.meetup.com/1658206/Hive+Index.pptx>
    - John Sichi talked about his work on filter-pushdown optimizations.
    This is applicable to the HBase storage handler and the new index
    infrastructure.
    - Pradeep Kamath gave an update on progress with Howl.
    - The Howl source code is available on GitHub here:
    http://github.com/yahoo/howl
    - Starting to work on security for Howl. For the first iteration the
    plan is to base it on DFS permissions.
    - General agreement that we should aim to desupport pre-0.20.0 versions
    of Hadoop in Hive 0.7.0. This will allow us to remove the shim layer and
    will make it easier to transition to the new mapreduce APIs. But we also
    want to get a better idea of how many users are stuck on pre-0.20 versions
    of Hadoop.
    - Remove Thrift generated code from repository.
    - Pro: reduce noise in diffs during reviews.
    - Con: requires developers to install Thrift compiler.
    - Discussed moving the documentation from the wiki to version control.
    - Probably not practical to maintain the trunk version of the docs
    on the wiki and roll over to version control at release time, so trunk
    version of docs will be maintained in vcs.
    - It was agreed that feature patches should include updates to the
    docs, but it is also acceptable to file a doc ticket if there is time
    pressure to commit.j
    - Will maintain an errata page on the wiki for collecting
    updates/corrections from users. These notes will be rolled into the
    documentation in vcs on a monthly basis.
    - The next meeting will be held in September at Cloudera's office in
    Palo Alto.

  • S. Venkatesh at Aug 31, 2010 at 10:13 am
    I think this is a huge benefit for all of us. Looking forward to it.
    Any time line you have in mind?

    Thanks,
    Venkatesh
    On Tue, Aug 31, 2010 at 12:33 AM, John Sichi wrote:
    As Carl mentioned below, there was agreement at the last Hive contributor
    meeting that we should drop support for pre-0.20 Hadoop versions in Hive
    trunk.  This means that starting with the Hive 0.7 release, Hadoop 0.20 or
    later will be required.  Anyone stuck on an earlier Hadoop version will need
    to remain on Hive 0.6 and backport any patches they need from trunk.
    There are two major benefits to this:
    * we can finally move from mapred to mapreduce API's across all of Hive
    * we'll enjoy a significant reduction in code maintenance and testing
    overhead (not to mention commit latency) for Hive contributors and
    committers
    Note that although we'll delete the pre-0.20 shim implementations, we will
    still keep the generic shim mechanism itself in place so that we can
    continue to support multiple Hadoop API versions as new ones are released in
    the future.
    For those who were not present at the contributor meeting, please speak up
    if you have an opinion on this.
    JVS
    On Aug 28, 2010, at 2:59 AM, Carl Steinbach wrote:

    August 8th, 2010

    Yongqiang He gave a presentation about his work on index support in Hive.

    Slides are available here: http://files.meetup.com/1658206/Hive%20Index.pptx

    John Sichi talked about his work on filter-pushdown optimizations. This is
    applicable to the HBase storage handler and the new index infrastructure.
    Pradeep Kamath gave an update on progress with Howl.

    The Howl source code is available
    on GitHub here: http://github.com/yahoo/howl
    Starting to work on security for Howl. For the first iteration the plan is
    to base it on DFS permissions.

    General agreement that we should aim to desupport pre-0.20.0 versions of
    Hadoop in Hive 0.7.0. This will allow us to remove the shim layer and will
    make it easier to transition to the new mapreduce APIs. But we also want to
    get a better idea of how many users are stuck on pre-0.20 versions of
    Hadoop.
    Remove Thrift generated code from repository.

    Pro: reduce noise in diffs during reviews.
    Con: requires developers to install Thrift compiler.

    Discussed moving the documentation from the wiki to version control.

    Probably not practical to maintain the trunk version of the docs on the wiki
    and roll over to version control at release time, so trunk version of docs
    will be maintained in vcs.
    It was agreed that feature patches should include updates to the docs, but
    it is also acceptable to file a doc ticket if there is time pressure to
    commit.j
    Will maintain an errata page on the wiki for collecting updates/corrections
    from users. These notes will be rolled into the documentation in vcs on a
    monthly basis.

    The next meeting will be held in September at Cloudera's office in Palo
    Alto.


    --
    Regards,
    Venkatesh

    “Perfection (in design) is achieved not when there is nothing more to
    add, but rather when there is nothing more to take away.”
    - Antoine de Saint-Exupéry
  • John Sichi at Aug 31, 2010 at 8:07 pm

    On Aug 31, 2010, at 3:13 AM, S. Venkatesh wrote:

    I think this is a huge benefit for all of us. Looking forward to it.
    Any time line you have in mind?

    Thanks,
    Venkatesh

    One issue which was raised at the last Hive contributor meeting was that Hive's new indexing support relies on getting the file offset while reading rows, but getPos has gone away. So we're going to need to come up with a resolution for that.

    JVS

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedAug 28, '10 at 10:00a
activeAug 31, '10 at 8:07p
posts5
users4
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase