Grokbase Groups Hive dev October 2010
FAQ

---------- Forwarded message ----------
From: Pradeep Kamath <pradeepk@yahoo-inc.com>
Date: Tue, Oct 5, 2010 at 1:19 PM
Subject: [howldev] RE: Howl Authorization proposal
To: Pradeep Kamath <pradeepk@yahoo-inc.com>, "howldev@yahoogroups.com" <
howldev@yahoogroups.com>


Also, if this proposal looks reasonable, it would be nice if hive would
also adopt it – so comments from hive developers/committers on the
feasibility would be much appreciated!



Thanks,

Pradeep


------------------------------

*From:* Pradeep Kamath
*Sent:* Tuesday, October 05, 2010 1:14 PM
*To:* 'howldev@yahoogroups.com'
*Subject:* Howl Authorization proposal



Hi,

I have posted a proposal for implementing authorization in howl based on
hdfs file permission at
http://wiki.apache.org/pig/Howl/HowlAuthorizationProposal. Please provide
any comments/feedback on the proposal.



Thanks,

Pradeep

__._,_.___
Reply to sender<pradeepk@yahoo-inc.com?subject=RE:+Howl+Authorization+proposal>|
Reply
to group <howldev@yahoogroups.com?subject=RE:+Howl+Authorization+proposal> |
Reply via web post<http://groups.yahoo.com/group/howldev/post;_ylc=X3oDMTJvazN1Y210BF9TAzk3MzU5NzE0BGdycElkAzYzNDIwNTA4BGdycHNwSWQDMTcwNzI4MTk0MgRtc2dJZAMyMARzZWMDZnRyBHNsawNycGx5BHN0aW1lAzEyODYzMDk5OTM-?act=reply&messageNum=20>|
Start
a New Topic<http://groups.yahoo.com/group/howldev/post;_ylc=X3oDMTJmbnRwcWJwBF9TAzk3MzU5NzE0BGdycElkAzYzNDIwNTA4BGdycHNwSWQDMTcwNzI4MTk0MgRzZWMDZnRyBHNsawNudHBjBHN0aW1lAzEyODYzMDk5OTM->
Messages in this
topic<http://groups.yahoo.com/group/howldev/message/19;_ylc=X3oDMTMxbTc2dDNkBF9TAzk3MzU5NzE0BGdycElkAzYzNDIwNTA4BGdycHNwSWQDMTcwNzI4MTk0MgRtc2dJZAMyMARzZWMDZnRyBHNsawN2dHBjBHN0aW1lAzEyODYzMDk5OTMEdHBjSWQDMTk->(
2)
Recent Activity:

- New Members<http://groups.yahoo.com/group/howldev/members;_ylc=X3oDMTJndGFmNHA2BF9TAzk3MzU5NzE0BGdycElkAzYzNDIwNTA4BGdycHNwSWQDMTcwNzI4MTk0MgRzZWMDdnRsBHNsawN2bWJycwRzdGltZQMxMjg2MzA5OTkz?o=6>
1

Visit Your Group<http://groups.yahoo.com/group/howldev;_ylc=X3oDMTJmZTNwM2twBF9TAzk3MzU5NzE0BGdycElkAzYzNDIwNTA4BGdycHNwSWQDMTcwNzI4MTk0MgRzZWMDdnRsBHNsawN2Z2hwBHN0aW1lAzEyODYzMDk5OTM->
MARKETPLACE

Get great advice about dogs and cats. Visit the Dog & Cat Answers
Center.<http://us.ard.yahoo.com/SIG=15o5ig19h/M=493064.13814537.14041040.10835568/D=groups/S=1707281942:MKP1/Y=YAHOO/EXP=1286317193/L=e9ea1b78-d0bd-11df-b313-f3af699cbd84/B=QUvfAEoGYq0-/J=1286309993633645/K=s0PJz0oTPYSmiVR3B5L0pw/A=6078812/R=0/SIG=114ae4ln1/*http://dogandcatanswers.yahoo.com/>
------------------------------

Hobbies & Activities Zone: Find others who share your passions! Explore new
interests.<http://us.ard.yahoo.com/SIG=15o9g4ri3/M=493064.14012770.13963757.13298430/D=groups/S=1707281942:MKP1/Y=YAHOO/EXP=1286317193/L=e9ea1b78-d0bd-11df-b313-f3af699cbd84/B=QkvfAEoGYq0-/J=1286309993633645/K=s0PJz0oTPYSmiVR3B5L0pw/A=6015306/R=0/SIG=11vlkvigg/*http://advision.webevents.yahoo.com/hobbiesandactivitieszone/>
------------------------------

Stay on top of your group activity without leaving the page you're on - Get
the Yahoo! Toolbar
now.<http://us.ard.yahoo.com/SIG=15o0b484b/M=493064.13983314.14041046.13298430/D=groups/S=1707281942:MKP1/Y=YAHOO/EXP=1286317193/L=e9ea1b78-d0bd-11df-b313-f3af699cbd84/B=QEvfAEoGYq0-/J=1286309993633645/K=s0PJz0oTPYSmiVR3B5L0pw/A=6060255/R=0/SIG=1194m4keh/*http://us.toolbar.yahoo.com/?.cpdl=grpj>
[image: Yahoo!
Groups]<http://groups.yahoo.com/;_ylc=X3oDMTJldGVudXBmBF9TAzk3NDc2NTkwBGdycElkAzYzNDIwNTA4BGdycHNwSWQDMTcwNzI4MTk0MgRzZWMDZnRyBHNsawNnZnAEc3RpbWUDMTI4NjMwOTk5Mw-->
Switch to: Text-Only<howldev-traditional@yahoogroups.com?subject=Change+Delivery+Format:+Traditional>,
Daily Digest <howldev-digest@yahoogroups.com?subject=Email+Delivery:+Digest>•
Unsubscribe <howldev-unsubscribe@yahoogroups.com?subject=Unsubscribe> • Terms
of Use <http://docs.yahoo.com/info/terms/>
.

__,_._,___

Search Discussions

  • John Sichi at Oct 12, 2010 at 12:13 am
    Hi Pradeep,

    Namit and I took a look at the doc; thanks for the clear writeup.

    Coincidentally, we've been starting to think about some Hive authorization use cases within Facebook as well. However, the approach we're thinking about is more along the lines of traditional SQL ACL's (role-based GRANT/REVOKE with persistence in the metastore) rather than HDFS-based. HIVE-78 touches on this (plus a lot of unrelated stuff).

    So, one question is whether you would still need HDFS-based approach if a metastore-level ACL solution were available?

    And if the answer to that is no, then would you prefer to skip the HDFS-based work and just join forces on the ACL solution?

    If it turns out that you're going to need the HDFS-based approach, then I can see how both can coexist (either as alternatives, or as one overlayed on top of the other). The HDFS-based approach can be useful for controlling how HDFS permissions are managed in the case where users are allowed direct access to HDFS, or when multiple clients are used for access (which is one of the main reasons for Howl to exist).

    Regarding development of the HDFS-based approach, it would make sense to start off with enforcement via hooks. I think now that we have the semantic analyzer hooks, it should be possible to do it either all there or via a combination of that and execution hooks.

    The code for the hook implementations can start out in Howl, and then if there's consensus on adopting it within Hive, we can move it at that time.

    JVS

    On Oct 5, 2010, at 1:19 PM, Pradeep Kamath wrote:



    Also, if this proposal looks reasonable, it would be nice if hive would also adopt it – so comments from hive developers/committers on the feasibility would be much appreciated!

    Thanks,
    Pradeep

    ________________________________
    From: Pradeep Kamath
    Sent: Tuesday, October 05, 2010 1:14 PM
    To: ' howldev@yahoogroups.com '
    Subject: Howl Authorization proposal

    Hi,
    I have posted a proposal for implementing authorization in howl based on hdfs file permission at http://wiki.apache.org/pig/Howl/HowlAuthorizationProposal. Please provide any comments/feedback on the proposal.

    Thanks,
    Pradeep


    __._,_.___


    Your email settings: Individual Email|Traditional
    Change settings via the Web<http://groups.yahoo.com/group/howldev/join;_ylc=X3oDMTJnbXZnZ25hBF9TAzk3NDc2NTkwBGdycElkAzYzNDIwNTA4BGdycHNwSWQDMTcwNzI4MTk0MgRzZWMDZnRyBHNsawNzdG5ncwRzdGltZQMxMjg2MzA5OTkz> (Yahoo! ID required)
    Change settings via email: Switch delivery to Daily Digest | Switch to Fully Featured
    Visit Your Group <http://groups.yahoo.com/group/howldev;_ylc=X3oDMTJlZGNvbjQwBF9TAzk3NDc2NTkwBGdycElkAzYzNDIwNTA4BGdycHNwSWQDMTcwNzI4MTk0MgRzZWMDZnRyBHNsawNocGYEc3RpbWUDMTI4NjMwOTk5Mw--> | Yahoo! Groups Terms of Use <http://docs.yahoo.com/info/terms/> | Unsubscribe

    __,_._,___
  • Alan Gates at Oct 13, 2010 at 4:22 pm
    John,

    It's not clear to us whether, if a traditional ACL model was
    available, we would still need the HDFS model. I suspect so, but I'm
    not sure.

    We had a few concerns with the full ACL model that caused us to avoid
    it at least initially. In this model Hive/Howl has to own all the
    files and set them to be 700. Otherwise someone else can go
    underneath and read them via HDFS. Maybe this is ok, but I wonder if
    it will make it harder to administer.

    Our biggest concern is that HDFS already has a permissions model, why
    create a whole new one? It is a lot of duplication. And that
    duplication will flow through to things like logging and auditing, all
    of which Hive/Howl will now need in addition to HDFS. To justify this
    we needed to understand what additional benefits a traditional ACL
    model would get us. We were not able to come up with compelling use
    cases where we had to have this traditional model.

    One clear issue with using HDFS is extending it to non-HDFS based
    tables (such as Hbase). So we should work on this being an interface
    that uses the underlying security (be it HDFS or Hbase or whatever).

    All that said, I see no problem with having two models for now, and
    seeing which turns out to better provide what users need and/or be
    easier to maintain.

    Alan.
    On Oct 11, 2010, at 5:12 PM, John Sichi wrote:


    Hi Pradeep,

    Namit and I took a look at the doc; thanks for the clear writeup.

    Coincidentally, we've been starting to think about some Hive
    authorization use cases within Facebook as well. However, the
    approach we're thinking about is more along the lines of traditional
    SQL ACL's (role-based GRANT/REVOKE with persistence in the
    metastore) rather than HDFS-based. HIVE-78 touches on this (plus a
    lot of unrelated stuff).

    So, one question is whether you would still need HDFS-based approach
    if a metastore-level ACL solution were available?

    And if the answer to that is no, then would you prefer to skip the
    HDFS-based work and just join forces on the ACL solution?

    If it turns out that you're going to need the HDFS-based approach,
    then I can see how both can coexist (either as alternatives, or as
    one overlayed on top of the other). The HDFS-based approach can be
    useful for controlling how HDFS permissions are managed in the case
    where users are allowed direct access to HDFS, or when multiple
    clients are used for access (which is one of the main reasons for
    Howl to exist).

    Regarding development of the HDFS-based approach, it would make
    sense to start off with enforcement via hooks. I think now that we
    have the semantic analyzer hooks, it should be possible to do it
    either all there or via a combination of that and execution hooks.

    The code for the hook implementations can start out in Howl, and
    then if there's consensus on adopting it within Hive, we can move it
    at that time.

    JVS
    On Oct 5, 2010, at 1:19 PM, Pradeep Kamath wrote:



    Also, if this proposal looks reasonable, it would be nice if hive
    would also adopt it – so comments from hive developers/committers
    on the feasibility would be much appreciated!

    Thanks,
    Pradeep

    From: Pradeep Kamath
    Sent: Tuesday, October 05, 2010 1:14 PM
    To: 'howldev@yahoogroups.com'
    Subject: Howl Authorization proposal

    Hi,
    I have posted a proposal for implementing authorization in howl
    based on hdfs file permission at http://wiki.apache.org/pig/Howl/HowlAuthorizationProposal
    . Please provide any comments/feedback on the proposal.

    Thanks,
    Pradeep

    __._,_.___
    Reply to sender | Reply to group | Reply via web post | Start a New
    Topic
    Messages in this topic (3)
    RECENT ACTIVITY:
    • New Members 1
    Visit Your Group

    Switch to: Text-Only, Daily Digest • Unsubscribe • Terms of Use
    .

    __,_._,___
  • John Sichi at Oct 13, 2010 at 11:36 pm

    On Oct 13, 2010, at 9:22 AM, Alan Gates wrote:

    Our biggest concern is that HDFS already has a permissions model, why create a whole new one? It is a lot of duplication. And that duplication will flow through to things like logging and auditing, all of which Hive/Howl will now need in addition to HDFS. To justify this we needed to understand what additional benefits a traditional ACL model would get us. We were not able to come up with compelling use cases where we had to have this traditional model.
    Here are some you probably already considered, but I'm listing them for consideration anyway...

    * table A can only be queried by roles X and Y; table B can only be queried by roles Y and Z; managing different groups for all the possible role combinations isn't very practical given large numbers of tables and roles

    * finer-grained access control (e.g. column-level) may not be expressible in terms of HDFS permissions without doing things like creating dummy files (although in SQL, views can be used to avoid column-level permissions)

    * privileges beyond read/write (e.g. delete vs update vs append)

    * (Hive-specific): GRANT/REVOKE is the standard SQL approach and requires ACL's (it can't be implemented in terms of HDFS permissions)
    All that said, I see no problem with having two models for now, and seeing which turns out to better provide what users need and/or be easier to maintain.

    OK, let us know if the hooks turn out to be insufficient as the implementation mechanism.

    JVS
  • Pradeep Kamath at Oct 13, 2010 at 11:50 pm
    One related concern with not using hdfs permissions is that there can be conflicts between what the hive authorization realm would permit versus what hdfs would permit.

    For instance a user X (in the hive authorization realm) has create table privilege for database db1 but the hdfs directory /user/hive/warehouse/db1 is actually not writable by user X - wouldn't this lead to a dfs permissions denied error though user X has the create privilege per hive? We can extend the same issue to other operations like drop table etc.

    Keep the two worlds in sync so that what is allowed/disallowed in one is the same in the other might be difficult - thoughts?

    -----Original Message-----
    From: John Sichi
    Sent: Wednesday, October 13, 2010 4:36 PM
    To: <dev@hive.apache.org>
    Cc: howldev@yahoogroups.com; Pradeep Kamath; <hive-dev@hadoop.apache.org>
    Subject: Re: [howldev] RE: Howl Authorization proposal
    On Oct 13, 2010, at 9:22 AM, Alan Gates wrote:

    Our biggest concern is that HDFS already has a permissions model, why create a whole new one? It is a lot of duplication. And that duplication will flow through to things like logging and auditing, all of which Hive/Howl will now need in addition to HDFS. To justify this we needed to understand what additional benefits a traditional ACL model would get us. We were not able to come up with compelling use cases where we had to have this traditional model.
    Here are some you probably already considered, but I'm listing them for consideration anyway...

    * table A can only be queried by roles X and Y; table B can only be queried by roles Y and Z; managing different groups for all the possible role combinations isn't very practical given large numbers of tables and roles

    * finer-grained access control (e.g. column-level) may not be expressible in terms of HDFS permissions without doing things like creating dummy files (although in SQL, views can be used to avoid column-level permissions)

    * privileges beyond read/write (e.g. delete vs update vs append)

    * (Hive-specific): GRANT/REVOKE is the standard SQL approach and requires ACL's (it can't be implemented in terms of HDFS permissions)
    All that said, I see no problem with having two models for now, and seeing which turns out to better provide what users need and/or be easier to maintain.

    OK, let us know if the hooks turn out to be insufficient as the implementation mechanism.

    JVS
  • Ashish Thusoo at Oct 14, 2010 at 12:01 am
    I think if the access to the files/directories (and their manipulations etc..) are through a single point (Hive) then this does not become an issue. However, if you have a usecase where direct manipulation of the files happens through hdfs then you do have to have 2 levels of authorization and you have to pay the administrative cost of potentially having these be out of sync. At the most basic level you could check for appropriate hdfs permissions while creating the more traditional permissions. However, that would not protect you to changes happening to the dfs permissions after you have created the Hive permissions. I agree, a sync utility, though possible, is perhaps too much of an overkill.

    Ashish

    -----Original Message-----
    From: Pradeep Kamath
    Sent: Wednesday, October 13, 2010 4:50 PM
    To: John Sichi; <dev@hive.apache.org>
    Cc: howldev@yahoogroups.com; <hive-dev@hadoop.apache.org>
    Subject: RE: [howldev] RE: Howl Authorization proposal

    One related concern with not using hdfs permissions is that there can be conflicts between what the hive authorization realm would permit versus what hdfs would permit.

    For instance a user X (in the hive authorization realm) has create table privilege for database db1 but the hdfs directory /user/hive/warehouse/db1 is actually not writable by user X - wouldn't this lead to a dfs permissions denied error though user X has the create privilege per hive? We can extend the same issue to other operations like drop table etc.

    Keep the two worlds in sync so that what is allowed/disallowed in one is the same in the other might be difficult - thoughts?

    -----Original Message-----
    From: John Sichi
    Sent: Wednesday, October 13, 2010 4:36 PM
    To: <dev@hive.apache.org>
    Cc: howldev@yahoogroups.com; Pradeep Kamath; <hive-dev@hadoop.apache.org>
    Subject: Re: [howldev] RE: Howl Authorization proposal
    On Oct 13, 2010, at 9:22 AM, Alan Gates wrote:

    Our biggest concern is that HDFS already has a permissions model, why create a whole new one? It is a lot of duplication. And that duplication will flow through to things like logging and auditing, all of which Hive/Howl will now need in addition to HDFS. To justify this we needed to understand what additional benefits a traditional ACL model would get us. We were not able to come up with compelling use cases where we had to have this traditional model.
    Here are some you probably already considered, but I'm listing them for consideration anyway...

    * table A can only be queried by roles X and Y; table B can only be queried by roles Y and Z; managing different groups for all the possible role combinations isn't very practical given large numbers of tables and roles

    * finer-grained access control (e.g. column-level) may not be expressible in terms of HDFS permissions without doing things like creating dummy files (although in SQL, views can be used to avoid column-level permissions)

    * privileges beyond read/write (e.g. delete vs update vs append)

    * (Hive-specific): GRANT/REVOKE is the standard SQL approach and requires ACL's (it can't be implemented in terms of HDFS permissions)
    All that said, I see no problem with having two models for now, and seeing which turns out to better provide what users need and/or be easier to maintain.

    OK, let us know if the hooks turn out to be insufficient as the implementation mechanism.

    JVS
  • Namit Jain at Oct 14, 2010 at 12:38 am
    Moreover, there are metadata operations which do not require access to the dfs today
    (like alter table add columns...).

    This should also have some privilege model - depending on hdfs for that seems like a overkill.

    If the metastore stores all privilege information, and all access (hive/pig/howl) goes through that metastore,
    it might be easier to maintain




    ________________________________________
    From: Ashish Thusoo [athusoo@facebook.com]
    Sent: Wednesday, October 13, 2010 5:01 PM
    To: dev@hive.apache.org; John Sichi
    Cc: howldev@yahoogroups.com; <hive-dev@hadoop.apache.org>
    Subject: RE: [howldev] RE: Howl Authorization proposal

    I think if the access to the files/directories (and their manipulations etc..) are through a single point (Hive) then this does not become an issue. However, if you have a usecase where direct manipulation of the files happens through hdfs then you do have to have 2 levels of authorization and you have to pay the administrative cost of potentially having these be out of sync. At the most basic level you could check for appropriate hdfs permissions while creating the more traditional permissions. However, that would not protect you to changes happening to the dfs permissions after you have created the Hive permissions. I agree, a sync utility, though possible, is perhaps too much of an overkill.

    Ashish

    -----Original Message-----
    From: Pradeep Kamath
    Sent: Wednesday, October 13, 2010 4:50 PM
    To: John Sichi; <dev@hive.apache.org>
    Cc: howldev@yahoogroups.com; <hive-dev@hadoop.apache.org>
    Subject: RE: [howldev] RE: Howl Authorization proposal

    One related concern with not using hdfs permissions is that there can be conflicts between what the hive authorization realm would permit versus what hdfs would permit.

    For instance a user X (in the hive authorization realm) has create table privilege for database db1 but the hdfs directory /user/hive/warehouse/db1 is actually not writable by user X - wouldn't this lead to a dfs permissions denied error though user X has the create privilege per hive? We can extend the same issue to other operations like drop table etc.

    Keep the two worlds in sync so that what is allowed/disallowed in one is the same in the other might be difficult - thoughts?

    -----Original Message-----
    From: John Sichi
    Sent: Wednesday, October 13, 2010 4:36 PM
    To: <dev@hive.apache.org>
    Cc: howldev@yahoogroups.com; Pradeep Kamath; <hive-dev@hadoop.apache.org>
    Subject: Re: [howldev] RE: Howl Authorization proposal
    On Oct 13, 2010, at 9:22 AM, Alan Gates wrote:

    Our biggest concern is that HDFS already has a permissions model, why create a whole new one? It is a lot of duplication. And that duplication will flow through to things like logging and auditing, all of which Hive/Howl will now need in addition to HDFS. To justify this we needed to understand what additional benefits a traditional ACL model would get us. We were not able to come up with compelling use cases where we had to have this traditional model.
    Here are some you probably already considered, but I'm listing them for consideration anyway...

    * table A can only be queried by roles X and Y; table B can only be queried by roles Y and Z; managing different groups for all the possible role combinations isn't very practical given large numbers of tables and roles

    * finer-grained access control (e.g. column-level) may not be expressible in terms of HDFS permissions without doing things like creating dummy files (although in SQL, views can be used to avoid column-level permissions)

    * privileges beyond read/write (e.g. delete vs update vs append)

    * (Hive-specific): GRANT/REVOKE is the standard SQL approach and requires ACL's (it can't be implemented in terms of HDFS permissions)
    All that said, I see no problem with having two models for now, and seeing which turns out to better provide what users need and/or be easier to maintain.

    OK, let us know if the hooks turn out to be insufficient as the implementation mechanism.

    JVS
  • Yongqiang he at Oct 14, 2010 at 3:12 am
    Ideally HDFS should allow plugin external authorization. Then the
    privilege inconsistent problem will gone.
    On Wed, Oct 13, 2010 at 4:49 PM, Pradeep Kamath wrote:
    One related concern with not using hdfs permissions is that there can be conflicts between what the hive authorization realm would permit versus what hdfs would permit.

    For instance a user X (in the hive authorization realm) has create table privilege for database db1 but the hdfs directory /user/hive/warehouse/db1 is actually not writable by user X - wouldn't this lead to a dfs permissions denied error though user X has the create privilege per hive? We can extend the same issue to other operations like drop table etc.

    Keep the two worlds in sync so that what is allowed/disallowed in one is the same in the other might be difficult - thoughts?

    -----Original Message-----
    From: John Sichi
    Sent: Wednesday, October 13, 2010 4:36 PM
    To: <dev@hive.apache.org>
    Cc: howldev@yahoogroups.com; Pradeep Kamath; <hive-dev@hadoop.apache.org>
    Subject: Re: [howldev] RE: Howl Authorization proposal
    On Oct 13, 2010, at 9:22 AM, Alan Gates wrote:

    Our biggest concern is that HDFS already has a permissions model, why create a whole new one?  It is a lot of duplication.  And that duplication will flow through to things like logging and auditing, all of which Hive/Howl will now need in addition to HDFS.  To justify this we needed to understand what additional benefits a traditional ACL model would get us.  We were not able to come up with compelling use cases where we had to have this traditional model.
    Here are some you probably already considered, but I'm listing them for consideration anyway...

    * table A can only be queried by roles X and Y; table B can only be queried by roles Y and Z; managing different groups for all the possible role combinations isn't very practical given large numbers of tables and roles

    * finer-grained access control (e.g. column-level) may not be expressible in terms of HDFS permissions without doing things like creating dummy files (although in SQL, views can be used to avoid column-level permissions)

    * privileges beyond read/write (e.g. delete vs update vs append)

    * (Hive-specific):  GRANT/REVOKE is the standard SQL approach and requires ACL's (it can't be implemented in terms of HDFS permissions)
    All that said, I see no problem with having two models for now, and seeing which turns out to better provide what users need and/or be easier to maintain.

    OK, let us know if the hooks turn out to be insufficient as the implementation mechanism.

    JVS

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshive, hadoop
postedOct 5, '10 at 8:43p
activeOct 14, '10 at 3:12a
posts8
users7
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase