Grokbase Groups Hive dev October 2010

It's not clear to us whether, if a traditional ACL model was
available, we would still need the HDFS model. I suspect so, but I'm
not sure.

We had a few concerns with the full ACL model that caused us to avoid
it at least initially. In this model Hive/Howl has to own all the
files and set them to be 700. Otherwise someone else can go
underneath and read them via HDFS. Maybe this is ok, but I wonder if
it will make it harder to administer.

Our biggest concern is that HDFS already has a permissions model, why
create a whole new one? It is a lot of duplication. And that
duplication will flow through to things like logging and auditing, all
of which Hive/Howl will now need in addition to HDFS. To justify this
we needed to understand what additional benefits a traditional ACL
model would get us. We were not able to come up with compelling use
cases where we had to have this traditional model.

One clear issue with using HDFS is extending it to non-HDFS based
tables (such as Hbase). So we should work on this being an interface
that uses the underlying security (be it HDFS or Hbase or whatever).

All that said, I see no problem with having two models for now, and
seeing which turns out to better provide what users need and/or be
easier to maintain.

On Oct 11, 2010, at 5:12 PM, John Sichi wrote:

Hi Pradeep,

Namit and I took a look at the doc; thanks for the clear writeup.

Coincidentally, we've been starting to think about some Hive
authorization use cases within Facebook as well. However, the
approach we're thinking about is more along the lines of traditional
SQL ACL's (role-based GRANT/REVOKE with persistence in the
metastore) rather than HDFS-based. HIVE-78 touches on this (plus a
lot of unrelated stuff).

So, one question is whether you would still need HDFS-based approach
if a metastore-level ACL solution were available?

And if the answer to that is no, then would you prefer to skip the
HDFS-based work and just join forces on the ACL solution?

If it turns out that you're going to need the HDFS-based approach,
then I can see how both can coexist (either as alternatives, or as
one overlayed on top of the other). The HDFS-based approach can be
useful for controlling how HDFS permissions are managed in the case
where users are allowed direct access to HDFS, or when multiple
clients are used for access (which is one of the main reasons for
Howl to exist).

Regarding development of the HDFS-based approach, it would make
sense to start off with enforcement via hooks. I think now that we
have the semantic analyzer hooks, it should be possible to do it
either all there or via a combination of that and execution hooks.

The code for the hook implementations can start out in Howl, and
then if there's consensus on adopting it within Hive, we can move it
at that time.

On Oct 5, 2010, at 1:19 PM, Pradeep Kamath wrote:

Also, if this proposal looks reasonable, it would be nice if hive
would also adopt it – so comments from hive developers/committers
on the feasibility would be much appreciated!


From: Pradeep Kamath
Sent: Tuesday, October 05, 2010 1:14 PM
To: ''
Subject: Howl Authorization proposal

I have posted a proposal for implementing authorization in howl
based on hdfs file permission at
. Please provide any comments/feedback on the proposal.


Reply to sender | Reply to group | Reply via web post | Start a New
Messages in this topic (3)
• New Members 1
Visit Your Group

Switch to: Text-Only, Daily Digest • Unsubscribe • Terms of Use


Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 8 | next ›
Discussion Overview
groupdev @
categorieshive, hadoop
postedOct 5, '10 at 8:43p
activeOct 14, '10 at 3:12a



site design / logo © 2021 Grokbase