Grokbase Groups HBase dev July 2012
FAQ
Hi all,

I've been doing some work around HBase security and I've identified a
few enhancements that would make AccessController provide better audit
information for people interested in that. There are three different
items which are not necessarily related to each other.


(i) Lack of column family information in audit logs

I'd actually consider this a bug, more than an enhancement. The
culprit here is AccessController.permissionGranted(). If you look at
that method, it will generate audit events containing column family /
qualifier information only when the required permission for that
specific family / qualifier is denied. If permission is granted, you
get an audit message that contains the table name, but no family /
qualifier.

I think the correct thing here would be to list the affected family /
qualifiers when permission is granted, too. In the deny case, it's a
little bit hazier: do we list all families / qualifiers, the first one
to have a permission denied, or all families / qualifiers for which
permission was denied?

This question is also mildly related to (iii) below.


(ii) The access controller does not work if authentication is disabled.

This sounds obvious, right? Why would you need an access controller if
there is no security configured?

I think it would still be useful to collect auditable events in this
case. The user information will be bogus or non-existent, but at least
anyone interested will be able to collect access information for their
service. This could be done by creating another coprocessor, but that
would require replicating some of the logic in access controler (to
decide which permissions to log in the audit event), or refactoring
that logic into a helper class or something.


(iii) There's no easy way to customize processing of audit events.

Audit events are written to a log appender in a private method in
AccessController.java; this means anyone who wants something
different, like writing this data to a database, has to go through the
logging system to do it. This is sub-optimal since it means having to
parse a log message, and potentially losing information in the
process.

My preferred approach is to separate audit event creation
(AccessController.java) from audit event storage (currently also in
AccessController.java) by means of an "audit logger" interface. A new
config option can tell the AccessController to instantiate one (or
more, although I don't see much use in that) "audit logger", and it
would then call that logger instead of (or in addition to) sending the
log message to the logging subsystem. I actually have a working
prototype for this approach on top of HBase 0.92, I can post the patch
somewhere if anyone is interested.

A different approach would be to make logResult() in AccessController
protected, so that it can be subclassed, achieving similar
functionality. But I don't like how this would create tight coupling
between AccessController and the audit logging code.


So I think that covers what I've been looking at; sorry for the
long-ish e-mail. Feedback always welcome.

--
Marcelo

Search Discussions

  • Andrew Purtell at Jul 12, 2012 at 9:56 pm
    All of the below are good suggestions.

    I'd argue the entire security side of Hadoop is in need of some
    serious work regards audit. For starters, consistent audit logging
    formats: success is logged at INFO level, failure is logged via
    exception.
    (i) Lack of column family information in audit logs
    I'd actually consider this a bug, more than an enhancement. The
    culprit here is AccessController.permissionGranted().
    Consider filing a JIRA for this as a subtask under
    https://issues.apache.org/jira/browse/HBASE-6096.
    (ii) The access controller does not work if authentication is disabled.
    IMHO, doing anything with authentication disabled is out of design
    scope. Reasonable people may disagree.
    (iii) There's no easy way to customize processing of audit events.

    Audit events are written to a log appender in a private method in
    AccessController.java; this means anyone who wants something
    different, like writing this data to a database, has to go through the
    logging system to do it.
    This is consistent with how all of Hadoop does logging. I don't think
    we should roll our own. That doesn't improve the situation for system
    operators, it means they have to deal with all other parts of Hadoop
    then do something else for HBase specifically. That said,
    I actually have a working
    prototype for this approach on top of HBase 0.92, I can post the patch
    somewhere if anyone is interested.
    Suggest putting it up as another subtask under
    https://issues.apache.org/jira/browse/HBASE-6096 so we can review it.

    On Thu, Jul 12, 2012 at 1:26 PM, Marcelo Vanzin wrote:
    Hi all,

    I've been doing some work around HBase security and I've identified a
    few enhancements that would make AccessController provide better audit
    information for people interested in that. There are three different
    items which are not necessarily related to each other.


    (i) Lack of column family information in audit logs

    I'd actually consider this a bug, more than an enhancement. The
    culprit here is AccessController.permissionGranted(). If you look at
    that method, it will generate audit events containing column family /
    qualifier information only when the required permission for that
    specific family / qualifier is denied. If permission is granted, you
    get an audit message that contains the table name, but no family /
    qualifier.

    I think the correct thing here would be to list the affected family /
    qualifiers when permission is granted, too. In the deny case, it's a
    little bit hazier: do we list all families / qualifiers, the first one
    to have a permission denied, or all families / qualifiers for which
    permission was denied?

    This question is also mildly related to (iii) below.


    (ii) The access controller does not work if authentication is disabled.

    This sounds obvious, right? Why would you need an access controller if
    there is no security configured?

    I think it would still be useful to collect auditable events in this
    case. The user information will be bogus or non-existent, but at least
    anyone interested will be able to collect access information for their
    service. This could be done by creating another coprocessor, but that
    would require replicating some of the logic in access controler (to
    decide which permissions to log in the audit event), or refactoring
    that logic into a helper class or something.


    (iii) There's no easy way to customize processing of audit events.

    Audit events are written to a log appender in a private method in
    AccessController.java; this means anyone who wants something
    different, like writing this data to a database, has to go through the
    logging system to do it. This is sub-optimal since it means having to
    parse a log message, and potentially losing information in the
    process.

    My preferred approach is to separate audit event creation
    (AccessController.java) from audit event storage (currently also in
    AccessController.java) by means of an "audit logger" interface. A new
    config option can tell the AccessController to instantiate one (or
    more, although I don't see much use in that) "audit logger", and it
    would then call that logger instead of (or in addition to) sending the
    log message to the logging subsystem. I actually have a working
    prototype for this approach on top of HBase 0.92, I can post the patch
    somewhere if anyone is interested.

    A different approach would be to make logResult() in AccessController
    protected, so that it can be subclassed, achieving similar
    functionality. But I don't like how this would create tight coupling
    between AccessController and the audit logging code.


    So I think that covers what I've been looking at; sorry for the
    long-ish e-mail. Feedback always welcome.

    --
    Marcelo


    --
    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet
    Hein (via Tom White)
  • Marcelo Vanzin at Jul 12, 2012 at 10:35 pm
    Hi Andrew, thanks for the feedback.
    On Thu, Jul 12, 2012 at 2:56 PM, Andrew Purtell wrote:
    I'd argue the entire security side of Hadoop is in need of some
    serious work regards audit. For starters, consistent audit logging
    formats: success is logged at INFO level, failure is logged via
    exception.
    I won't dispute that. :-) Consistent behavior is a good thing. For
    example, HDFS logs audit messages at INFO level today (IIRC), while
    HBase does so at TRACE level. For starters, that means HBase audit
    logs won't be available by default in most installations.
    (i) Lack of column family information in audit logs
    Consider filing a JIRA for this as a subtask under
    https://issues.apache.org/jira/browse/HBASE-6096.
    Will do.
    (ii) The access controller does not work if authentication is disabled.
    IMHO, doing anything with authentication disabled is out of design
    scope. Reasonable people may disagree.
    I don't have a strong opinion about this being a feature of the
    AccessController. It can be done easily enough with a custom
    coprocessor. The only thing that is kinda sketchy in the custom
    coprocessor approach is the definition of "what requests map to what
    required permissions", something that is baked into the
    AccessController code today.

    That's not too much information to replicate, but having it available
    in an easier manner would help a lot here.
    (iii) There's no easy way to customize processing of audit events.

    Audit events are written to a log appender in a private method in
    AccessController.java; this means anyone who wants something
    different, like writing this data to a database, has to go through the
    logging system to do it.
    This is consistent with how all of Hadoop does logging. I don't think
    we should roll our own. That doesn't improve the situation for system
    operators, it means they have to deal with all other parts of Hadoop
    then do something else for HBase specifically. That said,
    Well, the logging path wouldn't go away; this would just be an
    extension for people who have might complicated needs than just
    writing to log files. We're looking at maybe providing a similar thing
    for HDFS. In the end, we don't want the easy way to be any different
    than it is today, but at the same time have a system where doing more
    complicated things is possible.
    I actually have a working
    prototype for this approach on top of HBase 0.92, I can post the patch
    somewhere if anyone is interested.
    Suggest putting it up as another subtask under
    https://issues.apache.org/jira/browse/HBASE-6096 so we can review it.
    I'll play with it some more and post something.


    --
    Marcelo
  • Andrew Purtell at Jul 12, 2012 at 10:54 pm
    Hi Marcelo,
    For example, HDFS logs audit messages at INFO level today (IIRC), while HBase does so at TRACE level.
    This has been fixed.
    Well, the logging path wouldn't go away; this would just be an
    extension for people who have might complicated needs than just
    writing to log files. We're looking at maybe providing a similar thing
    for HDFS. In the end, we don't want the easy way to be any different
    than it is today, but at the same time have a system where doing more
    complicated things is possible.
    This is the right approach, IMHO, build it into Hadoop core and then
    we can use it in a manner consistent with how core does.

    Otherwise, thanks a lot for your attention to this area.

    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet
    Hein (via Tom White)
  • Marcelo Vanzin at Jul 12, 2012 at 11:42 pm
    Hello,
    On Thu, Jul 12, 2012 at 3:54 PM, Andrew Purtell wrote:
    For example, HDFS logs audit messages at INFO level today (IIRC), while HBase does so at TRACE level.
    This has been fixed.
    Ah, good to know. It seems our git mirrors are a little bit out of date.
    Well, the logging path wouldn't go away; this would just be an
    extension for people who have might complicated needs than just
    writing to log files. We're looking at maybe providing a similar thing
    for HDFS. In the end, we don't want the easy way to be any different
    than it is today, but at the same time have a system where doing more
    complicated things is possible.
    This is the right approach, IMHO, build it into Hadoop core and then
    we can use it in a manner consistent with how core does.
    My concern with trying to come up with a common solution for core
    Hadoop and HBase is that the data being logged is fundamentally
    different. Sure, you could have a silly logger that just takes a
    string, but that's no better than hacking through the logging system,
    which can be done today.

    A proper interface would have proper types provided to the logger
    (e.g., the "AuthResult" class currently private in AccessController).
    And those cannot be shared among different services; maybe some base
    type with common audit-related fields, but not much more than that.

    Anyway, I'll clean up my code and post it on Jira instead of
    elongating this thread. :-)

    --
    Marcelo

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedJul 12, '12 at 8:26p
activeJul 12, '12 at 11:42p
posts5
users2
websitehbase.apache.org

2 users in discussion

Marcelo Vanzin: 3 posts Andrew Purtell: 2 posts

People

Translate

site design / logo © 2022 Grokbase