Grokbase Groups HBase user June 2016
FAQ
Hi,

I was wondering if it's possible/how to write Visibility Labels to an HFileOutputFormat2? I believe Visibility Labels are just implemented as Tags, but with the normal way of writing them with Mutation#setCellVisibility these are formally written as Tags to the cells during the VisibilityController coprocessor as we need to assert the expression is valid for the labels configured.

How can we add visibility labels to cells if we have a job that creates an HFile with HFileOutputFormat2 which is then subsequently loaded using LoadIncrementalHFiles?

Cheers,

Tom Ellis
Consultant Developer - Excelian
Data Lake | Financial Markets IT
LLOYDS BANK COMMERCIAL BANKING
________________________________

E: tom.ellis@lloydsbanking.com
Website: www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.com/>
, , ,
Reduce printing. Lloyds Banking Group is helping to build the low carbon economy.
Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads<http://www.lloydsbankinggroup-cr.com/downloads>



Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.

Search Discussions

  • Ramkrishna vasudevan at Jun 7, 2016 at 10:19 am
    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the ImportTsv
    tool? If you are using the ImportTsv tool then yes there is a way to
    specify visibility tags while loading from the ImportTsv tool and those
    visibility tags are also bulk loaded as HFile.

    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be used to
    indicate that the data will have Visibility Tags and the tool will
    automatically parse the specified field as Visibility Tag.

    In case you have access to the code you can see the test
    case TestImportTSVWithVisibilityLabels to get an initial idea of how it is
    being done. If not get back to us, happy to help .

    Regards
    Ram


    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT) wrote:

    Hi,

    I was wondering if it's possible/how to write Visibility Labels to an
    HFileOutputFormat2? I believe Visibility Labels are just implemented as
    Tags, but with the normal way of writing them with
    Mutation#setCellVisibility these are formally written as Tags to the cells
    during the VisibilityController coprocessor as we need to assert the
    expression is valid for the labels configured.

    How can we add visibility labels to cells if we have a job that creates an
    HFile with HFileOutputFormat2 which is then subsequently loaded using
    LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING
    ________________________________

    E: tom.ellis@lloydsbanking.com Website: www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.com/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon
    economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank
    plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in
    England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales
    2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority and
    Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial
    Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and
    may contain privileged material. If you have received this e-mail in error,
    please notify the sender and delete it (including any attachments)
    immediately. You must not copy, distribute, disclose or use any of the
    information in it or any attachments. Telephone calls may be monitored or
    recorded.
  • Ellis, Tom (Financial Markets IT) at Jun 7, 2016 at 10:28 am
    Hi Ram,

    We're attempting to do it programmatically so:

    The HFile is created by a Spark job using saveAsNewAPIHadoopFile, and using ImmutableBytesWritable as the key (rowkey) with KeyValue as the value, and using the HFilOutputFormat2 format.
    This HFile is then loaded using HBase client's LoadIncrementalHFiles.doBulkLoad

    Is there a way to do this programmatically without using the ImportTsv tool? I was taking a look at VisibilityUtils.createVisibilityExpTags and maybe being able to just create the Tags myself that way (although it's obviously @InterfaceAudience.Private) but it seems to be able to use that I'd need to know Label ordinality client side..

    Thanks for your help,

    Tom

    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 07 June 2016 11:19
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the ImportTsv tool? If you are using the ImportTsv tool then yes there is a way to specify visibility tags while loading from the ImportTsv tool and those visibility tags are also bulk loaded as HFile.

    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be used to indicate that the data will have Visibility Tags and the tool will automatically parse the specified field as Visibility Tag.

    In case you have access to the code you can see the test case TestImportTSVWithVisibilityLabels to get an initial idea of how it is being done. If not get back to us, happy to help .

    Regards
    Ram


    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT) wrote:

    Hi,

    I was wondering if it's possible/how to write Visibility Labels to an
    HFileOutputFormat2? I believe Visibility Labels are just implemented
    as Tags, but with the normal way of writing them with
    Mutation#setCellVisibility these are formally written as Tags to the
    cells during the VisibilityController coprocessor as we need to assert
    the expression is valid for the labels configured.

    How can we add visibility labels to cells if we have a job that
    creates an HFile with HFileOutputFormat2 which is then subsequently
    loaded using LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING
    ________________________________

    E: tom.ellis@lloydsbanking.com Website:
    www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.com/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.
  • Ramkrishna vasudevan at Jun 8, 2016 at 8:13 am
    Hi

    It can be done. See the class CellCreator which is Public facing interface.
    When you create your spark job to create the hadoop files that produces the
    HFileOutputformat2 data. While creating the KeyValues you can use the
    CellCreator to create your KeyValues and use the
    CellCreator.getVisibilityExpressionResolver() to map your String Visibility
    tags with the system generated ordinals.

    For eg, you can see how TextSortReducer works. I think this should help
    you solve your problem. Let us know if you need further information.

    Regards
    Ram
    On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT) wrote:

    Hi Ram,

    We're attempting to do it programmatically so:

    The HFile is created by a Spark job using saveAsNewAPIHadoopFile, and
    using ImmutableBytesWritable as the key (rowkey) with KeyValue as the
    value, and using the HFilOutputFormat2 format.
    This HFile is then loaded using HBase client's
    LoadIncrementalHFiles.doBulkLoad

    Is there a way to do this programmatically without using the ImportTsv
    tool? I was taking a look at VisibilityUtils.createVisibilityExpTags and
    maybe being able to just create the Tags myself that way (although it's
    obviously @InterfaceAudience.Private) but it seems to be able to use that
    I'd need to know Label ordinality client side..

    Thanks for your help,

    Tom

    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 07 June 2016 11:19
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the ImportTsv
    tool? If you are using the ImportTsv tool then yes there is a way to
    specify visibility tags while loading from the ImportTsv tool and those
    visibility tags are also bulk loaded as HFile.

    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be used to
    indicate that the data will have Visibility Tags and the tool will
    automatically parse the specified field as Visibility Tag.

    In case you have access to the code you can see the test case
    TestImportTSVWithVisibilityLabels to get an initial idea of how it is being
    done. If not get back to us, happy to help .

    Regards
    Ram



    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Hi,

    I was wondering if it's possible/how to write Visibility Labels to an
    HFileOutputFormat2? I believe Visibility Labels are just implemented
    as Tags, but with the normal way of writing them with
    Mutation#setCellVisibility these are formally written as Tags to the
    cells during the VisibilityController coprocessor as we need to assert
    the expression is valid for the labels configured.

    How can we add visibility labels to cells if we have a job that
    creates an HFile with HFileOutputFormat2 which is then subsequently
    loaded using LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING
    ________________________________

    E: tom.ellis@lloydsbanking.com > Website:
    www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.com/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank
    plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in
    England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales
    2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority and
    Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial
    Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and
    may contain privileged material. If you have received this e-mail in error,
    please notify the sender and delete it (including any attachments)
    immediately. You must not copy, distribute, disclose or use any of the
    information in it or any attachments. Telephone calls may be monitored or
    recorded.
  • Anoop John at Jun 8, 2016 at 10:58 am
    Thanks Ram.. Ya that seems the best way as CellCreator is public
    exposed class. May be we should explain abt this in hbase book under
    the Visibility labels area. Good to know you have Visibility labels
    based usecase. Let us know in case of any trouble. Thanks.

    -Anoop-

    On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan
    wrote:
    Hi

    It can be done. See the class CellCreator which is Public facing interface.
    When you create your spark job to create the hadoop files that produces the
    HFileOutputformat2 data. While creating the KeyValues you can use the
    CellCreator to create your KeyValues and use the
    CellCreator.getVisibilityExpressionResolver() to map your String Visibility
    tags with the system generated ordinals.

    For eg, you can see how TextSortReducer works. I think this should help
    you solve your problem. Let us know if you need further information.

    Regards
    Ram

    On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Hi Ram,

    We're attempting to do it programmatically so:

    The HFile is created by a Spark job using saveAsNewAPIHadoopFile, and
    using ImmutableBytesWritable as the key (rowkey) with KeyValue as the
    value, and using the HFilOutputFormat2 format.
    This HFile is then loaded using HBase client's
    LoadIncrementalHFiles.doBulkLoad

    Is there a way to do this programmatically without using the ImportTsv
    tool? I was taking a look at VisibilityUtils.createVisibilityExpTags and
    maybe being able to just create the Tags myself that way (although it's
    obviously @InterfaceAudience.Private) but it seems to be able to use that
    I'd need to know Label ordinality client side..

    Thanks for your help,

    Tom

    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 07 June 2016 11:19
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the ImportTsv
    tool? If you are using the ImportTsv tool then yes there is a way to
    specify visibility tags while loading from the ImportTsv tool and those
    visibility tags are also bulk loaded as HFile.

    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be used to
    indicate that the data will have Visibility Tags and the tool will
    automatically parse the specified field as Visibility Tag.

    In case you have access to the code you can see the test case
    TestImportTSVWithVisibilityLabels to get an initial idea of how it is being
    done. If not get back to us, happy to help .

    Regards
    Ram



    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Hi,

    I was wondering if it's possible/how to write Visibility Labels to an
    HFileOutputFormat2? I believe Visibility Labels are just implemented
    as Tags, but with the normal way of writing them with
    Mutation#setCellVisibility these are formally written as Tags to the
    cells during the VisibilityController coprocessor as we need to assert
    the expression is valid for the labels configured.

    How can we add visibility labels to cells if we have a job that
    creates an HFile with HFileOutputFormat2 which is then subsequently
    loaded using LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING
    ________________________________

    E: tom.ellis@lloydsbanking.com > > Website:
    www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.com/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank
    plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in
    England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales
    2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority and
    Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial
    Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and
    may contain privileged material. If you have received this e-mail in error,
    please notify the sender and delete it (including any attachments)
    immediately. You must not copy, distribute, disclose or use any of the
    information in it or any attachments. Telephone calls may be monitored or
    recorded.
  • Ellis, Tom (Financial Markets IT) at Jun 15, 2016 at 9:39 am
    Yeah, thanks for this Ram. Although in my testing I have found that a client user attempting to use the visibility expression resolver doesn't seem to have the ability to scan the hbase:labels table for the full list of labels and thus can't get the ordinals/tags to add to the cell. Does the client user attempting to use the VisibilityExpressionResolver have to have some special permissions?

    Scan of hbase:labels by client user:

    hbase(main):003:0> scan 'hbase:labels'
    ROW COLUMN+CELL
      \x00\x00\x00\x01 column=f:\x00, timestamp=1465216652662, value=system
    1 row(s) in 0.0650 seconds

    Scan of hbase:labels by hbase user:

    hbase(main):001:0> scan 'hbase:labels'
    ROW COLUMN+CELL
      \x00\x00\x00\x01 column=f:\x00, timestamp=1465216652662, value=system
      \x00\x00\x00\x02 column=f:\x00, timestamp=1465216944935, value=protected
      \x00\x00\x00\x02 column=f:hbase, timestamp=1465547138533, value=
      \x00\x00\x00\x02 column=f:tom, timestamp=1465980236882, value=
      \x00\x00\x00\x03 column=f:\x00, timestamp=1465500156667, value=testtesttest
      \x00\x00\x00\x03 column=f:@hadoop, timestamp=1465980236967, value=
      \x00\x00\x00\x03 column=f:hadoop, timestamp=1465547304610, value=
      \x00\x00\x00\x03 column=f:hive, timestamp=1465501322616, value=
      \x00\x00\x00\x04 column=f:\x00, timestamp=1465570719901, value=confidential
      \x00\x00\x00\x05 column=f:\x00, timestamp=1465835047835, value=branch
      \x00\x00\x00\x05 column=f:hdfs, timestamp=1465980237060, value=
      \x00\x00\x00\x06 column=f:\x00, timestamp=1465980447307, value=group
      \x00\x00\x00\x06 column=f:hdfs, timestamp=1465980454130, value=
    6 row(s) in 0.7370 seconds

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads

    -----Original Message-----
    From: Anoop John
    Sent: 08 June 2016 11:58
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Thanks Ram.. Ya that seems the best way as CellCreator is public exposed class. May be we should explain abt this in hbase book under the Visibility labels area. Good to know you have Visibility labels based usecase. Let us know in case of any trouble. Thanks.

    -Anoop-
    On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan wrote:
    Hi

    It can be done. See the class CellCreator which is Public facing interface.
    When you create your spark job to create the hadoop files that
    produces the
    HFileOutputformat2 data. While creating the KeyValues you can use the
    CellCreator to create your KeyValues and use the
    CellCreator.getVisibilityExpressionResolver() to map your String
    Visibility tags with the system generated ordinals.

    For eg, you can see how TextSortReducer works. I think this should
    help you solve your problem. Let us know if you need further information.

    Regards
    Ram

    On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Hi Ram,

    We're attempting to do it programmatically so:

    The HFile is created by a Spark job using saveAsNewAPIHadoopFile, and
    using ImmutableBytesWritable as the key (rowkey) with KeyValue as the
    value, and using the HFilOutputFormat2 format.
    This HFile is then loaded using HBase client's
    LoadIncrementalHFiles.doBulkLoad

    Is there a way to do this programmatically without using the
    ImportTsv tool? I was taking a look at
    VisibilityUtils.createVisibilityExpTags and maybe being able to just
    create the Tags myself that way (although it's obviously
    @InterfaceAudience.Private) but it seems to be able to use that I'd need to know Label ordinality client side..

    Thanks for your help,

    Tom

    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 07 June 2016 11:19
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the
    ImportTsv tool? If you are using the ImportTsv tool then yes there
    is a way to specify visibility tags while loading from the ImportTsv
    tool and those visibility tags are also bulk loaded as HFile.

    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be used to
    indicate that the data will have Visibility Tags and the tool will
    automatically parse the specified field as Visibility Tag.

    In case you have access to the code you can see the test case
    TestImportTSVWithVisibilityLabels to get an initial idea of how it is
    being done. If not get back to us, happy to help .

    Regards
    Ram



    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Hi,

    I was wondering if it's possible/how to write Visibility Labels to
    an HFileOutputFormat2? I believe Visibility Labels are just
    implemented as Tags, but with the normal way of writing them with
    Mutation#setCellVisibility these are formally written as Tags to
    the cells during the VisibilityController coprocessor as we need to
    assert the expression is valid for the labels configured.

    How can we add visibility labels to cells if we have a job that
    creates an HFile with HFileOutputFormat2 which is then subsequently
    loaded using LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING
    ________________________________

    E: tom.ellis@lloydsbanking.com > > Website:
    www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.com/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this
    e-mail in error, please notify the sender and delete it (including
    any
    attachments) immediately. You must not copy, distribute, disclose
    or use any of the information in it or any attachments. Telephone
    calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.
  • Ramkrishna vasudevan at Jun 15, 2016 at 10:24 am
    The visibility expression resolver tries to scan the labels table and the
    user using the resolver should have the SYSTEM privileges. Since the
    information that is getting accessed is sensitive information.

    Suppose in your above case you have the client user added as a an admin
    then when you scan the label table you should be able to scan it.

    Regards
    Ram
    On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) wrote:

    Yeah, thanks for this Ram. Although in my testing I have found that a
    client user attempting to use the visibility expression resolver doesn't
    seem to have the ability to scan the hbase:labels table for the full list
    of labels and thus can't get the ordinals/tags to add to the cell. Does the
    client user attempting to use the VisibilityExpressionResolver have to have
    some special permissions?

    Scan of hbase:labels by client user:

    hbase(main):003:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    1 row(s) in 0.0650 seconds

    Scan of hbase:labels by hbase user:

    hbase(main):001:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    \x00\x00\x00\x02 column=f:\x00,
    timestamp=1465216944935, value=protected
    \x00\x00\x00\x02 column=f:hbase,
    timestamp=1465547138533, value=
    \x00\x00\x00\x02 column=f:tom,
    timestamp=1465980236882, value=
    \x00\x00\x00\x03 column=f:\x00,
    timestamp=1465500156667, value=testtesttest
    \x00\x00\x00\x03 column=f:@hadoop,
    timestamp=1465980236967, value=
    \x00\x00\x00\x03 column=f:hadoop,
    timestamp=1465547304610, value=
    \x00\x00\x00\x03 column=f:hive,
    timestamp=1465501322616, value=
    \x00\x00\x00\x04 column=f:\x00,
    timestamp=1465570719901, value=confidential
    \x00\x00\x00\x05 column=f:\x00,
    timestamp=1465835047835, value=branch
    \x00\x00\x00\x05 column=f:hdfs,
    timestamp=1465980237060, value=
    \x00\x00\x00\x06 column=f:\x00,
    timestamp=1465980447307, value=group
    \x00\x00\x00\x06 column=f:hdfs,
    timestamp=1465980454130, value=
    6 row(s) in 0.7370 seconds

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon
    economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads

    -----Original Message-----
    From: Anoop John
    Sent: 08 June 2016 11:58
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Thanks Ram.. Ya that seems the best way as CellCreator is public exposed
    class. May be we should explain abt this in hbase book under the Visibility
    labels area. Good to know you have Visibility labels based usecase. Let us
    know in case of any trouble. Thanks.

    -Anoop-

    On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan <
    ramkrishna.s.vasudevan@gmail.com> wrote:
    Hi

    It can be done. See the class CellCreator which is Public facing
    interface.
    When you create your spark job to create the hadoop files that
    produces the
    HFileOutputformat2 data. While creating the KeyValues you can use the
    CellCreator to create your KeyValues and use the
    CellCreator.getVisibilityExpressionResolver() to map your String
    Visibility tags with the system generated ordinals.

    For eg, you can see how TextSortReducer works. I think this should
    help you solve your problem. Let us know if you need further information.

    Regards
    Ram

    On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Hi Ram,

    We're attempting to do it programmatically so:

    The HFile is created by a Spark job using saveAsNewAPIHadoopFile, and
    using ImmutableBytesWritable as the key (rowkey) with KeyValue as the
    value, and using the HFilOutputFormat2 format.
    This HFile is then loaded using HBase client's
    LoadIncrementalHFiles.doBulkLoad

    Is there a way to do this programmatically without using the
    ImportTsv tool? I was taking a look at
    VisibilityUtils.createVisibilityExpTags and maybe being able to just
    create the Tags myself that way (although it's obviously
    @InterfaceAudience.Private) but it seems to be able to use that I'd
    need to know Label ordinality client side..
    Thanks for your help,

    Tom

    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 07 June 2016 11:19
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the
    ImportTsv tool? If you are using the ImportTsv tool then yes there
    is a way to specify visibility tags while loading from the ImportTsv
    tool and those visibility tags are also bulk loaded as HFile.

    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be used to
    indicate that the data will have Visibility Tags and the tool will
    automatically parse the specified field as Visibility Tag.

    In case you have access to the code you can see the test case
    TestImportTSVWithVisibilityLabels to get an initial idea of how it is
    being done. If not get back to us, happy to help .

    Regards
    Ram



    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Hi,

    I was wondering if it's possible/how to write Visibility Labels to
    an HFileOutputFormat2? I believe Visibility Labels are just
    implemented as Tags, but with the normal way of writing them with
    Mutation#setCellVisibility these are formally written as Tags to
    the cells during the VisibilityController coprocessor as we need to
    assert the expression is valid for the labels configured.

    How can we add visibility labels to cells if we have a job that
    creates an HFile with HFileOutputFormat2 which is then subsequently
    loaded using LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING
    ________________________________

    E: tom.ellis@lloydsbanking.com >> > Website:
    www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.com/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this
    e-mail in error, please notify the sender and delete it (including
    any
    attachments) immediately. You must not copy, distribute, disclose
    or use any of the information in it or any attachments. Telephone
    calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1
    1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland
    no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank
    plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in
    England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales
    2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority and
    Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial
    Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and
    may contain privileged material. If you have received this e-mail in error,
    please notify the sender and delete it (including any attachments)
    immediately. You must not copy, distribute, disclose or use any of the
    information in it or any attachments. Telephone calls may be monitored or
    recorded.
  • Ellis, Tom (Financial Markets IT) at Jun 15, 2016 at 10:40 am
    Hmm, is there no other way to set labels on individual cells where we don't have to give the client users system perms? For instance, client users can set the cell visibility on the entire put without having this (i.e. put.setCellVisibility("label")) and the VisibilityController will check this.

    We could I guess create multiple puts for cells in the same row with different labels and use the setCellVisibility on each individual put/cell, but will this create additional overhead?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 11:24
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    The visibility expression resolver tries to scan the labels table and the user using the resolver should have the SYSTEM privileges. Since the information that is getting accessed is sensitive information.

    Suppose in your above case you have the client user added as a an admin then when you scan the label table you should be able to scan it.

    Regards
    Ram
    On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) wrote:

    Yeah, thanks for this Ram. Although in my testing I have found that a
    client user attempting to use the visibility expression resolver
    doesn't seem to have the ability to scan the hbase:labels table for
    the full list of labels and thus can't get the ordinals/tags to add to
    the cell. Does the client user attempting to use the
    VisibilityExpressionResolver have to have some special permissions?

    Scan of hbase:labels by client user:

    hbase(main):003:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    1 row(s) in 0.0650 seconds

    Scan of hbase:labels by hbase user:

    hbase(main):001:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    \x00\x00\x00\x02 column=f:\x00,
    timestamp=1465216944935, value=protected
    \x00\x00\x00\x02 column=f:hbase,
    timestamp=1465547138533, value=
    \x00\x00\x00\x02 column=f:tom,
    timestamp=1465980236882, value=
    \x00\x00\x00\x03 column=f:\x00,
    timestamp=1465500156667, value=testtesttest
    \x00\x00\x00\x03 column=f:@hadoop,
    timestamp=1465980236967, value=
    \x00\x00\x00\x03 column=f:hadoop,
    timestamp=1465547304610, value=
    \x00\x00\x00\x03 column=f:hive,
    timestamp=1465501322616, value=
    \x00\x00\x00\x04 column=f:\x00,
    timestamp=1465570719901, value=confidential
    \x00\x00\x00\x05 column=f:\x00,
    timestamp=1465835047835, value=branch
    \x00\x00\x00\x05 column=f:hdfs,
    timestamp=1465980237060, value=
    \x00\x00\x00\x06 column=f:\x00,
    timestamp=1465980447307, value=group
    \x00\x00\x00\x06 column=f:hdfs,
    timestamp=1465980454130, value=
    6 row(s) in 0.7370 seconds

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads

    -----Original Message-----
    From: Anoop John
    Sent: 08 June 2016 11:58
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Thanks Ram.. Ya that seems the best way as CellCreator is public
    exposed class. May be we should explain abt this in hbase book under
    the Visibility labels area. Good to know you have Visibility labels
    based usecase. Let us know in case of any trouble. Thanks.

    -Anoop-

    On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan <
    ramkrishna.s.vasudevan@gmail.com> wrote:
    Hi

    It can be done. See the class CellCreator which is Public facing
    interface.
    When you create your spark job to create the hadoop files that
    produces the
    HFileOutputformat2 data. While creating the KeyValues you can use
    the CellCreator to create your KeyValues and use the
    CellCreator.getVisibilityExpressionResolver() to map your String
    Visibility tags with the system generated ordinals.

    For eg, you can see how TextSortReducer works. I think this should
    help you solve your problem. Let us know if you need further information.

    Regards
    Ram

    On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Hi Ram,

    We're attempting to do it programmatically so:

    The HFile is created by a Spark job using saveAsNewAPIHadoopFile,
    and using ImmutableBytesWritable as the key (rowkey) with KeyValue
    as the value, and using the HFilOutputFormat2 format.
    This HFile is then loaded using HBase client's
    LoadIncrementalHFiles.doBulkLoad

    Is there a way to do this programmatically without using the
    ImportTsv tool? I was taking a look at
    VisibilityUtils.createVisibilityExpTags and maybe being able to
    just create the Tags myself that way (although it's obviously
    @InterfaceAudience.Private) but it seems to be able to use that I'd
    need to know Label ordinality client side..
    Thanks for your help,

    Tom

    -----Original Message-----
    From: ramkrishna vasudevan

    Sent: 07 June 2016 11:19
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the
    ImportTsv tool? If you are using the ImportTsv tool then yes there
    is a way to specify visibility tags while loading from the
    ImportTsv tool and those visibility tags are also bulk loaded as HFile.

    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be used
    to indicate that the data will have Visibility Tags and the tool
    will automatically parse the specified field as Visibility Tag.

    In case you have access to the code you can see the test case
    TestImportTSVWithVisibilityLabels to get an initial idea of how it
    is being done. If not get back to us, happy to help .

    Regards
    Ram



    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Hi,

    I was wondering if it's possible/how to write Visibility Labels
    to an HFileOutputFormat2? I believe Visibility Labels are just
    implemented as Tags, but with the normal way of writing them with
    Mutation#setCellVisibility these are formally written as Tags to
    the cells during the VisibilityController coprocessor as we need
    to assert the expression is valid for the labels configured.

    How can we add visibility labels to cells if we have a job that
    creates an HFile with HFileOutputFormat2 which is then
    subsequently loaded using LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING
    ________________________________

    E:
    tom.ellis@lloydsbanking.com >> > Website:
    www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.com/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute, disclose
    or use any of the information in it or any attachments. Telephone
    calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1
    1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland
    no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this
    e-mail in error, please notify the sender and delete it (including
    any
    attachments) immediately. You must not copy, distribute, disclose
    or use any of the information in it or any attachments. Telephone
    calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.
  • Ramkrishna vasudevan at Jun 15, 2016 at 11:30 am
    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual put/cell,
    but will this create additional overhead?
    This can be done. If you want different cells in the same row to have
    different labels then it is better to create those many puts and
    setCellVisibility on each of them. What type of overhead you see here? In
    terms of the server processing them? If so there should not be much
    overhead here and also adding different cells to every column inturn means
    you need every cell to be treated differenly in terms of security. so
    should be fine IMHO.

    Without doing put.setCellvisibility() there is no other way I believe. One
    question regarding your use case
    Now in the mail you had told about the spark job where you will create a
    bulk loaded file. Now if that is to have all the visibility related
    information of all the cells then the user doing this job should be an
    admin or super user right Why is the case that a normal client user will
    read through all the visibility cells which may or may not be associated
    with that user?

    Thank you very much for testing and using this feature. LEt us know your
    feedback and if you find any gaps here. Happy to help.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) wrote:

    Hmm, is there no other way to set labels on individual cells where we
    don't have to give the client users system perms? For instance, client
    users can set the cell visibility on the entire put without having this
    (i.e. put.setCellVisibility("label")) and the VisibilityController will
    check this.

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual put/cell,
    but will this create additional overhead?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon
    economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 11:24
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    The visibility expression resolver tries to scan the labels table and the
    user using the resolver should have the SYSTEM privileges. Since the
    information that is getting accessed is sensitive information.

    Suppose in your above case you have the client user added as a an admin
    then when you scan the label table you should be able to scan it.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Yeah, thanks for this Ram. Although in my testing I have found that a
    client user attempting to use the visibility expression resolver
    doesn't seem to have the ability to scan the hbase:labels table for
    the full list of labels and thus can't get the ordinals/tags to add to
    the cell. Does the client user attempting to use the
    VisibilityExpressionResolver have to have some special permissions?

    Scan of hbase:labels by client user:

    hbase(main):003:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    1 row(s) in 0.0650 seconds

    Scan of hbase:labels by hbase user:

    hbase(main):001:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    \x00\x00\x00\x02 column=f:\x00,
    timestamp=1465216944935, value=protected
    \x00\x00\x00\x02 column=f:hbase,
    timestamp=1465547138533, value=
    \x00\x00\x00\x02 column=f:tom,
    timestamp=1465980236882, value=
    \x00\x00\x00\x03 column=f:\x00,
    timestamp=1465500156667, value=testtesttest
    \x00\x00\x00\x03 column=f:@hadoop,
    timestamp=1465980236967, value=
    \x00\x00\x00\x03 column=f:hadoop,
    timestamp=1465547304610, value=
    \x00\x00\x00\x03 column=f:hive,
    timestamp=1465501322616, value=
    \x00\x00\x00\x04 column=f:\x00,
    timestamp=1465570719901, value=confidential
    \x00\x00\x00\x05 column=f:\x00,
    timestamp=1465835047835, value=branch
    \x00\x00\x00\x05 column=f:hdfs,
    timestamp=1465980237060, value=
    \x00\x00\x00\x06 column=f:\x00,
    timestamp=1465980447307, value=group
    \x00\x00\x00\x06 column=f:hdfs,
    timestamp=1465980454130, value=
    6 row(s) in 0.7370 seconds

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads

    -----Original Message-----
    From: Anoop John
    Sent: 08 June 2016 11:58
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Thanks Ram.. Ya that seems the best way as CellCreator is public
    exposed class. May be we should explain abt this in hbase book under
    the Visibility labels area. Good to know you have Visibility labels
    based usecase. Let us know in case of any trouble. Thanks.

    -Anoop-

    On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan <
    ramkrishna.s.vasudevan@gmail.com> wrote:
    Hi

    It can be done. See the class CellCreator which is Public facing
    interface.
    When you create your spark job to create the hadoop files that
    produces the
    HFileOutputformat2 data. While creating the KeyValues you can use
    the CellCreator to create your KeyValues and use the
    CellCreator.getVisibilityExpressionResolver() to map your String
    Visibility tags with the system generated ordinals.

    For eg, you can see how TextSortReducer works. I think this should
    help you solve your problem. Let us know if you need further
    information.
    Regards
    Ram

    On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Hi Ram,

    We're attempting to do it programmatically so:

    The HFile is created by a Spark job using saveAsNewAPIHadoopFile,
    and using ImmutableBytesWritable as the key (rowkey) with KeyValue
    as the value, and using the HFilOutputFormat2 format.
    This HFile is then loaded using HBase client's
    LoadIncrementalHFiles.doBulkLoad

    Is there a way to do this programmatically without using the
    ImportTsv tool? I was taking a look at
    VisibilityUtils.createVisibilityExpTags and maybe being able to
    just create the Tags myself that way (although it's obviously
    @InterfaceAudience.Private) but it seems to be able to use that I'd
    need to know Label ordinality client side..
    Thanks for your help,

    Tom

    -----Original Message-----
    From: ramkrishna vasudevan

    Sent: 07 June 2016 11:19
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the
    ImportTsv tool? If you are using the ImportTsv tool then yes there
    is a way to specify visibility tags while loading from the
    ImportTsv tool and those visibility tags are also bulk loaded as
    HFile.
    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be used
    to indicate that the data will have Visibility Tags and the tool
    will automatically parse the specified field as Visibility Tag.

    In case you have access to the code you can see the test case
    TestImportTSVWithVisibilityLabels to get an initial idea of how it
    is being done. If not get back to us, happy to help .

    Regards
    Ram



    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Hi,

    I was wondering if it's possible/how to write Visibility Labels
    to an HFileOutputFormat2? I believe Visibility Labels are just
    implemented as Tags, but with the normal way of writing them with
    Mutation#setCellVisibility these are formally written as Tags to
    the cells during the VisibilityController coprocessor as we need
    to assert the expression is valid for the labels configured.

    How can we add visibility labels to cells if we have a job that
    creates an HFile with HFileOutputFormat2 which is then
    subsequently loaded using LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING
    ________________________________

    E:
    tom.ellis@lloydsbanking.com > >> > Website:
    www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.com/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute, disclose
    or use any of the information in it or any attachments. Telephone
    calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1
    1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland
    no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this
    e-mail in error, please notify the sender and delete it (including
    any
    attachments) immediately. You must not copy, distribute, disclose
    or use any of the information in it or any attachments. Telephone
    calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank
    plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in
    England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales
    2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority and
    Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial
    Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and
    may contain privileged material. If you have received this e-mail in error,
    please notify the sender and delete it (including any attachments)
    immediately. You must not copy, distribute, disclose or use any of the
    information in it or any attachments. Telephone calls may be monitored or
    recorded.
  • Ellis, Tom (Financial Markets IT) at Jun 15, 2016 at 3:25 pm
    So I have a working prototype using just bulk puts on a table and using setCellVisibility as necessary. Now I'm trying to do it using HFile.

    Sorry Ram, I don't quite follow why the user doing the writing of the HFile has to be an admin/super user? Is that necessary to load HFiles?

    The use case is to hopefully have an application user (non admin) performing the writes to an hbase table via a bulk load of an hfile, setting visibility labels on individual cells as necessary. Then business users who has been given the auth to view that label can see those cells, and others not.

    I've seen that it's possible to do this with map reduce & setting the map output to be a Put (and thus could setCellVisibility on the puts), but I'm struggling to do this with Spark, as I keep getting the exception that I can't cast a Put to a Cell.

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 12:31
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual put/cell, but will this create additional overhead?
    This can be done. If you want different cells in the same row to have different labels then it is better to create those many puts and setCellVisibility on each of them. What type of overhead you see here? In terms of the server processing them? If so there should not be much overhead here and also adding different cells to every column inturn means you need every cell to be treated differenly in terms of security. so should be fine IMHO.

    Without doing put.setCellvisibility() there is no other way I believe. One question regarding your use case Now in the mail you had told about the spark job where you will create a bulk loaded file. Now if that is to have all the visibility related information of all the cells then the user doing this job should be an admin or super user right Why is the case that a normal client user will read through all the visibility cells which may or may not be associated with that user?

    Thank you very much for testing and using this feature. LEt us know your feedback and if you find any gaps here. Happy to help.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) wrote:

    Hmm, is there no other way to set labels on individual cells where we
    don't have to give the client users system perms? For instance, client
    users can set the cell visibility on the entire put without having
    this (i.e. put.setCellVisibility("label")) and the
    VisibilityController will check this.

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual
    put/cell, but will this create additional overhead?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 11:24
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    The visibility expression resolver tries to scan the labels table and
    the user using the resolver should have the SYSTEM privileges. Since
    the information that is getting accessed is sensitive information.

    Suppose in your above case you have the client user added as a an
    admin then when you scan the label table you should be able to scan it.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Yeah, thanks for this Ram. Although in my testing I have found that
    a client user attempting to use the visibility expression resolver
    doesn't seem to have the ability to scan the hbase:labels table for
    the full list of labels and thus can't get the ordinals/tags to add
    to the cell. Does the client user attempting to use the
    VisibilityExpressionResolver have to have some special permissions?

    Scan of hbase:labels by client user:

    hbase(main):003:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    1 row(s) in 0.0650 seconds

    Scan of hbase:labels by hbase user:

    hbase(main):001:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    \x00\x00\x00\x02 column=f:\x00,
    timestamp=1465216944935, value=protected
    \x00\x00\x00\x02 column=f:hbase,
    timestamp=1465547138533, value=
    \x00\x00\x00\x02 column=f:tom,
    timestamp=1465980236882, value=
    \x00\x00\x00\x03 column=f:\x00,
    timestamp=1465500156667, value=testtesttest
    \x00\x00\x00\x03 column=f:@hadoop,
    timestamp=1465980236967, value=
    \x00\x00\x00\x03 column=f:hadoop,
    timestamp=1465547304610, value=
    \x00\x00\x00\x03 column=f:hive,
    timestamp=1465501322616, value=
    \x00\x00\x00\x04 column=f:\x00,
    timestamp=1465570719901, value=confidential
    \x00\x00\x00\x05 column=f:\x00,
    timestamp=1465835047835, value=branch
    \x00\x00\x00\x05 column=f:hdfs,
    timestamp=1465980237060, value=
    \x00\x00\x00\x06 column=f:\x00,
    timestamp=1465980447307, value=group
    \x00\x00\x00\x06 column=f:hdfs,
    timestamp=1465980454130, value=
    6 row(s) in 0.7370 seconds

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds
    Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads

    -----Original Message-----
    From: Anoop John
    Sent: 08 June 2016 11:58
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Thanks Ram.. Ya that seems the best way as CellCreator is public
    exposed class. May be we should explain abt this in hbase book under
    the Visibility labels area. Good to know you have Visibility labels
    based usecase. Let us know in case of any trouble. Thanks.

    -Anoop-

    On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan <
    ramkrishna.s.vasudevan@gmail.com> wrote:
    Hi

    It can be done. See the class CellCreator which is Public facing
    interface.
    When you create your spark job to create the hadoop files that
    produces the
    HFileOutputformat2 data. While creating the KeyValues you can use
    the CellCreator to create your KeyValues and use the
    CellCreator.getVisibilityExpressionResolver() to map your String
    Visibility tags with the system generated ordinals.

    For eg, you can see how TextSortReducer works. I think this
    should help you solve your problem. Let us know if you need
    further
    information.
    Regards
    Ram

    On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT)
    wrote:
    Hi Ram,

    We're attempting to do it programmatically so:

    The HFile is created by a Spark job using saveAsNewAPIHadoopFile,
    and using ImmutableBytesWritable as the key (rowkey) with
    KeyValue as the value, and using the HFilOutputFormat2 format.
    This HFile is then loaded using HBase client's
    LoadIncrementalHFiles.doBulkLoad

    Is there a way to do this programmatically without using the
    ImportTsv tool? I was taking a look at
    VisibilityUtils.createVisibilityExpTags and maybe being able to
    just create the Tags myself that way (although it's obviously
    @InterfaceAudience.Private) but it seems to be able to use that
    I'd
    need to know Label ordinality client side..
    Thanks for your help,

    Tom

    -----Original Message-----
    From: ramkrishna vasudevan

    Sent: 07 June 2016 11:19
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the
    ImportTsv tool? If you are using the ImportTsv tool then yes
    there is a way to specify visibility tags while loading from the
    ImportTsv tool and those visibility tags are also bulk loaded as
    HFile.
    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be
    used to indicate that the data will have Visibility Tags and the
    tool will automatically parse the specified field as Visibility Tag.

    In case you have access to the code you can see the test case
    TestImportTSVWithVisibilityLabels to get an initial idea of how
    it is being done. If not get back to us, happy to help .

    Regards
    Ram



    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT)
    wrote:
    Hi,

    I was wondering if it's possible/how to write Visibility Labels
    to an HFileOutputFormat2? I believe Visibility Labels are just
    implemented as Tags, but with the normal way of writing them
    with Mutation#setCellVisibility these are formally written as
    Tags to the cells during the VisibilityController coprocessor
    as we need to assert the expression is valid for the labels configured.

    How can we add visibility labels to cells if we have a job that
    creates an HFile with HFileOutputFormat2 which is then
    subsequently loaded using LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian Data Lake | Financial Markets
    IT LLOYDS BANK COMMERCIAL BANKING
    ________________________________

    E:
    tom.ellis@lloydsbanking.com > >> > Website:
    www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.co
    m/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the
    low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound,
    Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute,
    disclose or use any of the information in it or any
    attachments. Telephone calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1
    1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland
    no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute, disclose
    or use any of the information in it or any attachments. Telephone
    calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this
    e-mail in error, please notify the sender and delete it (including
    any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.
  • Ted Yu at Jun 15, 2016 at 4:01 pm
    Tom:
    Can you pastebin the stack trace for the exception ?

    It would be nice if you can show snippet of your code too.

    Thanks
    On Jun 15, 2016, at 8:24 AM, Ellis, Tom (Financial Markets IT) wrote:

    So I have a working prototype using just bulk puts on a table and using setCellVisibility as necessary. Now I'm trying to do it using HFile.

    Sorry Ram, I don't quite follow why the user doing the writing of the HFile has to be an admin/super user? Is that necessary to load HFiles?

    The use case is to hopefully have an application user (non admin) performing the writes to an hbase table via a bulk load of an hfile, setting visibility labels on individual cells as necessary. Then business users who has been given the auth to view that label can see those cells, and others not.

    I've seen that it's possible to do this with map reduce & setting the map output to be a Put (and thus could setCellVisibility on the puts), but I'm struggling to do this with Spark, as I keep getting the exception that I can't cast a Put to a Cell.

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 12:31
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual put/cell, but will this create additional overhead?
    This can be done. If you want different cells in the same row to have different labels then it is better to create those many puts and setCellVisibility on each of them. What type of overhead you see here? In terms of the server processing them? If so there should not be much overhead here and also adding different cells to every column inturn means you need every cell to be treated differenly in terms of security. so should be fine IMHO.

    Without doing put.setCellvisibility() there is no other way I believe. One question regarding your use case Now in the mail you had told about the spark job where you will create a bulk loaded file. Now if that is to have all the visibility related information of all the cells then the user doing this job should be an admin or super user right Why is the case that a normal client user will read through all the visibility cells which may or may not be associated with that user?

    Thank you very much for testing and using this feature. LEt us know your feedback and if you find any gaps here. Happy to help.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) wrote:

    Hmm, is there no other way to set labels on individual cells where we
    don't have to give the client users system perms? For instance, client
    users can set the cell visibility on the entire put without having
    this (i.e. put.setCellVisibility("label")) and the
    VisibilityController will check this.

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual
    put/cell, but will this create additional overhead?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 11:24
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    The visibility expression resolver tries to scan the labels table and
    the user using the resolver should have the SYSTEM privileges. Since
    the information that is getting accessed is sensitive information.

    Suppose in your above case you have the client user added as a an
    admin then when you scan the label table you should be able to scan it.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Yeah, thanks for this Ram. Although in my testing I have found that
    a client user attempting to use the visibility expression resolver
    doesn't seem to have the ability to scan the hbase:labels table for
    the full list of labels and thus can't get the ordinals/tags to add
    to the cell. Does the client user attempting to use the
    VisibilityExpressionResolver have to have some special permissions?

    Scan of hbase:labels by client user:

    hbase(main):003:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    1 row(s) in 0.0650 seconds

    Scan of hbase:labels by hbase user:

    hbase(main):001:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    \x00\x00\x00\x02 column=f:\x00,
    timestamp=1465216944935, value=protected
    \x00\x00\x00\x02 column=f:hbase,
    timestamp=1465547138533, value=
    \x00\x00\x00\x02 column=f:tom,
    timestamp=1465980236882, value=
    \x00\x00\x00\x03 column=f:\x00,
    timestamp=1465500156667, value=testtesttest
    \x00\x00\x00\x03 column=f:@hadoop,
    timestamp=1465980236967, value=
    \x00\x00\x00\x03 column=f:hadoop,
    timestamp=1465547304610, value=
    \x00\x00\x00\x03 column=f:hive,
    timestamp=1465501322616, value=
    \x00\x00\x00\x04 column=f:\x00,
    timestamp=1465570719901, value=confidential
    \x00\x00\x00\x05 column=f:\x00,
    timestamp=1465835047835, value=branch
    \x00\x00\x00\x05 column=f:hdfs,
    timestamp=1465980237060, value=
    \x00\x00\x00\x06 column=f:\x00,
    timestamp=1465980447307, value=group
    \x00\x00\x00\x06 column=f:hdfs,
    timestamp=1465980454130, value=
    6 row(s) in 0.7370 seconds

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds
    Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads

    -----Original Message-----
    From: Anoop John
    Sent: 08 June 2016 11:58
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Thanks Ram.. Ya that seems the best way as CellCreator is public
    exposed class. May be we should explain abt this in hbase book under
    the Visibility labels area. Good to know you have Visibility labels
    based usecase. Let us know in case of any trouble. Thanks.

    -Anoop-

    On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan <
    ramkrishna.s.vasudevan@gmail.com> wrote:
    Hi

    It can be done. See the class CellCreator which is Public facing
    interface.
    When you create your spark job to create the hadoop files that
    produces the
    HFileOutputformat2 data. While creating the KeyValues you can use
    the CellCreator to create your KeyValues and use the
    CellCreator.getVisibilityExpressionResolver() to map your String
    Visibility tags with the system generated ordinals.

    For eg, you can see how TextSortReducer works. I think this
    should help you solve your problem. Let us know if you need
    further
    information.
    Regards
    Ram

    On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT)
    wrote:
    Hi Ram,

    We're attempting to do it programmatically so:

    The HFile is created by a Spark job using saveAsNewAPIHadoopFile,
    and using ImmutableBytesWritable as the key (rowkey) with
    KeyValue as the value, and using the HFilOutputFormat2 format.
    This HFile is then loaded using HBase client's
    LoadIncrementalHFiles.doBulkLoad

    Is there a way to do this programmatically without using the
    ImportTsv tool? I was taking a look at
    VisibilityUtils.createVisibilityExpTags and maybe being able to
    just create the Tags myself that way (although it's obviously
    @InterfaceAudience.Private) but it seems to be able to use that
    I'd
    need to know Label ordinality client side..
    Thanks for your help,

    Tom

    -----Original Message-----
    From: ramkrishna vasudevan

    Sent: 07 June 2016 11:19
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the
    ImportTsv tool? If you are using the ImportTsv tool then yes
    there is a way to specify visibility tags while loading from the
    ImportTsv tool and those visibility tags are also bulk loaded as
    HFile.
    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be
    used to indicate that the data will have Visibility Tags and the
    tool will automatically parse the specified field as Visibility Tag.

    In case you have access to the code you can see the test case
    TestImportTSVWithVisibilityLabels to get an initial idea of how
    it is being done. If not get back to us, happy to help .

    Regards
    Ram



    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT)
    wrote:
    Hi,

    I was wondering if it's possible/how to write Visibility Labels
    to an HFileOutputFormat2? I believe Visibility Labels are just
    implemented as Tags, but with the normal way of writing them
    with Mutation#setCellVisibility these are formally written as
    Tags to the cells during the VisibilityController coprocessor
    as we need to assert the expression is valid for the labels configured.

    How can we add visibility labels to cells if we have a job that
    creates an HFile with HFileOutputFormat2 which is then
    subsequently loaded using LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian Data Lake | Financial Markets
    IT LLOYDS BANK COMMERCIAL BANKING
    ________________________________

    E:
    tom.ellis@lloydsbanking.com >>>>> Website:
    www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.co
    m/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the
    low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound,
    Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute,
    disclose or use any of the information in it or any
    attachments. Telephone calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1
    1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland
    no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute, disclose
    or use any of the information in it or any attachments. Telephone
    calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this
    e-mail in error, please notify the sender and delete it (including
    any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.
  • Ellis, Tom (Financial Markets IT) at Jun 15, 2016 at 4:29 pm
    Thanks Ted - It was just a class cast on line 161 of HFileOutput2.write, because I had previously read that you could give it Puts, but it can actually only take Cells. You can only do Puts if you use configureIncrementalLoad which then sets up the PutSortReducer as I discussed in my other email.

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: Ted Yu
    Sent: 15 June 2016 17:01
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Tom:
    Can you pastebin the stack trace for the exception ?

    It would be nice if you can show snippet of your code too.

    Thanks
    On Jun 15, 2016, at 8:24 AM, Ellis, Tom (Financial Markets IT) wrote:

    So I have a working prototype using just bulk puts on a table and using setCellVisibility as necessary. Now I'm trying to do it using HFile.

    Sorry Ram, I don't quite follow why the user doing the writing of the HFile has to be an admin/super user? Is that necessary to load HFiles?

    The use case is to hopefully have an application user (non admin) performing the writes to an hbase table via a bulk load of an hfile, setting visibility labels on individual cells as necessary. Then business users who has been given the auth to view that label can see those cells, and others not.

    I've seen that it's possible to do this with map reduce & setting the map output to be a Put (and thus could setCellVisibility on the puts), but I'm struggling to do this with Spark, as I keep getting the exception that I can't cast a Put to a Cell.

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 12:31
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual put/cell, but will this create additional overhead?
    This can be done. If you want different cells in the same row to have different labels then it is better to create those many puts and setCellVisibility on each of them. What type of overhead you see here? In terms of the server processing them? If so there should not be much overhead here and also adding different cells to every column inturn means you need every cell to be treated differenly in terms of security. so should be fine IMHO.

    Without doing put.setCellvisibility() there is no other way I believe. One question regarding your use case Now in the mail you had told about the spark job where you will create a bulk loaded file. Now if that is to have all the visibility related information of all the cells then the user doing this job should be an admin or super user right Why is the case that a normal client user will read through all the visibility cells which may or may not be associated with that user?

    Thank you very much for testing and using this feature. LEt us know your feedback and if you find any gaps here. Happy to help.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) wrote:

    Hmm, is there no other way to set labels on individual cells where we
    don't have to give the client users system perms? For instance,
    client users can set the cell visibility on the entire put without
    having this (i.e. put.setCellVisibility("label")) and the
    VisibilityController will check this.

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual
    put/cell, but will this create additional overhead?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds
    Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 11:24
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    The visibility expression resolver tries to scan the labels table and
    the user using the resolver should have the SYSTEM privileges. Since
    the information that is getting accessed is sensitive information.

    Suppose in your above case you have the client user added as a an
    admin then when you scan the label table you should be able to scan it.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Yeah, thanks for this Ram. Although in my testing I have found that
    a client user attempting to use the visibility expression resolver
    doesn't seem to have the ability to scan the hbase:labels table for
    the full list of labels and thus can't get the ordinals/tags to add
    to the cell. Does the client user attempting to use the
    VisibilityExpressionResolver have to have some special permissions?

    Scan of hbase:labels by client user:

    hbase(main):003:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    1 row(s) in 0.0650 seconds

    Scan of hbase:labels by hbase user:

    hbase(main):001:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    \x00\x00\x00\x02 column=f:\x00,
    timestamp=1465216944935, value=protected
    \x00\x00\x00\x02 column=f:hbase,
    timestamp=1465547138533, value=
    \x00\x00\x00\x02 column=f:tom,
    timestamp=1465980236882, value=
    \x00\x00\x00\x03 column=f:\x00,
    timestamp=1465500156667, value=testtesttest
    \x00\x00\x00\x03 column=f:@hadoop,
    timestamp=1465980236967, value=
    \x00\x00\x00\x03 column=f:hadoop,
    timestamp=1465547304610, value=
    \x00\x00\x00\x03 column=f:hive,
    timestamp=1465501322616, value=
    \x00\x00\x00\x04 column=f:\x00,
    timestamp=1465570719901, value=confidential
    \x00\x00\x00\x05 column=f:\x00,
    timestamp=1465835047835, value=branch
    \x00\x00\x00\x05 column=f:hdfs,
    timestamp=1465980237060, value=
    \x00\x00\x00\x06 column=f:\x00,
    timestamp=1465980447307, value=group
    \x00\x00\x00\x06 column=f:hdfs,
    timestamp=1465980454130, value=
    6 row(s) in 0.7370 seconds

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds
    Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads

    -----Original Message-----
    From: Anoop John
    Sent: 08 June 2016 11:58
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Thanks Ram.. Ya that seems the best way as CellCreator is public
    exposed class. May be we should explain abt this in hbase book under
    the Visibility labels area. Good to know you have Visibility labels
    based usecase. Let us know in case of any trouble. Thanks.

    -Anoop-

    On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan <
    ramkrishna.s.vasudevan@gmail.com> wrote:
    Hi

    It can be done. See the class CellCreator which is Public facing
    interface.
    When you create your spark job to create the hadoop files that
    produces the
    HFileOutputformat2 data. While creating the KeyValues you can use
    the CellCreator to create your KeyValues and use the
    CellCreator.getVisibilityExpressionResolver() to map your String
    Visibility tags with the system generated ordinals.

    For eg, you can see how TextSortReducer works. I think this should
    help you solve your problem. Let us know if you need further
    information.
    Regards
    Ram

    On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Hi Ram,

    We're attempting to do it programmatically so:

    The HFile is created by a Spark job using saveAsNewAPIHadoopFile,
    and using ImmutableBytesWritable as the key (rowkey) with KeyValue
    as the value, and using the HFilOutputFormat2 format.
    This HFile is then loaded using HBase client's
    LoadIncrementalHFiles.doBulkLoad

    Is there a way to do this programmatically without using the
    ImportTsv tool? I was taking a look at
    VisibilityUtils.createVisibilityExpTags and maybe being able to
    just create the Tags myself that way (although it's obviously
    @InterfaceAudience.Private) but it seems to be able to use that
    I'd
    need to know Label ordinality client side..
    Thanks for your help,

    Tom

    -----Original Message-----
    From: ramkrishna vasudevan

    Sent: 07 June 2016 11:19
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the
    ImportTsv tool? If you are using the ImportTsv tool then yes
    there is a way to specify visibility tags while loading from the
    ImportTsv tool and those visibility tags are also bulk loaded as
    HFile.
    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be used
    to indicate that the data will have Visibility Tags and the tool
    will automatically parse the specified field as Visibility Tag.

    In case you have access to the code you can see the test case
    TestImportTSVWithVisibilityLabels to get an initial idea of how it
    is being done. If not get back to us, happy to help .

    Regards
    Ram



    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT)
    wrote:
    Hi,

    I was wondering if it's possible/how to write Visibility Labels
    to an HFileOutputFormat2? I believe Visibility Labels are just
    implemented as Tags, but with the normal way of writing them with
    Mutation#setCellVisibility these are formally written as Tags to
    the cells during the VisibilityController coprocessor as we need
    to assert the expression is valid for the labels configured.

    How can we add visibility labels to cells if we have a job that
    creates an HFile with HFileOutputFormat2 which is then
    subsequently loaded using LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING ________________________________

    E:
    tom.ellis@lloydsbanking.com >>>>> Website:
    www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.co
    m/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute, disclose
    or use any of the information in it or any attachments. Telephone
    calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1
    1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland
    no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and delete
    it (including any
    attachments) immediately. You must not copy, distribute, disclose
    or use any of the information in it or any attachments. Telephone
    calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this
    e-mail in error, please notify the sender and delete it (including
    any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1
    1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V
    7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC327000. Telephone: 03457 801 801.
    Cheltenham & Gloucester plc. Registered Office: Barnett Way,
    Gloucester GL4 3RL. Registered in England and Wales 2299428.
    Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.
  • Ellis, Tom (Financial Markets IT) at Jun 15, 2016 at 4:08 pm
    I see now from some other examples I've found that actually this form of using HFileOutputFormat2 to write Puts will use the PutSortReducer if you set the map output class of the job you give it to Put. Looking at the source for PutSourceReducer it seems that it will actually lose the Cell Visibility information as it uses the getFamilyCellMap to create KeyValue objects and just uses that, and the CellVisibility is actually on the Put Mutation.

    So I think that unfortunately, I can only really work around this by giving the application user writing the HFile admin access so it can then use the VisibilityExpressionResolver to create cells with tags with the correct ordinals.

    Am I missing something? Why is it that a client user without admin/super user privileges can set a visibility expression using Put.setCellVisibility, but if we want to write using HFiles, the client user has to have admin/super user privileges so they can use VisibilityExpressionResolver to correctly create the tags on the Cell with correct ordinals?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: Ellis, Tom (Financial Markets IT)
    Sent: 15 June 2016 16:25
    To: user@hbase.apache.org
    Subject: RE: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    So I have a working prototype using just bulk puts on a table and using setCellVisibility as necessary. Now I'm trying to do it using HFile.

    Sorry Ram, I don't quite follow why the user doing the writing of the HFile has to be an admin/super user? Is that necessary to load HFiles?

    The use case is to hopefully have an application user (non admin) performing the writes to an hbase table via a bulk load of an hfile, setting visibility labels on individual cells as necessary. Then business users who has been given the auth to view that label can see those cells, and others not.

    I've seen that it's possible to do this with map reduce & setting the map output to be a Put (and thus could setCellVisibility on the puts), but I'm struggling to do this with Spark, as I keep getting the exception that I can't cast a Put to a Cell.

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 12:31
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual put/cell, but will this create additional overhead?
    This can be done. If you want different cells in the same row to have different labels then it is better to create those many puts and setCellVisibility on each of them. What type of overhead you see here? In terms of the server processing them? If so there should not be much overhead here and also adding different cells to every column inturn means you need every cell to be treated differenly in terms of security. so should be fine IMHO.

    Without doing put.setCellvisibility() there is no other way I believe. One question regarding your use case Now in the mail you had told about the spark job where you will create a bulk loaded file. Now if that is to have all the visibility related information of all the cells then the user doing this job should be an admin or super user right Why is the case that a normal client user will read through all the visibility cells which may or may not be associated with that user?

    Thank you very much for testing and using this feature. LEt us know your feedback and if you find any gaps here. Happy to help.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) wrote:

    Hmm, is there no other way to set labels on individual cells where we
    don't have to give the client users system perms? For instance, client
    users can set the cell visibility on the entire put without having
    this (i.e. put.setCellVisibility("label")) and the
    VisibilityController will check this.

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual
    put/cell, but will this create additional overhead?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 11:24
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    The visibility expression resolver tries to scan the labels table and
    the user using the resolver should have the SYSTEM privileges. Since
    the information that is getting accessed is sensitive information.

    Suppose in your above case you have the client user added as a an
    admin then when you scan the label table you should be able to scan it.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Yeah, thanks for this Ram. Although in my testing I have found that
    a client user attempting to use the visibility expression resolver
    doesn't seem to have the ability to scan the hbase:labels table for
    the full list of labels and thus can't get the ordinals/tags to add
    to the cell. Does the client user attempting to use the
    VisibilityExpressionResolver have to have some special permissions?

    Scan of hbase:labels by client user:

    hbase(main):003:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    1 row(s) in 0.0650 seconds

    Scan of hbase:labels by hbase user:

    hbase(main):001:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    \x00\x00\x00\x02 column=f:\x00,
    timestamp=1465216944935, value=protected
    \x00\x00\x00\x02 column=f:hbase,
    timestamp=1465547138533, value=
    \x00\x00\x00\x02 column=f:tom,
    timestamp=1465980236882, value=
    \x00\x00\x00\x03 column=f:\x00,
    timestamp=1465500156667, value=testtesttest
    \x00\x00\x00\x03 column=f:@hadoop,
    timestamp=1465980236967, value=
    \x00\x00\x00\x03 column=f:hadoop,
    timestamp=1465547304610, value=
    \x00\x00\x00\x03 column=f:hive,
    timestamp=1465501322616, value=
    \x00\x00\x00\x04 column=f:\x00,
    timestamp=1465570719901, value=confidential
    \x00\x00\x00\x05 column=f:\x00,
    timestamp=1465835047835, value=branch
    \x00\x00\x00\x05 column=f:hdfs,
    timestamp=1465980237060, value=
    \x00\x00\x00\x06 column=f:\x00,
    timestamp=1465980447307, value=group
    \x00\x00\x00\x06 column=f:hdfs,
    timestamp=1465980454130, value=
    6 row(s) in 0.7370 seconds

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds
    Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads

    -----Original Message-----
    From: Anoop John
    Sent: 08 June 2016 11:58
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Thanks Ram.. Ya that seems the best way as CellCreator is public
    exposed class. May be we should explain abt this in hbase book under
    the Visibility labels area. Good to know you have Visibility labels
    based usecase. Let us know in case of any trouble. Thanks.

    -Anoop-

    On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan <
    ramkrishna.s.vasudevan@gmail.com> wrote:
    Hi

    It can be done. See the class CellCreator which is Public facing
    interface.
    When you create your spark job to create the hadoop files that
    produces the
    HFileOutputformat2 data. While creating the KeyValues you can use
    the CellCreator to create your KeyValues and use the
    CellCreator.getVisibilityExpressionResolver() to map your String
    Visibility tags with the system generated ordinals.

    For eg, you can see how TextSortReducer works. I think this
    should help you solve your problem. Let us know if you need
    further
    information.
    Regards
    Ram

    On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT)
    wrote:
    Hi Ram,

    We're attempting to do it programmatically so:

    The HFile is created by a Spark job using saveAsNewAPIHadoopFile,
    and using ImmutableBytesWritable as the key (rowkey) with
    KeyValue as the value, and using the HFilOutputFormat2 format.
    This HFile is then loaded using HBase client's
    LoadIncrementalHFiles.doBulkLoad

    Is there a way to do this programmatically without using the
    ImportTsv tool? I was taking a look at
    VisibilityUtils.createVisibilityExpTags and maybe being able to
    just create the Tags myself that way (although it's obviously
    @InterfaceAudience.Private) but it seems to be able to use that
    I'd
    need to know Label ordinality client side..
    Thanks for your help,

    Tom

    -----Original Message-----
    From: ramkrishna vasudevan

    Sent: 07 June 2016 11:19
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the
    ImportTsv tool? If you are using the ImportTsv tool then yes
    there is a way to specify visibility tags while loading from the
    ImportTsv tool and those visibility tags are also bulk loaded as
    HFile.
    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be
    used to indicate that the data will have Visibility Tags and the
    tool will automatically parse the specified field as Visibility Tag.

    In case you have access to the code you can see the test case
    TestImportTSVWithVisibilityLabels to get an initial idea of how
    it is being done. If not get back to us, happy to help .

    Regards
    Ram



    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT)
    wrote:
    Hi,

    I was wondering if it's possible/how to write Visibility Labels
    to an HFileOutputFormat2? I believe Visibility Labels are just
    implemented as Tags, but with the normal way of writing them
    with Mutation#setCellVisibility these are formally written as
    Tags to the cells during the VisibilityController coprocessor
    as we need to assert the expression is valid for the labels configured.

    How can we add visibility labels to cells if we have a job that
    creates an HFile with HFileOutputFormat2 which is then
    subsequently loaded using LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian Data Lake | Financial Markets
    IT LLOYDS BANK COMMERCIAL BANKING
    ________________________________

    E:
    tom.ellis@lloydsbanking.com > >> > Website:
    www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.co
    m/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the
    low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound,
    Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute,
    disclose or use any of the information in it or any
    attachments. Telephone calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1
    1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland
    no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute, disclose
    or use any of the information in it or any attachments. Telephone
    calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this
    e-mail in error, please notify the sender and delete it (including
    any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.


    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.
  • Ellis, Tom (Financial Markets IT) at Jun 15, 2016 at 4:43 pm
    Looking at the source for how DefaultCellLabelServiceImpl checks authorisation I noted it's just that the user just needs to have the 'system' label auth privileges - not admin/super user as I thought you meant Ram. So technically, I could have a client user that is given the system label privileges, but only read access to the 'hbase:labels' table?

    Then that user will still be able to scan and read the labels + ordinal, and create the tags correctly :) I'll give it a go..

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: Ellis, Tom (Financial Markets IT)
    Sent: 15 June 2016 16:56
    To: user@hbase.apache.org
    Subject: RE: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    I see now from some other examples I've found that actually this form of using HFileOutputFormat2 to write Puts will use the PutSortReducer if you set the map output class of the job you give it to Put. Looking at the source for PutSourceReducer it seems that it will actually lose the Cell Visibility information as it uses the getFamilyCellMap to create KeyValue objects and just uses that, and the CellVisibility is actually on the Put Mutation.

    So I think that unfortunately, I can only really work around this by giving the application user writing the HFile admin access so it can then use the VisibilityExpressionResolver to create cells with tags with the correct ordinals.

    Am I missing something? Why is it that a client user without admin/super user privileges can set a visibility expression using Put.setCellVisibility, but if we want to write using HFiles, the client user has to have admin/super user privileges so they can use VisibilityExpressionResolver to correctly create the tags on the Cell with correct ordinals?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: Ellis, Tom (Financial Markets IT)
    Sent: 15 June 2016 16:25
    To: user@hbase.apache.org
    Subject: RE: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    So I have a working prototype using just bulk puts on a table and using setCellVisibility as necessary. Now I'm trying to do it using HFile.

    Sorry Ram, I don't quite follow why the user doing the writing of the HFile has to be an admin/super user? Is that necessary to load HFiles?

    The use case is to hopefully have an application user (non admin) performing the writes to an hbase table via a bulk load of an hfile, setting visibility labels on individual cells as necessary. Then business users who has been given the auth to view that label can see those cells, and others not.

    I've seen that it's possible to do this with map reduce & setting the map output to be a Put (and thus could setCellVisibility on the puts), but I'm struggling to do this with Spark, as I keep getting the exception that I can't cast a Put to a Cell.

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 12:31
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual put/cell, but will this create additional overhead?
    This can be done. If you want different cells in the same row to have different labels then it is better to create those many puts and setCellVisibility on each of them. What type of overhead you see here? In terms of the server processing them? If so there should not be much overhead here and also adding different cells to every column inturn means you need every cell to be treated differenly in terms of security. so should be fine IMHO.

    Without doing put.setCellvisibility() there is no other way I believe. One question regarding your use case Now in the mail you had told about the spark job where you will create a bulk loaded file. Now if that is to have all the visibility related information of all the cells then the user doing this job should be an admin or super user right Why is the case that a normal client user will read through all the visibility cells which may or may not be associated with that user?

    Thank you very much for testing and using this feature. LEt us know your feedback and if you find any gaps here. Happy to help.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) wrote:

    Hmm, is there no other way to set labels on individual cells where we
    don't have to give the client users system perms? For instance, client
    users can set the cell visibility on the entire put without having
    this (i.e. put.setCellVisibility("label")) and the
    VisibilityController will check this.

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual
    put/cell, but will this create additional overhead?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 11:24
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    The visibility expression resolver tries to scan the labels table and
    the user using the resolver should have the SYSTEM privileges. Since
    the information that is getting accessed is sensitive information.

    Suppose in your above case you have the client user added as a an
    admin then when you scan the label table you should be able to scan it.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Yeah, thanks for this Ram. Although in my testing I have found that
    a client user attempting to use the visibility expression resolver
    doesn't seem to have the ability to scan the hbase:labels table for
    the full list of labels and thus can't get the ordinals/tags to add
    to the cell. Does the client user attempting to use the
    VisibilityExpressionResolver have to have some special permissions?

    Scan of hbase:labels by client user:

    hbase(main):003:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    1 row(s) in 0.0650 seconds

    Scan of hbase:labels by hbase user:

    hbase(main):001:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    \x00\x00\x00\x02 column=f:\x00,
    timestamp=1465216944935, value=protected
    \x00\x00\x00\x02 column=f:hbase,
    timestamp=1465547138533, value=
    \x00\x00\x00\x02 column=f:tom,
    timestamp=1465980236882, value=
    \x00\x00\x00\x03 column=f:\x00,
    timestamp=1465500156667, value=testtesttest
    \x00\x00\x00\x03 column=f:@hadoop,
    timestamp=1465980236967, value=
    \x00\x00\x00\x03 column=f:hadoop,
    timestamp=1465547304610, value=
    \x00\x00\x00\x03 column=f:hive,
    timestamp=1465501322616, value=
    \x00\x00\x00\x04 column=f:\x00,
    timestamp=1465570719901, value=confidential
    \x00\x00\x00\x05 column=f:\x00,
    timestamp=1465835047835, value=branch
    \x00\x00\x00\x05 column=f:hdfs,
    timestamp=1465980237060, value=
    \x00\x00\x00\x06 column=f:\x00,
    timestamp=1465980447307, value=group
    \x00\x00\x00\x06 column=f:hdfs,
    timestamp=1465980454130, value=
    6 row(s) in 0.7370 seconds

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds
    Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads

    -----Original Message-----
    From: Anoop John
    Sent: 08 June 2016 11:58
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Thanks Ram.. Ya that seems the best way as CellCreator is public
    exposed class. May be we should explain abt this in hbase book under
    the Visibility labels area. Good to know you have Visibility labels
    based usecase. Let us know in case of any trouble. Thanks.

    -Anoop-

    On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan <
    ramkrishna.s.vasudevan@gmail.com> wrote:
    Hi

    It can be done. See the class CellCreator which is Public facing
    interface.
    When you create your spark job to create the hadoop files that
    produces the
    HFileOutputformat2 data. While creating the KeyValues you can use
    the CellCreator to create your KeyValues and use the
    CellCreator.getVisibilityExpressionResolver() to map your String
    Visibility tags with the system generated ordinals.

    For eg, you can see how TextSortReducer works. I think this
    should help you solve your problem. Let us know if you need
    further
    information.
    Regards
    Ram

    On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT)
    wrote:
    Hi Ram,

    We're attempting to do it programmatically so:

    The HFile is created by a Spark job using saveAsNewAPIHadoopFile,
    and using ImmutableBytesWritable as the key (rowkey) with
    KeyValue as the value, and using the HFilOutputFormat2 format.
    This HFile is then loaded using HBase client's
    LoadIncrementalHFiles.doBulkLoad

    Is there a way to do this programmatically without using the
    ImportTsv tool? I was taking a look at
    VisibilityUtils.createVisibilityExpTags and maybe being able to
    just create the Tags myself that way (although it's obviously
    @InterfaceAudience.Private) but it seems to be able to use that
    I'd
    need to know Label ordinality client side..
    Thanks for your help,

    Tom

    -----Original Message-----
    From: ramkrishna vasudevan

    Sent: 07 June 2016 11:19
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the
    ImportTsv tool? If you are using the ImportTsv tool then yes
    there is a way to specify visibility tags while loading from the
    ImportTsv tool and those visibility tags are also bulk loaded as
    HFile.
    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be
    used to indicate that the data will have Visibility Tags and the
    tool will automatically parse the specified field as Visibility Tag.

    In case you have access to the code you can see the test case
    TestImportTSVWithVisibilityLabels to get an initial idea of how
    it is being done. If not get back to us, happy to help .

    Regards
    Ram



    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT)
    wrote:
    Hi,

    I was wondering if it's possible/how to write Visibility Labels
    to an HFileOutputFormat2? I believe Visibility Labels are just
    implemented as Tags, but with the normal way of writing them
    with Mutation#setCellVisibility these are formally written as
    Tags to the cells during the VisibilityController coprocessor
    as we need to assert the expression is valid for the labels configured.

    How can we add visibility labels to cells if we have a job that
    creates an HFile with HFileOutputFormat2 which is then
    subsequently loaded using LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian Data Lake | Financial Markets
    IT LLOYDS BANK COMMERCIAL BANKING
    ________________________________

    E:
    tom.ellis@lloydsbanking.com > >> > Website:
    www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.co
    m/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the
    low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound,
    Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute,
    disclose or use any of the information in it or any
    attachments. Telephone calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1
    1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland
    no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute, disclose
    or use any of the information in it or any attachments. Telephone
    calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this
    e-mail in error, please notify the sender and delete it (including
    any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.


    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.


    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.
  • Ellis, Tom (Financial Markets IT) at Jun 15, 2016 at 5:59 pm
    So, I can see that I can correctly get the List<Tag>s from the VisibilityExpressionResolver, set them on the Cell, and write them using HFileOutputFormat2, however when I scan using an unprivileged user I can still see the cells. If I write the cells with setCellVisibility the unprivileged user can't see them.

    Then I noticed the fix for HBASE-15707. I am using the Hortonworks' HBase 1.1.2 - am affected by this/does HFileOutputFormat2 support tags before this fix?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: Ellis, Tom (Financial Markets IT)
    Sent: 15 June 2016 17:42
    To: user@hbase.apache.org
    Subject: RE: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Looking at the source for how DefaultCellLabelServiceImpl checks authorisation I noted it's just that the user just needs to have the 'system' label auth privileges - not admin/super user as I thought you meant Ram. So technically, I could have a client user that is given the system label privileges, but only read access to the 'hbase:labels' table?

    Then that user will still be able to scan and read the labels + ordinal, and create the tags correctly :) I'll give it a go..

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: Ellis, Tom (Financial Markets IT)
    Sent: 15 June 2016 16:56
    To: user@hbase.apache.org
    Subject: RE: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    I see now from some other examples I've found that actually this form of using HFileOutputFormat2 to write Puts will use the PutSortReducer if you set the map output class of the job you give it to Put. Looking at the source for PutSourceReducer it seems that it will actually lose the Cell Visibility information as it uses the getFamilyCellMap to create KeyValue objects and just uses that, and the CellVisibility is actually on the Put Mutation.

    So I think that unfortunately, I can only really work around this by giving the application user writing the HFile admin access so it can then use the VisibilityExpressionResolver to create cells with tags with the correct ordinals.

    Am I missing something? Why is it that a client user without admin/super user privileges can set a visibility expression using Put.setCellVisibility, but if we want to write using HFiles, the client user has to have admin/super user privileges so they can use VisibilityExpressionResolver to correctly create the tags on the Cell with correct ordinals?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: Ellis, Tom (Financial Markets IT)
    Sent: 15 June 2016 16:25
    To: user@hbase.apache.org
    Subject: RE: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    So I have a working prototype using just bulk puts on a table and using setCellVisibility as necessary. Now I'm trying to do it using HFile.

    Sorry Ram, I don't quite follow why the user doing the writing of the HFile has to be an admin/super user? Is that necessary to load HFiles?

    The use case is to hopefully have an application user (non admin) performing the writes to an hbase table via a bulk load of an hfile, setting visibility labels on individual cells as necessary. Then business users who has been given the auth to view that label can see those cells, and others not.

    I've seen that it's possible to do this with map reduce & setting the map output to be a Put (and thus could setCellVisibility on the puts), but I'm struggling to do this with Spark, as I keep getting the exception that I can't cast a Put to a Cell.

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 12:31
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual put/cell, but will this create additional overhead?
    This can be done. If you want different cells in the same row to have different labels then it is better to create those many puts and setCellVisibility on each of them. What type of overhead you see here? In terms of the server processing them? If so there should not be much overhead here and also adding different cells to every column inturn means you need every cell to be treated differenly in terms of security. so should be fine IMHO.

    Without doing put.setCellvisibility() there is no other way I believe. One question regarding your use case Now in the mail you had told about the spark job where you will create a bulk loaded file. Now if that is to have all the visibility related information of all the cells then the user doing this job should be an admin or super user right Why is the case that a normal client user will read through all the visibility cells which may or may not be associated with that user?

    Thank you very much for testing and using this feature. LEt us know your feedback and if you find any gaps here. Happy to help.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) wrote:

    Hmm, is there no other way to set labels on individual cells where we
    don't have to give the client users system perms? For instance, client
    users can set the cell visibility on the entire put without having
    this (i.e. put.setCellVisibility("label")) and the
    VisibilityController will check this.

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual
    put/cell, but will this create additional overhead?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 11:24
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    The visibility expression resolver tries to scan the labels table and
    the user using the resolver should have the SYSTEM privileges. Since
    the information that is getting accessed is sensitive information.

    Suppose in your above case you have the client user added as a an
    admin then when you scan the label table you should be able to scan it.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Yeah, thanks for this Ram. Although in my testing I have found that
    a client user attempting to use the visibility expression resolver
    doesn't seem to have the ability to scan the hbase:labels table for
    the full list of labels and thus can't get the ordinals/tags to add
    to the cell. Does the client user attempting to use the
    VisibilityExpressionResolver have to have some special permissions?

    Scan of hbase:labels by client user:

    hbase(main):003:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    1 row(s) in 0.0650 seconds

    Scan of hbase:labels by hbase user:

    hbase(main):001:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    \x00\x00\x00\x02 column=f:\x00,
    timestamp=1465216944935, value=protected
    \x00\x00\x00\x02 column=f:hbase,
    timestamp=1465547138533, value=
    \x00\x00\x00\x02 column=f:tom,
    timestamp=1465980236882, value=
    \x00\x00\x00\x03 column=f:\x00,
    timestamp=1465500156667, value=testtesttest
    \x00\x00\x00\x03 column=f:@hadoop,
    timestamp=1465980236967, value=
    \x00\x00\x00\x03 column=f:hadoop,
    timestamp=1465547304610, value=
    \x00\x00\x00\x03 column=f:hive,
    timestamp=1465501322616, value=
    \x00\x00\x00\x04 column=f:\x00,
    timestamp=1465570719901, value=confidential
    \x00\x00\x00\x05 column=f:\x00,
    timestamp=1465835047835, value=branch
    \x00\x00\x00\x05 column=f:hdfs,
    timestamp=1465980237060, value=
    \x00\x00\x00\x06 column=f:\x00,
    timestamp=1465980447307, value=group
    \x00\x00\x00\x06 column=f:hdfs,
    timestamp=1465980454130, value=
    6 row(s) in 0.7370 seconds

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds
    Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads

    -----Original Message-----
    From: Anoop John
    Sent: 08 June 2016 11:58
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Thanks Ram.. Ya that seems the best way as CellCreator is public
    exposed class. May be we should explain abt this in hbase book under
    the Visibility labels area. Good to know you have Visibility labels
    based usecase. Let us know in case of any trouble. Thanks.

    -Anoop-

    On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan <
    ramkrishna.s.vasudevan@gmail.com> wrote:
    Hi

    It can be done. See the class CellCreator which is Public facing
    interface.
    When you create your spark job to create the hadoop files that
    produces the
    HFileOutputformat2 data. While creating the KeyValues you can use
    the CellCreator to create your KeyValues and use the
    CellCreator.getVisibilityExpressionResolver() to map your String
    Visibility tags with the system generated ordinals.

    For eg, you can see how TextSortReducer works. I think this
    should help you solve your problem. Let us know if you need
    further
    information.
    Regards
    Ram

    On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT)
    wrote:
    Hi Ram,

    We're attempting to do it programmatically so:

    The HFile is created by a Spark job using saveAsNewAPIHadoopFile,
    and using ImmutableBytesWritable as the key (rowkey) with
    KeyValue as the value, and using the HFilOutputFormat2 format.
    This HFile is then loaded using HBase client's
    LoadIncrementalHFiles.doBulkLoad

    Is there a way to do this programmatically without using the
    ImportTsv tool? I was taking a look at
    VisibilityUtils.createVisibilityExpTags and maybe being able to
    just create the Tags myself that way (although it's obviously
    @InterfaceAudience.Private) but it seems to be able to use that
    I'd
    need to know Label ordinality client side..
    Thanks for your help,

    Tom

    -----Original Message-----
    From: ramkrishna vasudevan

    Sent: 07 June 2016 11:19
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the
    ImportTsv tool? If you are using the ImportTsv tool then yes
    there is a way to specify visibility tags while loading from the
    ImportTsv tool and those visibility tags are also bulk loaded as
    HFile.
    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be
    used to indicate that the data will have Visibility Tags and the
    tool will automatically parse the specified field as Visibility Tag.

    In case you have access to the code you can see the test case
    TestImportTSVWithVisibilityLabels to get an initial idea of how
    it is being done. If not get back to us, happy to help .

    Regards
    Ram



    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT)
    wrote:
    Hi,

    I was wondering if it's possible/how to write Visibility Labels
    to an HFileOutputFormat2? I believe Visibility Labels are just
    implemented as Tags, but with the normal way of writing them
    with Mutation#setCellVisibility these are formally written as
    Tags to the cells during the VisibilityController coprocessor
    as we need to assert the expression is valid for the labels configured.

    How can we add visibility labels to cells if we have a job that
    creates an HFile with HFileOutputFormat2 which is then
    subsequently loaded using LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian Data Lake | Financial Markets
    IT LLOYDS BANK COMMERCIAL BANKING
    ________________________________

    E:
    tom.ellis@lloydsbanking.com > >> > Website:
    www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.co
    m/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the
    low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound,
    Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute,
    disclose or use any of the information in it or any
    attachments. Telephone calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1
    1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland
    no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute, disclose
    or use any of the information in it or any attachments. Telephone
    calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this
    e-mail in error, please notify the sender and delete it (including
    any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.


    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.


    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.


    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.
  • Ramkrishna vasudevan at Jun 16, 2016 at 5:01 am
    Thanks for the updates here. Going through the mails here
    Why is it that a client user without admin/super user privileges can set
    a visibility expression using Put.setCellVisibility, but if we want to
    write using HFiles,

    I get your point now. There is a property
    '"hbase.security.visibility.mutations.checkauths" if set will check if the
    user is authorized to mutate the visibility labels that he is trying to
    write. If the user is not allowed to add that label the mutation will fail.
    Can you see if this solves the other problem of allowing any client user to
    write? If the above is not well documented pls feel free to raise a JIRA
    and we are happy to address it.

    Coming to reading the HFile and creating a bulk load, I think we should be
    more cautious here. There are some critical info stored in the HFile and
    just allowing any user to read it is going to be risky.

    Coming to the PutSortReducer problem, I think what you say is true. Not
    sure if there is a bug already, if not pls feel free to raise a bug here.
    We need to fix it.

      HBASE-15707 - you may need this because for scala's HBasecontext you need
    to ensure tags are included just incase ImportTSV has to be used.

    Write back, if I had missed something or if my info was lacking. Its been
    quite sometime we had worked in this area so have to see code every time to
    know what was done.

    Regards
    Ram
    On Wed, Jun 15, 2016 at 11:29 PM, Ellis, Tom (Financial Markets IT) wrote:

    So, I can see that I can correctly get the List<Tag>s from the
    VisibilityExpressionResolver, set them on the Cell, and write them using
    HFileOutputFormat2, however when I scan using an unprivileged user I can
    still see the cells. If I write the cells with setCellVisibility the
    unprivileged user can't see them.

    Then I noticed the fix for HBASE-15707. I am using the Hortonworks' HBase
    1.1.2 - am affected by this/does HFileOutputFormat2 support tags before
    this fix?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon
    economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: Ellis, Tom (Financial Markets IT) [mailto:
    tom.ellis@lloydsbanking.com.invalid]
    Sent: 15 June 2016 17:42
    To: user@hbase.apache.org
    Subject: RE: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Looking at the source for how DefaultCellLabelServiceImpl checks
    authorisation I noted it's just that the user just needs to have the
    'system' label auth privileges - not admin/super user as I thought you
    meant Ram. So technically, I could have a client user that is given the
    system label privileges, but only read access to the 'hbase:labels' table?

    Then that user will still be able to scan and read the labels + ordinal,
    and create the tags correctly :) I'll give it a go..

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon
    economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: Ellis, Tom (Financial Markets IT) [mailto:
    tom.ellis@lloydsbanking.com.invalid]
    Sent: 15 June 2016 16:56
    To: user@hbase.apache.org
    Subject: RE: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    I see now from some other examples I've found that actually this form of
    using HFileOutputFormat2 to write Puts will use the PutSortReducer if you
    set the map output class of the job you give it to Put. Looking at the
    source for PutSourceReducer it seems that it will actually lose the Cell
    Visibility information as it uses the getFamilyCellMap to create KeyValue
    objects and just uses that, and the CellVisibility is actually on the Put
    Mutation.

    So I think that unfortunately, I can only really work around this by
    giving the application user writing the HFile admin access so it can then
    use the VisibilityExpressionResolver to create cells with tags with the
    correct ordinals.

    Am I missing something? Why is it that a client user without admin/super
    user privileges can set a visibility expression using
    Put.setCellVisibility, but if we want to write using HFiles, the client
    user has to have admin/super user privileges so they can use
    VisibilityExpressionResolver to correctly create the tags on the Cell with
    correct ordinals?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon
    economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: Ellis, Tom (Financial Markets IT) [mailto:
    tom.ellis@lloydsbanking.com.invalid]
    Sent: 15 June 2016 16:25
    To: user@hbase.apache.org
    Subject: RE: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    So I have a working prototype using just bulk puts on a table and using
    setCellVisibility as necessary. Now I'm trying to do it using HFile.

    Sorry Ram, I don't quite follow why the user doing the writing of the
    HFile has to be an admin/super user? Is that necessary to load HFiles?

    The use case is to hopefully have an application user (non admin)
    performing the writes to an hbase table via a bulk load of an hfile,
    setting visibility labels on individual cells as necessary. Then business
    users who has been given the auth to view that label can see those cells,
    and others not.

    I've seen that it's possible to do this with map reduce & setting the map
    output to be a Put (and thus could setCellVisibility on the puts), but I'm
    struggling to do this with Spark, as I keep getting the exception that I
    can't cast a Put to a Cell.

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low carbon
    economy.
    Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 12:31
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual
    put/cell, but will this create additional overhead?
    This can be done. If you want different cells in the same row to have
    different labels then it is better to create those many puts and
    setCellVisibility on each of them. What type of overhead you see here? In
    terms of the server processing them? If so there should not be much
    overhead here and also adding different cells to every column inturn means
    you need every cell to be treated differenly in terms of security. so
    should be fine IMHO.

    Without doing put.setCellvisibility() there is no other way I believe. One
    question regarding your use case Now in the mail you had told about the
    spark job where you will create a bulk loaded file. Now if that is to have
    all the visibility related information of all the cells then the user doing
    this job should be an admin or super user right Why is the case that a
    normal client user will read through all the visibility cells which may or
    may not be associated with that user?

    Thank you very much for testing and using this feature. LEt us know your
    feedback and if you find any gaps here. Happy to help.

    Regards
    Ram


    On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Hmm, is there no other way to set labels on individual cells where we
    don't have to give the client users system perms? For instance, client
    users can set the cell visibility on the entire put without having
    this (i.e. put.setCellVisibility("label")) and the
    VisibilityController will check this.

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual
    put/cell, but will this create additional overhead?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 11:24
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    The visibility expression resolver tries to scan the labels table and
    the user using the resolver should have the SYSTEM privileges. Since
    the information that is getting accessed is sensitive information.

    Suppose in your above case you have the client user added as a an
    admin then when you scan the label table you should be able to scan it.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Yeah, thanks for this Ram. Although in my testing I have found that
    a client user attempting to use the visibility expression resolver
    doesn't seem to have the ability to scan the hbase:labels table for
    the full list of labels and thus can't get the ordinals/tags to add
    to the cell. Does the client user attempting to use the
    VisibilityExpressionResolver have to have some special permissions?

    Scan of hbase:labels by client user:

    hbase(main):003:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    1 row(s) in 0.0650 seconds

    Scan of hbase:labels by hbase user:

    hbase(main):001:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    \x00\x00\x00\x02 column=f:\x00,
    timestamp=1465216944935, value=protected
    \x00\x00\x00\x02 column=f:hbase,
    timestamp=1465547138533, value=
    \x00\x00\x00\x02 column=f:tom,
    timestamp=1465980236882, value=
    \x00\x00\x00\x03 column=f:\x00,
    timestamp=1465500156667, value=testtesttest
    \x00\x00\x00\x03 column=f:@hadoop,
    timestamp=1465980236967, value=
    \x00\x00\x00\x03 column=f:hadoop,
    timestamp=1465547304610, value=
    \x00\x00\x00\x03 column=f:hive,
    timestamp=1465501322616, value=
    \x00\x00\x00\x04 column=f:\x00,
    timestamp=1465570719901, value=confidential
    \x00\x00\x00\x05 column=f:\x00,
    timestamp=1465835047835, value=branch
    \x00\x00\x00\x05 column=f:hdfs,
    timestamp=1465980237060, value=
    \x00\x00\x00\x06 column=f:\x00,
    timestamp=1465980447307, value=group
    \x00\x00\x00\x06 column=f:hdfs,
    timestamp=1465980454130, value=
    6 row(s) in 0.7370 seconds

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds
    Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads

    -----Original Message-----
    From: Anoop John
    Sent: 08 June 2016 11:58
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Thanks Ram.. Ya that seems the best way as CellCreator is public
    exposed class. May be we should explain abt this in hbase book under
    the Visibility labels area. Good to know you have Visibility labels
    based usecase. Let us know in case of any trouble. Thanks.

    -Anoop-

    On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan <
    ramkrishna.s.vasudevan@gmail.com> wrote:
    Hi

    It can be done. See the class CellCreator which is Public facing
    interface.
    When you create your spark job to create the hadoop files that
    produces the
    HFileOutputformat2 data. While creating the KeyValues you can use
    the CellCreator to create your KeyValues and use the
    CellCreator.getVisibilityExpressionResolver() to map your String
    Visibility tags with the system generated ordinals.

    For eg, you can see how TextSortReducer works. I think this
    should help you solve your problem. Let us know if you need
    further
    information.
    Regards
    Ram

    On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT)
    wrote:
    Hi Ram,

    We're attempting to do it programmatically so:

    The HFile is created by a Spark job using saveAsNewAPIHadoopFile,
    and using ImmutableBytesWritable as the key (rowkey) with
    KeyValue as the value, and using the HFilOutputFormat2 format.
    This HFile is then loaded using HBase client's
    LoadIncrementalHFiles.doBulkLoad

    Is there a way to do this programmatically without using the
    ImportTsv tool? I was taking a look at
    VisibilityUtils.createVisibilityExpTags and maybe being able to
    just create the Tags myself that way (although it's obviously
    @InterfaceAudience.Private) but it seems to be able to use that
    I'd
    need to know Label ordinality client side..
    Thanks for your help,

    Tom

    -----Original Message-----
    From: ramkrishna vasudevan

    Sent: 07 June 2016 11:19
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the
    ImportTsv tool? If you are using the ImportTsv tool then yes
    there is a way to specify visibility tags while loading from the
    ImportTsv tool and those visibility tags are also bulk loaded as
    HFile.
    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be
    used to indicate that the data will have Visibility Tags and the
    tool will automatically parse the specified field as Visibility Tag.

    In case you have access to the code you can see the test case
    TestImportTSVWithVisibilityLabels to get an initial idea of how
    it is being done. If not get back to us, happy to help .

    Regards
    Ram



    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT)
    wrote:
    Hi,

    I was wondering if it's possible/how to write Visibility Labels
    to an HFileOutputFormat2? I believe Visibility Labels are just
    implemented as Tags, but with the normal way of writing them
    with Mutation#setCellVisibility these are formally written as
    Tags to the cells during the VisibilityController coprocessor
    as we need to assert the expression is valid for the labels
    configured.
    How can we add visibility labels to cells if we have a job that
    creates an HFile with HFileOutputFormat2 which is then
    subsequently loaded using LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian Data Lake | Financial Markets
    IT LLOYDS BANK COMMERCIAL BANKING
    ________________________________

    E:
    tom.ellis@lloydsbanking.com > > >> > Website:
    www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.co
    m/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the
    low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound,
    Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute,
    disclose or use any of the information in it or any
    attachments. Telephone calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1
    1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland
    no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute, disclose
    or use any of the information in it or any attachments. Telephone
    calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this
    e-mail in error, please notify the sender and delete it (including
    any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank
    plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in
    England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales
    2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority and
    Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial
    Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and
    may contain privileged material. If you have received this e-mail in error,
    please notify the sender and delete it (including any attachments)
    immediately. You must not copy, distribute, disclose or use any of the
    information in it or any attachments. Telephone calls may be monitored or
    recorded.


    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank
    plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in
    England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales
    2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority and
    Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial
    Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and
    may contain privileged material. If you have received this e-mail in error,
    please notify the sender and delete it (including any attachments)
    immediately. You must not copy, distribute, disclose or use any of the
    information in it or any attachments. Telephone calls may be monitored or
    recorded.


    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank
    plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in
    England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales
    2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority and
    Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial
    Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and
    may contain privileged material. If you have received this e-mail in error,
    please notify the sender and delete it (including any attachments)
    immediately. You must not copy, distribute, disclose or use any of the
    information in it or any attachments. Telephone calls may be monitored or
    recorded.


    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank
    plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in
    England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales
    2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority and
    Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial
    Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and
    may contain privileged material. If you have received this e-mail in error,
    please notify the sender and delete it (including any attachments)
    immediately. You must not copy, distribute, disclose or use any of the
    information in it or any attachments. Telephone calls may be monitored or
    recorded.
  • Ellis, Tom (Financial Markets IT) at Jun 16, 2016 at 3:42 pm
    Hi Again Ram,

    "hbase.security.visibility.mutations.checkauths" - for now the method of set_auths 'client','system' along with only giving 'client' read on 'hbase:labels' is working for me.

    "Coming to reading the HFile and creating a bulk load, I think we should be more cautious here " - I don't follow again sorry. The spark user writes the HFile, and then initiates the load with LoadIncrementalHFiles.doBulkLoad - so long as only the HBase user and the spark user can read/write to the file, I'm not sure what the risk is?

    HBASE-15707 - am I able to read the HFile manually to determine if Tags have been written properly?

    Cheers,

    Tom


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 16 June 2016 06:01
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Thanks for the updates here. Going through the mails here
    Why is it that a client user without admin/super user privileges can
    set
    a visibility expression using Put.setCellVisibility, but if we want to write using HFiles,

    I get your point now. There is a property '"hbase.security.visibility.mutations.checkauths" if set will check if the user is authorized to mutate the visibility labels that he is trying to write. If the user is not allowed to add that label the mutation will fail.
    Can you see if this solves the other problem of allowing any client user to write? If the above is not well documented pls feel free to raise a JIRA and we are happy to address it.

    Coming to reading the HFile and creating a bulk load, I think we should be more cautious here. There are some critical info stored in the HFile and just allowing any user to read it is going to be risky.

    Coming to the PutSortReducer problem, I think what you say is true. Not sure if there is a bug already, if not pls feel free to raise a bug here.
    We need to fix it.

      HBASE-15707 - you may need this because for scala's HBasecontext you need to ensure tags are included just incase ImportTSV has to be used.

    Write back, if I had missed something or if my info was lacking. Its been quite sometime we had worked in this area so have to see code every time to know what was done.

    Regards
    Ram
    On Wed, Jun 15, 2016 at 11:29 PM, Ellis, Tom (Financial Markets IT) wrote:

    So, I can see that I can correctly get the List<Tag>s from the
    VisibilityExpressionResolver, set them on the Cell, and write them
    using HFileOutputFormat2, however when I scan using an unprivileged
    user I can still see the cells. If I write the cells with
    setCellVisibility the unprivileged user can't see them.

    Then I noticed the fix for HBASE-15707. I am using the Hortonworks'
    HBase
    1.1.2 - am affected by this/does HFileOutputFormat2 support tags
    before this fix?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: Ellis, Tom (Financial Markets IT) [mailto:
    tom.ellis@lloydsbanking.com.invalid]
    Sent: 15 June 2016 17:42
    To: user@hbase.apache.org
    Subject: RE: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Looking at the source for how DefaultCellLabelServiceImpl checks
    authorisation I noted it's just that the user just needs to have the
    'system' label auth privileges - not admin/super user as I thought you
    meant Ram. So technically, I could have a client user that is given
    the system label privileges, but only read access to the 'hbase:labels' table?

    Then that user will still be able to scan and read the labels +
    ordinal, and create the tags correctly :) I'll give it a go..

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: Ellis, Tom (Financial Markets IT) [mailto:
    tom.ellis@lloydsbanking.com.invalid]
    Sent: 15 June 2016 16:56
    To: user@hbase.apache.org
    Subject: RE: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    I see now from some other examples I've found that actually this form
    of using HFileOutputFormat2 to write Puts will use the PutSortReducer
    if you set the map output class of the job you give it to Put. Looking
    at the source for PutSourceReducer it seems that it will actually lose
    the Cell Visibility information as it uses the getFamilyCellMap to
    create KeyValue objects and just uses that, and the CellVisibility is
    actually on the Put Mutation.

    So I think that unfortunately, I can only really work around this by
    giving the application user writing the HFile admin access so it can
    then use the VisibilityExpressionResolver to create cells with tags
    with the correct ordinals.

    Am I missing something? Why is it that a client user without
    admin/super user privileges can set a visibility expression using
    Put.setCellVisibility, but if we want to write using HFiles, the
    client user has to have admin/super user privileges so they can use
    VisibilityExpressionResolver to correctly create the tags on the Cell
    with correct ordinals?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: Ellis, Tom (Financial Markets IT) [mailto:
    tom.ellis@lloydsbanking.com.invalid]
    Sent: 15 June 2016 16:25
    To: user@hbase.apache.org
    Subject: RE: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    So I have a working prototype using just bulk puts on a table and
    using setCellVisibility as necessary. Now I'm trying to do it using HFile.

    Sorry Ram, I don't quite follow why the user doing the writing of the
    HFile has to be an admin/super user? Is that necessary to load HFiles?

    The use case is to hopefully have an application user (non admin)
    performing the writes to an hbase table via a bulk load of an hfile,
    setting visibility labels on individual cells as necessary. Then
    business users who has been given the auth to view that label can see
    those cells, and others not.

    I've seen that it's possible to do this with map reduce & setting the
    map output to be a Put (and thus could setCellVisibility on the puts),
    but I'm struggling to do this with Spark, as I keep getting the
    exception that I can't cast a Put to a Cell.

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 12:31
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual
    put/cell, but will this create additional overhead?
    This can be done. If you want different cells in the same row to have
    different labels then it is better to create those many puts and
    setCellVisibility on each of them. What type of overhead you see here?
    In terms of the server processing them? If so there should not be much
    overhead here and also adding different cells to every column inturn
    means you need every cell to be treated differenly in terms of
    security. so should be fine IMHO.

    Without doing put.setCellvisibility() there is no other way I believe.
    One question regarding your use case Now in the mail you had told
    about the spark job where you will create a bulk loaded file. Now if
    that is to have all the visibility related information of all the
    cells then the user doing this job should be an admin or super user
    right Why is the case that a normal client user will read through all
    the visibility cells which may or may not be associated with that user?

    Thank you very much for testing and using this feature. LEt us know
    your feedback and if you find any gaps here. Happy to help.

    Regards
    Ram


    On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Hmm, is there no other way to set labels on individual cells where
    we don't have to give the client users system perms? For instance,
    client users can set the cell visibility on the entire put without
    having this (i.e. put.setCellVisibility("label")) and the
    VisibilityController will check this.

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual
    put/cell, but will this create additional overhead?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds
    Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 11:24
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    The visibility expression resolver tries to scan the labels table
    and the user using the resolver should have the SYSTEM privileges.
    Since the information that is getting accessed is sensitive information.

    Suppose in your above case you have the client user added as a an
    admin then when you scan the label table you should be able to scan it.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Yeah, thanks for this Ram. Although in my testing I have found
    that a client user attempting to use the visibility expression
    resolver doesn't seem to have the ability to scan the hbase:labels
    table for the full list of labels and thus can't get the
    ordinals/tags to add to the cell. Does the client user attempting
    to use the VisibilityExpressionResolver have to have some special permissions?

    Scan of hbase:labels by client user:

    hbase(main):003:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    1 row(s) in 0.0650 seconds

    Scan of hbase:labels by hbase user:

    hbase(main):001:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    \x00\x00\x00\x02 column=f:\x00,
    timestamp=1465216944935, value=protected
    \x00\x00\x00\x02 column=f:hbase,
    timestamp=1465547138533, value=
    \x00\x00\x00\x02 column=f:tom,
    timestamp=1465980236882, value=
    \x00\x00\x00\x03 column=f:\x00,
    timestamp=1465500156667, value=testtesttest
    \x00\x00\x00\x03 column=f:@hadoop,
    timestamp=1465980236967, value=
    \x00\x00\x00\x03 column=f:hadoop,
    timestamp=1465547304610, value=
    \x00\x00\x00\x03 column=f:hive,
    timestamp=1465501322616, value=
    \x00\x00\x00\x04 column=f:\x00,
    timestamp=1465570719901, value=confidential
    \x00\x00\x00\x05 column=f:\x00,
    timestamp=1465835047835, value=branch
    \x00\x00\x00\x05 column=f:hdfs,
    timestamp=1465980237060, value=
    \x00\x00\x00\x06 column=f:\x00,
    timestamp=1465980447307, value=group
    \x00\x00\x00\x06 column=f:hdfs,
    timestamp=1465980454130, value=
    6 row(s) in 0.7370 seconds

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com , , , Reduce printing.
    Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads

    -----Original Message-----
    From: Anoop John
    Sent: 08 June 2016 11:58
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Thanks Ram.. Ya that seems the best way as CellCreator is public
    exposed class. May be we should explain abt this in hbase book
    under the Visibility labels area. Good to know you have
    Visibility labels based usecase. Let us know in case of any trouble. Thanks.

    -Anoop-

    On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan <
    ramkrishna.s.vasudevan@gmail.com> wrote:
    Hi

    It can be done. See the class CellCreator which is Public facing
    interface.
    When you create your spark job to create the hadoop files that
    produces the
    HFileOutputformat2 data. While creating the KeyValues you can
    use the CellCreator to create your KeyValues and use the
    CellCreator.getVisibilityExpressionResolver() to map your String
    Visibility tags with the system generated ordinals.

    For eg, you can see how TextSortReducer works. I think this
    should help you solve your problem. Let us know if you need
    further
    information.
    Regards
    Ram

    On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets
    IT) wrote:
    Hi Ram,

    We're attempting to do it programmatically so:

    The HFile is created by a Spark job using
    saveAsNewAPIHadoopFile, and using ImmutableBytesWritable as the
    key (rowkey) with KeyValue as the value, and using the HFilOutputFormat2 format.
    This HFile is then loaded using HBase client's
    LoadIncrementalHFiles.doBulkLoad

    Is there a way to do this programmatically without using the
    ImportTsv tool? I was taking a look at
    VisibilityUtils.createVisibilityExpTags and maybe being able to
    just create the Tags myself that way (although it's obviously
    @InterfaceAudience.Private) but it seems to be able to use that
    I'd
    need to know Label ordinality client side..
    Thanks for your help,

    Tom

    -----Original Message-----
    From: ramkrishna vasudevan

    Sent: 07 June 2016 11:19
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the
    ImportTsv tool? If you are using the ImportTsv tool then yes
    there is a way to specify visibility tags while loading from
    the ImportTsv tool and those visibility tags are also bulk
    loaded as
    HFile.
    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be
    used to indicate that the data will have Visibility Tags and
    the tool will automatically parse the specified field as Visibility Tag.

    In case you have access to the code you can see the test case
    TestImportTSVWithVisibilityLabels to get an initial idea of how
    it is being done. If not get back to us, happy to help .

    Regards
    Ram



    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets
    IT) wrote:
    Hi,

    I was wondering if it's possible/how to write Visibility
    Labels to an HFileOutputFormat2? I believe Visibility Labels
    are just implemented as Tags, but with the normal way of
    writing them with Mutation#setCellVisibility these are
    formally written as Tags to the cells during the
    VisibilityController coprocessor as we need to assert the
    expression is valid for the labels
    configured.
    How can we add visibility labels to cells if we have a job
    that creates an HFile with HFileOutputFormat2 which is then
    subsequently loaded using LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian Data Lake | Financial Markets
    IT LLOYDS BANK COMMERCIAL BANKING
    ________________________________

    E:
    tom.ellis@lloydsbanking.com > > >> > m>
    Website:
    www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.
    co
    m/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the
    low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound,
    Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in
    England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the
    Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by
    the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute,
    disclose or use any of the information in it or any
    attachments. Telephone calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound,
    Edinburgh
    EH1
    1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland
    no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute,
    disclose or use any of the information in it or any
    attachments. Telephone calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and delete
    it (including any
    attachments) immediately. You must not copy, distribute, disclose
    or use any of the information in it or any attachments. Telephone
    calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this
    e-mail in error, please notify the sender and delete it (including
    any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.


    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.


    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.


    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.
  • Ramkrishna vasudevan at Jun 17, 2016 at 6:37 am
    so long as only the HBase user and the spark user can read/write to the
    file, I'm not sure what the risk is?
    I was saying more with respect to the sensitivity of the data that was
    written.
    Say there are following users
    Admin
    Manager
    Worker1
    Worker 2

    and the following labels
    CONFIDENTIAL, SECRET, PUBLIC, WORKER_1_INFO, WORKER_2_INFO
    Now if the manager has associated Worker 1 with WORKER_1_INFO and Worker 2
    with WORKER_2_INFO. Now when worker1 is trying to read his information he
    should set WORKER_1_INFO in his scan.

    So if there is a bulk load scenario where the entire file is getting read
    so the user trying to do the bulk load in this example should not be
    worker1 or worker 2. It should be either the Admin or Manager.

    Now in your case spark user and hbase user are these Admin or Manager (as
    in my eg) then it is perfectly fine.
    am I able to read the HFile manually to determine if Tags have been
    written properly?
      HBASE-15707 is a case which was not allowing the tags to be written while
    creating the file. You may be needing that fix when you are adding tags
    directly. But in your case they are visibility tags which you are not
    supposed to add directly except for using the setCellVisibility() way. But
    it is better to have that fix in your branch also.
    "hbase.security.visibility.mutations.checkauths" - for now the method of
    set_auths 'client','system' along with only giving 'client' read on
    'hbase:labels' is working for me.

    Fine. I have some doubts on here with respect to how SYSTEM tags are
    implemented. Will get back on this.

    Regards
    Ram
    On Thu, Jun 16, 2016 at 9:11 PM, Ellis, Tom (Financial Markets IT) wrote:

    Hi Again Ram,

    "hbase.security.visibility.mutations.checkauths" - for now the method of
    set_auths 'client','system' along with only giving 'client' read on
    'hbase:labels' is working for me.

    "Coming to reading the HFile and creating a bulk load, I think we should
    be more cautious here " - I don't follow again sorry. The spark user writes
    the HFile, and then initiates the load with
    LoadIncrementalHFiles.doBulkLoad - so long as only the HBase user and the
    spark user can read/write to the file, I'm not sure what the risk is?

    HBASE-15707 - am I able to read the HFile manually to determine if Tags
    have been written properly?

    Cheers,

    Tom


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 16 June 2016 06:01
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Thanks for the updates here. Going through the mails here
    Why is it that a client user without admin/super user privileges can
    set
    a visibility expression using Put.setCellVisibility, but if we want to
    write using HFiles,

    I get your point now. There is a property
    '"hbase.security.visibility.mutations.checkauths" if set will check if the
    user is authorized to mutate the visibility labels that he is trying to
    write. If the user is not allowed to add that label the mutation will fail.
    Can you see if this solves the other problem of allowing any client user
    to write? If the above is not well documented pls feel free to raise a JIRA
    and we are happy to address it.

    Coming to reading the HFile and creating a bulk load, I think we should be
    more cautious here. There are some critical info stored in the HFile and
    just allowing any user to read it is going to be risky.

    Coming to the PutSortReducer problem, I think what you say is true. Not
    sure if there is a bug already, if not pls feel free to raise a bug here.
    We need to fix it.

    HBASE-15707 - you may need this because for scala's HBasecontext you need
    to ensure tags are included just incase ImportTSV has to be used.

    Write back, if I had missed something or if my info was lacking. Its been
    quite sometime we had worked in this area so have to see code every time to
    know what was done.

    Regards
    Ram

    On Wed, Jun 15, 2016 at 11:29 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    So, I can see that I can correctly get the List<Tag>s from the
    VisibilityExpressionResolver, set them on the Cell, and write them
    using HFileOutputFormat2, however when I scan using an unprivileged
    user I can still see the cells. If I write the cells with
    setCellVisibility the unprivileged user can't see them.

    Then I noticed the fix for HBASE-15707. I am using the Hortonworks'
    HBase
    1.1.2 - am affected by this/does HFileOutputFormat2 support tags
    before this fix?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: Ellis, Tom (Financial Markets IT) [mailto:
    tom.ellis@lloydsbanking.com.invalid]
    Sent: 15 June 2016 17:42
    To: user@hbase.apache.org
    Subject: RE: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Looking at the source for how DefaultCellLabelServiceImpl checks
    authorisation I noted it's just that the user just needs to have the
    'system' label auth privileges - not admin/super user as I thought you
    meant Ram. So technically, I could have a client user that is given
    the system label privileges, but only read access to the 'hbase:labels' table?
    Then that user will still be able to scan and read the labels +
    ordinal, and create the tags correctly :) I'll give it a go..

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: Ellis, Tom (Financial Markets IT) [mailto:
    tom.ellis@lloydsbanking.com.invalid]
    Sent: 15 June 2016 16:56
    To: user@hbase.apache.org
    Subject: RE: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    I see now from some other examples I've found that actually this form
    of using HFileOutputFormat2 to write Puts will use the PutSortReducer
    if you set the map output class of the job you give it to Put. Looking
    at the source for PutSourceReducer it seems that it will actually lose
    the Cell Visibility information as it uses the getFamilyCellMap to
    create KeyValue objects and just uses that, and the CellVisibility is
    actually on the Put Mutation.

    So I think that unfortunately, I can only really work around this by
    giving the application user writing the HFile admin access so it can
    then use the VisibilityExpressionResolver to create cells with tags
    with the correct ordinals.

    Am I missing something? Why is it that a client user without
    admin/super user privileges can set a visibility expression using
    Put.setCellVisibility, but if we want to write using HFiles, the
    client user has to have admin/super user privileges so they can use
    VisibilityExpressionResolver to correctly create the tags on the Cell
    with correct ordinals?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: Ellis, Tom (Financial Markets IT) [mailto:
    tom.ellis@lloydsbanking.com.invalid]
    Sent: 15 June 2016 16:25
    To: user@hbase.apache.org
    Subject: RE: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    So I have a working prototype using just bulk puts on a table and
    using setCellVisibility as necessary. Now I'm trying to do it using HFile.
    Sorry Ram, I don't quite follow why the user doing the writing of the
    HFile has to be an admin/super user? Is that necessary to load HFiles?

    The use case is to hopefully have an application user (non admin)
    performing the writes to an hbase table via a bulk load of an hfile,
    setting visibility labels on individual cells as necessary. Then
    business users who has been given the auth to view that label can see
    those cells, and others not.

    I've seen that it's possible to do this with map reduce & setting the
    map output to be a Put (and thus could setCellVisibility on the puts),
    but I'm struggling to do this with Spark, as I keep getting the
    exception that I can't cast a Put to a Cell.

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the low
    carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 12:31
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual
    put/cell, but will this create additional overhead?
    This can be done. If you want different cells in the same row to have
    different labels then it is better to create those many puts and
    setCellVisibility on each of them. What type of overhead you see here?
    In terms of the server processing them? If so there should not be much
    overhead here and also adding different cells to every column inturn
    means you need every cell to be treated differenly in terms of
    security. so should be fine IMHO.

    Without doing put.setCellvisibility() there is no other way I believe.
    One question regarding your use case Now in the mail you had told
    about the spark job where you will create a bulk loaded file. Now if
    that is to have all the visibility related information of all the
    cells then the user doing this job should be an admin or super user
    right Why is the case that a normal client user will read through all
    the visibility cells which may or may not be associated with that user?

    Thank you very much for testing and using this feature. LEt us know
    your feedback and if you find any gaps here. Happy to help.

    Regards
    Ram


    On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Hmm, is there no other way to set labels on individual cells where
    we don't have to give the client users system perms? For instance,
    client users can set the cell visibility on the entire put without
    having this (i.e. put.setCellVisibility("label")) and the
    VisibilityController will check this.

    We could I guess create multiple puts for cells in the same row with
    different labels and use the setCellVisibility on each individual
    put/cell, but will this create additional overhead?

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds
    Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads


    -----Original Message-----
    From: ramkrishna vasudevan
    Sent: 15 June 2016 11:24
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    The visibility expression resolver tries to scan the labels table
    and the user using the resolver should have the SYSTEM privileges.
    Since the information that is getting accessed is sensitive
    information.
    Suppose in your above case you have the client user added as a an
    admin then when you scan the label table you should be able to scan
    it.
    Regards
    Ram

    On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) <
    tom.ellis@lloydsbanking.com.invalid> wrote:
    Yeah, thanks for this Ram. Although in my testing I have found
    that a client user attempting to use the visibility expression
    resolver doesn't seem to have the ability to scan the hbase:labels
    table for the full list of labels and thus can't get the
    ordinals/tags to add to the cell. Does the client user attempting
    to use the VisibilityExpressionResolver have to have some special
    permissions?
    Scan of hbase:labels by client user:

    hbase(main):003:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    1 row(s) in 0.0650 seconds

    Scan of hbase:labels by hbase user:

    hbase(main):001:0> scan 'hbase:labels'
    ROW COLUMN+CELL
    \x00\x00\x00\x01 column=f:\x00,
    timestamp=1465216652662, value=system
    \x00\x00\x00\x02 column=f:\x00,
    timestamp=1465216944935, value=protected
    \x00\x00\x00\x02 column=f:hbase,
    timestamp=1465547138533, value=
    \x00\x00\x00\x02 column=f:tom,
    timestamp=1465980236882, value=
    \x00\x00\x00\x03 column=f:\x00,
    timestamp=1465500156667, value=testtesttest
    \x00\x00\x00\x03 column=f:@hadoop,
    timestamp=1465980236967, value=
    \x00\x00\x00\x03 column=f:hadoop,
    timestamp=1465547304610, value=
    \x00\x00\x00\x03 column=f:hive,
    timestamp=1465501322616, value=
    \x00\x00\x00\x04 column=f:\x00,
    timestamp=1465570719901, value=confidential
    \x00\x00\x00\x05 column=f:\x00,
    timestamp=1465835047835, value=branch
    \x00\x00\x00\x05 column=f:hdfs,
    timestamp=1465980237060, value=
    \x00\x00\x00\x06 column=f:\x00,
    timestamp=1465980447307, value=group
    \x00\x00\x00\x06 column=f:hdfs,
    timestamp=1465980454130, value=
    6 row(s) in 0.7370 seconds

    Cheers,

    Tom Ellis
    Consultant Developer – Excelian
    Data Lake | Financial Markets IT
    LLOYDS BANK COMMERCIAL BANKING


    E: tom.ellis@lloydsbanking.com
    Website: www.lloydsbankcommercial.com , , , Reduce printing.
    Lloyds Banking Group is helping to build the low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads

    -----Original Message-----
    From: Anoop John
    Sent: 08 June 2016 11:58
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Thanks Ram.. Ya that seems the best way as CellCreator is public
    exposed class. May be we should explain abt this in hbase book
    under the Visibility labels area. Good to know you have
    Visibility labels based usecase. Let us know in case of any
    trouble. Thanks.
    -Anoop-

    On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan <
    ramkrishna.s.vasudevan@gmail.com> wrote:
    Hi

    It can be done. See the class CellCreator which is Public facing
    interface.
    When you create your spark job to create the hadoop files that
    produces the
    HFileOutputformat2 data. While creating the KeyValues you can
    use the CellCreator to create your KeyValues and use the
    CellCreator.getVisibilityExpressionResolver() to map your String
    Visibility tags with the system generated ordinals.

    For eg, you can see how TextSortReducer works. I think this
    should help you solve your problem. Let us know if you need
    further
    information.
    Regards
    Ram

    On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets
    IT) wrote:
    Hi Ram,

    We're attempting to do it programmatically so:

    The HFile is created by a Spark job using
    saveAsNewAPIHadoopFile, and using ImmutableBytesWritable as the
    key (rowkey) with KeyValue as the value, and using the
    HFilOutputFormat2 format.
    This HFile is then loaded using HBase client's
    LoadIncrementalHFiles.doBulkLoad

    Is there a way to do this programmatically without using the
    ImportTsv tool? I was taking a look at
    VisibilityUtils.createVisibilityExpTags and maybe being able to
    just create the Tags myself that way (although it's obviously
    @InterfaceAudience.Private) but it seems to be able to use that
    I'd
    need to know Label ordinality client side..
    Thanks for your help,

    Tom

    -----Original Message-----
    From: ramkrishna vasudevan

    Sent: 07 June 2016 11:19
    To: user@hbase.apache.org
    Subject: Re: Writing visibility labels with HFileOutputFormat2

    -- This email has reached the Bank via an external source --


    Hi Ellis

    How is the HFileOutputFormat2 files created? Are you using the
    ImportTsv tool? If you are using the ImportTsv tool then yes
    there is a way to specify visibility tags while loading from
    the ImportTsv tool and those visibility tags are also bulk
    loaded as
    HFile.
    There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be
    used to indicate that the data will have Visibility Tags and
    the tool will automatically parse the specified field as
    Visibility Tag.
    In case you have access to the code you can see the test case
    TestImportTSVWithVisibilityLabels to get an initial idea of how
    it is being done. If not get back to us, happy to help .

    Regards
    Ram



    On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets
    IT) wrote:
    Hi,

    I was wondering if it's possible/how to write Visibility
    Labels to an HFileOutputFormat2? I believe Visibility Labels
    are just implemented as Tags, but with the normal way of
    writing them with Mutation#setCellVisibility these are
    formally written as Tags to the cells during the
    VisibilityController coprocessor as we need to assert the
    expression is valid for the labels
    configured.
    How can we add visibility labels to cells if we have a job
    that creates an HFile with HFileOutputFormat2 which is then
    subsequently loaded using LoadIncrementalHFiles?

    Cheers,

    Tom Ellis
    Consultant Developer - Excelian Data Lake | Financial Markets
    IT LLOYDS BANK COMMERCIAL BANKING
    ________________________________

    E:
    tom.ellis@lloydsbanking.com > > > >> > m>
    Website:
    www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.
    co
    m/
    , , ,
    Reduce printing. Lloyds Banking Group is helping to build the
    low carbon economy.
    Corporate Responsibility Report:
    www.lloydsbankinggroup-cr.com/downloads<
    http://www.lloydsbankinggroup-cr.com/downloads>



    Lloyds Banking Group plc. Registered Office: The Mound,
    Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626
    1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in
    England and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the
    Financial Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by
    the Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute,
    disclose or use any of the information in it or any
    attachments. Telephone calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound,
    Edinburgh
    EH1
    1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London
    EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland
    no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England
    and Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and
    delete it (including any
    attachments) immediately. You must not copy, distribute,
    disclose or use any of the information in it or any
    attachments. Telephone calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555.
    Lloyds Bank plc. Registered Office: 25 Gresham Street, London EC2V
    7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and
    confidential and may contain privileged material. If you have
    received this e-mail in error, please notify the sender and delete
    it (including any
    attachments) immediately. You must not copy, distribute, disclose
    or use any of the information in it or any attachments. Telephone
    calls may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh
    EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500.
    Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the
    Prudential Regulation Authority and regulated by the Financial
    Conduct Authority and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham &
    Gloucester Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this
    e-mail in error, please notify the sender and delete it (including
    any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.


    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.


    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.


    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
    Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
    Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank
    of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
    Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
    Wales 2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority
    and Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the
    Financial Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered
    in Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential
    and may contain privileged material. If you have received this e-mail
    in error, please notify the sender and delete it (including any
    attachments) immediately. You must not copy, distribute, disclose or
    use any of the information in it or any attachments. Telephone calls
    may be monitored or recorded.

    Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
    Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank
    plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in
    England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
    Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
    SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered
    Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales
    2299428. Telephone: 0345 603 1637

    Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
    Regulation Authority and regulated by the Financial Conduct Authority and
    Prudential Regulation Authority.

    Cheltenham & Gloucester plc is authorised and regulated by the Financial
    Conduct Authority.

    Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
    Savings is a division of Lloyds Bank plc.

    HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
    Scotland no. SC218813.

    This e-mail (including any attachments) is private and confidential and
    may contain privileged material. If you have received this e-mail in error,
    please notify the sender and delete it (including any attachments)
    immediately. You must not copy, distribute, disclose or use any of the
    information in it or any attachments. Telephone calls may be monitored or
    recorded.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJun 7, '16 at 10:06a
activeJun 17, '16 at 6:37a
posts18
users4
websitehbase.apache.org

People

Translate

site design / logo © 2018 Grokbase