Grokbase Groups HBase user March 2013
FAQ
Hi Nick,

As an HBase user I would welcome this addition. In addition to the proposed
list of datatypes A UUID/GUID type would also be nice to have.

Regards,

/David

On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk wrote:

Hi all,

I'd like to draw your attention to HBASE-8089. The desire is to add type
support to HBase. There are two primary objectives: make the lives of
developers building on HBase easier, and facilitate better tools on top of
HBase. Please chime in with any feature suggestions you think we've missed
in initial conversations.

Thanks,
-n

[0]: https://issues.apache.org/jira/browse/HBASE-8089

Search Discussions

  • Michel Segel at Mar 15, 2013 at 3:23 pm
    You do realize that having to worry about one type is easier...

    A bit more freedom...


    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Mar 14, 2013, at 11:51 AM, David Koch wrote:

    Hi Nick,

    As an HBase user I would welcome this addition. In addition to the proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David

    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk wrote:

    Hi all,

    I'd like to draw your attention to HBASE-8089. The desire is to add type
    support to HBase. There are two primary objectives: make the lives of
    developers building on HBase easier, and facilitate better tools on top of
    HBase. Please chime in with any feature suggestions you think we've missed
    in initial conversations.

    Thanks,
    -n

    [0]: https://issues.apache.org/jira/browse/HBASE-8089
  • Nick Dimiduk at Mar 15, 2013 at 5:07 pm

    On Fri, Mar 15, 2013 at 5:25 AM, Michel Segel wrote:

    You do realize that having to worry about one type is easier...
    For HBase developers, that's true. The side-effect is that those worries
    are pushed out into users' applications. Think of the application developer
    who's accustomed to all the accoutrements provided by the Management System
    part of an RDBMS. They pick up HBase and have none of that. I think the
    motivations outlined in the attached document make a good case for bringing
    some of that burden out of users' applications.

    A bit more freedom...
    >

    Support for raw byte[] doesn't go away. In this proposal, bytes remain the
    core plumbing of the system.

    -n
    On Mar 14, 2013, at 11:51 AM, David Koch wrote:

    Hi Nick,

    As an HBase user I would welcome this addition. In addition to the proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David

    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk wrote:

    Hi all,

    I'd like to draw your attention to HBASE-8089. The desire is to add type
    support to HBase. There are two primary objectives: make the lives of
    developers building on HBase easier, and facilitate better tools on top
    of
    HBase. Please chime in with any feature suggestions you think we've
    missed
    in initial conversations.

    Thanks,
    -n

    [0]: https://issues.apache.org/jira/browse/HBASE-8089
  • Michel Segel at Mar 15, 2013 at 10:36 pm
    Ok..

    So how do you check types in a column when the column isn't defined in the Schema?


    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Mar 15, 2013, at 10:06 AM, Nick Dimiduk wrote:
    On Fri, Mar 15, 2013 at 5:25 AM, Michel Segel wrote:

    You do realize that having to worry about one type is easier...
    For HBase developers, that's true. The side-effect is that those worries
    are pushed out into users' applications. Think of the application developer
    who's accustomed to all the accoutrements provided by the Management System
    part of an RDBMS. They pick up HBase and have none of that. I think the
    motivations outlined in the attached document make a good case for bringing
    some of that burden out of users' applications.

    A bit more freedom...

    Support for raw byte[] doesn't go away. In this proposal, bytes remain the
    core plumbing of the system.

    -n
    On Mar 14, 2013, at 11:51 AM, David Koch wrote:

    Hi Nick,

    As an HBase user I would welcome this addition. In addition to the proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David


    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <ndimiduk@gmail.com>
    wrote:
    Hi all,

    I'd like to draw your attention to HBASE-8089. The desire is to add type
    support to HBase. There are two primary objectives: make the lives of
    developers building on HBase easier, and facilitate better tools on top
    of
    HBase. Please chime in with any feature suggestions you think we've
    missed
    in initial conversations.

    Thanks,
    -n

    [0]: https://issues.apache.org/jira/browse/HBASE-8089
  • Nick Dimiduk at Mar 19, 2013 at 8:35 pm

    On Fri, Mar 15, 2013 at 3:35 PM, Michel Segel wrote:

    So how do you check types in a column when the column isn't defined in the
    Schema?
    In this proposal, it's not up to HBase to enforce types or schema, just as
    it does not do these things today. What we're proposing is a set of
    utilities that take the burdon of correct serialization off of user code. I
    request that you please read the proposal in its entirety before commenting
    further.

    Thanks,
    Nick
    On Mar 15, 2013, at 10:06 AM, Nick Dimiduk wrote:

    On Fri, Mar 15, 2013 at 5:25 AM, Michel Segel <michael_segel@hotmail.com
    wrote:
    You do realize that having to worry about one type is easier...
    For HBase developers, that's true. The side-effect is that those worries
    are pushed out into users' applications. Think of the application developer
    who's accustomed to all the accoutrements provided by the Management System
    part of an RDBMS. They pick up HBase and have none of that. I think the
    motivations outlined in the attached document make a good case for bringing
    some of that burden out of users' applications.

    A bit more freedom...

    Support for raw byte[] doesn't go away. In this proposal, bytes remain the
    core plumbing of the system.

    -n
    On Mar 14, 2013, at 11:51 AM, David Koch wrote:

    Hi Nick,

    As an HBase user I would welcome this addition. In addition to the proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David


    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <ndimiduk@gmail.com>
    wrote:
    Hi all,

    I'd like to draw your attention to HBASE-8089. The desire is to add
    type
    support to HBase. There are two primary objectives: make the lives of
    developers building on HBase easier, and facilitate better tools on
    top
    of
    HBase. Please chime in with any feature suggestions you think we've
    missed
    in initial conversations.

    Thanks,
    -n

    [0]: https://issues.apache.org/jira/browse/HBASE-8089
  • Nick Dimiduk at Mar 15, 2013 at 5:12 pm
    Hi David,

    Native support for a handful of hashing algorithms has also been discussed.
    Do you think these should be supported directly, as opposed to using a
    fixed-length String or fixed-length byte[]?

    Thanks,
    Nick
    On Thu, Mar 14, 2013 at 9:51 AM, David Koch wrote:

    Hi Nick,

    As an HBase user I would welcome this addition. In addition to the proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David

    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk wrote:

    Hi all,

    I'd like to draw your attention to HBASE-8089. The desire is to add type
    support to HBase. There are two primary objectives: make the lives of
    developers building on HBase easier, and facilitate better tools on top of
    HBase. Please chime in with any feature suggestions you think we've missed
    in initial conversations.

    Thanks,
    -n

    [0]: https://issues.apache.org/jira/browse/HBASE-8089
  • James Taylor at Mar 15, 2013 at 5:56 pm
    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James
    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
    Hi David,

    Native support for a handful of hashing algorithms has also been discussed.
    Do you think these should be supported directly, as opposed to using a
    fixed-length String or fixed-length byte[]?

    Thanks,
    Nick
    On Thu, Mar 14, 2013 at 9:51 AM, David Koch wrote:

    Hi Nick,

    As an HBase user I would welcome this addition. In addition to the proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David

    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk wrote:

    Hi all,

    I'd like to draw your attention to HBASE-8089. The desire is to add type
    support to HBase. There are two primary objectives: make the lives of
    developers building on HBase easier, and facilitate better tools on top of
    HBase. Please chime in with any feature suggestions you think we've missed
    in initial conversations.

    Thanks,
    -n

    [0]: https://issues.apache.org/jira/browse/HBASE-8089
  • Nick Dimiduk at Mar 15, 2013 at 5:58 pm
    I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
    in HBASE-7221.
    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor wrote:

    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been
    discussed.
    Do you think these should be supported directly, as opposed to using a
    fixed-length String or fixed-length byte[]?

    Thanks,
    Nick

    On Thu, Mar 14, 2013 at 9:51 AM, David Koch <ogdude@googlemail.com>
    wrote:

    Hi Nick,
    As an HBase user I would welcome this addition. In addition to the
    proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David


    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <ndimiduk@gmail.com>
    wrote:

    Hi all,
    I'd like to draw your attention to HBASE-8089. The desire is to add type
    support to HBase. There are two primary objectives: make the lives of
    developers building on HBase easier, and facilitate better tools on top of
    HBase. Please chime in with any feature suggestions you think we've missed
    in initial conversations.

    Thanks,
    -n

    [0]: https://issues.apache.org/**jira/browse/HBASE-8089<https://issues.apache.org/jira/browse/HBASE-8089>
  • Lars hofhansl at Mar 16, 2013 at 5:45 am
    I think generally we should keep HBase a byte[] based key value store.
    What we should add to HBase are tools that would allow client side apps (or libraries) to built functionality on top of plain HBase.

    Serialization that maintains a correct semantic sort order is important as a building block, so is code that can build up correctly serialized and sortable compound keys, as well as hashing algorithms.

    Where I would draw the line is adding types to HBase itself. As long as one can write a client, or Filters, or Coprocessors with the tools provided by HBase we're good. Higher level functionality can then be built of on top of HBase.


    For example, maybe we need to add better access API to the HBase WAL in order to have an external library implement idempotent transactions (which can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow an external library to implement full transactions.
    Or we might need a statistics framework (such as the one that Jesse is working on).

    These are all building blocks that do not presume specific access patterns or clients, but can be used to implement them.


    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
    in HBASE-7221.
    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor wrote:

    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been
    discussed.
    Do you think these should be supported directly, as opposed to using a
    fixed-length String or fixed-length byte[]?

    Thanks,
    Nick

    On Thu, Mar 14, 2013 at 9:51 AM, David Koch <ogdude@googlemail.com>
    wrote:

    Hi Nick,
    As an HBase user I would welcome this addition. In addition to the
    proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David


    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <ndimiduk@gmail.com>
    wrote:

    Hi all,
    I'd like to draw your attention to HBASE-8089. The desire is to add type
    support to HBase. There are two primary objectives: make the lives of
    developers building on HBase easier, and facilitate better tools on top of
    HBase. Please chime in with any feature suggestions you think we've missed
    in initial conversations.

    Thanks,
    -n

    [0]: https://issues.apache.org/**jira/browse/HBASE-8089<https://issues.apache.org/jira/browse/HBASE-8089>
  • Michel Segel at Mar 16, 2013 at 12:19 pm
    Isn't that what you get through add on frameworks like TSDB and Kiji ? Maybe not on the client side, but frameworks that extend HBase...

    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Mar 16, 2013, at 12:45 AM, lars hofhansl wrote:

    I think generally we should keep HBase a byte[] based key value store.
    What we should add to HBase are tools that would allow client side apps (or libraries) to built functionality on top of plain HBase.

    Serialization that maintains a correct semantic sort order is important as a building block, so is code that can build up correctly serialized and sortable compound keys, as well as hashing algorithms.

    Where I would draw the line is adding types to HBase itself. As long as one can write a client, or Filters, or Coprocessors with the tools provided by HBase we're good. Higher level functionality can then be built of on top of HBase.


    For example, maybe we need to add better access API to the HBase WAL in order to have an external library implement idempotent transactions (which can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow an external library to implement full transactions.
    Or we might need a statistics framework (such as the one that Jesse is working on).

    These are all building blocks that do not presume specific access patterns or clients, but can be used to implement them.


    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
    in HBASE-7221.
    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor wrote:

    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been
    discussed.
    Do you think these should be supported directly, as opposed to using a
    fixed-length String or fixed-length byte[]?

    Thanks,
    Nick

    On Thu, Mar 14, 2013 at 9:51 AM, David Koch <ogdude@googlemail.com>
    wrote:

    Hi Nick,
    As an HBase user I would welcome this addition. In addition to the
    proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David


    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <ndimiduk@gmail.com>
    wrote:

    Hi all,
    I'd like to draw your attention to HBASE-8089. The desire is to add type
    support to HBase. There are two primary objectives: make the lives of
    developers building on HBase easier, and facilitate better tools on top of
    HBase. Please chime in with any feature suggestions you think we've missed
    in initial conversations.

    Thanks,
    -n

    [0]: https://issues.apache.org/**jira/browse/HBASE-8089<https://issues.apache.org/jira/browse/HBASE-8089>
  • Michel Segel at Mar 16, 2013 at 12:27 pm
    I also want to add that you could add MD5 and SHA-1, but I'd check on us laws... I think these are ok, however other encryption/decryption code is not.

    They are part of the std sun java libraries ...

    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Mar 16, 2013, at 7:18 AM, Michel Segel wrote:

    Isn't that what you get through add on frameworks like TSDB and Kiji ? Maybe not on the client side, but frameworks that extend HBase...

    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Mar 16, 2013, at 12:45 AM, lars hofhansl wrote:

    I think generally we should keep HBase a byte[] based key value store.
    What we should add to HBase are tools that would allow client side apps (or libraries) to built functionality on top of plain HBase.

    Serialization that maintains a correct semantic sort order is important as a building block, so is code that can build up correctly serialized and sortable compound keys, as well as hashing algorithms.

    Where I would draw the line is adding types to HBase itself. As long as one can write a client, or Filters, or Coprocessors with the tools provided by HBase we're good. Higher level functionality can then be built of on top of HBase.


    For example, maybe we need to add better access API to the HBase WAL in order to have an external library implement idempotent transactions (which can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow an external library to implement full transactions.
    Or we might need a statistics framework (such as the one that Jesse is working on).

    These are all building blocks that do not presume specific access patterns or clients, but can be used to implement them.


    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
    in HBASE-7221.
    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor wrote:

    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been
    discussed.
    Do you think these should be supported directly, as opposed to using a
    fixed-length String or fixed-length byte[]?

    Thanks,
    Nick

    On Thu, Mar 14, 2013 at 9:51 AM, David Koch <ogdude@googlemail.com>
    wrote:

    Hi Nick,
    As an HBase user I would welcome this addition. In addition to the
    proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David


    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <ndimiduk@gmail.com>
    wrote:

    Hi all,
    I'd like to draw your attention to HBASE-8089. The desire is to add type
    support to HBase. There are two primary objectives: make the lives of
    developers building on HBase easier, and facilitate better tools on top of
    HBase. Please chime in with any feature suggestions you think we've missed
    in initial conversations.

    Thanks,
    -n

    [0]: https://issues.apache.org/**jira/browse/HBASE-8089<https://issues.apache.org/jira/browse/HBASE-8089>
  • Andrew Purtell at Mar 16, 2013 at 12:59 pm
    The ASF avails itself of an exception to crypto export which only requires
    a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
    humbly request we refrain from FUD here. See
    http://www.apache.org/dev/crypto.html. To the best of our knowledge we
    expect this to continue, though the ASF has not updated this policy yet for
    recent regulation updates.
    On Saturday, March 16, 2013, Michel Segel wrote:

    I also want to add that you could add MD5 and SHA-1, but I'd check on us
    laws... I think these are ok, however other encryption/decryption code is
    not.

    They are part of the std sun java libraries ...

    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Mar 16, 2013, at 7:18 AM, Michel Segel wrote:

    Isn't that what you get through add on frameworks like TSDB and Kiji ?
    Maybe not on the client side, but frameworks that extend HBase...
    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Mar 16, 2013, at 12:45 AM, lars hofhansl wrote:

    I think generally we should keep HBase a byte[] based key value store.
    What we should add to HBase are tools that would allow client side apps
    (or libraries) to built functionality on top of plain HBase.
    Serialization that maintains a correct semantic sort order is important
    as a building block, so is code that can build up correctly serialized and
    sortable compound keys, as well as hashing algorithms.
    Where I would draw the line is adding types to HBase itself. As long as
    one can write a client, or Filters, or Coprocessors with the tools provided
    by HBase we're good. Higher level functionality can then be built of on top
    of HBase.

    For example, maybe we need to add better access API to the HBase WAL in
    order to have an external library implement idempotent transactions (which
    can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow an
    external library to implement full transactions.
    Or we might need a statistics framework (such as the one that Jesse is
    working on).
    These are all building blocks that do not presume specific access
    patterns or clients, but can be used to implement them.

    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
    in HBASE-7221.

    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jtaylor@salesforce.com
    wrote:
    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been
    discussed.
    Do you think these should be supported directly, as opposed to using a
    fixed-length String or fixed-length byte[]?

    Thanks,
    Nick

    On Thu, Mar 14, 2013 at 9:51 AM, David Koch <ogdude@googlemail.com>
    wrote:

    Hi Nick,
    As an HBase user I would welcome this addition. In addition to the
    proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David


    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <ndimiduk@gmail.com>
    wrote:

    Hi all,
    I'd like to draw your attention to HBASE-8089. The desire is to add
    type
    support to HBase. There are two primary objectives: make the lives
    of
    developers building on HBase easier, and facilitate better tools on
    top
    of
    HBase. Please chime in with any feature suggestions you think we've

    --
    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)
  • Michael Segel at Mar 17, 2013 at 4:14 pm
    Its not a question of FUD, but that certain types of encryption/decryption code falls under the munitions act.
    See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm

    Having said that, there is this:
    http://www.bis.doc.gov/encryption/encfaqs6_17_02.html

    In short, I don't as a habit export/import encryption technology so I am not up to speed on the current state of the laws.
    Which is why I have to question the current state of the US encryption laws.

    This then leads to another question... suppose Apache does add encryption to Hadoop. While the Apache organization does have the proper paperwork in place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?

    But lets put that question aside.

    The point I was trying to make was that the core Sun JVM does support MD5 and SHA-1 out of the box, so that anyone running Hadoop and using the 1.6_xx or the 1.7_xx versions of the JVM will have these packages.

    Adding hooks that use these classes are a no brainer. However, beyond this... you tell me.

    -Mike
    On Mar 16, 2013, at 7:59 AM, Andrew Purtell wrote:

    The ASF avails itself of an exception to crypto export which only requires
    a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
    humbly request we refrain from FUD here. See
    http://www.apache.org/dev/crypto.html. To the best of our knowledge we
    expect this to continue, though the ASF has not updated this policy yet for
    recent regulation updates.
    On Saturday, March 16, 2013, Michel Segel wrote:

    I also want to add that you could add MD5 and SHA-1, but I'd check on us
    laws... I think these are ok, however other encryption/decryption code is
    not.

    They are part of the std sun java libraries ...

    Sent from a remote device. Please excuse any typos...

    Mike Segel

    On Mar 16, 2013, at 7:18 AM, Michel Segel <michael_segel@hotmail.com>
    wrote:
    Isn't that what you get through add on frameworks like TSDB and Kiji ?
    Maybe not on the client side, but frameworks that extend HBase...
    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Mar 16, 2013, at 12:45 AM, lars hofhansl wrote:

    I think generally we should keep HBase a byte[] based key value store.
    What we should add to HBase are tools that would allow client side apps
    (or libraries) to built functionality on top of plain HBase.
    Serialization that maintains a correct semantic sort order is important
    as a building block, so is code that can build up correctly serialized and
    sortable compound keys, as well as hashing algorithms.
    Where I would draw the line is adding types to HBase itself. As long as
    one can write a client, or Filters, or Coprocessors with the tools provided
    by HBase we're good. Higher level functionality can then be built of on top
    of HBase.

    For example, maybe we need to add better access API to the HBase WAL in
    order to have an external library implement idempotent transactions (which
    can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow an
    external library to implement full transactions.
    Or we might need a statistics framework (such as the one that Jesse is
    working on).
    These are all building blocks that do not presume specific access
    patterns or clients, but can be used to implement them.

    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
    in HBASE-7221.

    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jtaylor@salesforce.com
    wrote:
    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been
    discussed.
    Do you think these should be supported directly, as opposed to using a
    fixed-length String or fixed-length byte[]?

    Thanks,
    Nick

    On Thu, Mar 14, 2013 at 9:51 AM, David Koch <ogdude@googlemail.com>
    wrote:

    Hi Nick,
    As an HBase user I would welcome this addition. In addition to the
    proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David


    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <ndimiduk@gmail.com>
    wrote:

    Hi all,
    I'd like to draw your attention to HBASE-8089. The desire is to add
    type
    support to HBase. There are two primary objectives: make the lives
    of
    developers building on HBase easier, and facilitate better tools on
    top
    of
    HBase. Please chime in with any feature suggestions you think we've

    --
    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)
  • Andrew Purtell at Mar 17, 2013 at 5:12 pm
    This then leads to another question... suppose Apache does add encryption
    to Hadoop. While the Apache organization does have the proper paperwork in
    place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?

    Well I can't put that question aside since you've brought it up now
    twice and encryption feature candidates for Apache Hadoop and Apache HBase
    are something I have been working on. Its a valid question but since as you
    admit you don't know what you are talking about, perhaps stating uninformed
    opinions can be avoided. Only the latter is what I object to. I think the
    short answer is as an Apache contributor I'm concerned about the Apache
    product. Downstream repackagers can take whatever action needed including
    changes, since it is open source, or feedback about it representing a
    hardship. At this point I have heard nothing like that. I work for Intel
    and can say we are good with it.
    On Sunday, March 17, 2013, Michael Segel wrote:

    Its not a question of FUD, but that certain types of encryption/decryption
    code falls under the munitions act.
    See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm

    Having said that, there is this:
    http://www.bis.doc.gov/encryption/encfaqs6_17_02.html

    In short, I don't as a habit export/import encryption technology so I am
    not up to speed on the current state of the laws.
    Which is why I have to question the current state of the US encryption
    laws.

    This then leads to another question... suppose Apache does add encryption
    to Hadoop. While the Apache organization does have the proper paperwork in
    place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?

    But lets put that question aside.

    The point I was trying to make was that the core Sun JVM does support MD5
    and SHA-1 out of the box, so that anyone running Hadoop and using the
    1.6_xx or the 1.7_xx versions of the JVM will have these packages.

    Adding hooks that use these classes are a no brainer. However, beyond
    this... you tell me.

    -Mike
    On Mar 16, 2013, at 7:59 AM, Andrew Purtell wrote:

    The ASF avails itself of an exception to crypto export which only requires
    a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
    humbly request we refrain from FUD here. See
    http://www.apache.org/dev/crypto.html. To the best of our knowledge we
    expect this to continue, though the ASF has not updated this policy yet for
    recent regulation updates.
    On Saturday, March 16, 2013, Michel Segel wrote:

    I also want to add that you could add MD5 and SHA-1, but I'd check on us
    laws... I think these are ok, however other encryption/decryption code
    is
    not.

    They are part of the std sun java libraries ...

    Sent from a remote device. Please excuse any typos...

    Mike Segel

    On Mar 16, 2013, at 7:18 AM, Michel Segel <michael_segel@hotmail.com>
    wrote:
    Isn't that what you get through add on frameworks like TSDB and Kiji ?
    Maybe not on the client side, but frameworks that extend HBase...
    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Mar 16, 2013, at 12:45 AM, lars hofhansl wrote:

    I think generally we should keep HBase a byte[] based key value store.
    What we should add to HBase are tools that would allow client side
    apps
    (or libraries) to built functionality on top of plain HBase.
    Serialization that maintains a correct semantic sort order is
    important
    as a building block, so is code that can build up correctly serialized
    and
    sortable compound keys, as well as hashing algorithms.
    Where I would draw the line is adding types to HBase itself. As long
    as
    one can write a client, or Filters, or Coprocessors with the tools
    provided
    by HBase we're good. Higher level functionality can then be built of on
    top
    of HBase.

    For example, maybe we need to add better access API to the HBase WAL
    in
    order to have an external library implement idempotent transactions
    (which
    can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow an
    external library to implement full transactions.
    Or we might need a statistics framework (such as the one that Jesse is
    working on).
    These are all building blocks that do not presume specific access
    patterns or clients, but can be used to implement them.

    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
    in HBASE-7221.

    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <
    jtaylor@salesforce.com
    wrote:
    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been


    --
    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)
  • Amit Sela at Mar 17, 2013 at 5:32 pm
    Regarding HBase 7941 - client API support..

    In the past year I wrote a lot of client code for HBase and which led to
    writing a helper class for my specific needs, and since it was brought up
    here I guess I'm not the only one who did something similar...I personally
    like the idea brought up by Nick in the JIRA of using some kind of
    <SerializationType> interface and having HBase shipped with primitive
    support - and anyone who wants will implement for their needs. Same idea ad
    Hadoop's Writable interface - shipping with IntWritable, LongWritable, etc.

    Anyway I'd be happy to help.
    On Sun, Mar 17, 2013 at 7:12 PM, Andrew Purtell wrote:

    This then leads to another question... suppose Apache does add encryption
    to Hadoop. While the Apache organization does have the proper paperwork in
    place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?

    Well I can't put that question aside since you've brought it up now
    twice and encryption feature candidates for Apache Hadoop and Apache HBase
    are something I have been working on. Its a valid question but since as you
    admit you don't know what you are talking about, perhaps stating uninformed
    opinions can be avoided. Only the latter is what I object to. I think the
    short answer is as an Apache contributor I'm concerned about the Apache
    product. Downstream repackagers can take whatever action needed including
    changes, since it is open source, or feedback about it representing a
    hardship. At this point I have heard nothing like that. I work for Intel
    and can say we are good with it.
    On Sunday, March 17, 2013, Michael Segel wrote:

    Its not a question of FUD, but that certain types of
    encryption/decryption
    code falls under the munitions act.
    See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm

    Having said that, there is this:
    http://www.bis.doc.gov/encryption/encfaqs6_17_02.html

    In short, I don't as a habit export/import encryption technology so I am
    not up to speed on the current state of the laws.
    Which is why I have to question the current state of the US encryption
    laws.

    This then leads to another question... suppose Apache does add encryption
    to Hadoop. While the Apache organization does have the proper paperwork in
    place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?

    But lets put that question aside.

    The point I was trying to make was that the core Sun JVM does support MD5
    and SHA-1 out of the box, so that anyone running Hadoop and using the
    1.6_xx or the 1.7_xx versions of the JVM will have these packages.

    Adding hooks that use these classes are a no brainer. However, beyond
    this... you tell me.

    -Mike
    On Mar 16, 2013, at 7:59 AM, Andrew Purtell wrote:

    The ASF avails itself of an exception to crypto export which only requires
    a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
    humbly request we refrain from FUD here. See
    http://www.apache.org/dev/crypto.html. To the best of our knowledge we
    expect this to continue, though the ASF has not updated this policy yet for
    recent regulation updates.
    On Saturday, March 16, 2013, Michel Segel wrote:

    I also want to add that you could add MD5 and SHA-1, but I'd check on
    us
    laws... I think these are ok, however other encryption/decryption code
    is
    not.

    They are part of the std sun java libraries ...

    Sent from a remote device. Please excuse any typos...

    Mike Segel

    On Mar 16, 2013, at 7:18 AM, Michel Segel <michael_segel@hotmail.com>
    wrote:
    Isn't that what you get through add on frameworks like TSDB and Kiji
    ?
    Maybe not on the client side, but frameworks that extend HBase...
    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Mar 16, 2013, at 12:45 AM, lars hofhansl wrote:

    I think generally we should keep HBase a byte[] based key value
    store.
    What we should add to HBase are tools that would allow client side
    apps
    (or libraries) to built functionality on top of plain HBase.
    Serialization that maintains a correct semantic sort order is
    important
    as a building block, so is code that can build up correctly serialized
    and
    sortable compound keys, as well as hashing algorithms.
    Where I would draw the line is adding types to HBase itself. As long
    as
    one can write a client, or Filters, or Coprocessors with the tools
    provided
    by HBase we're good. Higher level functionality can then be built of
    on
    top
    of HBase.

    For example, maybe we need to add better access API to the HBase WAL
    in
    order to have an external library implement idempotent transactions
    (which
    can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow an
    external library to implement full transactions.
    Or we might need a statistics framework (such as the one that Jesse
    is
    working on).
    These are all building blocks that do not presume specific access
    patterns or clients, but can be used to implement them.

    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly
    mentioned
    in HBASE-7221.

    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <
    jtaylor@salesforce.com
    wrote:
    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been


    --
    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)
  • Mohamed Ibrahim at Mar 17, 2013 at 11:20 pm
    I'm not a lawyer, but I think we're ok as long as it's in source code as
    that is protected under freedom of speech in the US. See here (
    http://en.wikipedia.org/wiki/Cryptography ) under Export Control, the part
    related to Bernstein v. United States . I don't know about binaries like
    deb, but I can tell that we download binaries for browsers every day and
    they use encryption in lots of places. I believe if there's any real issues
    it would have surfaced up by now.

    As far as types in HBase, I think it is an excellent idea. I would suggest
    to enable us to add a custom type, just like we can add our custom filters.
    Some types that I had to code myself include CSV. There can be other custom
    types that I need in the future, may be json, so the ability to add a
    custom type might be a good feature.

    Thanks,
    Mohamed

    On Sun, Mar 17, 2013 at 1:12 PM, Andrew Purtell wrote:

    This then leads to another question... suppose Apache does add encryption
    to Hadoop. While the Apache organization does have the proper paperwork in
    place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?

    Well I can't put that question aside since you've brought it up now
    twice and encryption feature candidates for Apache Hadoop and Apache HBase
    are something I have been working on. Its a valid question but since as you
    admit you don't know what you are talking about, perhaps stating uninformed
    opinions can be avoided. Only the latter is what I object to. I think the
    short answer is as an Apache contributor I'm concerned about the Apache
    product. Downstream repackagers can take whatever action needed including
    changes, since it is open source, or feedback about it representing a
    hardship. At this point I have heard nothing like that. I work for Intel
    and can say we are good with it.
    On Sunday, March 17, 2013, Michael Segel wrote:

    Its not a question of FUD, but that certain types of
    encryption/decryption
    code falls under the munitions act.
    See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm

    Having said that, there is this:
    http://www.bis.doc.gov/encryption/encfaqs6_17_02.html

    In short, I don't as a habit export/import encryption technology so I am
    not up to speed on the current state of the laws.
    Which is why I have to question the current state of the US encryption
    laws.

    This then leads to another question... suppose Apache does add encryption
    to Hadoop. While the Apache organization does have the proper paperwork in
    place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?

    But lets put that question aside.

    The point I was trying to make was that the core Sun JVM does support MD5
    and SHA-1 out of the box, so that anyone running Hadoop and using the
    1.6_xx or the 1.7_xx versions of the JVM will have these packages.

    Adding hooks that use these classes are a no brainer. However, beyond
    this... you tell me.

    -Mike
    On Mar 16, 2013, at 7:59 AM, Andrew Purtell wrote:

    The ASF avails itself of an exception to crypto export which only requires
    a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
    humbly request we refrain from FUD here. See
    http://www.apache.org/dev/crypto.html. To the best of our knowledge we
    expect this to continue, though the ASF has not updated this policy yet for
    recent regulation updates.
    On Saturday, March 16, 2013, Michel Segel wrote:

    I also want to add that you could add MD5 and SHA-1, but I'd check on
    us
    laws... I think these are ok, however other encryption/decryption code
    is
    not.

    They are part of the std sun java libraries ...

    Sent from a remote device. Please excuse any typos...

    Mike Segel

    On Mar 16, 2013, at 7:18 AM, Michel Segel <michael_segel@hotmail.com>
    wrote:
    Isn't that what you get through add on frameworks like TSDB and Kiji
    ?
    Maybe not on the client side, but frameworks that extend HBase...
    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Mar 16, 2013, at 12:45 AM, lars hofhansl wrote:

    I think generally we should keep HBase a byte[] based key value
    store.
    What we should add to HBase are tools that would allow client side
    apps
    (or libraries) to built functionality on top of plain HBase.
    Serialization that maintains a correct semantic sort order is
    important
    as a building block, so is code that can build up correctly serialized
    and
    sortable compound keys, as well as hashing algorithms.
    Where I would draw the line is adding types to HBase itself. As long
    as
    one can write a client, or Filters, or Coprocessors with the tools
    provided
    by HBase we're good. Higher level functionality can then be built of
    on
    top
    of HBase.

    For example, maybe we need to add better access API to the HBase WAL
    in
    order to have an external library implement idempotent transactions
    (which
    can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow an
    external library to implement full transactions.
    Or we might need a statistics framework (such as the one that Jesse
    is
    working on).
    These are all building blocks that do not presume specific access
    patterns or clients, but can be used to implement them.

    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly
    mentioned
    in HBASE-7221.

    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <
    jtaylor@salesforce.com
    wrote:
    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been


    --
    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)
  • Ramkrishna vasudevan at Mar 18, 2013 at 3:01 am
    HBase shipping a generic framework for different interfaces is needed for
    ease of use for the users. +1 on the idea.
    Getting out the correct result for float values, positive and negative
    integers had to be taken care by the users or by using some wrappers.

    This will help to solve that problem to a great extent.

    Regards
    Ram
    On Sun, Mar 17, 2013 at 10:54 PM, Mohamed Ibrahim wrote:

    I'm not a lawyer, but I think we're ok as long as it's in source code as
    that is protected under freedom of speech in the US. See here (
    http://en.wikipedia.org/wiki/Cryptography ) under Export Control, the part
    related to Bernstein v. United States . I don't know about binaries like
    deb, but I can tell that we download binaries for browsers every day and
    they use encryption in lots of places. I believe if there's any real issues
    it would have surfaced up by now.

    As far as types in HBase, I think it is an excellent idea. I would suggest
    to enable us to add a custom type, just like we can add our custom filters.
    Some types that I had to code myself include CSV. There can be other custom
    types that I need in the future, may be json, so the ability to add a
    custom type might be a good feature.

    Thanks,
    Mohamed

    On Sun, Mar 17, 2013 at 1:12 PM, Andrew Purtell wrote:

    This then leads to another question... suppose Apache does add
    encryption
    to Hadoop. While the Apache organization does have the proper paperwork in
    place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?

    Well I can't put that question aside since you've brought it up now
    twice and encryption feature candidates for Apache Hadoop and Apache HBase
    are something I have been working on. Its a valid question but since as you
    admit you don't know what you are talking about, perhaps stating
    uninformed
    opinions can be avoided. Only the latter is what I object to. I think the
    short answer is as an Apache contributor I'm concerned about the Apache
    product. Downstream repackagers can take whatever action needed including
    changes, since it is open source, or feedback about it representing a
    hardship. At this point I have heard nothing like that. I work for Intel
    and can say we are good with it.
    On Sunday, March 17, 2013, Michael Segel wrote:

    Its not a question of FUD, but that certain types of
    encryption/decryption
    code falls under the munitions act.
    See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm

    Having said that, there is this:
    http://www.bis.doc.gov/encryption/encfaqs6_17_02.html

    In short, I don't as a habit export/import encryption technology so I
    am
    not up to speed on the current state of the laws.
    Which is why I have to question the current state of the US encryption
    laws.

    This then leads to another question... suppose Apache does add
    encryption
    to Hadoop. While the Apache organization does have the proper paperwork in
    place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel,
    etc ?
    But lets put that question aside.

    The point I was trying to make was that the core Sun JVM does support
    MD5
    and SHA-1 out of the box, so that anyone running Hadoop and using the
    1.6_xx or the 1.7_xx versions of the JVM will have these packages.

    Adding hooks that use these classes are a no brainer. However, beyond
    this... you tell me.

    -Mike
    On Mar 16, 2013, at 7:59 AM, Andrew Purtell wrote:

    The ASF avails itself of an exception to crypto export which only requires
    a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
    humbly request we refrain from FUD here. See
    http://www.apache.org/dev/crypto.html. To the best of our knowledge
    we
    expect this to continue, though the ASF has not updated this policy
    yet
    for
    recent regulation updates.
    On Saturday, March 16, 2013, Michel Segel wrote:

    I also want to add that you could add MD5 and SHA-1, but I'd check
    on
    us
    laws... I think these are ok, however other encryption/decryption
    code
    is
    not.

    They are part of the std sun java libraries ...

    Sent from a remote device. Please excuse any typos...

    Mike Segel

    On Mar 16, 2013, at 7:18 AM, Michel Segel <
    michael_segel@hotmail.com>
    wrote:
    Isn't that what you get through add on frameworks like TSDB and
    Kiji
    ?
    Maybe not on the client side, but frameworks that extend HBase...
    Sent from a remote device. Please excuse any typos...

    Mike Segel

    On Mar 16, 2013, at 12:45 AM, lars hofhansl <larsh@apache.org>
    wrote:
    I think generally we should keep HBase a byte[] based key value
    store.
    What we should add to HBase are tools that would allow client side
    apps
    (or libraries) to built functionality on top of plain HBase.
    Serialization that maintains a correct semantic sort order is
    important
    as a building block, so is code that can build up correctly
    serialized
    and
    sortable compound keys, as well as hashing algorithms.
    Where I would draw the line is adding types to HBase itself. As
    long
    as
    one can write a client, or Filters, or Coprocessors with the tools
    provided
    by HBase we're good. Higher level functionality can then be built of
    on
    top
    of HBase.

    For example, maybe we need to add better access API to the HBase
    WAL
    in
    order to have an external library implement idempotent transactions
    (which
    can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow
    an
    external library to implement full transactions.
    Or we might need a statistics framework (such as the one that
    Jesse
    is
    working on).
    These are all building blocks that do not presume specific access
    patterns or clients, but can be used to implement them.

    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly
    mentioned
    in HBASE-7221.

    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <
    jtaylor@salesforce.com
    wrote:
    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been


    --
    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)
  • Michel Segel at Mar 18, 2013 at 11:52 am
    If we look at TSDB, Kiji, Asynch HBase, it looks like extensions to HBase already exist.

    I haven't looked at Salesforce,com's SQL interface, but I suspect that they too have some sort of framework where they have to enforce typing.


    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Mar 17, 2013, at 10:01 PM, ramkrishna vasudevan wrote:

    HBase shipping a generic framework for different interfaces is needed for
    ease of use for the users. +1 on the idea.
    Getting out the correct result for float values, positive and negative
    integers had to be taken care by the users or by using some wrappers.

    This will help to solve that problem to a great extent.

    Regards
    Ram
    On Sun, Mar 17, 2013 at 10:54 PM, Mohamed Ibrahim wrote:

    I'm not a lawyer, but I think we're ok as long as it's in source code as
    that is protected under freedom of speech in the US. See here (
    http://en.wikipedia.org/wiki/Cryptography ) under Export Control, the part
    related to Bernstein v. United States . I don't know about binaries like
    deb, but I can tell that we download binaries for browsers every day and
    they use encryption in lots of places. I believe if there's any real issues
    it would have surfaced up by now.

    As far as types in HBase, I think it is an excellent idea. I would suggest
    to enable us to add a custom type, just like we can add our custom filters.
    Some types that I had to code myself include CSV. There can be other custom
    types that I need in the future, may be json, so the ability to add a
    custom type might be a good feature.

    Thanks,
    Mohamed


    On Sun, Mar 17, 2013 at 1:12 PM, Andrew Purtell <apurtell@apache.org>
    wrote:
    This then leads to another question... suppose Apache does add
    encryption
    to Hadoop. While the Apache organization does have the proper paperwork in
    place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?

    Well I can't put that question aside since you've brought it up now
    twice and encryption feature candidates for Apache Hadoop and Apache HBase
    are something I have been working on. Its a valid question but since as you
    admit you don't know what you are talking about, perhaps stating
    uninformed
    opinions can be avoided. Only the latter is what I object to. I think the
    short answer is as an Apache contributor I'm concerned about the Apache
    product. Downstream repackagers can take whatever action needed including
    changes, since it is open source, or feedback about it representing a
    hardship. At this point I have heard nothing like that. I work for Intel
    and can say we are good with it.
    On Sunday, March 17, 2013, Michael Segel wrote:

    Its not a question of FUD, but that certain types of
    encryption/decryption
    code falls under the munitions act.
    See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm

    Having said that, there is this:
    http://www.bis.doc.gov/encryption/encfaqs6_17_02.html

    In short, I don't as a habit export/import encryption technology so I
    am
    not up to speed on the current state of the laws.
    Which is why I have to question the current state of the US encryption
    laws.

    This then leads to another question... suppose Apache does add
    encryption
    to Hadoop. While the Apache organization does have the proper paperwork in
    place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel,
    etc ?
    But lets put that question aside.

    The point I was trying to make was that the core Sun JVM does support
    MD5
    and SHA-1 out of the box, so that anyone running Hadoop and using the
    1.6_xx or the 1.7_xx versions of the JVM will have these packages.

    Adding hooks that use these classes are a no brainer. However, beyond
    this... you tell me.

    -Mike

    On Mar 16, 2013, at 7:59 AM, Andrew Purtell <apurtell@apache.org>
    wrote:
    The ASF avails itself of an exception to crypto export which only requires
    a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
    humbly request we refrain from FUD here. See
    http://www.apache.org/dev/crypto.html. To the best of our knowledge
    we
    expect this to continue, though the ASF has not updated this policy
    yet
    for
    recent regulation updates.
    On Saturday, March 16, 2013, Michel Segel wrote:

    I also want to add that you could add MD5 and SHA-1, but I'd check
    on
    us
    laws... I think these are ok, however other encryption/decryption
    code
    is
    not.

    They are part of the std sun java libraries ...

    Sent from a remote device. Please excuse any typos...

    Mike Segel

    On Mar 16, 2013, at 7:18 AM, Michel Segel <
    michael_segel@hotmail.com>
    wrote:
    Isn't that what you get through add on frameworks like TSDB and
    Kiji
    ?
    Maybe not on the client side, but frameworks that extend HBase...
    Sent from a remote device. Please excuse any typos...

    Mike Segel

    On Mar 16, 2013, at 12:45 AM, lars hofhansl <larsh@apache.org>
    wrote:
    I think generally we should keep HBase a byte[] based key value
    store.
    What we should add to HBase are tools that would allow client side
    apps
    (or libraries) to built functionality on top of plain HBase.
    Serialization that maintains a correct semantic sort order is
    important
    as a building block, so is code that can build up correctly
    serialized
    and
    sortable compound keys, as well as hashing algorithms.
    Where I would draw the line is adding types to HBase itself. As
    long
    as
    one can write a client, or Filters, or Coprocessors with the tools
    provided
    by HBase we're good. Higher level functionality can then be built of
    on
    top
    of HBase.

    For example, maybe we need to add better access API to the HBase
    WAL
    in
    order to have an external library implement idempotent transactions
    (which
    can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow
    an
    external library to implement full transactions.
    Or we might need a statistics framework (such as the one that
    Jesse
    is
    working on).
    These are all building blocks that do not presume specific access
    patterns or clients, but can be used to implement them.

    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly
    mentioned
    in HBASE-7221.

    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <
    jtaylor@salesforce.com
    wrote:
    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been


    --
    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)
  • Michel Segel at Mar 18, 2013 at 12:03 pm
    Andrew,

    I was aware of you employer, which I am pretty sure that they have already dealt with the issue of exporting encryption software and probably hardware too.

    Neither of us are lawyers and what I do know of dealing with the government bureaucracies, it's not always as simple of just filing the correct paperwork. (Sometimes it is, sometimes not so much, YMMV...)

    Putting the hooks for encryption is probably a good idea. Shipping the encryption w the release or making it part of the official release, not so much. Sorry, I'm being a bit conservative here.

    IMHO I think fixing other issues would be of a higher priority, but that's just me;-)

    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Mar 17, 2013, at 12:12 PM, Andrew Purtell wrote:

    This then leads to another question... suppose Apache does add encryption
    to Hadoop. While the Apache organization does have the proper paperwork in
    place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?

    Well I can't put that question aside since you've brought it up now
    twice and encryption feature candidates for Apache Hadoop and Apache HBase
    are something I have been working on. Its a valid question but since as you
    admit you don't know what you are talking about, perhaps stating uninformed
    opinions can be avoided. Only the latter is what I object to. I think the
    short answer is as an Apache contributor I'm concerned about the Apache
    product. Downstream repackagers can take whatever action needed including
    changes, since it is open source, or feedback about it representing a
    hardship. At this point I have heard nothing like that. I work for Intel
    and can say we are good with it.
    On Sunday, March 17, 2013, Michael Segel wrote:

    Its not a question of FUD, but that certain types of encryption/decryption
    code falls under the munitions act.
    See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm

    Having said that, there is this:
    http://www.bis.doc.gov/encryption/encfaqs6_17_02.html

    In short, I don't as a habit export/import encryption technology so I am
    not up to speed on the current state of the laws.
    Which is why I have to question the current state of the US encryption
    laws.

    This then leads to another question... suppose Apache does add encryption
    to Hadoop. While the Apache organization does have the proper paperwork in
    place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?

    But lets put that question aside.

    The point I was trying to make was that the core Sun JVM does support MD5
    and SHA-1 out of the box, so that anyone running Hadoop and using the
    1.6_xx or the 1.7_xx versions of the JVM will have these packages.

    Adding hooks that use these classes are a no brainer. However, beyond
    this... you tell me.

    -Mike
    On Mar 16, 2013, at 7:59 AM, Andrew Purtell wrote:

    The ASF avails itself of an exception to crypto export which only requires
    a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
    humbly request we refrain from FUD here. See
    http://www.apache.org/dev/crypto.html. To the best of our knowledge we
    expect this to continue, though the ASF has not updated this policy yet for
    recent regulation updates.
    On Saturday, March 16, 2013, Michel Segel wrote:

    I also want to add that you could add MD5 and SHA-1, but I'd check on us
    laws... I think these are ok, however other encryption/decryption code
    is
    not.

    They are part of the std sun java libraries ...

    Sent from a remote device. Please excuse any typos...

    Mike Segel

    On Mar 16, 2013, at 7:18 AM, Michel Segel <michael_segel@hotmail.com>
    wrote:
    Isn't that what you get through add on frameworks like TSDB and Kiji ?
    Maybe not on the client side, but frameworks that extend HBase...
    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Mar 16, 2013, at 12:45 AM, lars hofhansl wrote:

    I think generally we should keep HBase a byte[] based key value store.
    What we should add to HBase are tools that would allow client side
    apps
    (or libraries) to built functionality on top of plain HBase.
    Serialization that maintains a correct semantic sort order is
    important
    as a building block, so is code that can build up correctly serialized
    and
    sortable compound keys, as well as hashing algorithms.
    Where I would draw the line is adding types to HBase itself. As long
    as
    one can write a client, or Filters, or Coprocessors with the tools
    provided
    by HBase we're good. Higher level functionality can then be built of on
    top
    of HBase.

    For example, maybe we need to add better access API to the HBase WAL
    in
    order to have an external library implement idempotent transactions
    (which
    can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow an
    external library to implement full transactions.
    Or we might need a statistics framework (such as the one that Jesse is
    working on).
    These are all building blocks that do not presume specific access
    patterns or clients, but can be used to implement them.

    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
    in HBASE-7221.

    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <
    jtaylor@salesforce.com
    wrote:
    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been


    --
    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)
  • Doug Meil at Mar 18, 2013 at 7:17 pm
    Sorry I'm late to this thread but I was the guy behind HBASE-7221 and the
    algorithms specifically mentioned were MD5 and Murmur (not SHA-1). And
    implementation of Murmur already exists in Hbase, and the MD5
    implementation was the one that ships with Java.

    The intent was to include hashing appropriate for use with key
    distribution of rowkeys in tables as is often suggested on the dist-lists.
    SHA-1 is probably overkill for the rowkey case, but I wouldn't want to
    stop anybody from using SHA-1 if it was appropriate for their needs.




    On 3/18/13 8:02 AM, "Michel Segel" wrote:

    Andrew,

    I was aware of you employer, which I am pretty sure that they have
    already dealt with the issue of exporting encryption software and
    probably hardware too.

    Neither of us are lawyers and what I do know of dealing with the
    government bureaucracies, it's not always as simple of just filing the
    correct paperwork. (Sometimes it is, sometimes not so much, YMMV...)

    Putting the hooks for encryption is probably a good idea. Shipping the
    encryption w the release or making it part of the official release, not
    so much. Sorry, I'm being a bit conservative here.

    IMHO I think fixing other issues would be of a higher priority, but
    that's just me;-)

    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Mar 17, 2013, at 12:12 PM, Andrew Purtell wrote:

    This then leads to another question... suppose Apache does add
    encryption
    to Hadoop. While the Apache organization does have the proper paperwork
    in
    place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc
    ?

    Well I can't put that question aside since you've brought it up now
    twice and encryption feature candidates for Apache Hadoop and Apache
    HBase
    are something I have been working on. Its a valid question but since as
    you
    admit you don't know what you are talking about, perhaps stating
    uninformed
    opinions can be avoided. Only the latter is what I object to. I think
    the
    short answer is as an Apache contributor I'm concerned about the Apache
    product. Downstream repackagers can take whatever action needed
    including
    changes, since it is open source, or feedback about it representing a
    hardship. At this point I have heard nothing like that. I work for Intel
    and can say we are good with it.
    On Sunday, March 17, 2013, Michael Segel wrote:

    Its not a question of FUD, but that certain types of
    encryption/decryption
    code falls under the munitions act.
    See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm

    Having said that, there is this:
    http://www.bis.doc.gov/encryption/encfaqs6_17_02.html

    In short, I don't as a habit export/import encryption technology so I
    am
    not up to speed on the current state of the laws.
    Which is why I have to question the current state of the US encryption
    laws.

    This then leads to another question... suppose Apache does add
    encryption
    to Hadoop. While the Apache organization does have the proper
    paperwork in
    place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel,
    etc ?

    But lets put that question aside.

    The point I was trying to make was that the core Sun JVM does support
    MD5
    and SHA-1 out of the box, so that anyone running Hadoop and using the
    1.6_xx or the 1.7_xx versions of the JVM will have these packages.

    Adding hooks that use these classes are a no brainer. However, beyond
    this... you tell me.

    -Mike

    On Mar 16, 2013, at 7:59 AM, Andrew Purtell <apurtell@apache.org>
    wrote:
    The ASF avails itself of an exception to crypto export which only requires
    a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
    humbly request we refrain from FUD here. See
    http://www.apache.org/dev/crypto.html. To the best of our knowledge we
    expect this to continue, though the ASF has not updated this policy
    yet for
    recent regulation updates.
    On Saturday, March 16, 2013, Michel Segel wrote:

    I also want to add that you could add MD5 and SHA-1, but I'd check
    on us
    laws... I think these are ok, however other encryption/decryption
    code
    is
    not.

    They are part of the std sun java libraries ...

    Sent from a remote device. Please excuse any typos...

    Mike Segel

    On Mar 16, 2013, at 7:18 AM, Michel Segel <michael_segel@hotmail.com>
    wrote:
    Isn't that what you get through add on frameworks like TSDB and
    Kiji ?
    Maybe not on the client side, but frameworks that extend HBase...
    Sent from a remote device. Please excuse any typos...

    Mike Segel

    On Mar 16, 2013, at 12:45 AM, lars hofhansl <larsh@apache.org>
    wrote:
    I think generally we should keep HBase a byte[] based key value
    store.
    What we should add to HBase are tools that would allow client side
    apps
    (or libraries) to built functionality on top of plain HBase.
    Serialization that maintains a correct semantic sort order is
    important
    as a building block, so is code that can build up correctly
    serialized
    and
    sortable compound keys, as well as hashing algorithms.
    Where I would draw the line is adding types to HBase itself. As
    long
    as
    one can write a client, or Filters, or Coprocessors with the tools
    provided
    by HBase we're good. Higher level functionality can then be built of
    on
    top
    of HBase.

    For example, maybe we need to add better access API to the HBase
    WAL
    in
    order to have an external library implement idempotent transactions
    (which
    can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow an
    external library to implement full transactions.
    Or we might need a statistics framework (such as the one that
    Jesse is
    working on).
    These are all building blocks that do not presume specific access
    patterns or clients, but can be used to implement them.

    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly
    mentioned
    in HBASE-7221.

    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <
    jtaylor@salesforce.com
    wrote:
    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been


    --
    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)
  • Michael Segel at Mar 19, 2013 at 12:42 am
    Thanks for the clarification Doug.

    Back to my point, I was saying that MD5 and SHA-1 are already part of the Java package so if you're running Java 1.6_xx or Java 1.7_xx, you will have MD5 available. So it could be a good thing.


    Murmur is released under MIT... Is there going to be a licensing issue? (Thinking back to the delay in getting Snappy.) Note: I don't know which is why I am asking so I don't want to be accused of FUD.
    :-P
    On Mar 18, 2013, at 2:16 PM, Doug Meil wrote:


    Sorry I'm late to this thread but I was the guy behind HBASE-7221 and the
    algorithms specifically mentioned were MD5 and Murmur (not SHA-1). And
    implementation of Murmur already exists in Hbase, and the MD5
    implementation was the one that ships with Java.

    The intent was to include hashing appropriate for use with key
    distribution of rowkeys in tables as is often suggested on the dist-lists.
    SHA-1 is probably overkill for the rowkey case, but I wouldn't want to
    stop anybody from using SHA-1 if it was appropriate for their needs.




    On 3/18/13 8:02 AM, "Michel Segel" wrote:

    Andrew,

    I was aware of you employer, which I am pretty sure that they have
    already dealt with the issue of exporting encryption software and
    probably hardware too.

    Neither of us are lawyers and what I do know of dealing with the
    government bureaucracies, it's not always as simple of just filing the
    correct paperwork. (Sometimes it is, sometimes not so much, YMMV...)

    Putting the hooks for encryption is probably a good idea. Shipping the
    encryption w the release or making it part of the official release, not
    so much. Sorry, I'm being a bit conservative here.

    IMHO I think fixing other issues would be of a higher priority, but
    that's just me;-)

    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Mar 17, 2013, at 12:12 PM, Andrew Purtell wrote:

    This then leads to another question... suppose Apache does add
    encryption
    to Hadoop. While the Apache organization does have the proper paperwork
    in
    place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc
    ?

    Well I can't put that question aside since you've brought it up now
    twice and encryption feature candidates for Apache Hadoop and Apache
    HBase
    are something I have been working on. Its a valid question but since as
    you
    admit you don't know what you are talking about, perhaps stating
    uninformed
    opinions can be avoided. Only the latter is what I object to. I think
    the
    short answer is as an Apache contributor I'm concerned about the Apache
    product. Downstream repackagers can take whatever action needed
    including
    changes, since it is open source, or feedback about it representing a
    hardship. At this point I have heard nothing like that. I work for Intel
    and can say we are good with it.
    On Sunday, March 17, 2013, Michael Segel wrote:

    Its not a question of FUD, but that certain types of
    encryption/decryption
    code falls under the munitions act.
    See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm

    Having said that, there is this:
    http://www.bis.doc.gov/encryption/encfaqs6_17_02.html

    In short, I don't as a habit export/import encryption technology so I
    am
    not up to speed on the current state of the laws.
    Which is why I have to question the current state of the US encryption
    laws.

    This then leads to another question... suppose Apache does add
    encryption
    to Hadoop. While the Apache organization does have the proper
    paperwork in
    place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel,
    etc ?

    But lets put that question aside.

    The point I was trying to make was that the core Sun JVM does support
    MD5
    and SHA-1 out of the box, so that anyone running Hadoop and using the
    1.6_xx or the 1.7_xx versions of the JVM will have these packages.

    Adding hooks that use these classes are a no brainer. However, beyond
    this... you tell me.

    -Mike

    On Mar 16, 2013, at 7:59 AM, Andrew Purtell <apurtell@apache.org>
    wrote:
    The ASF avails itself of an exception to crypto export which only requires
    a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
    humbly request we refrain from FUD here. See
    http://www.apache.org/dev/crypto.html. To the best of our knowledge we
    expect this to continue, though the ASF has not updated this policy
    yet for
    recent regulation updates.
    On Saturday, March 16, 2013, Michel Segel wrote:

    I also want to add that you could add MD5 and SHA-1, but I'd check
    on us
    laws... I think these are ok, however other encryption/decryption
    code
    is
    not.

    They are part of the std sun java libraries ...

    Sent from a remote device. Please excuse any typos...

    Mike Segel

    On Mar 16, 2013, at 7:18 AM, Michel Segel <michael_segel@hotmail.com>
    wrote:
    Isn't that what you get through add on frameworks like TSDB and
    Kiji ?
    Maybe not on the client side, but frameworks that extend HBase...
    Sent from a remote device. Please excuse any typos...

    Mike Segel

    On Mar 16, 2013, at 12:45 AM, lars hofhansl <larsh@apache.org>
    wrote:
    I think generally we should keep HBase a byte[] based key value
    store.
    What we should add to HBase are tools that would allow client side
    apps
    (or libraries) to built functionality on top of plain HBase.
    Serialization that maintains a correct semantic sort order is
    important
    as a building block, so is code that can build up correctly
    serialized
    and
    sortable compound keys, as well as hashing algorithms.
    Where I would draw the line is adding types to HBase itself. As
    long
    as
    one can write a client, or Filters, or Coprocessors with the tools
    provided
    by HBase we're good. Higher level functionality can then be built of
    on
    top
    of HBase.

    For example, maybe we need to add better access API to the HBase
    WAL
    in
    order to have an external library implement idempotent transactions
    (which
    can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow an
    external library to implement full transactions.
    Or we might need a statistics framework (such as the one that
    Jesse is
    working on).
    These are all building blocks that do not presume specific access
    patterns or clients, but can be used to implement them.

    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly
    mentioned
    in HBASE-7221.

    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <
    jtaylor@salesforce.com
    wrote:
    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been


    --
    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)

  • Nick Dimiduk at Mar 19, 2013 at 8:36 pm

    On Sat, Mar 16, 2013 at 5:18 AM, Michel Segel wrote:

    Isn't that what you get through add on frameworks like TSDB and Kiji ?
    Maybe not on the client side, but frameworks that extend HBase...
    Sure. How can these tools interoperate together? Right now, they would all
    have to agree on serialization and schema representation methods. They
    don't; each designs their own schema and type management systems. Now the
    user is experiencing vendor lockin.

    This proposal puts in place an HBase-sanctioned solution for storing typed
    data. The question of schema remains up to the tools, but with this
    proposal, at least data can be read interoperably. Those other frameworks
    can choose to use it or not, but the ones that do will all agree on how to
    read and write, say, an Integer value.

    Thanks,
    Nick
    On Mar 16, 2013, at 12:45 AM, lars hofhansl wrote:

    I think generally we should keep HBase a byte[] based key value store.
    What we should add to HBase are tools that would allow client side apps
    (or libraries) to built functionality on top of plain HBase.
    Serialization that maintains a correct semantic sort order is important
    as a building block, so is code that can build up correctly serialized and
    sortable compound keys, as well as hashing algorithms.
    Where I would draw the line is adding types to HBase itself. As long as
    one can write a client, or Filters, or Coprocessors with the tools provided
    by HBase we're good. Higher level functionality can then be built of on top
    of HBase.

    For example, maybe we need to add better access API to the HBase WAL in
    order to have an external library implement idempotent transactions (which
    can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow an
    external library to implement full transactions.
    Or we might need a statistics framework (such as the one that Jesse is
    working on).
    These are all building blocks that do not presume specific access
    patterns or clients, but can be used to implement them.

    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
    in HBASE-7221.

    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jtaylor@salesforce.com
    wrote:
    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been
    discussed.
    Do you think these should be supported directly, as opposed to using a
    fixed-length String or fixed-length byte[]?

    Thanks,
    Nick

    On Thu, Mar 14, 2013 at 9:51 AM, David Koch <ogdude@googlemail.com>
    wrote:

    Hi Nick,
    As an HBase user I would welcome this addition. In addition to the
    proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David


    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <ndimiduk@gmail.com>
    wrote:

    Hi all,
    I'd like to draw your attention to HBASE-8089. The desire is to add
    type
    support to HBase. There are two primary objectives: make the lives of
    developers building on HBase easier, and facilitate better tools on
    top
    of
    HBase. Please chime in with any feature suggestions you think we've missed
    in initial conversations.

    Thanks,
    -n

    [0]: https://issues.apache.org/**jira/browse/HBASE-8089<
    https://issues.apache.org/jira/browse/HBASE-8089>
  • Jonathan Hsieh at Mar 18, 2013 at 11:55 pm
    +1. I really don't want to add typing specific information into hbase core
    -- howver, having buliding blocks, plugins, and extra metadata manage it
    seems quite reasonable to me.

    There are many many games that can be played to encode data and enforcing
    typing at the hbase level as opposed to library. (ex: putting in structs
    that have fields with ints as opposed to having tons of cols with ints in
    them, or how opentsdb encodes time stamps, etc..).

    Jon.
    On Fri, Mar 15, 2013 at 10:45 PM, lars hofhansl wrote:

    I think generally we should keep HBase a byte[] based key value store.
    What we should add to HBase are tools that would allow client side apps
    (or libraries) to built functionality on top of plain HBase.

    Serialization that maintains a correct semantic sort order is important as
    a building block, so is code that can build up correctly serialized and
    sortable compound keys, as well as hashing algorithms.

    Where I would draw the line is adding types to HBase itself. As long as
    one can write a client, or Filters, or Coprocessors with the tools provided
    by HBase we're good. Higher level functionality can then be built of on top
    of HBase.


    For example, maybe we need to add better access API to the HBase WAL in
    order to have an external library implement idempotent transactions (which
    can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow an
    external library to implement full transactions.
    Or we might need a statistics framework (such as the one that Jesse is
    working on).

    These are all building blocks that do not presume specific access patterns
    or clients, but can be used to implement them.


    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
    in HBASE-7221.

    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jtaylor@salesforce.com
    wrote:
    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been
    discussed.
    Do you think these should be supported directly, as opposed to using a
    fixed-length String or fixed-length byte[]?

    Thanks,
    Nick

    On Thu, Mar 14, 2013 at 9:51 AM, David Koch <ogdude@googlemail.com>
    wrote:

    Hi Nick,
    As an HBase user I would welcome this addition. In addition to the
    proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David


    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <ndimiduk@gmail.com>
    wrote:

    Hi all,
    I'd like to draw your attention to HBASE-8089. The desire is to add
    type
    support to HBase. There are two primary objectives: make the lives of
    developers building on HBase easier, and facilitate better tools on
    top
    of
    HBase. Please chime in with any feature suggestions you think we've missed
    in initial conversations.

    Thanks,
    -n

    [0]: https://issues.apache.org/**jira/browse/HBASE-8089<
    https://issues.apache.org/jira/browse/HBASE-8089>


    --
    // Jonathan Hsieh (shay)
    // Software Engineer, Cloudera
    // jon@cloudera.com
  • Michael Segel at Mar 19, 2013 at 12:32 am
    yup. Why break a good thing? ;-)
    On Mar 18, 2013, at 6:54 PM, Jonathan Hsieh wrote:

    +1. I really don't want to add typing specific information into hbase core
    -- howver, having buliding blocks, plugins, and extra metadata manage it
    seems quite reasonable to me.

    There are many many games that can be played to encode data and enforcing
    typing at the hbase level as opposed to library. (ex: putting in structs
    that have fields with ints as opposed to having tons of cols with ints in
    them, or how opentsdb encodes time stamps, etc..).

    Jon.
    On Fri, Mar 15, 2013 at 10:45 PM, lars hofhansl wrote:

    I think generally we should keep HBase a byte[] based key value store.
    What we should add to HBase are tools that would allow client side apps
    (or libraries) to built functionality on top of plain HBase.

    Serialization that maintains a correct semantic sort order is important as
    a building block, so is code that can build up correctly serialized and
    sortable compound keys, as well as hashing algorithms.

    Where I would draw the line is adding types to HBase itself. As long as
    one can write a client, or Filters, or Coprocessors with the tools provided
    by HBase we're good. Higher level functionality can then be built of on top
    of HBase.


    For example, maybe we need to add better access API to the HBase WAL in
    order to have an external library implement idempotent transactions (which
    can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow an
    external library to implement full transactions.
    Or we might need a statistics framework (such as the one that Jesse is
    working on).

    These are all building blocks that do not presume specific access patterns
    or clients, but can be used to implement them.


    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
    in HBASE-7221.

    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jtaylor@salesforce.com
    wrote:
    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been
    discussed.
    Do you think these should be supported directly, as opposed to using a
    fixed-length String or fixed-length byte[]?

    Thanks,
    Nick

    On Thu, Mar 14, 2013 at 9:51 AM, David Koch <ogdude@googlemail.com>
    wrote:

    Hi Nick,
    As an HBase user I would welcome this addition. In addition to the
    proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David


    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <ndimiduk@gmail.com>
    wrote:

    Hi all,
    I'd like to draw your attention to HBASE-8089. The desire is to add
    type
    support to HBase. There are two primary objectives: make the lives of
    developers building on HBase easier, and facilitate better tools on
    top
    of
    HBase. Please chime in with any feature suggestions you think we've missed
    in initial conversations.

    Thanks,
    -n

    [0]: https://issues.apache.org/**jira/browse/HBASE-8089<
    https://issues.apache.org/jira/browse/HBASE-8089>


    --
    // Jonathan Hsieh (shay)
    // Software Engineer, Cloudera
    // jon@cloudera.com
  • Nick Dimiduk at Mar 19, 2013 at 8:39 pm

    On Mon, Mar 18, 2013 at 4:54 PM, Jonathan Hsieh wrote:

    +1. I really don't want to add typing specific information into hbase core
    -- howver, having buliding blocks, plugins, and extra metadata manage it
    seems quite reasonable to me.

    There are many many games that can be played to encode data and enforcing
    typing at the hbase level as opposed to library. (ex: putting in structs
    that have fields with ints as opposed to having tons of cols with ints in
    them, or how opentsdb encodes time stamps, etc..).
    I'm not proposing deep integration with core. This stuff would exist in the
    client module only. The byte[] interfaces don't go away either; OpenTSDB
    can continue to perform its data encoding as is. This proposal seeks to
    enable less sophisticated data storage approaches, solve the common case in
    a reasonable way.

    -n
    On Fri, Mar 15, 2013 at 10:45 PM, lars hofhansl wrote:

    I think generally we should keep HBase a byte[] based key value store.
    What we should add to HBase are tools that would allow client side apps
    (or libraries) to built functionality on top of plain HBase.

    Serialization that maintains a correct semantic sort order is important as
    a building block, so is code that can build up correctly serialized and
    sortable compound keys, as well as hashing algorithms.

    Where I would draw the line is adding types to HBase itself. As long as
    one can write a client, or Filters, or Coprocessors with the tools provided
    by HBase we're good. Higher level functionality can then be built of on top
    of HBase.


    For example, maybe we need to add better access API to the HBase WAL in
    order to have an external library implement idempotent transactions (which
    can be used to implement 2ndary indexes).
    Maybe some other primitives have to be exposed in order to allow an
    external library to implement full transactions.
    Or we might need a statistics framework (such as the one that Jesse is
    working on).

    These are all building blocks that do not presume specific access patterns
    or clients, but can be used to implement them.


    As usual, just my $0.02.

    -- Lars



    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
    in HBASE-7221.

    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jtaylor@salesforce.com
    wrote:
    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been
    discussed.
    Do you think these should be supported directly, as opposed to using a
    fixed-length String or fixed-length byte[]?

    Thanks,
    Nick

    On Thu, Mar 14, 2013 at 9:51 AM, David Koch <ogdude@googlemail.com>
    wrote:

    Hi Nick,
    As an HBase user I would welcome this addition. In addition to the
    proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David


    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <ndimiduk@gmail.com>
    wrote:

    Hi all,
    I'd like to draw your attention to HBASE-8089. The desire is to add
    type
    support to HBase. There are two primary objectives: make the lives
    of
    developers building on HBase easier, and facilitate better tools on
    top
    of
    HBase. Please chime in with any feature suggestions you think we've missed
    in initial conversations.

    Thanks,
    -n

    [0]: https://issues.apache.org/**jira/browse/HBASE-8089<
    https://issues.apache.org/jira/browse/HBASE-8089>


    --
    // Jonathan Hsieh (shay)
    // Software Engineer, Cloudera
    // jon@cloudera.com
  • Nick Dimiduk at Mar 19, 2013 at 8:31 pm

    On Fri, Mar 15, 2013 at 10:45 PM, lars hofhansl wrote:

    I think generally we should keep HBase a byte[] based key value store.
    What we should add to HBase are tools that would allow client side apps
    (or libraries) to built functionality on top of plain HBase.
    That's precisely it. HBase is not changed in any fundamental way to
    acknowledge or enforce types. Instead, the hbase-client module makes type
    management easier for user code.

    Serialization that maintains a correct semantic sort order is important as
    a building block, so is code that can build up correctly serialized and
    sortable compound keys, as well as hashing algorithms.
    Agreed on serialization. Hashing I can do without. Yes it's a common
    practice, but IMHO, if you're hashing, you're not taking advantage of the
    natural distribution of your data. I think it's a lazy schema designer's
    approach. I see no problem with shipping with support for some hashing
    strategies if users demand, but I don't think it's a design approach we
    should encourage.

    Thanks,
    Nick

    ________________________________
    From: Nick Dimiduk <ndimiduk@gmail.com>
    To: user@hbase.apache.org
    Sent: Friday, March 15, 2013 10:57 AM
    Subject: Re: HBase type support

    I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
    in HBASE-7221.

    On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jtaylor@salesforce.com
    wrote:
    Hi Nick,
    What do you mean by "hashing algorithms"?
    Thanks,
    James

    On 03/15/2013 10:11 AM, Nick Dimiduk wrote:

    Hi David,

    Native support for a handful of hashing algorithms has also been
    discussed.
    Do you think these should be supported directly, as opposed to using a
    fixed-length String or fixed-length byte[]?

    Thanks,
    Nick

    On Thu, Mar 14, 2013 at 9:51 AM, David Koch <ogdude@googlemail.com>
    wrote:

    Hi Nick,
    As an HBase user I would welcome this addition. In addition to the
    proposed
    list of datatypes A UUID/GUID type would also be nice to have.

    Regards,

    /David


    On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <ndimiduk@gmail.com>
    wrote:

    Hi all,
    I'd like to draw your attention to HBASE-8089. The desire is to add
    type
    support to HBase. There are two primary objectives: make the lives of
    developers building on HBase easier, and facilitate better tools on
    top
    of
    HBase. Please chime in with any feature suggestions you think we've missed
    in initial conversations.

    Thanks,
    -n

    [0]: https://issues.apache.org/**jira/browse/HBASE-8089<
    https://issues.apache.org/jira/browse/HBASE-8089>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedMar 14, '13 at 4:52p
activeMar 19, '13 at 8:39p
posts26
users11
websitehbase.apache.org

People

Translate

site design / logo © 2021 Grokbase