Thanks for the thoughtful response (and code!).
support nulls. The idea is to provide an extensible set of interfaces,
think this will not box us into a corner later. That is, a mirroring
the relevant trade-offs.
a public github repo at http://github.com/hotpads/**data-tools<http://github.com/hotpads/data-tools
Contributions are welcome - i'm sure we all have this stuff laying
You can see I've bumped into the NULL problem in a few places:
Looking back, I think my latest opinion on the topic is to reject
nullability as the rule since it can cause unexpected behavior and
confusion. It's cleaner to provide a wrapper class (so both
plus NullableLongArrayList) that explicitly defines the behavior, and
a little more in performance. If the user can't find a pre-made wrapper
class, it's not very difficult for each user to provide their own
interpretation of null and check for it themselves.
If you reject nullability, the question becomes what to do in situations
where you're implementing existing interfaces that accept nullable
The LongArrayList above implements List<Long> which requires an
method. In the above implementation I chose to swap nulls with
Long.MIN_VALUE, however I'm now thinking it best to force the user to
that swap and then throw IllegalArgumentException if they pass null.
On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <email@example.com
HmmmŠ good question.
I think that fixed width support is important for a great many rowkey
constructs cases, so I'd rather see something like losing MIN_VALUE and
keeping fixed width.
On 4/1/13 2:00 PM, "Nick Dimiduk" wrote:
Thinking about data types and serialization. I think null support is
important characteristic for the serialized representations,
when considering the compound type. However, doing so in directly
incompatible with fixed-width representations for numerics. For
if we want to have a fixed-width signed long stored on 8-bytes, where
you put null? float and double types can cheat a little by folding
and positive NaN's into a single representation (this isn't strictly
correct!), leaving a place to represent null. In the long example
obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one.
will allocate an additional encoding which can be used for null. My
experience working with scientific data, however, makes me wince at
The variable-width encodings have it a little easier. There's already
enough going on that it's simpler to make room.
Remember, the final goal is to support order-preserving serialization.
imposes some limitations on our encoding strategies. For instance,
enough to simply encode null, it really needs to be encoded as 0x00 so
to sort lexicographically earlier than any other value.
What do you think? Any ideas, experiences, etc?