|
Paul Sutter |
at Jun 26, 2006 at 11:35 pm
|
⇧ |
| |
I agree, there's no easy way around this one without separate interfaces
(one where the caller keeps the counts, and one where the writable keeps the
counts), and that would be silly.
However -> It still seems to me that the key length in the sequence file is
redundant. Since each key must write its own length, know its own length,
or be able to figure it out - even via the high speed interface - there's no
reason to have that key length in the file.
Why do I care about 4 bytes per record? Because we're integrating an
external sort, and right now it has to look at a record with two key
lengths. And I assume that others (such as Yahoo) will want to incorporate
an external sort. And if we're going to be reading the sequence file in
another language, we might as well be sure about the format to use.
Thanks!
Paul
On 6/26/06, Doug Cutting wrote:Eric Baldeschwieler wrote:
Can we turn this around and assume that writables will be given a stream
and a length when they read? That would also let us remove redundant
info...
Unless I misunderstand, that would make it harder to nest writables,
since all containers would need to store the length. Currently only
top-level containers (SequenceFile and the RPC protocol) need to write
lengths. Even these are optional, used only to optimize things.
Doug