On Sep 14, 2008, at 7:15 PM, John Howland wrote:
If I want to read values out of input files as binary data, is this
what BytesWritable is for? yes
I've successfully run my first task that uses a SequenceFile for
output. Are there any examples of SequenceFile usage out there? I'd
like to see the full range of what SequenceFile can do.
If I want to read values out of input files as binary data, is this
what BytesWritable is for? yes
I've successfully run my first task that uses a SequenceFile for
output. Are there any examples of SequenceFile usage out there? I'd
like to see the full range of what SequenceFile can do.
uses sequence files as its input.
You should also probably look at the TFile package that Hong is writing.
https://issues.apache.org/jira/browse/HADOOP-3315
Once it is ready, it will likely be exactly what you are looking for.
What are the
trade-offs between record compression and block compression?
You pretty much always want block compression. The only place wheretrade-offs between record compression and block compression?
record compression is ok, is if your value is web pages or some other
huge chunk of text.
What are
the limits on the key and value sizes?
Large. I think I've see keys and/or values of around 50-100mb. Itthe limits on the key and value sizes?
certainly can't be bigger than 1g. I believe the TFile limit on keys
may be 64k.
How do you use the per-file
metadata?
It is just an application specific string to string map in the headermetadata?
of the file.
My intended use is to read files on a local filesystem into a
SequenceFile, with the value of each record being the contents of each
file. I hacked MultiFileWordCount to get the basic concept working...
You should also look at the Hadoop archives.SequenceFile, with the value of each record being the contents of each
file. I hacked MultiFileWordCount to get the basic concept working...
http://hadoop.apache.org/core/docs/r0.18.0/hadoop_archives.html
but I'd appreciate any advice from the experts. In particular, what's
the most efficient way to read data from an
InputStreamReader/BufferedReader into a BytesWritable object?
The easiest way is the way you've done it. You probably want to usethe most efficient way to read data from an
InputStreamReader/BufferedReader into a BytesWritable object?
lzo compression too.
-- Owen