M pretty new to Hadoop .
I need to Sort a Metafile (TBs) and thought of using Hadoop Sort (in
examples) for it.
My input metafile looks like this --> binary stream (only 1's and 0's). It
basically contains records of 40 bytes.
Every record goes like this :
long a; <key> --> 8 bytes. The rest of the structure will be the <value> -->
I have created a *FpMetaId.java (extends BytesWritable)* corresponding to
the <value> and *FpMetadata.java (extends BytesWritable)* corresponding to
My sole aim is to get these records (40 bytes) sorted with the fp (double)
as the key. And I need to write these sorted records back into a metafile
(exactly my old metafile but with sorted records----> binaries only).
I also implemented ::
*MetafileInputFormat.java ( extends SequenceFileAsBinaryInputFormat) * --->
file making an input file format compatible to my record.
*MetafileOutputFormat<K, V> extends SequenceFileOutputFormat* ---> file
making the output file format compatible to my record.
SequenceFileAsBinaryInputFormat.SequenceFileAsBinaryRecordReader )* --->
file implementing the record reader compatible to my record.
MetafileRecordWriter class has been implemented with in my
Let me kindly get you through the sequence of events which followed :
1) I resolved all the errors in the writable classes (FpMetaId, FpMetadata)
and in/out formats (MetafileInputFormat, MetafileOutputFormat,) and
RecordReaders I implemented.
2) Writables I copied to /io folder. Other new files were copied to /mapred
folder. I successfully built it.
3) I modified the Sort file (the function I want to run with FpMetaId as key
and FpMetadata as value and imported these new classes in the file.) I
changed default conf settings to these required Writables and
RecordReaders.. I built hadoop using ant command after this. It successfully
*Q). Does this ensure all the new changes have got reflected on the jar. (
am I ready to go execute the sort function ?? )*
4) As I had already mentioned before, I am working with sequential file
format (binary) with a datastructure (key,value) repeating. So I wrote a C
code which generates random values for my datastructure and populated a file
, sequentially writing (binary) my (key,value)datastructure. I gave this as
my input for the sort which should sort my (key,values) with respect to
keys. I got the error : fp_input not a SequenceFile (fp_input is my input
file). I thought Seqfiles will just be stream of binaries.. Does it contain
any specific format ?
*Command used : bin/hadoop jar hadoop-0.20.2-examples.jar sort fp_input
*Q) What does this imply ? I have no clue how to proceed further. Again, is
it because my jar file used to execute doesnt have the latest libraries ? I
could not get any good tutorials on this.
It would be great if someone can offer an helping hand to this noob.