FAQ
MessageTo get around the small-file-problem (I have thousands of 2MB log files) I wrote
a class to convert all my log files into a single SequenceFile in
(Text key, BytesWritable value) format. That works fine. I can run this:

hadoop fs -text /my.seq |grep peemt114.log | head -1
10/07/08 15:02:10 INFO util.NativeCodeLoader: Loaded the native-hadoop library
10/07/08 15:02:10 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib
library
10/07/08 15:02:10 INFO compress.CodecPool: Got brand-new decompressor
peemt114.log 70 65 65 6d 74 31 31 34 09 .........[snip].......

which shows my file name key (peemt114.log)
and file contents value which appears to be converted to hex.
The hex values up to the first tab (09) translate to my hostname.

I'm trying to adapt my mapper to use the SequenceFile as input.

I changed the job's inputFormatClass to:
MyJob.setInputFormatClass(SequenceFileInputFormat.class);
and modified my mapper signature to:
public class MyMapper extends Mapper<Object, BytesWritable, Text, Text> {

but how do I convert the value back to Text? When I print out the key,values using:
System.out.printf("MAPPER INKEY: [%s]\n", key);
System.out.printf("MAPPER INVAL: [%s]\n", value.toString());
I get::
MAPPER INKEY: [peemt114.log]
MAPPER INVAL: [70 65 65 6d 74 31 31 34 09 .....[snip]......]

Alan

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 6 | next ›
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedJul 8, '10 at 1:49p
activeJul 9, '10 at 6:42p
posts6
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase