FAQ
I was hoping to use -inputformat SequenceFileAsTextInputFormat to process compressed sequencefiles in streaming jobs



However, using a python mapper that just echoes out each line as it gets, and numreducetasks=0 - here's what the streaming job output looks like:



SEQ^F org.apache.hadoop.io.IntWritable^Yorg.apache.hadoop.io.Text^A^A'org.apache.hadoop.io.compress.GzipCodec^@^@^@^@Z+r������^F�



So seems like the input file was not treated as sequencefile.



I must be missing some args - except don't understand what. Help appreciated ..



Thx,



Joydeep

Search Discussions

  • Runping Qi at Oct 26, 2007 at 5:00 pm
    Try to add the package name too:

    o.a.h.m. SequenceFileAsTextInputFormat

    Runping

    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Friday, October 26, 2007 12:30 AM
    To: hadoop-user@lucene.apache.org
    Subject: problems reading compressed sequencefiles in streaming (0.13.1)

    I was hoping to use -inputformat SequenceFileAsTextInputFormat to process
    compressed sequencefiles in streaming jobs



    However, using a python mapper that just echoes out each line as it gets,
    and numreducetasks=0 - here's what the streaming job output looks like:



    SEQ^F
    org.apache.hadoop.io.IntWritable^Yorg.apache.hadoop.io.Text^A^A'org.apache
    .hadoop.io.compress.GzipCodec^@^@^@^@Z+r������^F�



    So seems like the input file was not treated as sequencefile.



    I must be missing some args - except don't understand what. Help
    appreciated ..



    Thx,



    Joydeep
  • Joydeep Sen Sarma at Oct 26, 2007 at 5:20 pm
    Thanks!

    A gigantic Duh moment.

    -----Original Message-----
    From: Runping Qi
    Sent: Friday, October 26, 2007 9:59 AM
    To: hadoop-user@lucene.apache.org
    Subject: RE: problems reading compressed sequencefiles in streaming (0.13.1)


    Try to add the package name too:

    o.a.h.m. SequenceFileAsTextInputFormat

    Runping

    -----Original Message-----
    From: Joydeep Sen Sarma
    Sent: Friday, October 26, 2007 12:30 AM
    To: hadoop-user@lucene.apache.org
    Subject: problems reading compressed sequencefiles in streaming (0.13.1)

    I was hoping to use -inputformat SequenceFileAsTextInputFormat to process
    compressed sequencefiles in streaming jobs



    However, using a python mapper that just echoes out each line as it gets,
    and numreducetasks=0 - here's what the streaming job output looks like:



    SEQ^F
    org.apache.hadoop.io.IntWritable^Yorg.apache.hadoop.io.Text^A^A'org.apache
    .hadoop.io.compress.GzipCodec^@^@^@^@Z+r������^F�



    So seems like the input file was not treated as sequencefile.



    I must be missing some args - except don't understand what. Help
    appreciated ..



    Thx,



    Joydeep

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 26, '07 at 7:30a
activeOct 26, '07 at 5:20p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Joydeep Sen Sarma: 2 posts Runping Qi: 1 post

People

Translate

site design / logo © 2022 Grokbase