Grokbase Groups Pig user May 2011
FAQ
Context: I have a bunch of files living in HDFS, and I think my jobs are
failing on one of them... I want to output the files that the job is failing
on.

I thought that I could just make my own LoadFunc that followed the same
methodology as PigStorage, but caught exceptions and logged the file that
was given...this isn't working, however. I tried returning loadLocation, but
that is the globbed input, not the input to the mapper. I also tried reading
mapreduce.map.file.input and map.file.input from the Job given to
setLocation, but both were null... I think this is where some of my
ignorance as to pig's internal workings is coming into play, as I'm not sure
when files are deglobbed and the splits are actually read. I tried using
getLocations() from the PigSplit passed to prepareToRead but that was just
the glob as well...

My next thought would be to read make a RecordReader that outputs the file
associated with its splits (as I assume that this should have to have the
specific files it is processing?), but I thought I'd ask if there was a
cleaner way before doing that...

Thanks!
Jon

Search Discussions

  • Xiaomeng Wan at May 31, 2011 at 5:48 pm
    I asked a similar question before. Please see this thread

    http://mail-archives.apache.org/mod_mbox/pig-user/201103.mbox/%3CAANLkTimqkjAZfSTyW8u6S5Mi29a+=5u=ayVMuvoykacx@mail.gmail.com%3E

    Shawn
    On Tue, May 31, 2011 at 11:08 AM, Jonathan Coveney wrote:
    Context: I have a bunch of files living in HDFS, and I think my jobs are
    failing on one of them... I want to output the files that the job is failing
    on.

    I thought that I could just make my own LoadFunc that followed the same
    methodology as PigStorage, but caught exceptions and logged the file that
    was given...this isn't working, however. I tried returning loadLocation, but
    that is the globbed input, not the input to the mapper. I also tried reading
    mapreduce.map.file.input and map.file.input from the Job given to
    setLocation, but both were null... I think this is where some of my
    ignorance as to pig's internal workings is coming into play, as I'm not sure
    when files are deglobbed and the splits are actually read. I tried using
    getLocations() from the PigSplit passed to prepareToRead but that was just
    the glob as well...

    My next thought would be to read make a RecordReader that outputs the file
    associated with its splits (as I assume that this should have to have the
    specific files it is processing?), but I thought I'd ask if there was a
    cleaner way before doing that...

    Thanks!
    Jon
  • Jonathan Coveney at May 31, 2011 at 5:52 pm
    Thanks Xiaomeng!

    2011/5/31 Xiaomeng Wan <shawnwan@gmail.com>
    I asked a similar question before. Please see this thread


    http://mail-archives.apache.org/mod_mbox/pig-user/201103.mbox/%3CAANLkTimqkjAZfSTyW8u6S5Mi29a+=5u=ayVMuvoykacx@mail.gmail.com%3E

    Shawn
    On Tue, May 31, 2011 at 11:08 AM, Jonathan Coveney wrote:
    Context: I have a bunch of files living in HDFS, and I think my jobs are
    failing on one of them... I want to output the files that the job is failing
    on.

    I thought that I could just make my own LoadFunc that followed the same
    methodology as PigStorage, but caught exceptions and logged the file that
    was given...this isn't working, however. I tried returning loadLocation, but
    that is the globbed input, not the input to the mapper. I also tried reading
    mapreduce.map.file.input and map.file.input from the Job given to
    setLocation, but both were null... I think this is where some of my
    ignorance as to pig's internal workings is coming into play, as I'm not sure
    when files are deglobbed and the splits are actually read. I tried using
    getLocations() from the PigSplit passed to prepareToRead but that was just
    the glob as well...

    My next thought would be to read make a RecordReader that outputs the file
    associated with its splits (as I assume that this should have to have the
    specific files it is processing?), but I thought I'd ask if there was a
    cleaner way before doing that...

    Thanks!
    Jon

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMay 31, '11 at 5:08p
activeMay 31, '11 at 5:52p
posts3
users2
websitepig.apache.org

2 users in discussion

Jonathan Coveney: 2 posts Xiaomeng Wan: 1 post

People

Translate

site design / logo © 2022 Grokbase