FAQ
In my Hadoop 0.19.0 program each map function is assigned a directory
(representing a data location in my S3 datastore). The first thing each map
function does is copy the particular S3 data to the local machine that the
map task is running on and then being processing the data; e.g.,

command = "hadoop dfs -copyToLocal #{s3dir} #{localdir}"
system "#{command}"

In the above, "s3dir" is a directory that creates "localdir" - my
expectation is that "localdir" is created in the work directory for the
particular task attempt. Following this copy command I then run a function
that processes the data; e.g.,

processData(localdir)

In some instances my map/reduce program crashes and when I examine the logs
I get a message saying that "localdir" can not be found. This confuses me
since the hadoop shell command above is blocking so that localdir should
exist by the time processData() is called. I've found that if I add in some
diagnostic lines prior to processData() such as puts statements to print out
variables, I never run into the problem of the localdir not being found. It
is almost as if localdir needs time to be created before the call to
processData().

Has anyone encountered anything like this? Any suggestions on what could be
wrong are appreciated.

Thanks,
John

Search Discussions

  • Matei Zaharia at Feb 14, 2009 at 11:46 pm
    Have you logged the output of the dfs command to see whether it's always
    succeeded the copy?
    On Sat, Feb 14, 2009 at 2:46 PM, S D wrote:

    In my Hadoop 0.19.0 program each map function is assigned a directory
    (representing a data location in my S3 datastore). The first thing each map
    function does is copy the particular S3 data to the local machine that the
    map task is running on and then being processing the data; e.g.,

    command = "hadoop dfs -copyToLocal #{s3dir} #{localdir}"
    system "#{command}"

    In the above, "s3dir" is a directory that creates "localdir" - my
    expectation is that "localdir" is created in the work directory for the
    particular task attempt. Following this copy command I then run a function
    that processes the data; e.g.,

    processData(localdir)

    In some instances my map/reduce program crashes and when I examine the logs
    I get a message saying that "localdir" can not be found. This confuses me
    since the hadoop shell command above is blocking so that localdir should
    exist by the time processData() is called. I've found that if I add in some
    diagnostic lines prior to processData() such as puts statements to print
    out
    variables, I never run into the problem of the localdir not being found. It
    is almost as if localdir needs time to be created before the call to
    processData().

    Has anyone encountered anything like this? Any suggestions on what could be
    wrong are appreciated.

    Thanks,
    John
  • S D at Feb 15, 2009 at 10:52 pm
    I was not able to determine the command shell return value for

    hadoop dfs -copyToLocal #{s3dir} #{localdir}

    but I did print out several variables after the call and determined that the
    call apparently did not go through successfully. In particular, prior to my
    processData(localdir) command I use Ruby's puts to print out the contents of
    several directories including 'localdir' and '../localdir' - here is the
    weird thing: if I execute the following
    list = `ls -l "#{localdir}"`
    puts "List: #{list}"
    (where 'localdir' is the directory I need as an arg for processData) the
    processData command will execute properly. At first I thought that running
    the puts command was allowing enough time to elapse for a race condition to
    be avoided so that 'localdir' was ready when the processData command was
    called (I know that in certain ways that doesn't make sense given that
    hadoop dfs -copyToLocal blocks until it completes...) but then I tried other
    time consuming commands such as
    list = `ls -l "../#{localdir}"`
    puts "List: #{list}"
    and running processData(localdir) led to an error:
    'localdir' not found

    Any clues on what could be going on?

    Thanks,
    John


    On Sat, Feb 14, 2009 at 6:45 PM, Matei Zaharia wrote:

    Have you logged the output of the dfs command to see whether it's always
    succeeded the copy?
    On Sat, Feb 14, 2009 at 2:46 PM, S D wrote:

    In my Hadoop 0.19.0 program each map function is assigned a directory
    (representing a data location in my S3 datastore). The first thing each map
    function does is copy the particular S3 data to the local machine that the
    map task is running on and then being processing the data; e.g.,

    command = "hadoop dfs -copyToLocal #{s3dir} #{localdir}"
    system "#{command}"

    In the above, "s3dir" is a directory that creates "localdir" - my
    expectation is that "localdir" is created in the work directory for the
    particular task attempt. Following this copy command I then run a function
    that processes the data; e.g.,

    processData(localdir)

    In some instances my map/reduce program crashes and when I examine the logs
    I get a message saying that "localdir" can not be found. This confuses me
    since the hadoop shell command above is blocking so that localdir should
    exist by the time processData() is called. I've found that if I add in some
    diagnostic lines prior to processData() such as puts statements to print
    out
    variables, I never run into the problem of the localdir not being found. It
    is almost as if localdir needs time to be created before the call to
    processData().

    Has anyone encountered anything like this? Any suggestions on what could be
    wrong are appreciated.

    Thanks,
    John
  • Matei Zaharia at Feb 16, 2009 at 4:29 am
    I would capture the output of the dfs -copyToLocal command, because I still
    think that is the most likely cause of the data not making it. I don't know
    how to capture this output in Ruby but I'm sure it's possible. You want to
    capture both standard out and standard error.
    One other slim possibility is that if your localdir is a fixed absolute
    path, multiple map tasks on the machine may be trying to access it
    concurrently, and maybe one of them deletes it when it's done and one
    doesn't. Normally each task should run in its own temp directory though.
    On Sun, Feb 15, 2009 at 2:51 PM, S D wrote:

    I was not able to determine the command shell return value for

    hadoop dfs -copyToLocal #{s3dir} #{localdir}

    but I did print out several variables after the call and determined that
    the
    call apparently did not go through successfully. In particular, prior to my
    processData(localdir) command I use Ruby's puts to print out the contents
    of
    several directories including 'localdir' and '../localdir' - here is the
    weird thing: if I execute the following
    list = `ls -l "#{localdir}"`
    puts "List: #{list}"
    (where 'localdir' is the directory I need as an arg for processData) the
    processData command will execute properly. At first I thought that running
    the puts command was allowing enough time to elapse for a race condition to
    be avoided so that 'localdir' was ready when the processData command was
    called (I know that in certain ways that doesn't make sense given that
    hadoop dfs -copyToLocal blocks until it completes...) but then I tried
    other
    time consuming commands such as
    list = `ls -l "../#{localdir}"`
    puts "List: #{list}"
    and running processData(localdir) led to an error:
    'localdir' not found

    Any clues on what could be going on?

    Thanks,
    John


    On Sat, Feb 14, 2009 at 6:45 PM, Matei Zaharia wrote:

    Have you logged the output of the dfs command to see whether it's always
    succeeded the copy?
    On Sat, Feb 14, 2009 at 2:46 PM, S D wrote:

    In my Hadoop 0.19.0 program each map function is assigned a directory
    (representing a data location in my S3 datastore). The first thing each map
    function does is copy the particular S3 data to the local machine that the
    map task is running on and then being processing the data; e.g.,

    command = "hadoop dfs -copyToLocal #{s3dir} #{localdir}"
    system "#{command}"

    In the above, "s3dir" is a directory that creates "localdir" - my
    expectation is that "localdir" is created in the work directory for the
    particular task attempt. Following this copy command I then run a function
    that processes the data; e.g.,

    processData(localdir)

    In some instances my map/reduce program crashes and when I examine the logs
    I get a message saying that "localdir" can not be found. This confuses
    me
    since the hadoop shell command above is blocking so that localdir
    should
    exist by the time processData() is called. I've found that if I add in some
    diagnostic lines prior to processData() such as puts statements to
    print
    out
    variables, I never run into the problem of the localdir not being
    found.
    It
    is almost as if localdir needs time to be created before the call to
    processData().

    Has anyone encountered anything like this? Any suggestions on what
    could
    be
    wrong are appreciated.

    Thanks,
    John
  • S D at Feb 16, 2009 at 5:23 am
    I'm having difficulty capturing the output of any of the dfs commands
    (either in Ruby or on the command line). Supposedly the output is being
    sent to stdout yet just running any of the commands on the command line does
    not display the output nor does redirecting to a file (e.g., hadoop dfs
    -copyToLocal src dest > out.txt). I'm not sure what I'm missing here...

    John
    On Sun, Feb 15, 2009 at 11:28 PM, Matei Zaharia wrote:

    I would capture the output of the dfs -copyToLocal command, because I still
    think that is the most likely cause of the data not making it. I don't know
    how to capture this output in Ruby but I'm sure it's possible. You want to
    capture both standard out and standard error.
    One other slim possibility is that if your localdir is a fixed absolute
    path, multiple map tasks on the machine may be trying to access it
    concurrently, and maybe one of them deletes it when it's done and one
    doesn't. Normally each task should run in its own temp directory though.
    On Sun, Feb 15, 2009 at 2:51 PM, S D wrote:

    I was not able to determine the command shell return value for

    hadoop dfs -copyToLocal #{s3dir} #{localdir}

    but I did print out several variables after the call and determined that
    the
    call apparently did not go through successfully. In particular, prior to my
    processData(localdir) command I use Ruby's puts to print out the contents
    of
    several directories including 'localdir' and '../localdir' - here is the
    weird thing: if I execute the following
    list = `ls -l "#{localdir}"`
    puts "List: #{list}"
    (where 'localdir' is the directory I need as an arg for processData) the
    processData command will execute properly. At first I thought that running
    the puts command was allowing enough time to elapse for a race condition to
    be avoided so that 'localdir' was ready when the processData command was
    called (I know that in certain ways that doesn't make sense given that
    hadoop dfs -copyToLocal blocks until it completes...) but then I tried
    other
    time consuming commands such as
    list = `ls -l "../#{localdir}"`
    puts "List: #{list}"
    and running processData(localdir) led to an error:
    'localdir' not found

    Any clues on what could be going on?

    Thanks,
    John


    On Sat, Feb 14, 2009 at 6:45 PM, Matei Zaharia wrote:

    Have you logged the output of the dfs command to see whether it's
    always
    succeeded the copy?
    On Sat, Feb 14, 2009 at 2:46 PM, S D wrote:

    In my Hadoop 0.19.0 program each map function is assigned a directory
    (representing a data location in my S3 datastore). The first thing
    each
    map
    function does is copy the particular S3 data to the local machine
    that
    the
    map task is running on and then being processing the data; e.g.,

    command = "hadoop dfs -copyToLocal #{s3dir} #{localdir}"
    system "#{command}"

    In the above, "s3dir" is a directory that creates "localdir" - my
    expectation is that "localdir" is created in the work directory for
    the
    particular task attempt. Following this copy command I then run a function
    that processes the data; e.g.,

    processData(localdir)

    In some instances my map/reduce program crashes and when I examine
    the
    logs
    I get a message saying that "localdir" can not be found. This
    confuses
    me
    since the hadoop shell command above is blocking so that localdir
    should
    exist by the time processData() is called. I've found that if I add
    in
    some
    diagnostic lines prior to processData() such as puts statements to
    print
    out
    variables, I never run into the problem of the localdir not being
    found.
    It
    is almost as if localdir needs time to be created before the call to
    processData().

    Has anyone encountered anything like this? Any suggestions on what
    could
    be
    wrong are appreciated.

    Thanks,
    John

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 14, '09 at 10:46p
activeFeb 16, '09 at 5:23a
posts5
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

S D: 3 posts Matei Zaharia: 2 posts

People

Translate

site design / logo © 2022 Grokbase