FAQ
Hi All,

How can I determine if a file is being written to (by any thread) in HDFS. I
have a continuous process on the master node, which is tracking a particular
folder in HDFS for files to process. On the slave nodes, I am creating files
in the same folder using the following code :

At the slave node:

import org.apache.commons.io.IOUtils;
import org.apache.hadoop.fs.FileSystem;
import java.io.OutputStream;

OutputStream oStream = fileSystem.create(path);
IOUtils.write(<Some String>, oStream);
IOUtils.closeQuietly(oStream);


At the master node,
I am getting the earliest modified file in the folder. At times when I try
reading the file, I get nothing in the file, mostly because the slave might
be still finishing writing to the file. Is there any way, to somehow tell
the master, that the slave is still writing to the file and to check the
file sometime later for actual content.

Thanks,
--


Nitin Khandelwal

Search Discussions

  • Joey Echeverria at Jul 28, 2011 at 12:23 pm
    How about having the slave write to temp file first, then move it to the file the master is monitoring for after they close it?

    -Joey


    On Jul 27, 2011, at 22:51, Nitin Khandelwal wrote:

    Hi All,

    How can I determine if a file is being written to (by any thread) in HDFS. I
    have a continuous process on the master node, which is tracking a particular
    folder in HDFS for files to process. On the slave nodes, I am creating files
    in the same folder using the following code :

    At the slave node:

    import org.apache.commons.io.IOUtils;
    import org.apache.hadoop.fs.FileSystem;
    import java.io.OutputStream;

    OutputStream oStream = fileSystem.create(path);
    IOUtils.write(<Some String>, oStream);
    IOUtils.closeQuietly(oStream);


    At the master node,
    I am getting the earliest modified file in the folder. At times when I try
    reading the file, I get nothing in the file, mostly because the slave might
    be still finishing writing to the file. Is there any way, to somehow tell
    the master, that the slave is still writing to the file and to check the
    file sometime later for actual content.

    Thanks,
    --


    Nitin Khandelwal
  • George Datskos at Jul 29, 2011 at 12:25 am
    Nitin,
    On 2011/07/28 14:51, Nitin Khandelwal wrote:
    How can I determine if a file is being written to (by any thread) in HDFS.
    That information is exposed by the NameNode http servlet. You can
    obtain it with the
    fsck tool (hadoop fsck /path/to/dir -openforwrite) or you can do an http get

    http://namenode:port/fsck?path=/your/path&openforwrite=1


    George

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 28, '11 at 5:52a
activeJul 29, '11 at 12:25a
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase