FAQ
Hi.

I'm trying to continuously write data to HDFS via OutputStream(), and want
to be able to read it at the same time from another client.

Problem is, that after the file is created on HDFS with size of 0, it stays
that way, and only fills up when I close the OutputStream().

Here is a simple code sample illustrating this issue:

try {

FSDataOutputStream out=fileSystem.create(new
Path("/test/test.bin")); // Here the file created with 0 size
for(int i=0;i<1000;i++)
{
out.write(1); // Still stays 0
out.flush(); // Even when I flush it out???
}

Thread.currentThread().sleep(10000);
out.close(); //Only here the file is updated
} catch (Exception e) {
e.printStackTrace();
}

So, two questions here:

1) How it's possible to write the files directly to HDFS, and have them
update there immedaitely?
2) Just for information, in this case, where the file content stays all the
time - on server local disk, in memory, etc...?

Thanks in advance.

Search Discussions

  • Tom White at May 26, 2009 at 12:42 pm
    This feature is not available yet, and is still under active
    discussion. (The current version of HDFS will make the previous block
    available to readers.) Michael Stack gave a good summary on the HBase
    dev list:

    http://mail-archives.apache.org/mod_mbox/hadoop-hbase-dev/200905.mbox/%3C7c962aed0905231601g533088ebj4a7a068505ba3f50@mail.gmail.com%3E

    Tom
    On Tue, May 26, 2009 at 12:08 PM, Stas Oskin wrote:
    Hi.

    I'm trying to continuously write data to HDFS via OutputStream(), and want
    to be able to read it at the same time from another client.

    Problem is, that after the file is created on HDFS with size of 0, it stays
    that way, and only fills up when I close the OutputStream().

    Here is a simple code sample illustrating this issue:

    try {

    FSDataOutputStream out=fileSystem.create(new
    Path("/test/test.bin")); // Here the file created with 0 size
    for(int i=0;i<1000;i++)
    {
    out.write(1); // Still stays 0
    out.flush(); // Even when I flush it out???
    }

    Thread.currentThread().sleep(10000);
    out.close(); //Only here the file is updated
    } catch (Exception e) {
    e.printStackTrace();
    }

    So, two questions here:

    1) How it's possible to write the files directly to HDFS, and have them
    update there immedaitely?
    2) Just for information, in this case, where the file content stays all the
    time - on server local disk, in memory, etc...?

    Thanks in advance.
  • Stas Oskin at May 26, 2009 at 12:52 pm
    Hi.

    Does it means there is no way to access the data being written to HDFS,
    while it's written?

    Where it's stored then via the writing - on cluster or on local disks?

    Thanks.

    2009/5/26 Tom White <tom@cloudera.com>
    This feature is not available yet, and is still under active
    discussion. (The current version of HDFS will make the previous block
    available to readers.) Michael Stack gave a good summary on the HBase
    dev list:


    http://mail-archives.apache.org/mod_mbox/hadoop-hbase-dev/200905.mbox/%3C7c962aed0905231601g533088ebj4a7a068505ba3f50@mail.gmail.com%3E

    Tom
    On Tue, May 26, 2009 at 12:08 PM, Stas Oskin wrote:
    Hi.

    I'm trying to continuously write data to HDFS via OutputStream(), and want
    to be able to read it at the same time from another client.

    Problem is, that after the file is created on HDFS with size of 0, it stays
    that way, and only fills up when I close the OutputStream().

    Here is a simple code sample illustrating this issue:

    try {

    FSDataOutputStream out=fileSystem.create(new
    Path("/test/test.bin")); // Here the file created with 0 size
    for(int i=0;i<1000;i++)
    {
    out.write(1); // Still stays 0
    out.flush(); // Even when I flush it out???
    }

    Thread.currentThread().sleep(10000);
    out.close(); //Only here the file is updated
    } catch (Exception e) {
    e.printStackTrace();
    }

    So, two questions here:

    1) How it's possible to write the files directly to HDFS, and have them
    update there immedaitely?
    2) Just for information, in this case, where the file content stays all the
    time - on server local disk, in memory, etc...?

    Thanks in advance.
  • Stas Oskin at May 26, 2009 at 6:38 pm
    Hi.

    You probably referring to the following paragraph?

    After some back and forth over a set of slides presented by Sanjay on
    work being done by Hairong as part of HADOOP-5744, "Revising append",
    the room settled on API3 from the list of options below as the
    priority feature needed by HADOOP 0.21.0. Readers must be able to
    read up to the last writer 'successful' flush. Its not important that
    the file length is 'inexact'.

    If I'm understand correctly, this, means the data actually gets written to
    cluster - but it's not visible until the block is closed.
    Work is ongoing to allow in version 0.21 to make the file visible on
    flush().

    Am I correct up to here?

    Regards.


    2009/5/26 Tom White <tom@cloudera.com>
    This feature is not available yet, and is still under active
    discussion. (The current version of HDFS will make the previous block
    available to readers.) Michael Stack gave a good summary on the HBase
    dev list:


    http://mail-archives.apache.org/mod_mbox/hadoop-hbase-dev/200905.mbox/%3C7c962aed0905231601g533088ebj4a7a068505ba3f50@mail.gmail.com%3E

    Tom

    On Tue, May 26, 2009 at 12:08 PM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    Hi.

    I'm trying to continuously write data to HDFS via OutputStream(), and want
    to be able to read it at the same time from another client.

    Problem is, that after the file is created on HDFS with size of 0, it stays
    that way, and only fills up when I close the OutputStream().

    Here is a simple code sample illustrating this issue:

    try {

    FSDataOutputStream out=fileSystem.create(new
    Path("/test/test.bin")); // Here the file created with 0 size
    for(int i=0;i<1000;i++)
    {
    out.write(1); // Still stays 0
    out.flush(); // Even when I flush it out???
    }

    Thread.currentThread().sleep(10000);
    out.close(); //Only here the file is updated
    } catch (Exception e) {
    e.printStackTrace();
    }

    So, two questions here:

    1) How it's possible to write the files directly to HDFS, and have them
    update there immedaitely?
    2) Just for information, in this case, where the file content stays all the
    time - on server local disk, in memory, etc...?

    Thanks in advance.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 26, '09 at 11:08a
activeMay 26, '09 at 6:38p
posts4
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Stas Oskin: 3 posts Tom White: 1 post

People

Translate

site design / logo © 2022 Grokbase