FAQ
Hi Anton,

I get this error: (If i set replication factor to 1, the error goes away)

13/07/18 09:27:34 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
13/07/18 09:27:35 WARN hdfs.DFSClient: DataStreamer Exception
java.io.IOException: Failed to add a datanode. User may turn off this
feature by setting
dfs.client.block.write.replace-datanode-on-failure.policy in configuration,
where the current policy is DEFAULT. (Nodes: current=[127.0.0.1:50010],
original=[127.0.0.1:50010])
         at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:816)
         at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:876)
         at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
         at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:493)
13/07/18 09:27:35 WARN hdfs.DFSClient: Error while syncing
java.io.IOException: Failed to add a datanode. User may turn off this
feature by setting
dfs.client.block.write.replace-datanode-on-failure.policy in configuration,
where the current policy is DEFAULT. (Nodes: current=[127.0.0.1:50010],
original=[127.0.0.1:50010])
         at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:816)
         at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:876)
         at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
         at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:493)
Exception in thread "main" java.io.IOException: Failed to add a datanode.
  User may turn off this feature by setting
dfs.client.block.write.replace-datanode-on-failure.policy in configuration,
where the current policy is DEFAULT. (Nodes: current=[127.0.0.1:50010],
original=[127.0.0.1:50010])
         at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:816)
         at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:876)
         at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
         at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:493)


On Friday, 12 July 2013 13:36:13 UTC+1, Paul Birnie wrote:

Hi,

I wanted to try and see if I could append to a file on hdfs and use impala
to query the external table

result: it works (in that I can in parallel append to a text file and from
impala run queries whose results the appended lines in the text file

in order to get it to work i had to
enable dfs.support.append on hdfs
only one writer per file is supported
had to set replication of the appended file to 1 (not 3)
really dont know impala queries will perform once multiple impalad
nodes are accessing the single continually appended to data1.txt file

investigate further:
Q) is it correct that I had to set replication to 1 (in order to get
this to work)
Q) what are the performance implications of using impala in this way?

implementation details:

## Use cloudera manager safety value to enable append on hdfs
##http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Free/4.5.4/Cloudera-Manager-Free-Edition-User-Guide/cmfeug_topic_5_3.html
##
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>

## wrote some code capable of appending to an hdfs file
package com.test.hdfs;

import java.io.IOException;
import java.io.PrintWriter;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class FileAppend1 {


public static void main(String[] args) throws IOException {

if (args.length == 0) {

System.err.println("wrong argument list");
}

String uri = args[0];

// get the content user want to append
String content = args[1]; //"tradeappend0,EQUITY,book1,-6449";

int iterationCount = Integer.parseInt(args[2]);

// instantiate a configuration class
Configuration conf = new Configuration();

// get a HDFS filesystem instance
FileSystem fs = FileSystem.get(URI.create(uri), conf);


FSDataOutputStream fsout = fs.append(new Path(uri));


for ( int i = 0 ; i < iterationCount; i++ ){


fsout.writeBytes(content);
fsout.writeBytes("\n");

fsout.sync();
fsout.flush();


System.out.println( "wrote:" + i + "\n");
try {
Thread.sleep( 100 );
}
catch( InterruptedException ie){}

}
fsout.close();

fs.close();
}
}


## Deployed writer into the cloudera vm as hdfsexperiement-1.0.1.jar
##
## note: the use of hdfs://localhost as the hdfs service does not listen on a public ip address by default
## (I assume to prevent writes/reads from a hacker into the cloudera vm)
##
## note: its important to reference the exact same jar dependencies that are running on the cluster
##
java -cp "/usr/lib/hadoop/client-0.20/*:hdfsexperiement-1.0.1.jar" com.test.hdfs.FileAppend1 "hdfs://localhost/user/cloudera/mydb/day1/data1.txt" "tradeappend1,EQUITY,book1,-6449" 1000000

## In parallel ran the following in the impala-shell
##
[localhost.localdomain:21000] > select count(*) from mydb.day1;
Query: select count(*) from mydb.day1
Query finished, fetching results ...
+----------+
count(*) |
+----------+
43940 |
+----------+
Returned 1 row(s) in 7.36s



## Encountered ERROR: java.io.IOException: Failed to add a datanode.
## Solution:
http://stackoverflow.com/questions/15347799/java-io-ioexception-failed-to-add-a-datanode-hdfs-hadoop


[cloudera@localhost ~]$ hadoop dfs -setrep -R -w 1
/user/cloudera

DEPRECATED: Use of this script to execute hdfs command is
deprecated.

Instead use the hdfs command for it.

Replication 1 set: /user/cloudera/mydb/day1/data1.txt

Waiting for /user/cloudera/mydb/day1/data1.txt ... done


## Question: Is it possible to write a single hdfs files from multiple locations
## Answers: no, use a jms queue or a in memory queue to prepare and write data
##http://stackoverflow.com/questions/6389594/is-it-possible-to-append-to-hdfs-file-from-multiple-clients-in-parallel



Search Discussions

  • Paul Birnie at Jul 18, 2013 at 4:13 pm
    Hi Aaron,

    I get this error: (If i set replication factor to 1, the error goes away)

    13/07/18 09:27:34 WARN util.NativeCodeLoader: Unable to load native-hadoop
    library for your platform... using builtin-java classes where applicable
    13/07/18 09:27:35 WARN hdfs.DFSClient: DataStreamer Exception
    java.io.IOException: Failed to add a datanode. User may turn off this
    feature by setting
    dfs.client.block.write.replace-datanode-on-failure.policy in configuration,
    where the current policy is DEFAULT. (Nodes: current=[127.0.0.1:50010],
    original=[127.0.0.1:50010])
             at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:816)
             at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:876)
             at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
             at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:493)
    13/07/18 09:27:35 WARN hdfs.DFSClient: Error while syncing
    java.io.IOException: Failed to add a datanode. User may turn off this
    feature by setting
    dfs.client.block.write.replace-datanode-on-failure.policy in configuration,
    where the current policy is DEFAULT. (Nodes: current=[127.0.0.1:50010],
    original=[127.0.0.1:50010])
             at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:816)
             at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:876)
             at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
             at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:493)
    Exception in thread "main" java.io.IOException: Failed to add a datanode.
      User may turn off this feature by setting
    dfs.client.block.write.replace-datanode-on-failure.policy in configuration,
    where the current policy is DEFAULT. (Nodes: current=[127.0.0.1:50010],
    original=[127.0.0.1:50010])
             at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:816)
             at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:876)
             at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
             at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:493)
    On Friday, 12 July 2013 13:36:13 UTC+1, Paul Birnie wrote:

    Hi,

    I wanted to try and see if I could append to a file on hdfs and use impala
    to query the external table

    result: it works (in that I can in parallel append to a text file and from
    impala run queries whose results the appended lines in the text file

    in order to get it to work i had to
    enable dfs.support.append on hdfs
    only one writer per file is supported
    had to set replication of the appended file to 1 (not 3)
    really dont know impala queries will perform once multiple impalad
    nodes are accessing the single continually appended to data1.txt file

    investigate further:
    Q) is it correct that I had to set replication to 1 (in order to get
    this to work)
    Q) what are the performance implications of using impala in this way?

    implementation details:

    ## Use cloudera manager safety value to enable append on hdfs
    ##http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Free/4.5.4/Cloudera-Manager-Free-Edition-User-Guide/cmfeug_topic_5_3.html
    ##
    <property>
    <name>dfs.support.append</name>
    <value>true</value>
    </property>

    ## wrote some code capable of appending to an hdfs file
    package com.test.hdfs;

    import java.io.IOException;
    import java.io.PrintWriter;
    import java.net.URI;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FSDataOutputStream;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;

    public class FileAppend1 {


    public static void main(String[] args) throws IOException {

    if (args.length == 0) {

    System.err.println("wrong argument list");
    }

    String uri = args[0];

    // get the content user want to append
    String content = args[1]; //"tradeappend0,EQUITY,book1,-6449";

    int iterationCount = Integer.parseInt(args[2]);

    // instantiate a configuration class
    Configuration conf = new Configuration();

    // get a HDFS filesystem instance
    FileSystem fs = FileSystem.get(URI.create(uri), conf);


    FSDataOutputStream fsout = fs.append(new Path(uri));


    for ( int i = 0 ; i < iterationCount; i++ ){


    fsout.writeBytes(content);
    fsout.writeBytes("\n");

    fsout.sync();
    fsout.flush();


    System.out.println( "wrote:" + i + "\n");
    try {
    Thread.sleep( 100 );
    }
    catch( InterruptedException ie){}

    }
    fsout.close();

    fs.close();
    }
    }


    ## Deployed writer into the cloudera vm as hdfsexperiement-1.0.1.jar
    ##
    ## note: the use of hdfs://localhost as the hdfs service does not listen on a public ip address by default
    ## (I assume to prevent writes/reads from a hacker into the cloudera vm)
    ##
    ## note: its important to reference the exact same jar dependencies that are running on the cluster
    ##
    java -cp "/usr/lib/hadoop/client-0.20/*:hdfsexperiement-1.0.1.jar" com.test.hdfs.FileAppend1 "hdfs://localhost/user/cloudera/mydb/day1/data1.txt" "tradeappend1,EQUITY,book1,-6449" 1000000

    ## In parallel ran the following in the impala-shell
    ##
    [localhost.localdomain:21000] > select count(*) from mydb.day1;
    Query: select count(*) from mydb.day1
    Query finished, fetching results ...
    +----------+
    count(*) |
    +----------+
    43940 |
    +----------+
    Returned 1 row(s) in 7.36s



    ## Encountered ERROR: java.io.IOException: Failed to add a datanode.
    ## Solution:
    http://stackoverflow.com/questions/15347799/java-io-ioexception-failed-to-add-a-datanode-hdfs-hadoop


    [cloudera@localhost ~]$ hadoop dfs -setrep -R -w 1
    /user/cloudera

    DEPRECATED: Use of this script to execute hdfs command is
    deprecated.

    Instead use the hdfs command for it.

    Replication 1 set: /user/cloudera/mydb/day1/data1.txt

    Waiting for /user/cloudera/mydb/day1/data1.txt ... done


    ## Question: Is it possible to write a single hdfs files from multiple locations
    ## Answers: no, use a jms queue or a in memory queue to prepare and write data
    ##http://stackoverflow.com/questions/6389594/is-it-possible-to-append-to-hdfs-file-from-multiple-clients-in-parallel



  • Aaron T. Myers at Jul 18, 2013 at 9:53 pm
    Ah, all of those "127.0.0.1" indicate that your HDFS cluster's DNS is
    likely misconfigured in some way. The fact that the NN is handing back the
    IP address of the loopback interface to the DFSClient is resulting in the
    DFSClient only being able to write to a single DN, in particular the local
    DN. This explains why dropping the replication level down to 1 gets things
    to work.

    I suggest you email the cdh-user@ mailing list to get some help working out
    this issue itself. It appears to me to not be an issue with Impala at all.


    --
    Aaron T. Myers
    Software Engineer, Cloudera

    On Thu, Jul 18, 2013 at 9:12 AM, Paul Birnie wrote:

    Hi Anton,

    I get this error: (If i set replication factor to 1, the error goes away)

    13/07/18 09:27:34 WARN util.NativeCodeLoader: Unable to load native-hadoop
    library for your platform... using builtin-java classes where applicable
    13/07/18 09:27:35 WARN hdfs.DFSClient: DataStreamer Exception
    java.io.IOException: Failed to add a datanode. User may turn off this
    feature by setting
    dfs.client.block.write.replace-datanode-on-failure.policy in configuration,
    where the current policy is DEFAULT. (Nodes: current=[127.0.0.1:50010],
    original=[127.0.0.1:50010])
    at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:816)
    at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:876)
    at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
    at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:493)
    13/07/18 09:27:35 WARN hdfs.DFSClient: Error while syncing
    java.io.IOException: Failed to add a datanode. User may turn off this
    feature by setting
    dfs.client.block.write.replace-datanode-on-failure.policy in configuration,
    where the current policy is DEFAULT. (Nodes: current=[127.0.0.1:50010],
    original=[127.0.0.1:50010])
    at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:816)
    at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:876)
    at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
    at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:493)
    Exception in thread "main" java.io.IOException: Failed to add a datanode.
    User may turn off this feature by setting
    dfs.client.block.write.replace-datanode-on-failure.policy in configuration,
    where the current policy is DEFAULT. (Nodes: current=[127.0.0.1:50010],
    original=[127.0.0.1:50010])
    at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:816)
    at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:876)
    at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
    at
    org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:493)


    On Friday, 12 July 2013 13:36:13 UTC+1, Paul Birnie wrote:

    Hi,

    I wanted to try and see if I could append to a file on hdfs and use
    impala to query the external table

    result: it works (in that I can in parallel append to a text file and
    from impala run queries whose results the appended lines in the text file

    in order to get it to work i had to
    enable dfs.support.append on hdfs
    only one writer per file is supported
    had to set replication of the appended file to 1 (not 3)
    really dont know impala queries will perform once multiple impalad
    nodes are accessing the single continually appended to data1.txt file

    investigate further:
    Q) is it correct that I had to set replication to 1 (in order to get
    this to work)
    Q) what are the performance implications of using impala in this way?

    implementation details:

    ## Use cloudera manager safety value to enable append on hdfs
    ##http://www.cloudera.com/**content/cloudera-content/**cloudera-docs/CM4Free/4.5.4/**Cloudera-Manager-Free-Edition-**User-Guide/cmfeug_topic_5_3.**html <http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Free/4.5.4/Cloudera-Manager-Free-Edition-User-Guide/cmfeug_topic_5_3.html>
    ##
    <property>
    <name>dfs.support.append</**name>
    <value>true</value>
    </property>

    ## wrote some code capable of appending to an hdfs file
    package com.test.hdfs;

    import java.io.IOException;
    import java.io.PrintWriter;
    import java.net.URI;

    import org.apache.hadoop.conf.**Configuration;
    import org.apache.hadoop.fs.**FSDataOutputStream;
    import org.apache.hadoop.fs.**FileSystem;
    import org.apache.hadoop.fs.Path;

    public class FileAppend1 {


    public static void main(String[] args) throws IOException {

    if (args.length == 0) {

    System.err.println("wrong argument list");
    }

    String uri = args[0];

    // get the content user want to append
    String content = args[1]; //"tradeappend0,EQUITY,book1,-**6449";

    int iterationCount = Integer.parseInt(args[2]);

    // instantiate a configuration class
    Configuration conf = new Configuration();

    // get a HDFS filesystem instance
    FileSystem fs = FileSystem.get(URI.create(uri)**, conf);


    FSDataOutputStream fsout = fs.append(new Path(uri));


    for ( int i = 0 ; i < iterationCount; i++ ){


    fsout.writeBytes(content);
    fsout.writeBytes("\n");

    fsout.sync();
    fsout.flush();


    System.out.println( "wrote:" + i + "\n");
    try {
    Thread.sleep( 100 );
    }
    catch( InterruptedException ie){}

    }
    fsout.close();

    fs.close();
    }
    }


    ## Deployed writer into the cloudera vm as hdfsexperiement-1.0.1.jar
    ##
    ## note: the use of hdfs://localhost as the hdfs service does not listen on a public ip address by default
    ## (I assume to prevent writes/reads from a hacker into the cloudera vm)
    ##
    ## note: its important to reference the exact same jar dependencies that are running on the cluster
    ##
    java -cp "/usr/lib/hadoop/client-0.20/***:hdfsexperiement-1.0.1.jar" com.test.hdfs.FileAppend1 "hdfs://localhost/user/**cloudera/mydb/day1/data1.txt" "tradeappend1,EQUITY,book1,-**6449" 1000000

    ## In parallel ran the following in the impala-shell
    ##
    [localhost.localdomain:21000] > select count(*) from mydb.day1;
    Query: select count(*) from mydb.day1
    Query finished, fetching results ...
    +----------+
    count(*) |
    +----------+
    43940 |
    +----------+
    Returned 1 row(s) in 7.36s



    ## Encountered ERROR: java.io.IOException: Failed to add a datanode.
    ## Solution:
    http://stackoverflow.com/**questions/15347799/java-io-**ioexception-failed-to-add-a-**datanode-hdfs-hadoop <http://stackoverflow.com/questions/15347799/java-io-ioexception-failed-to-add-a-datanode-hdfs-hadoop>


    [cloudera@localhost ~]$ hadoop dfs -setrep -R -w 1
    /user/cloudera

    DEPRECATED: Use of this script to execute hdfs command is
    deprecated.

    Instead use the hdfs command for it.

    Replication 1 set: /user/cloudera/mydb/day1/**data1.txt

    Waiting for /user/cloudera/mydb/day1/**data1.txt ... done


    ## Question: Is it possible to write a single hdfs files from multiple locations
    ## Answers: no, use a jms queue or a in memory queue to prepare and write data
    ##http://stackoverflow.com/**questions/6389594/is-it-**possible-to-append-to-hdfs-**file-from-multiple-clients-in-**parallel <http://stackoverflow.com/questions/6389594/is-it-possible-to-append-to-hdfs-file-from-multiple-clients-in-parallel>



Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedJul 18, '13 at 4:12p
activeJul 18, '13 at 9:53p
posts3
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Paul Birnie: 2 posts Aaron T. Myers: 1 post

People

Translate

site design / logo © 2022 Grokbase