FAQ
Guys,

As far as I know hadoop, I think, to copy the files to HDFS, first it needs
to be copied to the NameNode's local filesystem. Is it right ??
So does it mean that even if I have a hadoop cluster of 10 nodes with
overall capacity of 6TB, but if my NameNode's hard disk capacity is 500 GB,
I can not copy any file to HDFS greater than 500 GB ?

Is there any other way to directly copy to HDFS without copy the file to
namenode's local filesystem ?
What can be other ways to copy large files greater than namenode's disk
capacity ?

Thanks,
Praveenesh.

Search Discussions

  • Uma Maheswara Rao G 72686 at Sep 21, 2011 at 8:53 am
    Hi,

    You need not copy the files to NameNode.

    Hadoop provide Client code as well to copy the files.
    To copy the files from other node ( non dfs), you need to put the hadoop**.jar's into classpath and use the below code snippet.

    FileSystem fs =new DistributedFileSystem();
    fs.initialize("NAMENODE_URI", configuration);

    fs.copyFromLocal(srcPath, dstPath);

    using this API, you can copy the files from any machine.

    Regards,
    Uma





    ----- Original Message -----
    From: praveenesh kumar <praveenesh@gmail.com>
    Date: Wednesday, September 21, 2011 2:14 pm
    Subject: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Guys,

    As far as I know hadoop, I think, to copy the files to HDFS, first
    it needs
    to be copied to the NameNode's local filesystem. Is it right ??
    So does it mean that even if I have a hadoop cluster of 10 nodes with
    overall capacity of 6TB, but if my NameNode's hard disk capacity
    is 500 GB,
    I can not copy any file to HDFS greater than 500 GB ?

    Is there any other way to directly copy to HDFS without copy the
    file to
    namenode's local filesystem ?
    What can be other ways to copy large files greater than namenode's
    diskcapacity ?

    Thanks,
    Praveenesh.
  • Uma Maheswara Rao G 72686 at Sep 21, 2011 at 9:08 am
    For more understanding the flows, i would recommend you to go through once below docs
    http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace

    Regards,
    Uma

    ----- Original Message -----
    From: Uma Maheswara Rao G 72686 <maheswara@huawei.com>
    Date: Wednesday, September 21, 2011 2:36 pm
    Subject: Re: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Hi,

    You need not copy the files to NameNode.

    Hadoop provide Client code as well to copy the files.
    To copy the files from other node ( non dfs), you need to put the
    hadoop**.jar's into classpath and use the below code snippet.

    FileSystem fs =new DistributedFileSystem();
    fs.initialize("NAMENODE_URI", configuration);

    fs.copyFromLocal(srcPath, dstPath);

    using this API, you can copy the files from any machine.

    Regards,
    Uma





    ----- Original Message -----
    From: praveenesh kumar <praveenesh@gmail.com>
    Date: Wednesday, September 21, 2011 2:14 pm
    Subject: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Guys,

    As far as I know hadoop, I think, to copy the files to HDFS, first
    it needs
    to be copied to the NameNode's local filesystem. Is it right ??
    So does it mean that even if I have a hadoop cluster of 10 nodes
    with> overall capacity of 6TB, but if my NameNode's hard disk
    capacity
    is 500 GB,
    I can not copy any file to HDFS greater than 500 GB ?

    Is there any other way to directly copy to HDFS without copy the
    file to
    namenode's local filesystem ?
    What can be other ways to copy large files greater than
    namenode's
    diskcapacity ?

    Thanks,
    Praveenesh.
  • Praveenesh kumar at Sep 21, 2011 at 9:41 am
    So I want to copy the file from windows machine to linux namenode.
    How can I define NAMENODE_URI in the code you mention, if I want to
    copy data from windows machine to namenode machine ?

    Thanks,
    Praveenesh
    On Wed, Sep 21, 2011 at 2:37 PM, Uma Maheswara Rao G 72686 wrote:

    For more understanding the flows, i would recommend you to go through once
    below docs

    http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace

    Regards,
    Uma

    ----- Original Message -----
    From: Uma Maheswara Rao G 72686 <maheswara@huawei.com>
    Date: Wednesday, September 21, 2011 2:36 pm
    Subject: Re: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Hi,

    You need not copy the files to NameNode.

    Hadoop provide Client code as well to copy the files.
    To copy the files from other node ( non dfs), you need to put the
    hadoop**.jar's into classpath and use the below code snippet.

    FileSystem fs =new DistributedFileSystem();
    fs.initialize("NAMENODE_URI", configuration);

    fs.copyFromLocal(srcPath, dstPath);

    using this API, you can copy the files from any machine.

    Regards,
    Uma





    ----- Original Message -----
    From: praveenesh kumar <praveenesh@gmail.com>
    Date: Wednesday, September 21, 2011 2:14 pm
    Subject: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Guys,

    As far as I know hadoop, I think, to copy the files to HDFS, first
    it needs
    to be copied to the NameNode's local filesystem. Is it right ??
    So does it mean that even if I have a hadoop cluster of 10 nodes
    with> overall capacity of 6TB, but if my NameNode's hard disk
    capacity
    is 500 GB,
    I can not copy any file to HDFS greater than 500 GB ?

    Is there any other way to directly copy to HDFS without copy the
    file to
    namenode's local filesystem ?
    What can be other ways to copy large files greater than
    namenode's
    diskcapacity ?

    Thanks,
    Praveenesh.
  • Uma Maheswara Rao G 72686 at Sep 21, 2011 at 9:58 am
    When you start the NameNode in Linux Machine, it will listen on one address.You can configure that address in NameNode by using fs.default.name.
    From the clients, you can give this address to connect to your NameNode.
    initialize API will take URI and configuration.

    Assume if your NameNode is running on hdfs://10.18.52.63:9000

    Then you can caonnect to your NameNode like below.

    FileSystem fs =new DistributedFileSystem();
    fs.initialize(new URI("hdfs://10.18.52.63:9000/"), new Configuration());

    Please go through the below mentioned docs, you will more understanding.
    if I want to
    copy data from windows machine to namenode machine ?
    In DFS namenode will be responsible for only nameSpace.

    in simple words to understand quickly the flow:
    Clients will ask NameNode to give some DNs to copy the data. Then NN will create file entry in NameSpace and also will return the block entries based on client request. Then clients directly will connect to the DNs and copy the data.
    Reading data back also will the sameway.

    I hope you will understand better now :-)


    Regards,
    Uma

    ----- Original Message -----
    From: praveenesh kumar <praveenesh@gmail.com>
    Date: Wednesday, September 21, 2011 3:11 pm
    Subject: Re: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    So I want to copy the file from windows machine to linux namenode.
    How can I define NAMENODE_URI in the code you mention, if I want to
    copy data from windows machine to namenode machine ?

    Thanks,
    Praveenesh

    On Wed, Sep 21, 2011 at 2:37 PM, Uma Maheswara Rao G 72686 <
    maheswara@huawei.com> wrote:
    For more understanding the flows, i would recommend you to go
    through once
    below docs

    http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace>
    Regards,
    Uma

    ----- Original Message -----
    From: Uma Maheswara Rao G 72686 <maheswara@huawei.com>
    Date: Wednesday, September 21, 2011 2:36 pm
    Subject: Re: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Hi,

    You need not copy the files to NameNode.

    Hadoop provide Client code as well to copy the files.
    To copy the files from other node ( non dfs), you need to put the
    hadoop**.jar's into classpath and use the below code snippet.

    FileSystem fs =new DistributedFileSystem();
    fs.initialize("NAMENODE_URI", configuration);

    fs.copyFromLocal(srcPath, dstPath);

    using this API, you can copy the files from any machine.

    Regards,
    Uma





    ----- Original Message -----
    From: praveenesh kumar <praveenesh@gmail.com>
    Date: Wednesday, September 21, 2011 2:14 pm
    Subject: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Guys,

    As far as I know hadoop, I think, to copy the files to HDFS, first
    it needs
    to be copied to the NameNode's local filesystem. Is it right ??
    So does it mean that even if I have a hadoop cluster of 10 nodes
    with> overall capacity of 6TB, but if my NameNode's hard disk
    capacity
    is 500 GB,
    I can not copy any file to HDFS greater than 500 GB ?

    Is there any other way to directly copy to HDFS without copy the
    file to
    namenode's local filesystem ?
    What can be other ways to copy large files greater than
    namenode's
    diskcapacity ?

    Thanks,
    Praveenesh.
  • Praveenesh kumar at Sep 21, 2011 at 10:11 am
    Thanks a lot. I am trying to run the following code on my windows machine
    that is not part of cluster.
    **
    *public* *static* *void* main(String args[]) *throws* IOException,
    URISyntaxException

    {

    FileSystem fs =*new* DistributedFileSystem();

    fs.initialize(*new* URI("hdfs://162.192.100.53:54310/"), *new*Configuration());

    fs.copyFromLocalFile(*new* Path("C:\\Positive.txt"),*new* Path(
    "/user/hadoop/Positive.txt"));

    System.*out*.println("Done");

    }

    But I am getting the following exception :

    Exception in thread "main"
    org.apache.hadoop.security.AccessControlException:
    org.apache.hadoop.security.AccessControlException: Permission denied:
    user=DrWho, access=WRITE, inode="hadoop":hadoop:supergroup:rwxr-xr-x
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at
    sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at
    sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at
    org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96)
    at
    org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58)
    at
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:500)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:206)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:208)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1189)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1165)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1137)
    at com.musigma.hdfs.HdfsBackup.main(HdfsBackup.java:20)
    Caused by: org.apache.hadoop.ipc.RemoteException:
    org.apache.hadoop.security.AccessControlException: Permission denied:
    user=DrWho, access=WRITE, inode="hadoop":hadoop:supergroup:rwxr-xr-x
    at
    org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:176)
    at
    org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:157)
    at
    org.apache.hadoop.hdfs.server.namenode.PermissionChecker.checkPermission(PermissionChecker.java:105)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4702)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4672)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1048)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1002)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:381)
    at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
    at org.apache.hadoop.ipc.Client.call(Client.java:740)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy0.create(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.create(Unknown Source)
    at
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2833)
    ... 10 more
    As far as I know, the exception is coming because some other user is trying
    to access HDFS than my hadoop user.
    Does it mean I have to change permission ?
    or is there any other way to do it from java code ?

    Thanks,
    Praveenesh
    ---------- Forwarded message ----------
    From: Uma Maheswara Rao G 72686 <maheswara@huawei.com>
    Date: Wed, Sep 21, 2011 at 3:27 PM
    Subject: Re: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org


    When you start the NameNode in Linux Machine, it will listen on one
    address.You can configure that address in NameNode by using fs.default.name.
    From the clients, you can give this address to connect to your NameNode.
    initialize API will take URI and configuration.

    Assume if your NameNode is running on hdfs://10.18.52.63:9000

    Then you can caonnect to your NameNode like below.

    FileSystem fs =new DistributedFileSystem();
    fs.initialize(new URI("hdfs://10.18.52.63:9000/"), new Configuration());

    Please go through the below mentioned docs, you will more understanding.
    if I want to
    copy data from windows machine to namenode machine ?
    In DFS namenode will be responsible for only nameSpace.

    in simple words to understand quickly the flow:
    Clients will ask NameNode to give some DNs to copy the data. Then NN will
    create file entry in NameSpace and also will return the block entries based
    on client request. Then clients directly will connect to the DNs and copy
    the data.
    Reading data back also will the sameway.

    I hope you will understand better now :-)


    Regards,
    Uma

    ----- Original Message -----
    From: praveenesh kumar <praveenesh@gmail.com>
    Date: Wednesday, September 21, 2011 3:11 pm
    Subject: Re: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    So I want to copy the file from windows machine to linux namenode.
    How can I define NAMENODE_URI in the code you mention, if I want to
    copy data from windows machine to namenode machine ?

    Thanks,
    Praveenesh

    On Wed, Sep 21, 2011 at 2:37 PM, Uma Maheswara Rao G 72686 <
    maheswara@huawei.com> wrote:
    For more understanding the flows, i would recommend you to go
    through once
    below docs
    http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace
    Regards,
    Uma

    ----- Original Message -----
    From: Uma Maheswara Rao G 72686 <maheswara@huawei.com>
    Date: Wednesday, September 21, 2011 2:36 pm
    Subject: Re: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Hi,

    You need not copy the files to NameNode.

    Hadoop provide Client code as well to copy the files.
    To copy the files from other node ( non dfs), you need to put the
    hadoop**.jar's into classpath and use the below code snippet.

    FileSystem fs =new DistributedFileSystem();
    fs.initialize("NAMENODE_URI", configuration);

    fs.copyFromLocal(srcPath, dstPath);

    using this API, you can copy the files from any machine.

    Regards,
    Uma





    ----- Original Message -----
    From: praveenesh kumar <praveenesh@gmail.com>
    Date: Wednesday, September 21, 2011 2:14 pm
    Subject: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Guys,

    As far as I know hadoop, I think, to copy the files to HDFS, first
    it needs
    to be copied to the NameNode's local filesystem. Is it right ??
    So does it mean that even if I have a hadoop cluster of 10 nodes
    with> overall capacity of 6TB, but if my NameNode's hard disk
    capacity
    is 500 GB,
    I can not copy any file to HDFS greater than 500 GB ?

    Is there any other way to directly copy to HDFS without copy the
    file to
    namenode's local filesystem ?
    What can be other ways to copy large files greater than
    namenode's
    diskcapacity ?

    Thanks,
    Praveenesh.
  • Uma Maheswara Rao G 72686 at Sep 21, 2011 at 10:30 am
    Hello Praveenesh,

    If you really need not care about permissions then you can disable it at NN side by using the property dfs.permissions.enable

    You can the permission for the path before creating as well.

    from docs:
    Changes to the File System API
    All methods that use a path parameter will throw AccessControlException if permission checking fails.

    New methods:

    public FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException;
    public boolean mkdirs(Path f, FsPermission permission) throws IOException;
    public void setPermission(Path p, FsPermission permission) throws IOException;
    public void setOwner(Path p, String username, String groupname) throws IOException;
    public FileStatus getFileStatus(Path f) throws IOException; will additionally return the user, group and mode associated with the path.


    http://hadoop.apache.org/common/docs/r0.20.2/hdfs_permissions_guide.html


    Regards,
    Uma
    ----- Original Message -----
    From: praveenesh kumar <praveenesh@gmail.com>
    Date: Wednesday, September 21, 2011 3:41 pm
    Subject: Fwd: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Thanks a lot. I am trying to run the following code on my windows
    machinethat is not part of cluster.
    **
    *public* *static* *void* main(String args[]) *throws* IOException,
    URISyntaxException

    {

    FileSystem fs =*new* DistributedFileSystem();

    fs.initialize(*new* URI("hdfs://162.192.100.53:54310/"),
    *new*Configuration());
    fs.copyFromLocalFile(*new* Path("C:\\Positive.txt"),*new* Path(
    "/user/hadoop/Positive.txt"));

    System.*out*.println("Done");

    }

    But I am getting the following exception :

    Exception in thread "main"
    org.apache.hadoop.security.AccessControlException:
    org.apache.hadoop.security.AccessControlException: Permission denied:
    user=DrWho, access=WRITE, inode="hadoop":hadoop:supergroup:rwxr-xr-x
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
    Method) at
    sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at
    sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at
    org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96)
    at
    org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58)
    at
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2836)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:500)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:206)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:208)
    at
    org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1189) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1165)
    at
    org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1137) at com.musigma.hdfs.HdfsBackup.main(HdfsBackup.java:20)
    Caused by: org.apache.hadoop.ipc.RemoteException:
    org.apache.hadoop.security.AccessControlException: Permission denied:
    user=DrWho, access=WRITE, inode="hadoop":hadoop:supergroup:rwxr-xr-x
    at
    org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:176)
    at
    org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:157)
    at
    org.apache.hadoop.hdfs.server.namenode.PermissionChecker.checkPermission(PermissionChecker.java:105)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4702)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4672)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1048)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1002)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:381)
    at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
    at org.apache.hadoop.ipc.Client.call(Client.java:740)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy0.create(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.create(Unknown Source)
    at
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2833)
    ... 10 more
    As far as I know, the exception is coming because some other user
    is trying
    to access HDFS than my hadoop user.
    Does it mean I have to change permission ?
    or is there any other way to do it from java code ?

    Thanks,
    Praveenesh
    ---------- Forwarded message ----------
    From: Uma Maheswara Rao G 72686 <maheswara@huawei.com>
    Date: Wed, Sep 21, 2011 at 3:27 PM
    Subject: Re: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org


    When you start the NameNode in Linux Machine, it will listen on one
    address.You can configure that address in NameNode by using
    fs.default.name.
    From the clients, you can give this address to connect to your
    NameNode.
    initialize API will take URI and configuration.

    Assume if your NameNode is running on hdfs://10.18.52.63:9000

    Then you can caonnect to your NameNode like below.

    FileSystem fs =new DistributedFileSystem();
    fs.initialize(new URI("hdfs://10.18.52.63:9000/"), new
    Configuration());
    Please go through the below mentioned docs, you will more
    understanding.
    if I want to
    copy data from windows machine to namenode machine ?
    In DFS namenode will be responsible for only nameSpace.

    in simple words to understand quickly the flow:
    Clients will ask NameNode to give some DNs to copy the data.
    Then NN will
    create file entry in NameSpace and also will return the block
    entries based
    on client request. Then clients directly will connect to the DNs
    and copy
    the data.
    Reading data back also will the sameway.

    I hope you will understand better now :-)


    Regards,
    Uma

    ----- Original Message -----
    From: praveenesh kumar <praveenesh@gmail.com>
    Date: Wednesday, September 21, 2011 3:11 pm
    Subject: Re: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    So I want to copy the file from windows machine to linux namenode.
    How can I define NAMENODE_URI in the code you mention, if I want to
    copy data from windows machine to namenode machine ?

    Thanks,
    Praveenesh

    On Wed, Sep 21, 2011 at 2:37 PM, Uma Maheswara Rao G 72686 <
    maheswara@huawei.com> wrote:
    For more understanding the flows, i would recommend you to go
    through once
    below docs
    http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace
    Regards,
    Uma

    ----- Original Message -----
    From: Uma Maheswara Rao G 72686 <maheswara@huawei.com>
    Date: Wednesday, September 21, 2011 2:36 pm
    Subject: Re: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Hi,

    You need not copy the files to NameNode.

    Hadoop provide Client code as well to copy the files.
    To copy the files from other node ( non dfs), you need to
    put the
    hadoop**.jar's into classpath and use the below code snippet.

    FileSystem fs =new DistributedFileSystem();
    fs.initialize("NAMENODE_URI", configuration);

    fs.copyFromLocal(srcPath, dstPath);

    using this API, you can copy the files from any machine.

    Regards,
    Uma





    ----- Original Message -----
    From: praveenesh kumar <praveenesh@gmail.com>
    Date: Wednesday, September 21, 2011 2:14 pm
    Subject: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Guys,

    As far as I know hadoop, I think, to copy the files to HDFS, first
    it needs
    to be copied to the NameNode's local filesystem. Is it
    right ??
    So does it mean that even if I have a hadoop cluster of 10
    nodes> > > with> overall capacity of 6TB, but if my NameNode's
    hard disk
    capacity
    is 500 GB,
    I can not copy any file to HDFS greater than 500 GB ?

    Is there any other way to directly copy to HDFS without
    copy the
    file to
    namenode's local filesystem ?
    What can be other ways to copy large files greater than
    namenode's
    diskcapacity ?

    Thanks,
    Praveenesh.
  • Praveenesh kumar at Sep 21, 2011 at 11:00 am
    Thanks a lot..!!
    I guess I can play around with the permissions of dfs for a while.
    On Wed, Sep 21, 2011 at 3:59 PM, Uma Maheswara Rao G 72686 wrote:

    Hello Praveenesh,

    If you really need not care about permissions then you can disable it at NN
    side by using the property dfs.permissions.enable

    You can the permission for the path before creating as well.

    from docs:
    Changes to the File System API
    All methods that use a path parameter will throw AccessControlException if
    permission checking fails.

    New methods:

    public FSDataOutputStream create(Path f, FsPermission permission, boolean
    overwrite, int bufferSize, short replication, long blockSize, Progressable
    progress) throws IOException;
    public boolean mkdirs(Path f, FsPermission permission) throws IOException;
    public void setPermission(Path p, FsPermission permission) throws
    IOException;
    public void setOwner(Path p, String username, String groupname) throws
    IOException;
    public FileStatus getFileStatus(Path f) throws IOException; will
    additionally return the user, group and mode associated with the path.


    http://hadoop.apache.org/common/docs/r0.20.2/hdfs_permissions_guide.html


    Regards,
    Uma
    ----- Original Message -----
    From: praveenesh kumar <praveenesh@gmail.com>
    Date: Wednesday, September 21, 2011 3:41 pm
    Subject: Fwd: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Thanks a lot. I am trying to run the following code on my windows
    machinethat is not part of cluster.
    **
    *public* *static* *void* main(String args[]) *throws* IOException,
    URISyntaxException

    {

    FileSystem fs =*new* DistributedFileSystem();

    fs.initialize(*new* URI("hdfs://162.192.100.53:54310/"),
    *new*Configuration());
    fs.copyFromLocalFile(*new* Path("C:\\Positive.txt"),*new* Path(
    "/user/hadoop/Positive.txt"));

    System.*out*.println("Done");

    }

    But I am getting the following exception :

    Exception in thread "main"
    org.apache.hadoop.security.AccessControlException:
    org.apache.hadoop.security.AccessControlException: Permission denied:
    user=DrWho, access=WRITE, inode="hadoop":hadoop:supergroup:rwxr-xr-x
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
    Method) at
    sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at
    sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at
    org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96)
    at
    org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58)
    at
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2836)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:500)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:206)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:208)
    at
    org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1189)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1165)
    at
    org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1137)
    at com.musigma.hdfs.HdfsBackup.main(HdfsBackup.java:20)
    Caused by: org.apache.hadoop.ipc.RemoteException:
    org.apache.hadoop.security.AccessControlException: Permission denied:
    user=DrWho, access=WRITE, inode="hadoop":hadoop:supergroup:rwxr-xr-x
    at
    org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:176)
    at
    org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:157)
    at
    org.apache.hadoop.hdfs.server.namenode.PermissionChecker.checkPermission(PermissionChecker.java:105)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4702)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4672)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1048)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1002)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:381)
    at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
    at org.apache.hadoop.ipc.Client.call(Client.java:740)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy0.create(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.create(Unknown Source)
    at
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2833)
    ... 10 more
    As far as I know, the exception is coming because some other user
    is trying
    to access HDFS than my hadoop user.
    Does it mean I have to change permission ?
    or is there any other way to do it from java code ?

    Thanks,
    Praveenesh
    ---------- Forwarded message ----------
    From: Uma Maheswara Rao G 72686 <maheswara@huawei.com>
    Date: Wed, Sep 21, 2011 at 3:27 PM
    Subject: Re: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org


    When you start the NameNode in Linux Machine, it will listen on one
    address.You can configure that address in NameNode by using
    fs.default.name.
    From the clients, you can give this address to connect to your
    NameNode.
    initialize API will take URI and configuration.

    Assume if your NameNode is running on hdfs://10.18.52.63:9000

    Then you can caonnect to your NameNode like below.

    FileSystem fs =new DistributedFileSystem();
    fs.initialize(new URI("hdfs://10.18.52.63:9000/"), new
    Configuration());
    Please go through the below mentioned docs, you will more
    understanding.
    if I want to
    copy data from windows machine to namenode machine ?
    In DFS namenode will be responsible for only nameSpace.

    in simple words to understand quickly the flow:
    Clients will ask NameNode to give some DNs to copy the data.
    Then NN will
    create file entry in NameSpace and also will return the block
    entries based
    on client request. Then clients directly will connect to the DNs
    and copy
    the data.
    Reading data back also will the sameway.

    I hope you will understand better now :-)


    Regards,
    Uma

    ----- Original Message -----
    From: praveenesh kumar <praveenesh@gmail.com>
    Date: Wednesday, September 21, 2011 3:11 pm
    Subject: Re: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    So I want to copy the file from windows machine to linux namenode.
    How can I define NAMENODE_URI in the code you mention, if I want to
    copy data from windows machine to namenode machine ?

    Thanks,
    Praveenesh

    On Wed, Sep 21, 2011 at 2:37 PM, Uma Maheswara Rao G 72686 <
    maheswara@huawei.com> wrote:
    For more understanding the flows, i would recommend you to go
    through once
    below docs
    http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace
    Regards,
    Uma

    ----- Original Message -----
    From: Uma Maheswara Rao G 72686 <maheswara@huawei.com>
    Date: Wednesday, September 21, 2011 2:36 pm
    Subject: Re: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Hi,

    You need not copy the files to NameNode.

    Hadoop provide Client code as well to copy the files.
    To copy the files from other node ( non dfs), you need to
    put the
    hadoop**.jar's into classpath and use the below code snippet.

    FileSystem fs =new DistributedFileSystem();
    fs.initialize("NAMENODE_URI", configuration);

    fs.copyFromLocal(srcPath, dstPath);

    using this API, you can copy the files from any machine.

    Regards,
    Uma





    ----- Original Message -----
    From: praveenesh kumar <praveenesh@gmail.com>
    Date: Wednesday, September 21, 2011 2:14 pm
    Subject: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Guys,

    As far as I know hadoop, I think, to copy the files to HDFS, first
    it needs
    to be copied to the NameNode's local filesystem. Is it
    right ??
    So does it mean that even if I have a hadoop cluster of 10
    nodes> > > with> overall capacity of 6TB, but if my NameNode's
    hard disk
    capacity
    is 500 GB,
    I can not copy any file to HDFS greater than 500 GB ?

    Is there any other way to directly copy to HDFS without
    copy the
    file to
    namenode's local filesystem ?
    What can be other ways to copy large files greater than
    namenode's
    diskcapacity ?

    Thanks,
    Praveenesh.
  • Harsh J at Sep 21, 2011 at 12:38 pm
    Praveenesh,

    It should be understood, as a takeaway from this, that HDFS is a set
    of servers, like webservers are. You can send it a request, and you
    can expect a response. It is also an FS in the sense that it is
    designed to do FS like operations (hold inodes, read/write data), but
    primally it behaves like any other server would when you wanna
    communicate with it.

    When you load files into it, the mechanisms underneath are merely
    opening a TCP socket connection to the server(s) and writing packets
    through, and closing it down when done. Similarly, when reading out
    files as well. Of course the details are much more complex than a
    simple, single TCP connection, but that's how it works.

    Hope this helps you understand your Hadoop better ;-)
    On Wed, Sep 21, 2011 at 4:29 PM, praveenesh kumar wrote:
    Thanks a lot..!!
    I guess I can play around with the permissions of dfs for a while.

    On Wed, Sep 21, 2011 at 3:59 PM, Uma Maheswara Rao G 72686 <
    maheswara@huawei.com> wrote:
    Hello Praveenesh,

    If you really need not care about permissions then you can disable it at NN
    side by using the property dfs.permissions.enable

    You can the permission for the path before creating as well.

    from docs:
    Changes to the File System API
    All methods that use a path parameter will throw AccessControlException if
    permission checking fails.

    New methods:

    public FSDataOutputStream create(Path f, FsPermission permission, boolean
    overwrite, int bufferSize, short replication, long blockSize, Progressable
    progress) throws IOException;
    public boolean mkdirs(Path f, FsPermission permission) throws IOException;
    public void setPermission(Path p, FsPermission permission) throws
    IOException;
    public void setOwner(Path p, String username, String groupname) throws
    IOException;
    public FileStatus getFileStatus(Path f) throws IOException; will
    additionally return the user, group and mode associated with the path.


    http://hadoop.apache.org/common/docs/r0.20.2/hdfs_permissions_guide.html


    Regards,
    Uma
    ----- Original Message -----
    From: praveenesh kumar <praveenesh@gmail.com>
    Date: Wednesday, September 21, 2011 3:41 pm
    Subject: Fwd: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Thanks a lot. I am trying to run the following code on my windows
    machinethat is not part of cluster.
    **
    *public* *static* *void* main(String args[]) *throws* IOException,
    URISyntaxException

    {

    FileSystem fs =*new* DistributedFileSystem();

    fs.initialize(*new* URI("hdfs://162.192.100.53:54310/"),
    *new*Configuration());
    fs.copyFromLocalFile(*new* Path("C:\\Positive.txt"),*new* Path(
    "/user/hadoop/Positive.txt"));

    System.*out*.println("Done");

    }

    But I am getting the following exception :

    Exception in thread "main"
    org.apache.hadoop.security.AccessControlException:
    org.apache.hadoop.security.AccessControlException: Permission denied:
    user=DrWho, access=WRITE, inode="hadoop":hadoop:supergroup:rwxr-xr-x
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
    Method) at
    sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at
    sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at
    org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96)
    at
    org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58)
    at
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2836)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:500)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:206)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:208)
    at
    org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1189)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1165)
    at
    org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1137)
    at com.musigma.hdfs.HdfsBackup.main(HdfsBackup.java:20)
    Caused by: org.apache.hadoop.ipc.RemoteException:
    org.apache.hadoop.security.AccessControlException: Permission denied:
    user=DrWho, access=WRITE, inode="hadoop":hadoop:supergroup:rwxr-xr-x
    at
    org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:176)
    at
    org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:157)
    at
    org.apache.hadoop.hdfs.server.namenode.PermissionChecker.checkPermission(PermissionChecker.java:105)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4702)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4672)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1048)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1002)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:381)
    at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
    at org.apache.hadoop.ipc.Client.call(Client.java:740)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy0.create(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.create(Unknown Source)
    at
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2833)
    ... 10 more
    As far as I know, the exception is coming because some other user
    is trying
    to access HDFS than my hadoop user.
    Does it mean I have to change permission ?
    or is there any other way to do it from java code ?

    Thanks,
    Praveenesh
    ---------- Forwarded message ----------
    From: Uma Maheswara Rao G 72686 <maheswara@huawei.com>
    Date: Wed, Sep 21, 2011 at 3:27 PM
    Subject: Re: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org


    When you start the NameNode in Linux Machine, it will listen on one
    address.You can configure that address in NameNode by using
    fs.default.name.
    From the clients, you can give this address to connect to your
    NameNode.
    initialize API will take URI and configuration.

    Assume if your NameNode is running on hdfs://10.18.52.63:9000

    Then you can caonnect to your NameNode like below.

    FileSystem fs =new DistributedFileSystem();
    fs.initialize(new URI("hdfs://10.18.52.63:9000/"), new
    Configuration());
    Please go through the below mentioned docs, you will more
    understanding.
    if I want to
    copy data from windows machine to namenode machine ?
    In DFS namenode will be responsible for only nameSpace.

    in simple words to understand quickly the flow:
    Clients will ask NameNode to give some DNs to copy the data.
    Then NN will
    create file entry in NameSpace and also will return the block
    entries based
    on client request. Then clients directly will connect to the DNs
    and copy
    the data.
    Reading data back also will the sameway.

    I hope you will understand better now :-)


    Regards,
    Uma

    ----- Original Message -----
    From: praveenesh kumar <praveenesh@gmail.com>
    Date: Wednesday, September 21, 2011 3:11 pm
    Subject: Re: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    So I want to copy the file from windows machine to linux namenode.
    How can I define NAMENODE_URI in the code you mention, if I want to
    copy data from windows machine to namenode machine ?

    Thanks,
    Praveenesh

    On Wed, Sep 21, 2011 at 2:37 PM, Uma Maheswara Rao G 72686 <
    maheswara@huawei.com> wrote:
    For more understanding the flows, i would recommend you to go
    through once
    below docs
    http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace
    Regards,
    Uma

    ----- Original Message -----
    From: Uma Maheswara Rao G 72686 <maheswara@huawei.com>
    Date: Wednesday, September 21, 2011 2:36 pm
    Subject: Re: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Hi,

    You need not copy the files to NameNode.

    Hadoop provide Client code as well to copy the files.
    To copy the files from other node ( non dfs), you need to
    put the
    hadoop**.jar's into classpath and use the below code snippet.

    FileSystem fs =new DistributedFileSystem();
    fs.initialize("NAMENODE_URI", configuration);

    fs.copyFromLocal(srcPath, dstPath);

    using this API, you can copy the files from any machine.

    Regards,
    Uma





    ----- Original Message -----
    From: praveenesh kumar <praveenesh@gmail.com>
    Date: Wednesday, September 21, 2011 2:14 pm
    Subject: Any other way to copy to HDFS ?
    To: common-user@hadoop.apache.org
    Guys,

    As far as I know hadoop, I think, to copy the files to HDFS, first
    it needs
    to be copied to the NameNode's local filesystem. Is it
    right ??
    So does it mean that even if I have a hadoop cluster of 10
    nodes> > > with> overall capacity of 6TB, but if my NameNode's
    hard disk
    capacity
    is 500 GB,
    I can not copy any file to HDFS greater than 500 GB ?

    Is there any other way to directly copy to HDFS without
    copy the
    file to
    namenode's local filesystem ?
    What can be other ways to copy large files greater than
    namenode's
    diskcapacity ?

    Thanks,
    Praveenesh.


    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 21, '11 at 8:44a
activeSep 21, '11 at 12:38p
posts9
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase