I have a 7 node cluster. But there is one remote node(8th machine) within
the same LAN which holds some kind of data. Now, I need to place this data
into HDFS. This 8th machine is not a part of the hadoop
cluster(master/slave) config file.

So, what I have thought is::
-> Will get the Filesystem instance by using FileSystem api
-> Will get the local file's(remote machine's) instance by using the same
api by passing a different config file which simply states a tag of fs,

-> And then will simply use all the methods to copy and get the data back
from HDFS...
-> During the complete episode, I will have to take care of the proxy issues
for remote node to get connceted to Namenode.

Is this procedure correct?

Also, I am an undergraduate as of now. I want to be a part of this hadoop
project and get into its development of various sub projects undertaken. Can
that be feasible.??

Thanking You,

On Fri, Jun 5, 2009 at 11:19 PM, Alex Loddengaard wrote:


The throughput of HDFS is good, because each read is basically a stream
several hard drives (each hard drive holds a different block of the file,
and these blocks are distributed across many machines). That said, HDFS
does not have very good latency, at least compared to local file systems.

When you write a file using the HDFS client (whether it be Java or
bin/hadoop fs), the client and the name node coordinate to put your file on
various nodes in the cluster. When you use that same client to read data,
your client coordinates with the name node to get block locations for a
given file and does a HTTP GET request to fetch those blocks from the nodes
which store them.

You could in theory get data off of the local file system on your data
nodes, but this wouldn't make any sense, because the client does everything
for you already.

Hope this clears things up.


On Fri, Jun 5, 2009 at 12:53 AM, Sugandha Naolekar

Placing any kind of data into HDFS and then getting it back, can this
activity be fast? Also, the node of which I have to place the data in HDFS,
is a remote node. So then, will I have to use RPC mechnaism or simply cna
get the locla filesystem of that node and do the things?



Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 6 of 12 | next ›
Discussion Overview
groupcommon-user @
postedJun 5, '09 at 7:31a
activeJun 10, '09 at 5:27p



site design / logo © 2022 Grokbase