FAQ
Hello!

I am have following queries related to Hadoop::

-> Once I place my data in HDFS, it gets replicated and chunked
automatically over the datanodes. Right? Hadoop takes care of all those
things.

-> Now, if there is some third party who is not participating in the Hadoop
program. Means, he is not one of the nodes of hadoop cluster. Now, he has
some data on his local filesystem. Thus, can I place this data into HDFS?
How?

-> Then, now when, that third party asks for a file or a direcory or any
kind of data that was previously being dumped in HDFS without that third
person's knowledge- he wnats it back(wants to retrieve it). Thus, the data
should get placed on his local file system again, in some specific
directory. How can I do this?

-> Will I have to use Map-Reduce or something else ot make it work.

-> Also, if I write map reduce code for all the complete activity, how will
I fetch the data or the files that are chunked in HDFS in the form of blocks
and combine(reassemble) them into a complete file and place it on a node;s
local filesystem who is not a part of hadoop cluster setup.

Eagerly waiting for reply!

Thanking You,
Sugandha!



--
Regards!
Sugandha

Search Discussions

  • Sugandha Naolekar at Jun 5, 2009 at 7:33 am
    Hello!

    I am have following queries related to Hadoop::

    -> Once I place my data in HDFS, it gets replicated and chunked
    automatically over the datanodes. Right? Hadoop takes care of all those
    things.

    -> Now, if there is some third party who is not participating in the Hadoop
    program. Means, he is not one of the nodes of hadoop cluster. Now, he has
    some data on his local filesystem. Thus, can I place this data into HDFS?
    How?

    -> Then, now when, that third party asks for a file or a direcory or any
    kind of data that was previously being dumped in HDFS without that third
    person's knowledge- he wnats it back(wants to retrieve it). Thus, the data
    should get placed on his local file system again, in some specific
    directory. How can I do this?

    -> Will I have to use Map-Reduce or something else ot make it work.

    -> Also, if I write map reduce code for all the complete activity, how will
    I fetch the data or the files that are chunked in HDFS in the form of blocks
    and combine(reassemble) them into a complete file and place it on a node;s
    local filesystem who is not a part of hadoop cluster setup.

    Eagerly waiting for reply!

    Thanking You,
    Sugandha!



    --
    Regards!
    Sugandha
  • Tim robertson at Jun 5, 2009 at 7:43 am
    Answers inline
    -> Once I place my data in HDFS, it gets replicated and chunked
    automatically over the datanodes. Right? Hadoop takes care of all those
    things.
    Yes it does
    -> Now, if there is some third party who is not participating in the Hadoop
    program. Means, he is not one of the nodes of hadoop cluster. Now, he has
    some data on his local filesystem. Thus, can I place this data into HDFS?
    How?
    Using the HDFS interface you would put the data in (not dissimilar to
    FTP'ing it to a server)
    -> Then, now when, that third party asks for  a file or a direcory or any
    kind of data that was previously being dumped in HDFS without that third
    person's knowledge- he wnats it back(wants to retrieve it). Thus, the data
    should get placed on his local file system again, in some specific
    directory. How can I do this?
    Copy it out of HDFS onto the local file system - it is pretty much
    like copying from a mounted drive to a different mounted drive, just
    that you go through a hadoop command (distcp) and not a native command
    (cp)
    -> Will I have to use Map-Reduce or something else ot make it work.
    No. You could write a java program or use command line utilities.
    Maybe you can do it in other languages but I only do Java...
    -> Also, if I write map reduce code for all the complete activity, how will
    I fetch the data or the files that are chunked in HDFS in the form of blocks
    and combine(reassemble) them into a complete file and place it on a node;s
    local filesystem who is not a part of hadoop cluster setup.
    You don't write MR code for putting and retrieving files - this is all
    done by Hadoop for you. Just copy files in and copy files out.

    It's probably worth you reading the Hadoop command line guide:
    http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html to get
    an understanding of what you can do from the command line. All those
    command line utilities you can use programmatically (e.g. from code)
    as well.

    Cheers,

    Tim



    Eagerly waiting for reply!

    Thanking You,
    Sugandha!



    --
    Regards!
    Sugandha
  • Roldano Cattoni at Jun 10, 2009 at 4:34 pm
    A very basic question: which Java version is required for hadoop-0.19.1?

    With jre1.5.0_06 I get the error:
    java.lang.UnsupportedClassVersionError: Bad version number in .class file
    at java.lang.ClassLoader.defineClass1(Native Method)
    (..)

    By the way hadoop-0.17.2.1 was running successfully with jre1.5.0_06


    Thanks in advance for your kind help

    Roldano
  • Stuart White at Jun 10, 2009 at 4:37 pm
    Java 1.6.
    On Wed, Jun 10, 2009 at 11:33 AM, Roldano Cattoni wrote:

    A very basic question: which Java version is required for hadoop-0.19.1?

    With jre1.5.0_06 I get the error:
    java.lang.UnsupportedClassVersionError: Bad version number in .class file
    at java.lang.ClassLoader.defineClass1(Native Method)
    (..)

    By the way hadoop-0.17.2.1 was running successfully with jre1.5.0_06


    Thanks in advance for your kind help

    Roldano
  • Roldano Cattoni at Jun 10, 2009 at 5:03 pm
    It works, many thanks.

    Last question: is this information documented somewhere in the package? I
    was not able to find it.


    Roldano


    On Wed, Jun 10, 2009 at 06:37:08PM +0200, Stuart White wrote:
    Java 1.6.
    On Wed, Jun 10, 2009 at 11:33 AM, Roldano Cattoni wrote:

    A very basic question: which Java version is required for hadoop-0.19.1?

    With jre1.5.0_06 I get the error:
    java.lang.UnsupportedClassVersionError: Bad version number in .class file
    at java.lang.ClassLoader.defineClass1(Native Method)
    (..)

    By the way hadoop-0.17.2.1 was running successfully with jre1.5.0_06


    Thanks in advance for your kind help

    Roldano
  • Stuart White at Jun 10, 2009 at 5:07 pm
    http://hadoop.apache.org/core/docs/r0.19.1/quickstart.html#Required+Software

    On Wed, Jun 10, 2009 at 12:02 PM, Roldano Cattoni wrote:

    It works, many thanks.

    Last question: is this information documented somewhere in the package? I
    was not able to find it.


    Roldano


    On Wed, Jun 10, 2009 at 06:37:08PM +0200, Stuart White wrote:
    Java 1.6.
    On Wed, Jun 10, 2009 at 11:33 AM, Roldano Cattoni wrote:

    A very basic question: which Java version is required for
    hadoop-0.19.1?
    With jre1.5.0_06 I get the error:
    java.lang.UnsupportedClassVersionError: Bad version number in .class
    file
    at java.lang.ClassLoader.defineClass1(Native Method)
    (..)

    By the way hadoop-0.17.2.1 was running successfully with jre1.5.0_06


    Thanks in advance for your kind help

    Roldano
  • Roldano Cattoni at Jun 10, 2009 at 5:27 pm
    Thanks again, Stuart.

    I definitely need to search better ...

    Best

    Roldano

    On Wed, Jun 10, 2009 at 07:06:58PM +0200, Stuart White wrote:
    http://hadoop.apache.org/core/docs/r0.19.1/quickstart.html#Required+Software

    On Wed, Jun 10, 2009 at 12:02 PM, Roldano Cattoni wrote:

    It works, many thanks.

    Last question: is this information documented somewhere in the package? I
    was not able to find it.


    Roldano


    On Wed, Jun 10, 2009 at 06:37:08PM +0200, Stuart White wrote:
    Java 1.6.

    On Wed, Jun 10, 2009 at 11:33 AM, Roldano Cattoni <cattoni@fbk.eu>
    wrote:
    A very basic question: which Java version is required for
    hadoop-0.19.1?
    With jre1.5.0_06 I get the error:
    java.lang.UnsupportedClassVersionError: Bad version number in .class
    file
    at java.lang.ClassLoader.defineClass1(Native Method)
    (..)

    By the way hadoop-0.17.2.1 was running successfully with jre1.5.0_06


    Thanks in advance for your kind help

    Roldano
  • Sugandha Naolekar at Jun 5, 2009 at 7:53 am
    Hello!

    Placing any kind of data into HDFS and then getting it back, can this
    activity be fast? Also, the node of which I have to place the data in HDFS,
    is a remote node. So then, will I have to use RPC mechnaism or simply cna
    get the locla filesystem of that node and do the things?

    --
    Regards!
    Sugandha
  • Alex Loddengaard at Jun 5, 2009 at 5:50 pm
    Hi,

    The throughput of HDFS is good, because each read is basically a stream from
    several hard drives (each hard drive holds a different block of the file,
    and these blocks are distributed across many machines). That said, HDFS
    does not have very good latency, at least compared to local file systems.

    When you write a file using the HDFS client (whether it be Java or
    bin/hadoop fs), the client and the name node coordinate to put your file on
    various nodes in the cluster. When you use that same client to read data,
    your client coordinates with the name node to get block locations for a
    given file and does a HTTP GET request to fetch those blocks from the nodes
    which store them.

    You could in theory get data off of the local file system on your data
    nodes, but this wouldn't make any sense, because the client does everything
    for you already.

    Hope this clears things up.

    Alex

    On Fri, Jun 5, 2009 at 12:53 AM, Sugandha Naolekar
    wrote:
    Hello!

    Placing any kind of data into HDFS and then getting it back, can this
    activity be fast? Also, the node of which I have to place the data in HDFS,
    is a remote node. So then, will I have to use RPC mechnaism or simply cna
    get the locla filesystem of that node and do the things?

    --
    Regards!
    Sugandha
  • Sugandha Naolekar at Jun 8, 2009 at 7:07 am
    Hello!

    I have a 7 node cluster. But there is one remote node(8th machine) within
    the same LAN which holds some kind of data. Now, I need to place this data
    into HDFS. This 8th machine is not a part of the hadoop
    cluster(master/slave) config file.

    So, what I have thought is::
    -> Will get the Filesystem instance by using FileSystem api
    -> Will get the local file's(remote machine's) instance by using the same
    api by passing a different config file which simply states a tag of fs,
    default.name

    -> And then will simply use all the methods to copy and get the data back
    from HDFS...
    -> During the complete episode, I will have to take care of the proxy issues
    for remote node to get connceted to Namenode.

    Is this procedure correct?

    Also, I am an undergraduate as of now. I want to be a part of this hadoop
    project and get into its development of various sub projects undertaken. Can
    that be feasible.??

    Thanking You,

    On Fri, Jun 5, 2009 at 11:19 PM, Alex Loddengaard wrote:

    Hi,

    The throughput of HDFS is good, because each read is basically a stream
    from
    several hard drives (each hard drive holds a different block of the file,
    and these blocks are distributed across many machines). That said, HDFS
    does not have very good latency, at least compared to local file systems.

    When you write a file using the HDFS client (whether it be Java or
    bin/hadoop fs), the client and the name node coordinate to put your file on
    various nodes in the cluster. When you use that same client to read data,
    your client coordinates with the name node to get block locations for a
    given file and does a HTTP GET request to fetch those blocks from the nodes
    which store them.

    You could in theory get data off of the local file system on your data
    nodes, but this wouldn't make any sense, because the client does everything
    for you already.

    Hope this clears things up.

    Alex

    On Fri, Jun 5, 2009 at 12:53 AM, Sugandha Naolekar
    wrote:
    Hello!

    Placing any kind of data into HDFS and then getting it back, can this
    activity be fast? Also, the node of which I have to place the data in HDFS,
    is a remote node. So then, will I have to use RPC mechnaism or simply cna
    get the locla filesystem of that node and do the things?

    --
    Regards!
    Sugandha


    --
    Regards!
    Sugandha
  • Alex Loddengaard at Jun 8, 2009 at 3:28 pm
    If you're going to be doing ad-hoc HDFS puts and gets, then you should just
    use the Hadoop command line tool, bin/hadoop. Otherwise, you can use the
    Java API to read and write files, etc.

    As for contributing to Hadoop and its ecosystem, everything is open source
    and open for contributions. You should find a JIRA that looks like a good
    starting point (something simple, yet something that will get you familiar
    with the code base), make a patch that fixes the issue, submit the patch,
    and apply the feedback you get from the community to create further patches.

    Alex

    On Mon, Jun 8, 2009 at 12:06 AM, Sugandha Naolekar
    wrote:
    Hello!

    I have a 7 node cluster. But there is one remote node(8th machine) within
    the same LAN which holds some kind of data. Now, I need to place this data
    into HDFS. This 8th machine is not a part of the hadoop
    cluster(master/slave) config file.

    So, what I have thought is::
    -> Will get the Filesystem instance by using FileSystem api
    -> Will get the local file's(remote machine's) instance by using the same
    api by passing a different config file which simply states a tag of fs,
    default.name

    -> And then will simply use all the methods to copy and get the data back
    from HDFS...
    -> During the complete episode, I will have to take care of the proxy
    issues
    for remote node to get connceted to Namenode.

    Is this procedure correct?

    Also, I am an undergraduate as of now. I want to be a part of this hadoop
    project and get into its development of various sub projects undertaken.
    Can
    that be feasible.??

    Thanking You,

    On Fri, Jun 5, 2009 at 11:19 PM, Alex Loddengaard wrote:

    Hi,

    The throughput of HDFS is good, because each read is basically a stream
    from
    several hard drives (each hard drive holds a different block of the file,
    and these blocks are distributed across many machines). That said, HDFS
    does not have very good latency, at least compared to local file systems.

    When you write a file using the HDFS client (whether it be Java or
    bin/hadoop fs), the client and the name node coordinate to put your file on
    various nodes in the cluster. When you use that same client to read data,
    your client coordinates with the name node to get block locations for a
    given file and does a HTTP GET request to fetch those blocks from the nodes
    which store them.

    You could in theory get data off of the local file system on your data
    nodes, but this wouldn't make any sense, because the client does
    everything
    for you already.

    Hope this clears things up.

    Alex

    On Fri, Jun 5, 2009 at 12:53 AM, Sugandha Naolekar
    wrote:
    Hello!

    Placing any kind of data into HDFS and then getting it back, can this
    activity be fast? Also, the node of which I have to place the data in HDFS,
    is a remote node. So then, will I have to use RPC mechnaism or simply
    cna
    get the locla filesystem of that node and do the things?

    --
    Regards!
    Sugandha


    --
    Regards!
    Sugandha

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 5, '09 at 7:31a
activeJun 10, '09 at 5:27p
posts12
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase