FAQ
How do I go about uploading content from a remote machine to the
hadoop cluster? Do I have to first move the data to one of the nodes and
then do a fs -put or is there some client I can use to just access an
existing cluster?

Thanks

Search Discussions

  • Harsh J at Sep 6, 2010 at 9:03 pm
    Java: You can use a DFSClient instance with a proper config object
    (Configuration) from right about anywhere - basically all that matters is
    the right fs.default.name value, which is your namenode's communication
    point.

    Can even use hadoop installation's 'bin/hadoop dfs' on a remote node
    (without it acting as a proper node i.e. not in slaves or masters list) if
    you want to use the scripts.

    On 7 Sep 2010 01:43, "Mark" wrote:

    How do I go about uploading content from a remote machine to the hadoop
    cluster? Do I have to first move the data to one of the nodes and then do a
    fs -put or is there some client I can use to just access an existing
    cluster?

    Thanks
  • Mark at Sep 6, 2010 at 9:15 pm
    Thanks.. ill give that a try
    On 9/6/10 2:02 PM, Harsh J wrote:
    Java: You can use a DFSClient instance with a proper config object
    (Configuration) from right about anywhere - basically all that matters is
    the right fs.default.name value, which is your namenode's communication
    point.

    Can even use hadoop installation's 'bin/hadoop dfs' on a remote node
    (without it acting as a proper node i.e. not in slaves or masters list) if
    you want to use the scripts.

    On 7 Sep 2010 01:43, "Mark"wrote:

    How do I go about uploading content from a remote machine to the hadoop
    cluster? Do I have to first move the data to one of the nodes and then do a
    fs -put or is there some client I can use to just access an existing
    cluster?

    Thanks
  • Alex Baranau at Sep 7, 2010 at 7:06 am
    You might find this useful as well:

    "What are the ways of importing data to HDFS from remote locations? I need
    this process to be well-managed and automated.

    Here are just some of the options. First you should look at available HDFS
    shell commands. For large inter/intra-cluster copying distcp might work best
    for you. For moving data from RDBMS system you should check Sqoop. To
    automate moving (constantly produced) data from many different locations
    refer to Flume. You might also want to look at Chukwa (data collection
    system for monitoring large distributed systems) and Scribe (server for
    aggregating log data streamed in real time from a large number of servers)."

    (see http://blog.sematext.com/2010/08/02/hadoop-digest-july-2010/ with
    better formatting and links ;))

    Alex Baranau
    ----
    Sematext :: http://sematext.com/ :: Solr - Lucene - Nautch - Hadoop - HBase
    Hadoop ecosystem search :: http://search-hadoop.com/
    On Tue, Sep 7, 2010 at 12:14 AM, Mark wrote:

    Thanks.. ill give that a try

    On 9/6/10 2:02 PM, Harsh J wrote:

    Java: You can use a DFSClient instance with a proper config object
    (Configuration) from right about anywhere - basically all that matters is
    the right fs.default.name value, which is your namenode's communication
    point.

    Can even use hadoop installation's 'bin/hadoop dfs' on a remote node
    (without it acting as a proper node i.e. not in slaves or masters list) if
    you want to use the scripts.

    On 7 Sep 2010 01:43, "Mark"wrote:

    How do I go about uploading content from a remote machine to the hadoop
    cluster? Do I have to first move the data to one of the nodes and then do
    a
    fs -put or is there some client I can use to just access an existing
    cluster?

    Thanks

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 6, '10 at 8:13p
activeSep 7, '10 at 7:06a
posts4
users3
websitehadoop.apache.org...
irc#hadoop

3 users in discussion

Mark: 2 posts Harsh J: 1 post Alex Baranau: 1 post

People

Translate

site design / logo © 2022 Grokbase