FAQ
Scenario:
Hadoop version: 0.20.2
MR coding will be done in java.


Just starting out with my first Hadoop setup. I would like to know are
there any best practice ways to load data into the dfs? I have
(obviously) manually put data files into hdfs using the shell commands
while playing with it at setup but going forward I will want to be
retrieving large numbers of data feeds from remote, 3rd party locations
and throwing them into hadoop for analysis later. What is the best way
to automate this? Is it to gather the retrieved files into known
locations to be mounted and then automate via script etc. to put the
files into hdfs? Or is there some other practice? I've not been able to
find specific use case yet... all docs cover the basic fs command
without giving much details about more advanced setups.

thanks for any info

regards

Search Discussions

  • Jeff Hammerbacher at Jul 20, 2010 at 7:29 am
    Hey Urckle,

    I'm biased, but I'd recommend checking out Sqoop (
    http://github.com/cloudera/sqoop) for moving data from RDBMS systems into
    HDFS/Hive/HBase and Flume (http://github.com/cloudera/flume) for moving log
    files into HDFS/Hive/HBase.

    For moving large sets of files into HDFS, I think distcp (
    http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320/distcp.html) is your
    best bet.

    Thanks,
    Jeff
    On Fri, Jul 16, 2010 at 4:51 AM, Urckle wrote:

    Scenario:
    Hadoop version: 0.20.2
    MR coding will be done in java.


    Just starting out with my first Hadoop setup. I would like to know are
    there any best practice ways to load data into the dfs? I have (obviously)
    manually put data files into hdfs using the shell commands while playing
    with it at setup but going forward I will want to be retrieving large
    numbers of data feeds from remote, 3rd party locations and throwing them
    into hadoop for analysis later. What is the best way to automate this? Is it
    to gather the retrieved files into known locations to be mounted and then
    automate via script etc. to put the files into hdfs? Or is there some other
    practice? I've not been able to find specific use case yet... all docs cover
    the basic fs command without giving much details about more advanced setups.

    thanks for any info

    regards

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 20, '10 at 6:20a
activeJul 20, '10 at 7:29a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Jeff Hammerbacher: 1 post Urckle: 1 post

People

Translate

site design / logo © 2022 Grokbase