Grokbase Groups Pig user June 2011
FAQ
I was wondering what a good approach would be to the following: On each node
in a Hadoop cluster I have the same directory with different log files in
them (in the local filesystem, not hdfs). I'd like to load these files such
that each node in the cluster is mapping over the files in their version of
the directory. Are there existing LoadFuncs that would support this?

Search Discussions

  • Dylan Scott at Jun 19, 2011 at 5:19 pm
    I was wondering what a good approach would be to the following: On each node
    in a Hadoop cluster I have the same directory with different log files in
    them (in the local filesystem, not hdfs). I'd like to load these files such
    that each node in the cluster is mapping over the files in their version of
    the directory. Are there existing LoadFuncs that would support this?
  • Dmitriy Ryaboy at Jun 19, 2011 at 9:46 pm
    That's something sometimes referred to as "in-situ map reduce" and is
    way not something Hadoop and associated tools generally do.
    We'd have to solve problems like handling failure conditions of one of
    the nodes crashing mid-run (it's the only one that had the data! Now
    what?), etc. The usual way to solve this kind of issue in the wild is
    to set up a process that moves your local log files into hadoop,
    perhaps with metadata about where they came from (directories named
    after hosts? metadata files? lots of options here), and runs jobs over
    them there.

    I am not sure how active it is right now, but you might want to look
    into a subproject of Hadoop called Chuckwa for handling this type of
    problem.
    On Sat, Jun 18, 2011 at 12:02 PM, Dylan Scott wrote:
    I was wondering what a good approach would be to the following: On each node
    in a Hadoop cluster I have the same directory with different log files in
    them (in the local filesystem, not hdfs). I'd like to load these files such
    that each node in the cluster is mapping over the files in their version of
    the directory. Are there existing LoadFuncs that would support this?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 18, '11 at 10:25p
activeJun 19, '11 at 9:46p
posts3
users2
websitepig.apache.org

2 users in discussion

Dylan Scott: 2 posts Dmitriy Ryaboy: 1 post

People

Translate

site design / logo © 2023 Grokbase