FAQ
Hi all,



I’d like to create a new URI for a distributed POSIX-compliant filesystem
shared between all nodes. A number of such filesystems currently exist
(think HDFS w/o the POSIX incompliance). We can, of course, run HDFS on top
of such a file system, but it adds an extra unnecessary and inefficient
layer. Why have a master retrieve a set of data from a FS cluster, only to
distribute it back out to the same cluster but on a different distributed FS
(HDFS)?



In the new URI I seek to create, each MapReduce slave would look for input
data from a seemingly local file:///, and write output to it as well. Assume
that the distributed FS handles concurrent reads, writes. Assuming
POSIX-compliance, the LocalFileSystem seems to be the best foundation.



Please let me know of any warnings or errors you see in this. Any advice is
strongly appreciated as well, as the source tree of Hadoop is new to me and
intimidating.



Best,

--Chris

Search Discussions

  • Allen Wittenauer at Jul 1, 2010 at 6:51 pm

    On Jul 1, 2010, at 11:28 AM, Chris D wrote:
    In the new URI I seek to create, each MapReduce slave would look for input
    data from a seemingly local file:///, and write output to it as well. Assume
    that the distributed FS handles concurrent reads, writes. Assuming
    POSIX-compliance, the LocalFileSystem seems to be the best foundation.
    If this is a mountable file system, does LocalFileSystem work out of the box with no modifications required?
  • Chris D at Jul 1, 2010 at 7:02 pm
    Yes, it is mountable on all machines simultaneously, and, for example, works
    properly through file:///mnt/to/dfs in a single node cluster.

    On Jul 1, 2010 11:51 AM, "Allen Wittenauer" wrote:

    On Jul 1, 2010, at 11:28 AM, Chris D wrote:
    In the new URI I seek to create, each MapReduce slave...
    If this is a mountable file system, does LocalFileSystem work out of the box
    with no modifications required?
  • Allen Wittenauer at Jul 1, 2010 at 7:17 pm

    On Jul 1, 2010, at 12:00 PM, Chris D wrote:

    Yes, it is mountable on all machines simultaneously, and, for example, works
    properly through file:///mnt/to/dfs in a single node cluster.
    Then file:// will likely work a multi-node cluster as well. So I doubt you'll need to write anything at all. :)
  • Chris D at Jul 1, 2010 at 9:09 pm
    With the following configuration, tasks complete successfully:
    1. /mnt1 is the mount point for a shared, distributed FS on all machines
    running hadoop.
    2. hadoop.tmp.dir = /mnt1/tmp/hadoop
    3. fs.default.name = file:///
    4. `bin/hadoop jar hadoop-*-examples.jar wordcount /mnt1/input /mnt1/ouput`

    top and intuition suggest that each machine is reading from it's own /mnt1
    point and that computation is distributed. Thank you very much for the
    advice Allen.

    Best,
    --Chris

    On Thu, Jul 1, 2010 at 12:15 PM, Allen Wittenauer
    wrote:

    On Jul 1, 2010, at 12:00 PM, Chris D wrote:

    Yes, it is mountable on all machines simultaneously, and, for example, works
    properly through file:///mnt/to/dfs in a single node cluster.
    Then file:// will likely work a multi-node cluster as well. So I doubt
    you'll need to write anything at all. :)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJul 1, '10 at 6:30p
activeJul 1, '10 at 9:09p
posts5
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Chris D: 3 posts Allen Wittenauer: 2 posts

People

Translate

site design / logo © 2022 Grokbase