FAQ
I have a doubt about how this works. The API documentation says that the
class LocalDirAllocator is: "An implementation of a round-robin scheme for
disk allocation for creating files"
I am wondering, the disk allocation is done in the constructor?
Let's say I have a cluster of just 1 node and 4 disks and I do inside a
reducer:
LocalDirAllocator localDirAlloc = new LocalDirAllocator("mapred.local.dir");
Path pathA = localDirAlloc.getLocalPathForWrite("a") ;
Path pathB = localDirAlloc.getLocalPathForWrite("b") ;

The local paths pathA and pathB will for sure be in the same local disk as
it was allocated by new LocalDirAllocator("mapred.local.dir") or is
getLocalPathForWrite who gets the disk and so the two paths might not be in
the same disk (as I have 4 disks)?

Thanks in advance
--
View this message in context: http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2199517.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Search Discussions

  • Todd Lipcon at Jan 5, 2011 at 8:19 pm
    Hi Marc,

    LocalDirAllocator is an internal-facing API and you shouldn't be using it
    from user code. If you write into mapred.local.dir like this, you will end
    up with conflicts between different tasks running from the same node.

    The working directory of your MR task is already within one of the drives,
    and there isn't usually a good reason to write to multiple drives from
    within a task - you should get parallelism by running multiple tasks at the
    same time, not by having each task write to multiple places.

    Thanks
    -Todd
    On Wed, Jan 5, 2011 at 8:35 AM, Marc Sturlese wrote:


    I have a doubt about how this works. The API documentation says that the
    class LocalDirAllocator is: "An implementation of a round-robin scheme for
    disk allocation for creating files"
    I am wondering, the disk allocation is done in the constructor?
    Let's say I have a cluster of just 1 node and 4 disks and I do inside a
    reducer:
    LocalDirAllocator localDirAlloc = new
    LocalDirAllocator("mapred.local.dir");
    Path pathA = localDirAlloc.getLocalPathForWrite("a") ;
    Path pathB = localDirAlloc.getLocalPathForWrite("b") ;

    The local paths pathA and pathB will for sure be in the same local disk as
    it was allocated by new LocalDirAllocator("mapred.local.dir") or is
    getLocalPathForWrite who gets the disk and so the two paths might not be in
    the same disk (as I have 4 disks)?

    Thanks in advance
    --
    View this message in context:
    http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2199517.html
    Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Marc Sturlese at Jan 5, 2011 at 11:27 pm
    Hey Todd,
    LocalDirAllocator is an internal-facing API and you shouldn't be using it
    from user code. If you write into mapred.local.dir like this, you will end
    up with conflicts between different tasks running from the same node
    I know it's a bit odd usage but the thing is that I need to create files in
    the local file system, work in there with them amb after that upload them to
    hdfs (I use the outputcomitter.) To avoid the conflicts you talk about, I
    create a folder which looks like "mapred.local.dir"/taskId/attemptId and I
    work there and aparently I am having no problems.
    and there isn't usually a good reason to write to multiple drives from
    within a task
    When I said I had a cluster of one node, was just to try to clarify my doubt
    and explain the example. My cluster is bigger than that actually and each
    node has more than 1 phisical disk. To have multuple task running at the
    same time is what I do. I would like each task to write just to a single
    local disk but don't know how to do it.
    The working directory of your MR task is already within one of the drives,
    Is there a way to get a working directory in the local disk from the
    reducer?
    Could I do something similar to:
    FileSystem fs = FileSystem.get(conf);
    LocalFileSystem localFs = fs.getLocal(conf);
    Path path = localFs.getWorkingDirectory();
    I would apreciate if you can tell me a bit more about this.
    I need to deal with these files just in local and want them copied to hdfs
    just when I finish working with them.

    Thanks in advance.
    --
    View this message in context: http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2202221.html
    Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
  • Todd Lipcon at Jan 5, 2011 at 11:47 pm
    Hi Marc,

    Yes, using LocalFileSystem would work fine, or you can just use the normal
    java.io.File APIs.

    -Todd
    On Wed, Jan 5, 2011 at 3:26 PM, Marc Sturlese wrote:


    Hey Todd,
    LocalDirAllocator is an internal-facing API and you shouldn't be using it
    from user code. If you write into mapred.local.dir like this, you will
    end
    up with conflicts between different tasks running from the same node
    I know it's a bit odd usage but the thing is that I need to create files
    in
    the local file system, work in there with them amb after that upload them
    to
    hdfs (I use the outputcomitter.) To avoid the conflicts you talk about, I
    create a folder which looks like "mapred.local.dir"/taskId/attemptId and I
    work there and aparently I am having no problems.
    and there isn't usually a good reason to write to multiple drives from
    within a task
    When I said I had a cluster of one node, was just to try to clarify my
    doubt
    and explain the example. My cluster is bigger than that actually and each
    node has more than 1 phisical disk. To have multuple task running at the
    same time is what I do. I would like each task to write just to a single
    local disk but don't know how to do it.
    The working directory of your MR task is already within one of the
    drives,

    Is there a way to get a working directory in the local disk from the
    reducer?
    Could I do something similar to:
    FileSystem fs = FileSystem.get(conf);
    LocalFileSystem localFs = fs.getLocal(conf);
    Path path = localFs.getWorkingDirectory();
    I would apreciate if you can tell me a bit more about this.
    I need to deal with these files just in local and want them copied to hdfs
    just when I finish working with them.

    Thanks in advance.
    --
    View this message in context:
    http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2202221.html
    Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


    --
    Todd Lipcon
    Software Engineer, Cloudera

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJan 5, '11 at 4:36p
activeJan 5, '11 at 11:47p
posts4
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Todd Lipcon: 2 posts Marc Sturlese: 2 posts

People

Translate

site design / logo © 2022 Grokbase