FAQ
hadoop commands seem extremely slow in 0.20 branch
--------------------------------------------------

Key: HADOOP-5588
URL: https://issues.apache.org/jira/browse/HADOOP-5588
Project: Hadoop Core
Issue Type: Bug
Components: dfs, fs
Environment: 0.20-branch and trunk
Reporter: Koji Noguchi
Priority: Blocker


hadoop dfs -get/rm/mkdir/etc mydir/fileA mydir/fileB mydir/fileC ...

seem to be very slow in 0.20 branch.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Hairong Kuang (JIRA) at Mar 27, 2009 at 12:32 am
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-5588:
    ----------------------------------

    Affects Version/s: 0.20.0
    Fix Version/s: 0.20.0
    Assignee: Hairong Kuang

    A suspect is HADOOP-3497 which introduced a listing call on the parent directory no matter the path contains globs or not in globStatus. One of our users calls "dfs -get" on many small files under one dir. It has the same effect of calling dfs -ls many times on a large directory, thus causing NN to do lots of gc and making it less responsive.
    hadoop commands seem extremely slow in 0.20 branch
    --------------------------------------------------

    Key: HADOOP-5588
    URL: https://issues.apache.org/jira/browse/HADOOP-5588
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs, fs
    Affects Versions: 0.20.0
    Environment: 0.20-branch and trunk
    Reporter: Koji Noguchi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.20.0


    hadoop dfs -get/rm/mkdir/etc mydir/fileA mydir/fileB mydir/fileC ...
    seem to be very slow in 0.20 branch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Koji Noguchi (JIRA) at Mar 27, 2009 at 5:06 am
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Koji Noguchi updated HADOOP-5588:
    ---------------------------------

    Description:
    hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...

    seem to be very slow in 0.20 branch.


    was:
    hadoop dfs -get/rm/mkdir/etc mydir/fileA mydir/fileB mydir/fileC ...

    seem to be very slow in 0.20 branch.


    hadoop commands seem extremely slow in 0.20 branch
    --------------------------------------------------

    Key: HADOOP-5588
    URL: https://issues.apache.org/jira/browse/HADOOP-5588
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs, fs
    Affects Versions: 0.20.0
    Environment: 0.20-branch and trunk
    Reporter: Koji Noguchi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.20.0


    hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
    seem to be very slow in 0.20 branch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Mar 27, 2009 at 5:05 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689759#action_12689759 ]

    Hairong Kuang edited comment on HADOOP-5588 at 3/27/09 10:02 AM:
    -----------------------------------------------------------------

    A suspect is HADOOP-3497 which introduced a listing call on the parent directory no matter the path contains globs or not in globStatus. One of our users calls "dfs -get" on many small files under one large directory. It has the same effect of calling dfs -ls many times on the large directory, thus causing NN to do lots of gc and making it less responsive.

    was (Author: hairong):
    A suspect is HADOOP-3497 which introduced a listing call on the parent directory no matter the path contains globs or not in globStatus. One of our users calls "dfs -get" on many small files under one dir. It has the same effect of calling dfs -ls many times on a large directory, thus causing NN to do lots of gc and making it less responsive.
    hadoop commands seem extremely slow in 0.20 branch
    --------------------------------------------------

    Key: HADOOP-5588
    URL: https://issues.apache.org/jira/browse/HADOOP-5588
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs, fs
    Affects Versions: 0.20.0
    Environment: 0.20-branch and trunk
    Reporter: Koji Noguchi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.20.0


    hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
    seem to be very slow in 0.20 branch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Mar 27, 2009 at 5:55 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-5588:
    ----------------------------------

    Attachment: globStatus.patch

    This patch restores pre-0.20.0 behavior.
    hadoop commands seem extremely slow in 0.20 branch
    --------------------------------------------------

    Key: HADOOP-5588
    URL: https://issues.apache.org/jira/browse/HADOOP-5588
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs, fs
    Affects Versions: 0.20.0
    Environment: 0.20-branch and trunk
    Reporter: Koji Noguchi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.20.0

    Attachments: globStatus.patch


    hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
    seem to be very slow in 0.20 branch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Mar 27, 2009 at 6:57 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-5588:
    ----------------------------------

    Attachment: globStatus1.patch

    This patch fixed a bug in the previous patch.
    hadoop commands seem extremely slow in 0.20 branch
    --------------------------------------------------

    Key: HADOOP-5588
    URL: https://issues.apache.org/jira/browse/HADOOP-5588
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs, fs
    Affects Versions: 0.20.0
    Environment: 0.20-branch and trunk
    Reporter: Koji Noguchi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.20.0

    Attachments: globStatus.patch, globStatus1.patch


    hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
    seem to be very slow in 0.20 branch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Mar 27, 2009 at 7:30 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12690014#action_12690014 ]

    Hairong Kuang commented on HADOOP-5588:
    ---------------------------------------

    Manual tests on dfs -ls/get etc. showed that the patch removed the additional listing call to the parent directory if the input path did not contain a glob.
    hadoop commands seem extremely slow in 0.20 branch
    --------------------------------------------------

    Key: HADOOP-5588
    URL: https://issues.apache.org/jira/browse/HADOOP-5588
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs, fs
    Affects Versions: 0.20.0
    Environment: 0.20-branch and trunk
    Reporter: Koji Noguchi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.20.0

    Attachments: globStatus.patch, globStatus1.patch


    hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
    seem to be very slow in 0.20 branch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Mar 27, 2009 at 7:39 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12690016#action_12690016 ]

    Hairong Kuang commented on HADOOP-5588:
    ---------------------------------------

    I am not able to get to run all unit tests, but all fs/dfs related unit tests were passed.
    hadoop commands seem extremely slow in 0.20 branch
    --------------------------------------------------

    Key: HADOOP-5588
    URL: https://issues.apache.org/jira/browse/HADOOP-5588
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs, fs
    Affects Versions: 0.20.0
    Environment: 0.20-branch and trunk
    Reporter: Koji Noguchi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.20.0

    Attachments: globStatus.patch, globStatus1.patch


    hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
    seem to be very slow in 0.20 branch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Mar 27, 2009 at 7:45 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12690018#action_12690018 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-5588:
    ------------------------------------------------

    +1 patch looks good.

    I tested manually and ran some related tests. Everything has worked fine.
    hadoop commands seem extremely slow in 0.20 branch
    --------------------------------------------------

    Key: HADOOP-5588
    URL: https://issues.apache.org/jira/browse/HADOOP-5588
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs, fs
    Affects Versions: 0.20.0
    Environment: 0.20-branch and trunk
    Reporter: Koji Noguchi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.20.0

    Attachments: globStatus.patch, globStatus1.patch


    hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
    seem to be very slow in 0.20 branch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Mar 27, 2009 at 7:49 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-5588:
    -------------------------------------------

    Hadoop Flags: [Reviewed]

    {noformat}
    [exec] -1 overall.
    [exec]
    [exec] +1 @author. The patch does not contain any @author tags.
    [exec]
    [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
    [exec] Please justify why no tests are needed for this patch.
    [exec]
    [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
    [exec]
    [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
    [exec]
    [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
    [exec]
    [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
    [exec]
    [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
    {noformat}
    hadoop commands seem extremely slow in 0.20 branch
    --------------------------------------------------

    Key: HADOOP-5588
    URL: https://issues.apache.org/jira/browse/HADOOP-5588
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs, fs
    Affects Versions: 0.20.0
    Environment: 0.20-branch and trunk
    Reporter: Koji Noguchi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.20.0

    Attachments: globStatus.patch, globStatus1.patch


    hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
    seem to be very slow in 0.20 branch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Mar 27, 2009 at 7:53 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE resolved HADOOP-5588.
    --------------------------------------------

    Resolution: Fixed
    Fix Version/s: 0.21.0

    I have committed this to 0.20 and above. Thanks, Hairong!
    hadoop commands seem extremely slow in 0.20 branch
    --------------------------------------------------

    Key: HADOOP-5588
    URL: https://issues.apache.org/jira/browse/HADOOP-5588
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs, fs
    Affects Versions: 0.20.0
    Environment: 0.20-branch and trunk
    Reporter: Koji Noguchi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.20.0, 0.21.0

    Attachments: globStatus.patch, globStatus1.patch


    hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
    seem to be very slow in 0.20 branch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Mar 30, 2009 at 5:53 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693864#action_12693864 ]

    Hairong Kuang commented on HADOOP-5588:
    ---------------------------------------

    Koji did some experiments with the patch. He is too busy to post the results. I am doing this for him.

    Directory size with 10,000 files.
    About 450 mappers. Each mapper calling dfs -get 10000 times.

    Without the fix, namenode was showing 20-30 getblocklocations per sec and 30-40 threads blocked.
    With the fix, 600 getblocklocations per sec and almost no blocked threads.
    hadoop commands seem extremely slow in 0.20 branch
    --------------------------------------------------

    Key: HADOOP-5588
    URL: https://issues.apache.org/jira/browse/HADOOP-5588
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs, fs
    Affects Versions: 0.20.0
    Environment: 0.20-branch and trunk
    Reporter: Koji Noguchi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.20.0, 0.21.0

    Attachments: globStatus.patch, globStatus1.patch


    hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
    seem to be very slow in 0.20 branch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Apr 3, 2009 at 3:24 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695422#action_12695422 ]

    Hudson commented on HADOOP-5588:
    --------------------------------

    Integrated in Hadoop-trunk #796 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/796/])

    hadoop commands seem extremely slow in 0.20 branch
    --------------------------------------------------

    Key: HADOOP-5588
    URL: https://issues.apache.org/jira/browse/HADOOP-5588
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs, fs
    Affects Versions: 0.20.0
    Environment: 0.20-branch and trunk
    Reporter: Koji Noguchi
    Assignee: Hairong Kuang
    Priority: Blocker
    Fix For: 0.20.0, 0.21.0

    Attachments: globStatus.patch, globStatus1.patch


    hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
    seem to be very slow in 0.20 branch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 27, '09 at 12:20a
activeApr 3, '09 at 3:24p
posts13
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Hudson (JIRA): 13 posts

People

Translate

site design / logo © 2022 Grokbase