FAQ
Hi,
What is the difference between DFS Used and Non-DFS used ?

Thanks,
Sagar

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Search Discussions

  • Harsh J at Jul 7, 2011 at 12:34 pm
    DFS used is a count of all the space used by the dfs.data.dirs. The
    non-dfs used space is whatever space is occupied beyond that (which
    the DN does not account for).

    On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla
    wrote:
    Hi,
    What is the difference between DFS Used and Non-DFS used ?

    Thanks,
    Sagar

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.


    --
    Harsh J
  • Sagar Shukla at Jul 8, 2011 at 4:49 am
    Hi Harsh,
    Thanks for your reply.

    But why does it require non-DFS storage ? And why that space is accounted differently from regular DFS storage ?

    Ideally, it should have been part of same storage.

    Thanks,
    Sagar

    -----Original Message-----
    From: Harsh J
    Sent: Thursday, July 07, 2011 6:04 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Difference between DFS Used and Non-DFS Used

    DFS used is a count of all the space used by the dfs.data.dirs. The
    non-dfs used space is whatever space is occupied beyond that (which
    the DN does not account for).

    On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla
    wrote:
    Hi,
    What is the difference between DFS Used and Non-DFS used ?

    Thanks,
    Sagar

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.


    --
    Harsh J

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
  • Harsh J at Jul 8, 2011 at 11:12 am
    It is just for information's sake (cause it can be computed with the
    data collected). The space is accounted just to let you know that
    there's something being stored on the DataNodes apart from just the
    HDFS data, in case you are running out of space.

    On Fri, Jul 8, 2011 at 10:18 AM, Sagar Shukla
    wrote:
    Hi Harsh,
    Thanks for your reply.

    But why does it require non-DFS storage ? And why that space is accounted differently from regular DFS storage ?

    Ideally, it should have been part of same storage.

    Thanks,
    Sagar

    -----Original Message-----
    From: Harsh J
    Sent: Thursday, July 07, 2011 6:04 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Difference between DFS Used and Non-DFS Used

    DFS used is a count of all the space used by the dfs.data.dirs. The
    non-dfs used space is whatever space is occupied beyond that (which
    the DN does not account for).

    On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla
    wrote:
    Hi,
    What is the difference between DFS Used and Non-DFS used ?

    Thanks,
    Sagar

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.


    --
    Harsh J

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.


    --
    Harsh J
  • Sagar Shukla at Jul 8, 2011 at 11:25 am
    Thanks Harsh. My first question still remains unanswered - "Why does it require non-DFS storage?". If it is cache data then it should get flushed from the system after certain interval of time. And if it is useful data then it should have been part of used DFS data.

    I have a setup in which DFS used is use approx. 10 MB whereas non-DFS used is around 250 GB which is quite ridiculous.

    Thanks,
    Sagar

    -----Original Message-----
    From: Harsh J
    Sent: Friday, July 08, 2011 4:42 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Difference between DFS Used and Non-DFS Used

    It is just for information's sake (cause it can be computed with the
    data collected). The space is accounted just to let you know that
    there's something being stored on the DataNodes apart from just the
    HDFS data, in case you are running out of space.

    On Fri, Jul 8, 2011 at 10:18 AM, Sagar Shukla
    wrote:
    Hi Harsh,
    Thanks for your reply.

    But why does it require non-DFS storage ? And why that space is accounted differently from regular DFS storage ?

    Ideally, it should have been part of same storage.

    Thanks,
    Sagar

    -----Original Message-----
    From: Harsh J
    Sent: Thursday, July 07, 2011 6:04 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Difference between DFS Used and Non-DFS Used

    DFS used is a count of all the space used by the dfs.data.dirs. The
    non-dfs used space is whatever space is occupied beyond that (which
    the DN does not account for).

    On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla
    wrote:
    Hi,
    What is the difference between DFS Used and Non-DFS used ?

    Thanks,
    Sagar

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.


    --
    Harsh J

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.


    --
    Harsh J

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
  • Harsh J at Jul 8, 2011 at 12:03 pm
    I did not get that question, "require"? Its not a count of something
    HDFS uses, just outside of it (logs, other apps, OS, w/e that uses
    other space would show up in that metric). Am not sure I understand
    you? Isn't 250 GB already utilized looking at your disks?

    On Fri, Jul 8, 2011 at 4:54 PM, Sagar Shukla
    wrote:
    Thanks Harsh. My first question still remains unanswered - "Why does it require non-DFS storage?". If it is cache data then it should get flushed from the system after certain interval of time. And if it is useful data then it should have been part of used DFS data.

    I have a setup in which DFS used is use approx. 10 MB whereas non-DFS used is around 250 GB which is quite ridiculous.

    Thanks,
    Sagar

    -----Original Message-----
    From: Harsh J
    Sent: Friday, July 08, 2011 4:42 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Difference between DFS Used and Non-DFS Used

    It is just for information's sake (cause it can be computed with the
    data collected). The space is accounted just to let you know that
    there's something being stored on the DataNodes apart from just the
    HDFS data, in case you are running out of space.

    On Fri, Jul 8, 2011 at 10:18 AM, Sagar Shukla
    wrote:
    Hi Harsh,
    Thanks for your reply.

    But why does it require non-DFS storage ? And why that space is accounted differently from regular DFS storage ?

    Ideally, it should have been part of same storage.

    Thanks,
    Sagar

    -----Original Message-----
    From: Harsh J
    Sent: Thursday, July 07, 2011 6:04 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Difference between DFS Used and Non-DFS Used

    DFS used is a count of all the space used by the dfs.data.dirs. The
    non-dfs used space is whatever space is occupied beyond that (which
    the DN does not account for).

    On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla
    wrote:
    Hi,
    What is the difference between DFS Used and Non-DFS used ?

    Thanks,
    Sagar

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.


    --
    Harsh J

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.


    --
    Harsh J

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.


    --
    Harsh J
  • Suresh Srinivas at Jul 8, 2011 at 12:13 pm
    non DFS storage is not required, it is provided as information only to shown
    how the storage is being used.

    The available storage on the disks is used for both DFS and non DFS
    (mapreduce shuffle output and any other files that could be on the disks).

    See if you have unnecessary files or shuffle output that is lingering on
    these disks, that is contributing to 250GB. Delete the unneeded files and
    you should be able to reclaim some of the 250GB.

    On Fri, Jul 8, 2011 at 4:24 AM, Sagar Shukla
    wrote:
    Thanks Harsh. My first question still remains unanswered - "Why does it
    require non-DFS storage?". If it is cache data then it should get flushed
    from the system after certain interval of time. And if it is useful data
    then it should have been part of used DFS data.

    I have a setup in which DFS used is use approx. 10 MB whereas non-DFS used
    is around 250 GB which is quite ridiculous.

    Thanks,
    Sagar

    -----Original Message-----
    From: Harsh J
    Sent: Friday, July 08, 2011 4:42 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Difference between DFS Used and Non-DFS Used

    It is just for information's sake (cause it can be computed with the
    data collected). The space is accounted just to let you know that
    there's something being stored on the DataNodes apart from just the
    HDFS data, in case you are running out of space.

    On Fri, Jul 8, 2011 at 10:18 AM, Sagar Shukla
    wrote:
    Hi Harsh,
    Thanks for your reply.

    But why does it require non-DFS storage ? And why that space is accounted
    differently from regular DFS storage ?
    Ideally, it should have been part of same storage.

    Thanks,
    Sagar

    -----Original Message-----
    From: Harsh J
    Sent: Thursday, July 07, 2011 6:04 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Difference between DFS Used and Non-DFS Used

    DFS used is a count of all the space used by the dfs.data.dirs. The
    non-dfs used space is whatever space is occupied beyond that (which
    the DN does not account for).

    On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla
    wrote:
    Hi,
    What is the difference between DFS Used and Non-DFS used ?

    Thanks,
    Sagar

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is
    the property of Persistent Systems Ltd. It is intended only for the use of
    the individual or entity to which it is addressed. If you are not the
    intended recipient, you are not authorized to read, retain, copy, print,
    distribute or use this message. If you have received this communication in
    error, please notify the sender and delete all copies of this message.
    Persistent Systems Ltd. does not accept any liability for virus infected
    mails.


    --
    Harsh J

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is
    the property of Persistent Systems Ltd. It is intended only for the use of
    the individual or entity to which it is addressed. If you are not the
    intended recipient, you are not authorized to read, retain, copy, print,
    distribute or use this message. If you have received this communication in
    error, please notify the sender and delete all copies of this message.
    Persistent Systems Ltd. does not accept any liability for virus infected
    mails.


    --
    Harsh J

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is
    the property of Persistent Systems Ltd. It is intended only for the use of
    the individual or entity to which it is addressed. If you are not the
    intended recipient, you are not authorized to read, retain, copy, print,
    distribute or use this message. If you have received this communication in
    error, please notify the sender and delete all copies of this message.
    Persistent Systems Ltd. does not accept any liability for virus infected
    mails.

    --
    Regards,
    Suresh
  • Sagar Shukla at Jul 8, 2011 at 12:34 pm
    Hi Suresh / Harsh,
    Thanks for the details. Let me go over the setup again and get some understanding of what you are saying.

    Thanks,
    Sagar

    -----Original Message-----
    From: Suresh Srinivas
    Sent: Friday, July 08, 2011 5:43 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Difference between DFS Used and Non-DFS Used

    non DFS storage is not required, it is provided as information only to shown
    how the storage is being used.

    The available storage on the disks is used for both DFS and non DFS
    (mapreduce shuffle output and any other files that could be on the disks).

    See if you have unnecessary files or shuffle output that is lingering on
    these disks, that is contributing to 250GB. Delete the unneeded files and
    you should be able to reclaim some of the 250GB.

    On Fri, Jul 8, 2011 at 4:24 AM, Sagar Shukla
    wrote:
    Thanks Harsh. My first question still remains unanswered - "Why does it
    require non-DFS storage?". If it is cache data then it should get flushed
    from the system after certain interval of time. And if it is useful data
    then it should have been part of used DFS data.

    I have a setup in which DFS used is use approx. 10 MB whereas non-DFS used
    is around 250 GB which is quite ridiculous.

    Thanks,
    Sagar

    -----Original Message-----
    From: Harsh J
    Sent: Friday, July 08, 2011 4:42 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Difference between DFS Used and Non-DFS Used

    It is just for information's sake (cause it can be computed with the
    data collected). The space is accounted just to let you know that
    there's something being stored on the DataNodes apart from just the
    HDFS data, in case you are running out of space.

    On Fri, Jul 8, 2011 at 10:18 AM, Sagar Shukla
    wrote:
    Hi Harsh,
    Thanks for your reply.

    But why does it require non-DFS storage ? And why that space is accounted
    differently from regular DFS storage ?
    Ideally, it should have been part of same storage.

    Thanks,
    Sagar

    -----Original Message-----
    From: Harsh J
    Sent: Thursday, July 07, 2011 6:04 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Difference between DFS Used and Non-DFS Used

    DFS used is a count of all the space used by the dfs.data.dirs. The
    non-dfs used space is whatever space is occupied beyond that (which
    the DN does not account for).

    On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla
    wrote:
    Hi,
    What is the difference between DFS Used and Non-DFS used ?

    Thanks,
    Sagar

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is
    the property of Persistent Systems Ltd. It is intended only for the use of
    the individual or entity to which it is addressed. If you are not the
    intended recipient, you are not authorized to read, retain, copy, print,
    distribute or use this message. If you have received this communication in
    error, please notify the sender and delete all copies of this message.
    Persistent Systems Ltd. does not accept any liability for virus infected
    mails.


    --
    Harsh J

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is
    the property of Persistent Systems Ltd. It is intended only for the use of
    the individual or entity to which it is addressed. If you are not the
    intended recipient, you are not authorized to read, retain, copy, print,
    distribute or use this message. If you have received this communication in
    error, please notify the sender and delete all copies of this message.
    Persistent Systems Ltd. does not accept any liability for virus infected
    mails.


    --
    Harsh J

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is
    the property of Persistent Systems Ltd. It is intended only for the use of
    the individual or entity to which it is addressed. If you are not the
    intended recipient, you are not authorized to read, retain, copy, print,
    distribute or use this message. If you have received this communication in
    error, please notify the sender and delete all copies of this message.
    Persistent Systems Ltd. does not accept any liability for virus infected
    mails.

    --
    Regards,
    Suresh

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 7, '11 at 10:01a
activeJul 8, '11 at 12:34p
posts8
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase