Hi,
Every day each map/reduce processes I schedule on my cluster leave files
behind on all the DataNodes in a directory named blocksBeingWritten. After 1
week the amount of files left behind reach 70 GB on each blockBeingWritten
directory on each DataNodes.
I have noticed that once I restart a DataNodes this directory is cleaned up.
Can someone please help me understand what exactly are those files contained
in this directory and why the DataNodes seems to delete them just when it's
restarted ?
Here below an example of the files that I seen in the
blockBeingWritten directory:
-rw-r--r-- 1 hdfs hadoop 2.0K Jun 14 14:24
blk_2226351414820476901_4655671.meta
-rw-r--r-- 1 hdfs hadoop 254K Jun 14 14:24 blk_2226351414820476901
-rw-r--r-- 1 hdfs hadoop 26K Jun 14 14:25
blk_651476714389509127_4655706.meta
-rw-r--r-- 1 hdfs hadoop 3.2M Jun 14 14:25 blk_651476714389509127
-rw-r--r-- 1 hdfs hadoop 182K Jun 14 14:58
blk_1727419676952982071_4659418.meta
-rw-r--r-- 1 hdfs hadoop 23M Jun 14 14:58 blk_1727419676952982071
-rw-r--r-- 1 hdfs hadoop 447K Jun 14 14:59
blk_687415755671726127_4659433.meta
-rw-r--r-- 1 hdfs hadoop 56M Jun 14 14:59 blk_687415755671726127
-rw-r--r-- 1 hdfs hadoop 476K Jun 14 15:02
blk_-1767796325092574815_4659494.meta
-rw-r--r-- 1 hdfs hadoop 60M Jun 14 15:02 blk_-1767796325092574815
Thank you,
JP.