It looks like your understanding is correct. At the worst point, a given Region will use twice it's disk space during a major compaction. Once the compaction is complete, the original files are deleted from HDFS. So it is not the case that your entire dataset will require double the space for compactions as they are not all run concurrently.

From: Vidhyashankar Venkataraman

Sent: Monday, May 17, 2010 11:56 AM

To: hbase-user@hadoop.apache.org

Cc: Joel Koshy

Subject: Additional disk space required for Hbase compactions..

Hi guys,

I am quite new to Hbase.. I am trying to figure out the max

additional disk space required for compactions..

If the set of small Hfiles amount to a size of U in total, before a

major compaction happens and the 'behemoth' HFile has size M, assuming

the resultant size of the Hfile after compaction is U+M (worst case has

only insertions) and a replication factor of r, then disk space taken

by the Hfiles is 2r(U+M).. Is this estimate reasonable? (This also is

based on my understanding that compactions happen on HDFS and not on

the local file system: am I correct?)...

Thank you

Vidhya

