It looks like your understanding is correct. At the worst point, a given Region will use twice it's disk space during a major compaction. Once the compaction is complete, the original files are deleted from HDFS. So it is not the case that your entire dataset will require double the space for compactions as they are not all run concurrently.
JG
-----Original Message-----
From: Vidhyashankar Venkataraman
Sent: Monday, May 17, 2010 11:56 AM
To: hbase-user@hadoop.apache.org
Cc: Joel Koshy
Subject: Additional disk space required for Hbase compactions..
Hi guys,
I am quite new to Hbase.. I am trying to figure out the max
additional disk space required for compactions..
If the set of small Hfiles amount to a size of U in total, before a
major compaction happens and the 'behemoth' HFile has size M, assuming
the resultant size of the Hfile after compaction is U+M (worst case has
only insertions) and a replication factor of r, then disk space taken
by the Hfiles is 2r(U+M).. Is this estimate reasonable? (This also is
based on my understanding that compactions happen on HDFS and not on
the local file system: am I correct?)...
Thank you
Vidhya
From: Vidhyashankar Venkataraman
Sent: Monday, May 17, 2010 11:56 AM
To: hbase-user@hadoop.apache.org
Cc: Joel Koshy
Subject: Additional disk space required for Hbase compactions..
Hi guys,
I am quite new to Hbase.. I am trying to figure out the max
additional disk space required for compactions..
If the set of small Hfiles amount to a size of U in total, before a
major compaction happens and the 'behemoth' HFile has size M, assuming
the resultant size of the Hfile after compaction is U+M (worst case has
only insertions) and a replication factor of r, then disk space taken
by the Hfiles is 2r(U+M).. Is this estimate reasonable? (This also is
based on my understanding that compactions happen on HDFS and not on
the local file system: am I correct?)...
Thank you
Vidhya
