FAQ
I¹m trying to estimate our disk space requirements and I¹m wondering about
disk space required for compaction.

My application mostly inserts new data and performs updates to existing data
very infrequently, so there will be very few bytes removed by compaction. It
seems that if a major compaction occurs, that performing the compaction will
require as much disk space as is currently consumed by the table.

So here¹s my question. If Cassandra only compacts one table at a time, then
I should be safe if I keep as much free space as there is data in the
largest table. If Cassandra can compact multiple tables simultaneously, then
it seems that I need as much free space as all the tables put together,
which means no more than 50% utilization. So, how much free space do I need?
Any rules of thumb anyone can offer?

Also, what happens if a node gets low on disk space and there isn¹t enough
available for compaction? If I add new nodes to reduce the amount of data on
each node, I assume the space won¹t be reclaimed until a compaction event
occurs. Is there a way to salvage a node that gets into a state where it
cannot compact its tables?

Thanks

Robert

Search Discussions

  • Sankalp Kohli at Nov 29, 2013 at 11:48 am
    Apart from the compaction, you might want to also look at free space required for repairs.
    This could be problem if you have large rows as repair is not at column level.



    On Nov 28, 2013, at 19:21, Robert Wille wrote:

    I’m trying to estimate our disk space requirements and I’m wondering about disk space required for compaction.

    My application mostly inserts new data and performs updates to existing data very infrequently, so there will be very few bytes removed by compaction. It seems that if a major compaction occurs, that performing the compaction will require as much disk space as is currently consumed by the table.

    So here’s my question. If Cassandra only compacts one table at a time, then I should be safe if I keep as much free space as there is data in the largest table. If Cassandra can compact multiple tables simultaneously, then it seems that I need as much free space as all the tables put together, which means no more than 50% utilization. So, how much free space do I need? Any rules of thumb anyone can offer?

    Also, what happens if a node gets low on disk space and there isn’t enough available for compaction? If I add new nodes to reduce the amount of data on each node, I assume the space won’t be reclaimed until a compaction event occurs. Is there a way to salvage a node that gets into a state where it cannot compact its tables?

    Thanks

    Robert
  • Anthony Grasso at Nov 30, 2013 at 1:25 am
    Hi Robert,

    We found having about 50% free disk space is a good rule of thumb.
    Cassandra will typically use less than that when running compactions,
    however it is good to have free space available just in case it compacts
    some of the larger SSTables in the keyspace. More information can be found
    on the Datastax website [1]

    If you have a situation where only one node in the cluster is running low
    on disk space and all other nodes are fine for disk space, there are two
    things you can do.
    1) Run a 'nodetool repair -pr' on each node to ensure that the token ranges
    for each node are balanced (this should be run periodically anyway).
    2) Run targeted compactions on the problem node using 'nodetool compact
    [keyspace] [table]', where [table] is the list of the SSTables tables on
    the node that need to be reduced in size.

    Note that having a single node that uses all its disk space while the other
    nodes are fine implies that there could be underlying issues with the node.

    Regards,
    Anthony

    [1]
    http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/architecture/architecturePlanningDiskCapacity_t.html

    On Fri, Nov 29, 2013 at 10:48 PM, Sankalp Kohli wrote:

    Apart from the compaction, you might want to also look at free space
    required for repairs.
    This could be problem if you have large rows as repair is not at column
    level.



    On Nov 28, 2013, at 19:21, Robert Wille wrote:

    I’m trying to estimate our disk space requirements and I’m wondering
    about disk space required for compaction.
    My application mostly inserts new data and performs updates to existing
    data very infrequently, so there will be very few bytes removed by
    compaction. It seems that if a major compaction occurs, that performing the
    compaction will require as much disk space as is currently consumed by the
    table.
    So here’s my question. If Cassandra only compacts one table at a time,
    then I should be safe if I keep as much free space as there is data in the
    largest table. If Cassandra can compact multiple tables simultaneously,
    then it seems that I need as much free space as all the tables put
    together, which means no more than 50% utilization. So, how much free space
    do I need? Any rules of thumb anyone can offer?
    Also, what happens if a node gets low on disk space and there isn’t
    enough available for compaction? If I add new nodes to reduce the amount of
    data on each node, I assume the space won’t be reclaimed until a compaction
    event occurs. Is there a way to salvage a node that gets into a state where
    it cannot compact its tables?
    Thanks

    Robert
  • Takenori Sato at Nov 30, 2013 at 2:22 am
    Hi,
    If Cassandra only compacts one table at a time, then I should be safe if
    I keep as much free space as there is data in the largest table. If
    Cassandra can compact multiple tables simultaneously, then it seems that I
    need as much free space as all the tables put together, which means no more
    than 50% utilization.

    Based on your configuration. 1 per CPU core by default. See
    concurrent_compactors for details.
    Also, what happens if a node gets low on disk space and there isn’t
    enough available for compaction?

    A compaction checks if there's enough disk space based on its estimate.
    Otherwise, it won't get executed.
    Is there a way to salvage a node that gets into a state where it cannot
    compact its tables?

    If you carefully run some cleanups, then you'll get some room based on its
    new range.

    On Fri, Nov 29, 2013 at 12:21 PM, Robert Wille wrote:

    I’m trying to estimate our disk space requirements and I’m wondering about
    disk space required for compaction.

    My application mostly inserts new data and performs updates to existing
    data very infrequently, so there will be very few bytes removed by
    compaction. It seems that if a major compaction occurs, that performing the
    compaction will require as much disk space as is currently consumed by the
    table.

    So here’s my question. If Cassandra only compacts one table at a time,
    then I should be safe if I keep as much free space as there is data in the
    largest table. If Cassandra can compact multiple tables simultaneously,
    then it seems that I need as much free space as all the tables put
    together, which means no more than 50% utilization. So, how much free space
    do I need? Any rules of thumb anyone can offer?

    Also, what happens if a node gets low on disk space and there isn’t enough
    available for compaction? If I add new nodes to reduce the amount of data
    on each node, I assume the space won’t be reclaimed until a compaction
    event occurs. Is there a way to salvage a node that gets into a state where
    it cannot compact its tables?

    Thanks

    Robert
  • Robert Coli at Dec 2, 2013 at 7:57 pm

    On Thu, Nov 28, 2013 at 7:21 PM, Robert Wille wrote:

    So here’s my question. If Cassandra only compacts one table at a time,
    then I should be safe if I keep as much free space as there is data in the
    largest table. If Cassandra can compact multiple tables simultaneously,
    then it seems that I need as much free space as all the tables put
    together, which means no more than 50% utilization.
    If the number of SSTables is [x], then :

    - "minor" compaction runs on 1-[x-1] SSTables (in practice, usually a
    smaller number depending on config)
    - "major" compaction runs on [x] SSTables

    Unless you run a major compaction manually while using Size Tiered
    Compaction, you are very unlikely to run a major compaction in normal
    operation.

    Level compaciton strategy has a different set of assumptions.

    =Rob

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriescassandra
postedNov 29, '13 at 3:21a
activeDec 2, '13 at 7:57p
posts5
users5
websitecassandra.apache.org
irc#cassandra

People

Translate

site design / logo © 2021 Grokbase