FAQ
I suspect that running cluster wide repair interferes with TTL based
expiration. I am running repair every 7 days and using TTL expiration
time 7 days too. Data are never deleted.
Stored data in cassandra are always growing (watching them for 3 months)
but they should not. If i run manual cleanup, some data are deleted but
just about 5%. Currently there are about 3-5 times more rows then i
estimate.

I suspect that running repair on data with TTL can cause:

1. time check for expired records is ignored and these data are streamed
to other node and they will be alive again
or
2. streaming data are propagated with full TTL. Lets say that i have ttl
7 days, data are stored for 5 days and then repaired, they should be
sent to other node with ttl 2 days not 7.

Can someone do testing on this case? I could not play with production
cluster too much.

Search Discussions

  • Igor at Mar 19, 2012 at 7:27 pm
    Hello

    Datasize should decrease during minor compactions. Check logs for compactions results.





    -----Original Message-----
    From: Radim Kolar <hsn@filez.com>
    To: user@cassandra.apache.org
    Sent: Mon, 19 Mar 2012 12:16
    Subject: repair broke TTL based expiration

    I suspect that running cluster wide repair interferes with TTL based
    expiration. I am running repair every 7 days and using TTL expiration
    time 7 days too. Data are never deleted.
    Stored data in cassandra are always growing (watching them for 3 months)
    but they should not. If i run manual cleanup, some data are deleted but
    just about 5%. Currently there are about 3-5 times more rows then i
    estimate.

    I suspect that running repair on data with TTL can cause:

    1. time check for expired records is ignored and these data are streamed
    to other node and they will be alive again
    or
    2. streaming data are propagated with full TTL. Lets say that i have ttl
    7 days, data are stored for 5 days and then repaired, they should be
    sent to other node with ttl 2 days not 7.

    Can someone do testing on this case? I could not play with production
    cluster too much.
  • Caleb Rackliffe at Mar 19, 2012 at 8:46 pm
    I've been wondering about this too, but every column has both a timestamp and a TTL. Unless the timestamp is not preserved, there should be no need to adjust the TTL, assuming the expiration time is determined from these two variables.

    Does that make sense?

    My question is how often Cassandra checks for TTL expirations. Does it happen at compaction time? Some other time?


    Caleb Rackliffe | Software Developer
    M 949.981.0159 | caleb@steelhouse.com
    [cid:C02073B9-9A8A-49FE-89BE-9AC4419A3D3C]

    From: "igor@4friends.od.ua " <igor@4friends.od.ua
    Reply-To: "user@cassandra.apache.org " <user@cassandra.apache.org
    Date: Mon, 19 Mar 2012 15:28:40 -0400
    To: "user@cassandra.apache.org " <user@cassandra.apache.org
    Subject: Re: repair broke TTL based expiration


    Hello

    Datasize should decrease during minor compactions. Check logs for compactions results.





    -----Original Message-----
    From: Radim Kolar <hsn@filez.com
    To: user@cassandra.apache.org
    Sent: Mon, 19 Mar 2012 12:16
    Subject: repair broke TTL based expiration


    I suspect that running cluster wide repair interferes with TTL based
    expiration. I am running repair every 7 days and using TTL expiration
    time 7 days too. Data are never deleted.
    Stored data in cassandra are always growing (watching them for 3 months)
    but they should not. If i run manual cleanup, some data are deleted but
    just about 5%. Currently there are about 3-5 times more rows then i
    estimate.

    I suspect that running repair on data with TTL can cause:

    1. time check for expired records is ignored and these data are streamed
    to other node and they will be alive again
    or
    2. streaming data are propagated with full TTL. Lets say that i have ttl
    7 days, data are stored for 5 days and then repaired, they should be
    sent to other node with ttl 2 days not 7.

    Can someone do testing on this case? I could not play with production
    cluster too much.
  • Radim Kolar at Mar 19, 2012 at 10:10 pm

    Dne 19.3.2012 21:46, Caleb Rackliffe napsal(a):
    I've been wondering about this too, but every column has both a
    timestamp /and/ a TTL. Unless the timestamp is not preserved, there
    should be no need to adjust the TTL, assuming the expiration time is
    determined from these two variables.
    timestamp is application defined, it can be anything. expire time is
    recorded into sstable in node local time.
    another question is why to store original TTL? i dont think that it is
    that usefull to read it back. it would be enough to read expire time.
  • Radim Kolar at Mar 19, 2012 at 9:48 pm
    Dne 19.3.2012 20:28, igor@4friends.od.ua napsal(a):
    Hello

    Datasize should decrease during minor compactions. Check logs for
    compactions results.
    they do but not as much as i expect. Look at sizes and file dates:

    -rw-r--r-- 1 root wheel 5.4G Feb 23 17:03 resultcache-hc-27045-Data.db
    -rw-r--r-- 1 root wheel 6.4G Feb 23 17:11 resultcache-hc-27047-Data.db
    -rw-r--r-- 1 root wheel 5.5G Feb 25 06:40 resultcache-hc-27167-Data.db
    -rw-r--r-- 1 root wheel 2.2G Mar 2 05:03 resultcache-hc-27323-Data.db
    -rw-r--r-- 1 root wheel 2.0G Mar 5 09:15 resultcache-hc-27542-Data.db
    -rw-r--r-- 1 root wheel 2.2G Mar 12 23:24 resultcache-hc-27791-Data.db
    -rw-r--r-- 1 root wheel 468M Mar 15 03:27 resultcache-hc-27822-Data.db
    -rw-r--r-- 1 root wheel 483M Mar 16 05:23 resultcache-hc-27853-Data.db
    -rw-r--r-- 1 root wheel 53M Mar 17 05:33 resultcache-hc-27901-Data.db
    -rw-r--r-- 1 root wheel 485M Mar 17 09:37 resultcache-hc-27930-Data.db
    -rw-r--r-- 1 root wheel 480M Mar 19 00:45 resultcache-hc-27961-Data.db
    -rw-r--r-- 1 root wheel 95M Mar 19 09:35 resultcache-hc-27967-Data.db
    -rw-r--r-- 1 root wheel 98M Mar 19 17:04 resultcache-hc-27973-Data.db
    -rw-r--r-- 1 root wheel 19M Mar 19 18:23 resultcache-hc-27974-Data.db
    -rw-r--r-- 1 root wheel 19M Mar 19 19:50 resultcache-hc-27975-Data.db
    -rw-r--r-- 1 root wheel 19M Mar 19 21:17 resultcache-hc-27976-Data.db
    -rw-r--r-- 1 root wheel 19M Mar 19 22:05 resultcache-hc-27977-Data.db

    I insert everything with 7days TTL + 10 days tombstone expiration. This
    means that there should not be in ideal case nothing older then Mar 2.

    These 3x5 GB files waits to be compacted. Because they contains only
    tombstones, cassandra should make some optimalizations - mark sstable as
    tombstone only, remember time of latest tombstone and delete entire
    sstable without need to merge it first.

    1. Question is why create tombstone after row expiration at all, because
    it will expire at all cluster nodes at same time without need to be deleted.
    2. Its super column family. When i dump oldest sstable, i wonder why it
    looks like this:

    {
    "7777772c61727469636c65736f61702e636f6d": {},
    "7175616b652d34": {"1": {"deletedAt": -9223372036854775808,
    "subColumns": [["crc32","4f34455c",1328220892597002,"d"],
    ["id","4f34455c",1328220892597000,"d"],
    ["name","4f34455c",1328220892597001,"d"],
    ["size","4f34455c",1328220892597003,"d"]]}, "2": {"deletedAt":
    -9223372036854775808, "subColumns":
    [["crc32","4f34455c",1328220892597007,"d"],
    ["id","4f34455c",1328220892597005,"d"],
    ["name","4f34455c",1328220892597006,"d"],
    ["size","4f34455c",1328220892597008,"d"]]}, "3": {"deletedAt":
    -9223372036854775808, "subColumns":

    * all subcolums are deleted. why to keep their names in table? isnt
    marking column as deleted enough? "1": {"deletedAt":
    -9223372036854775808"} enough?
    * another question is why was not tombstone entire row, because all its
    members were expired.
  • Igor at Mar 19, 2012 at 11:22 pm
    You can try to play with comaction thresholds - looks like your data wait too long before sizetiered compaction start to merge old large sstables. I have the same scenario as you (no deletes, all data with TTL) and I use script which call userdefinedcompaction on these old sstables.

    -----Original Message-----
    From: Radim Kolar <hsn@filez.com>
    To: user@cassandra.apache.org
    Sent: Mon, 19 Mar 2012 23:48
    Subject: Re: repair broke TTL based expiration

    Dne 19.3.2012 20:28, igor@4friends.od.ua napsal(a):
    Hello

    Datasize should decrease during minor compactions. Check logs for
    compactions results.
    they do but not as much as i expect. Look at sizes and file dates:

    -rw-r--r-- 1 root wheel 5.4G Feb 23 17:03 resultcache-hc-27045-Data.db
    -rw-r--r-- 1 root wheel 6.4G Feb 23 17:11 resultcache-hc-27047-Data.db
    -rw-r--r-- 1 root wheel 5.5G Feb 25 06:40 resultcache-hc-27167-Data.db
    -rw-r--r-- 1 root wheel 2.2G Mar 2 05:03 resultcache-hc-27323-Data.db
    -rw-r--r-- 1 root wheel 2.0G Mar 5 09:15 resultcache-hc-27542-Data.db
    -rw-r--r-- 1 root wheel 2.2G Mar 12 23:24 resultcache-hc-27791-Data.db
    -rw-r--r-- 1 root wheel 468M Mar 15 03:27 resultcache-hc-27822-Data.db
    -rw-r--r-- 1 root wheel 483M Mar 16 05:23 resultcache-hc-27853-Data.db
    -rw-r--r-- 1 root wheel 53M Mar 17 05:33 resultcache-hc-27901-Data.db
    -rw-r--r-- 1 root wheel 485M Mar 17 09:37 resultcache-hc-27930-Data.db
    -rw-r--r-- 1 root wheel 480M Mar 19 00:45 resultcache-hc-27961-Data.db
    -rw-r--r-- 1 root wheel 95M Mar 19 09:35 resultcache-hc-27967-Data.db
    -rw-r--r-- 1 root wheel 98M Mar 19 17:04 resultcache-hc-27973-Data.db
    -rw-r--r-- 1 root wheel 19M Mar 19 18:23 resultcache-hc-27974-Data.db
    -rw-r--r-- 1 root wheel 19M Mar 19 19:50 resultcache-hc-27975-Data.db
    -rw-r--r-- 1 root wheel 19M Mar 19 21:17 resultcache-hc-27976-Data.db
    -rw-r--r-- 1 root wheel 19M Mar 19 22:05 resultcache-hc-27977-Data.db

    I insert everything with 7days TTL + 10 days tombstone expiration. This
    means that there should not be in ideal case nothing older then Mar 2.

    These 3x5 GB files waits to be compacted. Because they contains only
    tombstones, cassandra should make some optimalizations - mark sstable as
    tombstone only, remember time of latest tombstone and delete entire
    sstable without need to merge it first.

    1. Question is why create tombstone after row expiration at all, because
    it will expire at all cluster nodes at same time without need to be deleted.
    2. Its super column family. When i dump oldest sstable, i wonder why it
    looks like this:

    {
    "7777772c61727469636c65736f61702e636f6d": {},
    "7175616b652d34": {"1": {"deletedAt": -9223372036854775808,
    "subColumns": [["crc32","4f34455c",1328220892597002,"d"],
    ["id","4f34455c",1328220892597000,"d"],
    ["name","4f34455c",1328220892597001,"d"],
    ["size","4f34455c",1328220892597003,"d"]]}, "2": {"deletedAt":
    -9223372036854775808, "subColumns":
    [["crc32","4f34455c",1328220892597007,"d"],
    ["id","4f34455c",1328220892597005,"d"],
    ["name","4f34455c",1328220892597006,"d"],
    ["size","4f34455c",1328220892597008,"d"]]}, "3": {"deletedAt":
    -9223372036854775808, "subColumns":

    * all subcolums are deleted. why to keep their names in table? isnt
    marking column as deleted enough? "1": {"deletedAt":
    -9223372036854775808"} enough?
    * another question is why was not tombstone entire row, because all its
    members were expired.
  • Jeremiah Jordan at Mar 20, 2012 at 2:47 pm
    You need to create the tombstone in case the data was inserted without a timestamp at some point.

    -Jeremiah

    ________________________________________
    From: Radim Kolar [hsn@filez.com]
    Sent: Monday, March 19, 2012 4:48 PM
    To: user@cassandra.apache.org
    Subject: Re: repair broke TTL based expiration

    Dne 19.3.2012 20:28, igor@4friends.od.ua napsal(a):
    Hello

    Datasize should decrease during minor compactions. Check logs for
    compactions results.
    they do but not as much as i expect. Look at sizes and file dates:

    -rw-r--r-- 1 root wheel 5.4G Feb 23 17:03 resultcache-hc-27045-Data.db
    -rw-r--r-- 1 root wheel 6.4G Feb 23 17:11 resultcache-hc-27047-Data.db
    -rw-r--r-- 1 root wheel 5.5G Feb 25 06:40 resultcache-hc-27167-Data.db
    -rw-r--r-- 1 root wheel 2.2G Mar 2 05:03 resultcache-hc-27323-Data.db
    -rw-r--r-- 1 root wheel 2.0G Mar 5 09:15 resultcache-hc-27542-Data.db
    -rw-r--r-- 1 root wheel 2.2G Mar 12 23:24 resultcache-hc-27791-Data.db
    -rw-r--r-- 1 root wheel 468M Mar 15 03:27 resultcache-hc-27822-Data.db
    -rw-r--r-- 1 root wheel 483M Mar 16 05:23 resultcache-hc-27853-Data.db
    -rw-r--r-- 1 root wheel 53M Mar 17 05:33 resultcache-hc-27901-Data.db
    -rw-r--r-- 1 root wheel 485M Mar 17 09:37 resultcache-hc-27930-Data.db
    -rw-r--r-- 1 root wheel 480M Mar 19 00:45 resultcache-hc-27961-Data.db
    -rw-r--r-- 1 root wheel 95M Mar 19 09:35 resultcache-hc-27967-Data.db
    -rw-r--r-- 1 root wheel 98M Mar 19 17:04 resultcache-hc-27973-Data.db
    -rw-r--r-- 1 root wheel 19M Mar 19 18:23 resultcache-hc-27974-Data.db
    -rw-r--r-- 1 root wheel 19M Mar 19 19:50 resultcache-hc-27975-Data.db
    -rw-r--r-- 1 root wheel 19M Mar 19 21:17 resultcache-hc-27976-Data.db
    -rw-r--r-- 1 root wheel 19M Mar 19 22:05 resultcache-hc-27977-Data.db

    I insert everything with 7days TTL + 10 days tombstone expiration. This
    means that there should not be in ideal case nothing older then Mar 2.

    These 3x5 GB files waits to be compacted. Because they contains only
    tombstones, cassandra should make some optimalizations - mark sstable as
    tombstone only, remember time of latest tombstone and delete entire
    sstable without need to merge it first.

    1. Question is why create tombstone after row expiration at all, because
    it will expire at all cluster nodes at same time without need to be deleted.
    2. Its super column family. When i dump oldest sstable, i wonder why it
    looks like this:

    {
    "7777772c61727469636c65736f61702e636f6d": {},
    "7175616b652d34": {"1": {"deletedAt": -9223372036854775808,
    "subColumns": [["crc32","4f34455c",1328220892597002,"d"],
    ["id","4f34455c",1328220892597000,"d"],
    ["name","4f34455c",1328220892597001,"d"],
    ["size","4f34455c",1328220892597003,"d"]]}, "2": {"deletedAt":
    -9223372036854775808, "subColumns":
    [["crc32","4f34455c",1328220892597007,"d"],
    ["id","4f34455c",1328220892597005,"d"],
    ["name","4f34455c",1328220892597006,"d"],
    ["size","4f34455c",1328220892597008,"d"]]}, "3": {"deletedAt":
    -9223372036854775808, "subColumns":

    * all subcolums are deleted. why to keep their names in table? isnt
    marking column as deleted enough? "1": {"deletedAt":
    -9223372036854775808"} enough?
    * another question is why was not tombstone entire row, because all its
    members were expired.
  • Radim Kolar at Mar 20, 2012 at 4:26 pm

    Dne 20.3.2012 15:46, Jeremiah Jordan napsal(a):
    You need to create the tombstone in case the data was inserted without a timestamp at some point.
    Yes i figured it out too. It would helped if you can assign TTL to whole
    sstable. Most common use of TTL is for cache and there is most likely
    dedicated CF for caching.
  • Ruslan usifov at Mar 19, 2012 at 10:33 pm
    Do you make major compaction??

    2012/3/19 Radim Kolar <hsn@filez.com>:
    I suspect that running cluster wide repair interferes with TTL based
    expiration. I am running repair every 7 days and using TTL expiration time 7
    days too. Data are never deleted.
    Stored data in cassandra are always growing (watching them for 3 months) but
    they should not. If i run manual cleanup, some data are deleted but just
    about 5%. Currently there are about 3-5 times more rows then i estimate.

    I suspect that running repair on data with TTL can cause:

    1. time check for expired records is ignored and these data are streamed to
    other node and they will be alive again
    or
    2. streaming data are propagated with full TTL. Lets say that i have ttl 7
    days, data are stored for 5 days and then repaired, they should be sent to
    other node with ttl 2 days not 7.

    Can someone do testing on this case? I could not play with production
    cluster too much.
  • Radim Kolar at Mar 19, 2012 at 10:51 pm

    Dne 19.3.2012 23:33, ruslan usifov napsal(a):
    Do you make major compaction??
    no, i do cleanups only. Major compactions kills my node with OOM.
  • Ruslan usifov at Mar 20, 2012 at 12:10 am
    cleanup in you case doesn't have any seens. You write that repair work
    for you, so you can stop cassandra daemon, delete all data from folder
    that contain problem data, start cassandra daemon, and run nodetool
    repair, but in this case ypu must have replication factor for keyspace
    3 and have consistency level for data manipulation QUORUM
    2012/3/20 Radim Kolar <hsn@filez.com>:
    Dne 19.3.2012 23:33, ruslan usifov napsal(a):
    Do you make major compaction??
    no, i do cleanups only. Major compactions kills my node with OOM.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriescassandra
postedMar 19, '12 at 10:17a
activeMar 20, '12 at 4:26p
posts11
users5
websitecassandra.apache.org
irc#cassandra

People

Translate

site design / logo © 2022 Grokbase