Removing Cassandra snapshots from all the nodes at once

Sometimest you might want to clean your nodes by removing some snapshots that you don’t need. Even if you did not create them, they might be there – Cassandra makes them before scrub or before truncate. However, removing them one by one from the whole cluster might be a pain, so I wrote a short script that does it.

This script was written to fit my own needs, it’s very simple and might not be user-proof, so use it carefully – take a look at the source before you use it and make sure it does what you want him to.

You can find the script on my GitHub account – it’s called (surprise!) remove_snapshots.sh.

So, how to use it?

./remove_snapshots.sh <keyspace> <pattern>

First you have to modify it by adding hostnames of all your Cassandra nodes to the variable ‘node’. It’s hardcoded, because it’s not going to change very often. You may also want to change the path to Cassandra’s data directory. Additionally this script assumes that you have access to root account on Cassandra nodes – if not, you will have to modify this script a little. Now you can use it.

Let’s say you have done truncate on column family FooBar in a keyspace test, Cassandra will create a snapshot for it on each node in your cluster. Assuming that you know what you’re doing, you’ll probably want to remove the snapshots, because you truncated the data because you simply don’t want them to waste your disks’ space. Unluckily, you cannot remove them all at once with a nodetool, which might be a bit annoying. To do it use:

./remove_snapshots.sh test FooBar

However, if you truncated more column families – like FooBaz and FooBam – you can do this:

./remove_snapshots.sh test Foo

Second parameter – pattern – is used for matching file name using find and will match anything like *pattern* in file name.

That’s all! I hope it helps.

Comments are closed.