Difference between Cassandra’s Consistency Level ANY and ONE

Just a short note this time. I’ve noticed that there’s a thing that some of the people miss when trying to understand Cassandra’s data model and the two “weakest” Consistency Levels for writes: ANY and ONE. What’s the difference? For CL.ONE it’s required that at least one replica responssible for a specific key, pointed by Partitioner, is able to receive a request and write the data. For CL.ANY it’s required that request is received by any node, including the node that is not responssible for a key. In this case Hinted Handoff will be used and data will by send to proper replicas when they’re available. However, keep in mind that it may happen, that a node that received such query will crash before any of the replica nodes will be able to receive a HH request. In this case, if you fail to recover that node (i.e. because of hardware failure) these data will be lost forever. You have been warned.

Removing Cassandra snapshots from all the nodes at once

Sometimest you might want to clean your nodes by removing some snapshots that you don’t need. Even if you did not create them, they might be there – Cassandra makes them before scrub or before truncate. However, removing them one by one from the whole cluster might be a pain, so I wrote a short script that does it.

Continue reading

Safe Cassandra shutdown and restart

If you are a Cassandra user you’re probably experienced enough to know how to stop or restart Linux services – that’s an obvious thing. However in some cases it might be a problem if a service you turned off goes down, especially if other services have been using it. While Cassandra is very robust and crash-safe (pkill -9 cassandra works fine ;-) ), it’s never a bad idea to do things in a way that minimizes the risk of something going wrong. The other advantage of clean Cassandra restart procedure is saving some startup time. Here is how to do it.

Continue reading

Problem with very slow sendmail startup

I was playing a bit with some virtual machines I need for testing, when after a reboot I noticed that sendmail is starting very slow – it took about 3-4 minutes to have it working. I’ve checked the log too see what’s wrong:

[root@hdps01 ~]# tail /var/log/maillog
Jul 11 21:26:43 hdps01 sm-msp-queue[1266]: My unqualified host name (hdps01) unknown; sleeping for retry
Jul 11 21:27:43 hdps01 sm-msp-queue[1266]: unable to qualify my own domain name (hdps01) -- using short name
Jul 11 21:27:43 hdps01 sm-msp-queue[1289]: starting daemon (8.14.4): queueing@01:00:00

So, what was the problem?

Continue reading

Be careful with Hadoop and pbzip2!

Or you may be surprised at one day when you see that your output looks like it’s missing a lot of data. The problem affects Hadoop versions older than 1.4 (according to Jira) and is caused by the misinterpretation of EOS in compressed files, which is interpreted as EOF, so it – obviously – ends reading the file:

https://issues.apache.org/jira/browse/COMPRESS-146

So, if your Hadoop is misbehaving and your output data look odd without any reason – ask your admins if they didn’t change bzip2 to pbzip2.

A few words about Python’s extended slices

One thing that most Python users learn at the very beginning are list slices – defining a part of the list using the samplelist[begin:end] syntax. It’s great thing, but – surprisingly – many people don’t even know, that there’s different syntaxt for this, containing one additional parameter – “step”: samplelist[begin:end:step]. How does it work?

Continue reading

Very good interactive Git workflow cheat sheet

Few minutes ago I was looking for a Git workflow cheat sheet to verify some rarely-used parts of my knowledge before doing something I might regret. Actually I was hoping to find something very simple (preferrably one pdf page or so), but instead I found this one, which is a very good, interactive webpage. So I decided to share this find with you, because it’s definitely worth it:

http://ndpsoftware.com/git-cheatsheet.html

Git workflow cheat sheet

Git workflow cheat sheet

What’s best in it, it presents everything in a very intuitive, visual way which is easy to understand. If you are looking for a command which will completely revert your commited changes, you can just click on “Local Repository” and see which arrow points to “Workspace” – it’s a git reset --hard one. How about leaving the changes you’ve made, but reverting commit? It’s an arrow with git reset --soft. Brillant!

It’s not only a good thing for people who look for a typical cheat sheet, but also for those who have some problems with understanding git workflow.

I really like it – nice work guys!

More on Cassandra’s SimpleAuthority permissions

Few days ago I had some doubts on how Cassandra’s SimpleAuthenticator and SimpleAuthority really work. I mean – I was not sure of the way I should configure them to get the expected results. It may seem obvious now, but I had to look at source code to find out what is possible and what is not. So, to save your time, here’s a brief description of this.

Continue reading