Do you sometimes feel like you missed something completely obvious, that you should really know about, that could have helped you many times, but you just had no idea that it exists? Something, that you simply assumed “it’s not here, move along”? I felt this way today. How did it happen? Well…
Just a short note this time. I’ve noticed that there’s a thing that some of the people miss when trying to understand Cassandra’s data model and the two “weakest” Consistency Levels for writes:
ONE. What’s the difference? For
CL.ONE it’s required that at least one replica responssible for a specific key, pointed by Partitioner, is able to receive a request and write the data. For
CL.ANY it’s required that request is received by any node, including the node that is not responssible for a key. In this case Hinted Handoff will be used and data will by send to proper replicas when they’re available. However, keep in mind that it may happen, that a node that received such query will crash before any of the replica nodes will be able to receive a HH request. In this case, if you fail to recover that node (i.e. because of hardware failure) these data will be lost forever. You have been warned.
Sometimest you might want to clean your nodes by removing some snapshots that you don’t need. Even if you did not create them, they might be there – Cassandra makes them before
scrub or before
truncate. However, removing them one by one from the whole cluster might be a pain, so I wrote a short script that does it.
If you are a Cassandra user you’re probably experienced enough to know how to stop or restart Linux services – that’s an obvious thing. However in some cases it might be a problem if a service you turned off goes down, especially if other services have been using it. While Cassandra is very robust and crash-safe (
pkill -9 cassandra works fine ;-) ), it’s never a bad idea to do things in a way that minimizes the risk of something going wrong. The other advantage of clean Cassandra restart procedure is saving some startup time. Here is how to do it.
Just for people who care – I’m alive and I’m fine. I had a lot of work recently, including wedding preparation, but this blog is not dead! I’m still coding, I’m still hacking and I will be posting again soon! Stay tuned! :-)
I was playing a bit with some virtual machines I need for testing, when after a reboot I noticed that sendmail is starting very slow – it took about 3-4 minutes to have it working. I’ve checked the log too see what’s wrong:
[root@hdps01 ~]# tail /var/log/maillog Jul 11 21:26:43 hdps01 sm-msp-queue: My unqualified host name (hdps01) unknown; sleeping for retry Jul 11 21:27:43 hdps01 sm-msp-queue: unable to qualify my own domain name (hdps01) -- using short name Jul 11 21:27:43 hdps01 sm-msp-queue: starting daemon (8.14.4): queueing@01:00:00
So, what was the problem?
Or you may be surprised at one day when you see that your output looks like it’s missing a lot of data. The problem affects Hadoop versions older than 1.4 (according to Jira) and is caused by the misinterpretation of EOS in compressed files, which is interpreted as EOF, so it – obviously – ends reading the file:
So, if your Hadoop is misbehaving and your output data look odd without any reason – ask your admins if they didn’t change bzip2 to pbzip2.
One thing that most Python users learn at the very beginning are list slices – defining a part of the list using the
samplelist[begin:end] syntax. It’s great thing, but – surprisingly – many people don’t even know, that there’s different syntaxt for this, containing one additional parameter – “step”:
samplelist[begin:end:step]. How does it work?
Few minutes ago I was looking for a Git workflow cheat sheet to verify some rarely-used parts of my knowledge before doing something I might regret. Actually I was hoping to find something very simple (preferrably one pdf page or so), but instead I found this one, which is a very good, interactive webpage. So I decided to share this find with you, because it’s definitely worth it:
What’s best in it, it presents everything in a very intuitive, visual way which is easy to understand. If you are looking for a command which will completely revert your commited changes, you can just click on “Local Repository” and see which arrow points to “Workspace” – it’s a
git reset --hard one. How about leaving the changes you’ve made, but reverting commit? It’s an arrow with
git reset --soft. Brillant!
It’s not only a good thing for people who look for a typical cheat sheet, but also for those who have some problems with understanding git workflow.
I really like it – nice work guys!
Few days ago I had some doubts on how Cassandra’s SimpleAuthenticator and SimpleAuthority really work. I mean – I was not sure of the way I should configure them to get the expected results. It may seem obvious now, but I had to look at source code to find out what is possible and what is not. So, to save your time, here’s a brief description of this.