About two weeks ago I felt a huge need of learning something new – a language, framework… Anything! Just to make my brain work a bit harder. After looking around for a few days (sorry Scala and Erlang, you have to wait a bit longer!) I decided to become more familiar with modern cloud application platforms which are becoming more popular these days (or
Paas model in general). Because I think that real projects are much better that rewriting tutorials and reading docs, I decided to write a small Flask web app and deploy it on Heroku. After a few days of learning Flask, coding and running my app on a built-in Flask server I decided to move to a bit more production-ready stage by running it using Gunicorn. And here the story begins…
Sometimest you might want to clean your nodes by removing some snapshots that you don’t need. Even if you did not create them, they might be there – Cassandra makes them before
scrub or before
truncate. However, removing them one by one from the whole cluster might be a pain, so I wrote a short script that does it.
If you are a Cassandra user you’re probably experienced enough to know how to stop or restart Linux services – that’s an obvious thing. However in some cases it might be a problem if a service you turned off goes down, especially if other services have been using it. While Cassandra is very robust and crash-safe (
pkill -9 cassandra works fine ;-) ), it’s never a bad idea to do things in a way that minimizes the risk of something going wrong. The other advantage of clean Cassandra restart procedure is saving some startup time. Here is how to do it.
A few days ago I was about to upgrade Cassandra cluster from 1.1.0 (+ Authentication patch I wrote) to 1.1.6, but – a bit surprisingly – I realized that something is wrong with new Cassandra. I had no problems with creating keyspaces before, as I set proper modify-keyspaces property in access.properties, but after the upgrade it stopped to work. After a short investigation I found out that there were some significant changes in Cassandra’s Permission system which broke SimpleAuthenticator. This article is about how to make it work again.
It’s not always enough to have only one virtual machine for testing purposes. More – it’s not enough in most of the cases. One of such cases is – for example – when you want to try a multi-node configuration for software like Hadoop or Cassandra, or make a failover test of your system. For me most comfortable and useful is to have a configuration that allows guest system to use the Internet (usually it’s a default, NAT mode) but also makes it possible to easily connect to guest system from host system (not by port forwarding) and makes guest systems able to “talk” with each other – it’s good enough to mimic most of production configurations I use. To do this we need to setup a two-network-cards configuration for alle the guest systems we have. This article is about how to make it work.
I was playing a bit with some virtual machines I need for testing, when after a reboot I noticed that sendmail is starting very slow – it took about 3-4 minutes to have it working. I’ve checked the log too see what’s wrong:
[root@hdps01 ~]# tail /var/log/maillog
Jul 11 21:26:43 hdps01 sm-msp-queue: My unqualified host name (hdps01) unknown; sleeping for retry
Jul 11 21:27:43 hdps01 sm-msp-queue: unable to qualify my own domain name (hdps01) -- using short name
Jul 11 21:27:43 hdps01 sm-msp-queue: starting daemon (8.14.4): queueing@01:00:00
So, what was the problem?
Or you may be surprised at one day when you see that your output looks like it’s missing a lot of data. The problem affects Hadoop versions older than 1.4 (according to Jira) and is caused by the misinterpretation of EOS in compressed files, which is interpreted as EOF, so it – obviously – ends reading the file:
So, if your Hadoop is misbehaving and your output data look odd without any reason – ask your admins if they didn’t change bzip2 to pbzip2.
Today I was asked to set up user authentication in Cassandra, so we could stop using the “default” user with unrestricted access only. I have to say that I was really surprised when I noticed that there’s NO out-of-the-box authentication and authorization framework in it. Luckily, it can be easily enabled in a few steps which I’m going to show you.
One important thing – SimpleAuthenticator we’re going to use is in the “examples” directory of Cassandra package. It’s because it is considered to be very simple and not very safe (it was even called a “toy” in one of Cassandra’s Jira tasks), so DO NOT rely on it as on a serious protection tool for your system. However it still fits many requirements (i.e. you don’t want user to make a mess in a Column Family he doesn’t need to work on) so you may find it useful. You have been warned.
Using Hadoop may quickly become very annoying if you have to navigate through the HDFS filesystem with a standard
hadoop command. As a Linux user I got used to TAB-autocompletion feature, which lets me quickly and easily use my filesystem so I was really disappointed with this difficulty. Luckily – there’s a solution which eased my pain!
Few weeks ago I’ve bought a new laptop and I’ve installed a new Xubuntu 11.10 on it. Surprisingly (and sadly), when I tried to connect an external display to it, I found out that it’s impossible to do it in “out of the box” version of Xubuntu. I had only a choice to use my external display instead of the internal one or to do a “mirror” configuration, having the same on both displays… Satisfying? No, thanks. I had to find out how to do it in the way I want it to work.