Mounting HDFS cluster as a block device with hadoop-fuse

Using Hadoop may quickly become very annoying if you have to navigate through the HDFS filesystem with a standard hadoop command. As a Linux user I got used to TAB-autocompletion feature, which lets me quickly and easily use my filesystem so I was really disappointed with this difficulty. Luckily – there’s a solution which eased my pain!

Continue reading

Installing Hadoop on Ubuntu 11.10 Oneiric Ocelot

Just a quick tip if you try to set up Hadoop on your Ubuntu 11.10 and you wonder if Maverick’s version will work – yes, it will. Just follow the installation guide inserting this:

deb http://archive.cloudera.com/debian maverick-cdh3 contrib
deb-src http://archive.cloudera.com/debian maverick-cdh3 contrib

in /etc/apt/sources.list.d/cloudera.list – that’s it!

OK, not exactely – some people (I was one of them) report a NullPointer Exception. Something similar to this:

1
2
3
4
5
6
7
8
9
Error: java.lang.NullPointerException
        at java.util.concurrent.ConcurrentHashMap.
            get(ConcurrentHashMap.java:768)
        at org.apache.hadoop.mapred.ReduceTask$
            ReduceCopier$GetMapEventsThread.
            getMapCompletionEvents(ReduceTask.java:2683)
        at org.apache.hadoop.mapred.
            ReduceTask$ReduceCopier$GetMapEventsThread.
            run(ReduceTask.java:2605)

check your Hadoop host settings (preferrably use IP instead of hostname) and/or /etc/hosts, which may contain a strange entry with something like .(null) – just leave one, proper hostname in that line. For me – it started to work after this fix.

I won’t risk saying that it’ll be OK for production enviroment (production server with Ubuntu? OK…), but for testing – it works perfectly.

Strange behavior of default function parameters in Python

Sometimes things don’t work in the way we want them to. Today I was asked why this piece of code is not working properly (OK, given problem was a bit different and much more “real-life-applicable”, but it’s just an example):

1
2
3
4
5
6
7
8
9
def mypush(val, mylist=[]):
	mylist.append(val)
	print mylist, ': ', id(mylist)
 
lst = []
mypush(1, lst)
mypush(2, lst)
mypush(1)
mypush(2)

The output is:

1
2
3
4
[1]    :  139750946213184
[1, 2] :  139750946213184
[1]    :  139750946218496
[1, 2] :  139750946218496

What’s wrong with it? NOTHING – that’s the way Python should and will behave in such case.

Do you agree? If yes – stop reading, because you won’t learn anything new. Go to XKCD instead. If no – here’s a brief explaination:

Continue reading

More on swapping two variables without using the third one

No matter what you do, sometimes you just need to swap some values – I guess you did it thousand times. And yes, I know – this problem is trivial. In Python it’s completely trivial. You just swap two variables in the most “natural” way:

1
a, b = b, a

instead of doing it this way:

1
2
3
tmp = a
a = b
b = tmp

Beautiful! This is why we love Python, isn’t it?

That’s the moment where I should end this post, if it was about doing it in a “pythonic” way only. But it’s not – it’s about the ideas, not the solutions in any specific language. So…

Continue reading