Contrary to what the title may seem to imply, I’m not making any grand proclamations here. Rather, inspired by a discussion with a friend and co-author on Facebook this morning, I’m going to note one fairly common data analysis case in which Python (NumPy) behaves in a totally straightforward manner, R in a similar but slightly less straightforward manner, and Matlab in an annoying and not particularly straightforward manner.
The case is the calculation of means (or other functions) along specified axes of multidimensional arrays.
In IPython (with pylab invoked), you specify the axis along which you want to apply the function of interest, and, at least in my opinion, you get output arrays that are exactly the shape you would expect. If you have a array and you calculate the mean along the first axis, you get a array, and analogously for the second and third axes:
In : X = random.normal(size=(3,4,5)) In : X.mean(axis=0).shape Out: (4, 5) In : X.mean(axis=1).shape Out: (3, 5) In : X.mean(axis=2).shape Out: (3, 4)
In R, you get similarly sensible results, but you have to specify the axes along which you don’t want to apply the function (which I find much more confusing than the Python approach shown above):
> X = array(rnorm(60),dim=c(3,4,5)) > dim(apply(X,c(2,3),mean))  4 5 > dim(apply(X,c(1,3),mean))  3 5 > dim(apply(X,c(1,2),mean))  3 4
And here’s what Matlab does:
>> X = random(makedist('Normal'),3,4,5); >> size(mean(X,1)) ans = 1 4 5 >> size(mean(X,2)) ans = 3 1 5 >> size(mean(X,3)) ans = 3 4
Ugh. For some reason, Matlab preserves the dimension if you apply the function on the first or second axis, but drops it if you apply it on the third (or greater) axis. This is annoying.
So, in summary, I continue to be happy using Python for almost everything, R for most of what’s left over, and Matlab only very rarely anymore.