Clean code == Good logic

November 28, 2010

A short post here, after some discussions I’ve had on twitter over the past week or two. I graduated nearly 5 years ago, and since then have seen lots of production source code, mainly high-performance/games related – real-time it must to go fast stuff. Since only five years ago I was a graduate, It’s quite fresh-ish in my memory as to what sort of things were taught.


Design patterns are something which have given me, and certainly many of my coworkers, a mountain of grief when it comes to getting performance from code. I can remember it being taught as some sort of rule book in University, something you can refer to when you have a certain logic issue, and then implement that pattern. Unfortunately this ideal exists now in many graduates, and therefore quite ingrained into current programming circles. In my opinion we have a bad habit: a design pattern copy-paste culture.

I’m not going to go into why some design patterns can cause havoc when you come to optimize; hell some may make it easier. It’s basically the lack of thought that now goes into implementations, meaning for example a generic pattern may be applied to a problem which really doesn’t need it. But this code is in the foundation of systems usually, the hardest part to modify given tight deadlines late on in a project.

Some may say that this runtime performance drop is a worthwhile price to pay for clean code; of course that is certainly not true in my line of work. For me the solving of a problem in a really efficient way for the hardware makes things nice, for others it’s hiding everything under a generic base, and having the source ‘look’ nice (whatever the bloody hell that means). Don’t get me wrong, bad implementations have always existed; but I think the way design patterns are being taught is not helping.

Really, this is lie #1 from Mike Actons GDC2010 “Three Big Lies in Game Development” : Software is not a platform.

One persons clean source code is someones elses nightmare, another’s ‘bad’ source being an abomination to others. The fact here though is that regardless of how the source ‘looks and feels’, the compiler can generate pretty good source code for both implementations, and can try to optimize things as much as possible – obviously to the extent which the logic of the solution allows.  The real problem here is that a solution is defined by the logic, something the compiler cannot modify.


Whilst the compiler can generate ‘good’ machine code from ‘bad’ source code, it can not replace bad logic with good logic.

I think that about says it all, really. Before you go pasting that Design Pattern into your project, really think: Does this solve the problem in the best possible way for my hardware?


Migrating scalar Vec4 classes to SIMD

November 26, 2010

Something I have come up against at various times is when the Vector class (that is, a Vec3/4 etc, not a large storage container) is one implemented for a platform that supports SIMD, like altivec/vmx, but simply uses floats, aligned in a structure.

class Vector4 { public:  float x ALIGN(16), y, z, w;       /* insert operators, constructors, etc here */ };

Obviously this works just fine, however slow (even if you use vector intrinsics in your operator methods, the compiler will probably force the data to memory, amongst many other things). When the time comes where you actually want to move over to using real hardware-supported vectors, passing in vector registers, the biggest issue seems to be that pretty much your whole codebase now accesses data via .x, .y, .z and .w .

One way to get it working is to union your vector float data member with the four scalar components, which at first glance seems like it should work fine, but in my experience most compilers just don’t seem to produce even semi-optimal code, so that option is completely out.

Most implementations using vmx types will offer  .SetX(float) and .GetX() methods which use correct intrinsics as to be optimal, but when migrating code over, having to separate the accesses into distinct read and write operations and handle them using different methods can be very time consuming. Sadly, there will always be source changes, but I would like to keep it to a minimum, so changing .x to be .x() would be ok, but it needs to work for both reading and writing.

One method is to have .x() return a float reference to the data at that element of the vector, like &vec[1], blah blah. When I read this though it always worries me, as the statement involves addressof, which then implies memory. The compiler may be able to optimise as to not force anything into memory, but most times it doesn’t seem to be able to. This can always be our backup plan.

What I attempted was to make use of a temporary class returned by the x() method of the vector to indirectly modify a reference to our real vmx vector. Here is the basic code (just typed out, so sorry if it doesn’t compile-first-time on a copy and paste – and sorry for the bad source formatting throughout, I need to set up some sort of WordPress plugin for that – Thanks to spoofmeat in the comments for directing me to the correct sourcecode tag!

VMXVectorClass {
  vec_float4 m128;

  template <int element>
  class dScalar {
    vec_float4& v;
    dScalar(vec_float4& _v) : v(_v) {}

    inline operator float() const { 
      return vec_extract(v,element);

    inline float operator=(float f) { 
      vec_insert(f,v,element);return f;

  inline dScalar<0> x() { return dScalar<0>(m128); }

  inline dScalar<1> y() { return dScalar<1>(m128); }

  inline dScalar<2> z() { return dScalar<2>(m128); }

  /*the rest of the vector goes here*/

} __attribute__(aligned16, vecreturn, passinvector, etc);

To my surprise, this worked very well in my simple test cases. The compiler optimised away the vec_float4 reference, everything passed in registers, good times. Obviously I need to look into this more to see if this is actually better than the simple returning of float-reference to vector in ‘memory’. One issue that I noticed immediately is you can’t use .x() into … argument calls, like printf. You must explicitly cast to float, which is obvious when you see what the code does – you don’t want the actual dScalar class itself passed into printf, you want the float.

I’d like feedback on what other folk have tried, and if they know of a good benchmark I could try this method on, especially interested to see if it is indeed any better than the simple float& x() version. I had a look at some of the code generated and it didn’t seem too different from .SetX and .GetX explicitly, but a run-time example to test would be best, one that mixed significant non-scalar usage of the vectors along with the accessors, to make sure things were not being forced out of registers.

An interesting side-effect from using a temporary class: if it did turn out you needed to then migrate further over to the .SetX and .GetX methods at a later date, you could use this class to identify the cases that needed .SetX by adding the deprecated attribute to the operator= method, and then just work through the warnings.