Migrating scalar Vec4 classes to SIMD

Something I have come up against at various times is when the Vector class (that is, a Vec3/4 etc, not a large storage container) is one implemented for a platform that supports SIMD, like altivec/vmx, but simply uses floats, aligned in a structure.

class Vector4 { public:  float x ALIGN(16), y, z, w;       /* insert operators, constructors, etc here */ };

Obviously this works just fine, however slow (even if you use vector intrinsics in your operator methods, the compiler will probably force the data to memory, amongst many other things). When the time comes where you actually want to move over to using real hardware-supported vectors, passing in vector registers, the biggest issue seems to be that pretty much your whole codebase now accesses data via .x, .y, .z and .w .

One way to get it working is to union your vector float data member with the four scalar components, which at first glance seems like it should work fine, but in my experience most compilers just don’t seem to produce even semi-optimal code, so that option is completely out.

Most implementations using vmx types will offer  .SetX(float) and .GetX() methods which use correct intrinsics as to be optimal, but when migrating code over, having to separate the accesses into distinct read and write operations and handle them using different methods can be very time consuming. Sadly, there will always be source changes, but I would like to keep it to a minimum, so changing .x to be .x() would be ok, but it needs to work for both reading and writing.

One method is to have .x() return a float reference to the data at that element of the vector, like &vec[1], blah blah. When I read this though it always worries me, as the statement involves addressof, which then implies memory. The compiler may be able to optimise as to not force anything into memory, but most times it doesn’t seem to be able to. This can always be our backup plan.

What I attempted was to make use of a temporary class returned by the x() method of the vector to indirectly modify a reference to our real vmx vector. Here is the basic code (just typed out, so sorry if it doesn’t compile-first-time on a copy and paste – and sorry for the bad source formatting throughout, I need to set up some sort of WordPress plugin for that – Thanks to spoofmeat in the comments for directing me to the correct sourcecode tag!

VMXVectorClass {
  vec_float4 m128;

  template <int element>
  class dScalar {
    vec_float4& v;
    dScalar(vec_float4& _v) : v(_v) {}

    inline operator float() const { 
      return vec_extract(v,element);

    inline float operator=(float f) { 
      vec_insert(f,v,element);return f;

  inline dScalar<0> x() { return dScalar<0>(m128); }

  inline dScalar<1> y() { return dScalar<1>(m128); }

  inline dScalar<2> z() { return dScalar<2>(m128); }

  /*the rest of the vector goes here*/

} __attribute__(aligned16, vecreturn, passinvector, etc);

To my surprise, this worked very well in my simple test cases. The compiler optimised away the vec_float4 reference, everything passed in registers, good times. Obviously I need to look into this more to see if this is actually better than the simple returning of float-reference to vector in ‘memory’. One issue that I noticed immediately is you can’t use .x() into … argument calls, like printf. You must explicitly cast to float, which is obvious when you see what the code does – you don’t want the actual dScalar class itself passed into printf, you want the float.

I’d like feedback on what other folk have tried, and if they know of a good benchmark I could try this method on, especially interested to see if it is indeed any better than the simple float& x() version. I had a look at some of the code generated and it didn’t seem too different from .SetX and .GetX explicitly, but a run-time example to test would be best, one that mixed significant non-scalar usage of the vectors along with the accessors, to make sure things were not being forced out of registers.

An interesting side-effect from using a temporary class: if it did turn out you needed to then migrate further over to the .SetX and .GetX methods at a later date, you could use this class to identify the cases that needed .SetX by adding the deprecated attribute to the operator= method, and then just work through the warnings.

12 Responses to Migrating scalar Vec4 classes to SIMD

  1. Jonathan says:

    Interesting little problem – I’ve played with several similar proxy classes in the past.

    Here’s an attempt that avoids the need for x, y and z to be functions:

    struct vec {
    struct scalar {
    vec_float4 v;
    operator float() { return vec_extract(v,e); }
    scalar& operator=(float f) {
    v = vec_insert(f,v,e);
    return *this;
    union {
    scalar x;
    scalar y;
    scalar z;
    vec_float4 v;

    Note that struct scalar can’t have an explicit constructor (but shouldn’t need one in this case), and that I think scalar& is a “more correct” return value from operator=. (Also, you missed the assignment to v in operator=)

    Initial tests show that this works, and seems to result in identical generated code.

    (imho, inserting scalars into vectors like this shouldn’t be encouraged – make expensive ops harder to do, encourage programmers to come up with something smarter 🙂

    I think my code also thoroughly gives the finger to the Rules of Aliasing, but gcc interprets it the right way…

    • Colin Riley says:

      That is interesting; in my experience the union pretty much kills the class as a whole – it wont be vecreturned/passed in a vector register etc. Going to need to test this one, would be very very nice if the compiler decided to do something correctly for once 🙂

      You’re obviously right with the assign to v in the insert, oops 🙂

    • While I haven’t worked with PS3 yet, there was a talk at GDC 2007 called “Under the Compiler’s Hood: Supercharge Your PLAYSTATION®3 (PS3TM) Code” by Andy Thomason from SN Systems (pdf here: http://cmpmedia.vo.llnwd.net/o1/vault/gdc07/slides/S4428i1.pdf ) about coding for the PS3.

      Andy explicitly states (page 10) not to put a union of scalar and SIMD-vec values into the class/object as it complicates the compilers job when accessing an object via pointers or by value-copying it – the compiler doesn’t know how to move the objects memory and might decide against keeping it in the SIMD registers.

      He suggests to put the SIMD type into the class and use unions in accessor functions for explicit casting (aka: move from SIMD registers to scalar registers).

      However, the paper is from 2007 – so compilers might be smarter/more powerful by now.

      • Jonathan says:

        This isn’t a union of vector and scalar – it’s a union of vector and vector (well, vector and simple aggregate…). Still quite capable of confusing a compiler 🙂

        I’m quite interested to see how it compares in real-world (style) usage.

      • Colin Riley says:

        Yeah Jonathan, i’ll have to definately try that method! Would be much nicer.

    • Colin Riley says:

      Sadly this fails due to one stupid C++ standards rule, the fact you can’t have assignment operators in classes used in a union. So someVector.x = otherVec.x actually copies the whole union. Quite a rediculous C++ rule, as the vec_float4 has a non-complex constructor/assignment to use when copying the vec3.

  2. spoofmeat says:


    That describes how to post highlighted source code on wordpress.com blogs

    You will probably need to use the HTML editor not the wysiwyg one though.

  3. jalf says:

    This is almost creepy, I went through the exact same line of thought just a couple of weeks ago. I considered returning a “proxy class” too, but figured it was a bit overkill (and I worried that the compiler would be unable to optimize it, given how crippled it generally is in general when it comes to SIMD code)

    I settled for just defining a traditional setter (which takes its argument by value, and returns void), and a getter (which returns a float by value), so there are no references in the interface.

    Anyway, like you, I was surprised to see people use the union trick (smells like aliasing), or even just storing a raw array of floats in the vector, instead of just using the SIMD datatypes throughout.

%d bloggers like this: