Something I have come up against at various times is when the Vector class (that is, a Vec3/4 etc, not a large storage container) is one implemented for a platform that supports SIMD, like altivec/vmx, but simply uses floats, aligned in a structure.
class Vector4 { public: float x ALIGN(16), y, z, w; /* insert operators, constructors, etc here */ };
Obviously this works just fine, however slow (even if you use vector intrinsics in your operator methods, the compiler will probably force the data to memory, amongst many other things). When the time comes where you actually want to move over to using real hardware-supported vectors, passing in vector registers, the biggest issue seems to be that pretty much your whole codebase now accesses data via .x, .y, .z and .w .
One way to get it working is to union your vector float data member with the four scalar components, which at first glance seems like it should work fine, but in my experience most compilers just don’t seem to produce even semi-optimal code, so that option is completely out.
Most implementations using vmx types will offer .SetX(float) and .GetX() methods which use correct intrinsics as to be optimal, but when migrating code over, having to separate the accesses into distinct read and write operations and handle them using different methods can be very time consuming. Sadly, there will always be source changes, but I would like to keep it to a minimum, so changing .x to be .x() would be ok, but it needs to work for both reading and writing.
One method is to have .x() return a float reference to the data at that element of the vector, like &vec[1], blah blah. When I read this though it always worries me, as the statement involves addressof, which then implies memory. The compiler may be able to optimise as to not force anything into memory, but most times it doesn’t seem to be able to. This can always be our backup plan.
What I attempted was to make use of a temporary class returned by the x() method of the vector to indirectly modify a reference to our real vmx vector. Here is the basic code (just typed out, so sorry if it doesn’t compile-first-time on a copy and paste – and sorry for the bad source formatting throughout, I need to set up some sort of WordPress plugin for that – Thanks to spoofmeat in the comments for directing me to the correct sourcecode tag!
VMXVectorClass {
vec_float4 m128;
template <int element>
class dScalar {
public:
vec_float4& v;
dScalar(vec_float4& _v) : v(_v) {}
inline operator float() const {
return vec_extract(v,element);
}
inline float operator=(float f) {
vec_insert(f,v,element);return f;
}
}
inline dScalar<0> x() { return dScalar<0>(m128); }
inline dScalar<1> y() { return dScalar<1>(m128); }
inline dScalar<2> z() { return dScalar<2>(m128); }
/*the rest of the vector goes here*/
} __attribute__(aligned16, vecreturn, passinvector, etc);
To my surprise, this worked very well in my simple test cases. The compiler optimised away the vec_float4 reference, everything passed in registers, good times. Obviously I need to look into this more to see if this is actually better than the simple returning of float-reference to vector in ‘memory’. One issue that I noticed immediately is you can’t use .x() into … argument calls, like printf. You must explicitly cast to float, which is obvious when you see what the code does – you don’t want the actual dScalar class itself passed into printf, you want the float.
I’d like feedback on what other folk have tried, and if they know of a good benchmark I could try this method on, especially interested to see if it is indeed any better than the simple float& x() version. I had a look at some of the code generated and it didn’t seem too different from .SetX and .GetX explicitly, but a run-time example to test would be best, one that mixed significant non-scalar usage of the vectors along with the accessors, to make sure things were not being forced out of registers.
An interesting side-effect from using a temporary class: if it did turn out you needed to then migrate further over to the .SetX and .GetX methods at a later date, you could use this class to identify the cases that needed .SetX by adding the deprecated attribute to the operator= method, and then just work through the warnings.