Now posting at domipheus.com

July 17, 2014

Where I post now is over at http://labs.domipheus.com


Refactoring for Task Concurrency

February 3, 2011

This is really more of a follow on from my previous article, Preparing for Parallelism. Some of the comments wanted more explanation, and an example of what I actually mean about the likes of incorrect data and controlling execution context.

What I explained was a tips for a component/entity system, a task based one: concurrent systems, running in parallel, and communicating effectively between each other.  Fine-grained parallelism does not apply. I also mentioned the fact that being able to accept, and properly deal with out of date ‘incorrect’ data can lead to some novel opportunities for parallelism.

Not to say the methods I employ lead to perfect parallelism and scalability, I do believe they allow for implementing a level of concurrent operation in a cost-effective manner on current hardware, without the hidden quirks that locks can have creep up on your frame profile. I include ‘lock-free algorithms’ in that, as those can cause nasty contention issues. If you disagree, please comment, as I’m really interested in hearing other opinions and experiences on the matter.

First of all, what changes are those that really make concurrent execution hard, even if you allow for reading data which is possibly a frame out of date?

You can read the full article over at AltDevBlogADay.com.

 

 


Preparing for Parallelism

January 19, 2011

Don’t worry, just parallelize everything and have it run over all the cores.

Ouch. This is going to hurt. Invariably it does; when you end up having that really hard to find heap corruption issue, or when those odd pauses creep into the profiler when calling into that middleware. Parallel execution is tough at the best of times, and I think it is harder in gamecode where the notion of a game-frame is key. Something I have learned over the years looking at codebases is that many developers do very little groundwork in preparation for parallel operation. Laying some clear code foundations and keeping to a small set of rules can result in less pain encountered in the long term.  In this post, I’m going to share the main tips/rules I try to abide by when doing that job; something I find myself doing often to increase the prospect of allowing more parallelism.

These tips allow designing systems for parallel execution simpler, but it is still not easy street. You need deep knowledge of the systems you are working with, the data it touches, and above all understand what makes things inherently serial. Also you need to understand how you are going to achieve the parallelism; are you using multiple threads running over multiple jobs, running a large subsystem for the most part of a frame, or are you going to use the SPUs?

You can read the whole post over at altdevblogaday.com, where I am a contributor.


Clean code == Good logic

November 28, 2010

A short post here, after some discussions I’ve had on twitter over the past week or two. I graduated nearly 5 years ago, and since then have seen lots of production source code, mainly high-performance/games related – real-time it must to go fast stuff. Since only five years ago I was a graduate, It’s quite fresh-ish in my memory as to what sort of things were taught.

DESIGN PATTERNS!

Design patterns are something which have given me, and certainly many of my coworkers, a mountain of grief when it comes to getting performance from code. I can remember it being taught as some sort of rule book in University, something you can refer to when you have a certain logic issue, and then implement that pattern. Unfortunately this ideal exists now in many graduates, and therefore quite ingrained into current programming circles. In my opinion we have a bad habit: a design pattern copy-paste culture.

I’m not going to go into why some design patterns can cause havoc when you come to optimize; hell some may make it easier. It’s basically the lack of thought that now goes into implementations, meaning for example a generic pattern may be applied to a problem which really doesn’t need it. But this code is in the foundation of systems usually, the hardest part to modify given tight deadlines late on in a project.

Some may say that this runtime performance drop is a worthwhile price to pay for clean code; of course that is certainly not true in my line of work. For me the solving of a problem in a really efficient way for the hardware makes things nice, for others it’s hiding everything under a generic base, and having the source ‘look’ nice (whatever the bloody hell that means). Don’t get me wrong, bad implementations have always existed; but I think the way design patterns are being taught is not helping.

Really, this is lie #1 from Mike Actons GDC2010 “Three Big Lies in Game Development” : Software is not a platform.

One persons clean source code is someones elses nightmare, another’s ‘bad’ source being an abomination to others. The fact here though is that regardless of how the source ‘looks and feels’, the compiler can generate pretty good source code for both implementations, and can try to optimize things as much as possible – obviously to the extent which the logic of the solution allows.  The real problem here is that a solution is defined by the logic, something the compiler cannot modify.

So,

Whilst the compiler can generate ‘good’ machine code from ‘bad’ source code, it can not replace bad logic with good logic.

I think that about says it all, really. Before you go pasting that Design Pattern into your project, really think: Does this solve the problem in the best possible way for my hardware?


Migrating scalar Vec4 classes to SIMD

November 26, 2010

Something I have come up against at various times is when the Vector class (that is, a Vec3/4 etc, not a large storage container) is one implemented for a platform that supports SIMD, like altivec/vmx, but simply uses floats, aligned in a structure.

class Vector4 { public:  float x ALIGN(16), y, z, w;       /* insert operators, constructors, etc here */ };

Obviously this works just fine, however slow (even if you use vector intrinsics in your operator methods, the compiler will probably force the data to memory, amongst many other things). When the time comes where you actually want to move over to using real hardware-supported vectors, passing in vector registers, the biggest issue seems to be that pretty much your whole codebase now accesses data via .x, .y, .z and .w .

One way to get it working is to union your vector float data member with the four scalar components, which at first glance seems like it should work fine, but in my experience most compilers just don’t seem to produce even semi-optimal code, so that option is completely out.

Most implementations using vmx types will offer  .SetX(float) and .GetX() methods which use correct intrinsics as to be optimal, but when migrating code over, having to separate the accesses into distinct read and write operations and handle them using different methods can be very time consuming. Sadly, there will always be source changes, but I would like to keep it to a minimum, so changing .x to be .x() would be ok, but it needs to work for both reading and writing.

One method is to have .x() return a float reference to the data at that element of the vector, like &vec[1], blah blah. When I read this though it always worries me, as the statement involves addressof, which then implies memory. The compiler may be able to optimise as to not force anything into memory, but most times it doesn’t seem to be able to. This can always be our backup plan.

What I attempted was to make use of a temporary class returned by the x() method of the vector to indirectly modify a reference to our real vmx vector. Here is the basic code (just typed out, so sorry if it doesn’t compile-first-time on a copy and paste – and sorry for the bad source formatting throughout, I need to set up some sort of WordPress plugin for that – Thanks to spoofmeat in the comments for directing me to the correct sourcecode tag!

VMXVectorClass {
  vec_float4 m128;

  template <int element>
  class dScalar {
  public:
    vec_float4& v;
    dScalar(vec_float4& _v) : v(_v) {}

    inline operator float() const { 
      return vec_extract(v,element);
    }

    inline float operator=(float f) { 
      vec_insert(f,v,element);return f;
    }
  }

  inline dScalar<0> x() { return dScalar<0>(m128); }

  inline dScalar<1> y() { return dScalar<1>(m128); }

  inline dScalar<2> z() { return dScalar<2>(m128); }

  /*the rest of the vector goes here*/

} __attribute__(aligned16, vecreturn, passinvector, etc);

To my surprise, this worked very well in my simple test cases. The compiler optimised away the vec_float4 reference, everything passed in registers, good times. Obviously I need to look into this more to see if this is actually better than the simple returning of float-reference to vector in ‘memory’. One issue that I noticed immediately is you can’t use .x() into … argument calls, like printf. You must explicitly cast to float, which is obvious when you see what the code does – you don’t want the actual dScalar class itself passed into printf, you want the float.

I’d like feedback on what other folk have tried, and if they know of a good benchmark I could try this method on, especially interested to see if it is indeed any better than the simple float& x() version. I had a look at some of the code generated and it didn’t seem too different from .SetX and .GetX explicitly, but a run-time example to test would be best, one that mixed significant non-scalar usage of the vectors along with the accessors, to make sure things were not being forced out of registers.

An interesting side-effect from using a temporary class: if it did turn out you needed to then migrate further over to the .SetX and .GetX methods at a later date, you could use this class to identify the cases that needed .SetX by adding the deprecated attribute to the operator= method, and then just work through the warnings.


Edinburgh Game Dev Meet

November 24, 2010

It has finally happened. Edinburgh is soon to join the likes of Glasgow and Dundee who have BertWednesdays for their GameDev meetup banter: introducing @GameDevEd!

@moggy has set up the twitter account, and it also has a facebook page. The first meetup is on the 30th November at The Cumberland Bar on Cumberland Street, it is sure to be an awesome evening of fun filled drinks and gamedevvery.

See you there!


F5

November 23, 2010

After a little persuasion from @sheredom I have decided to relaunch this blog to post things which are a bit too long for a tweet, or deserve a longer discussion with some structured comments. Hopefully stuff will actually get posted to it, sadly every time I try to start a blog, within a few weeks, the content fizzles out to nothing.