Multithreaded Renderloop - 2

Download Alpha and Omega v2.
Binaries as Win32 Installer (3 MB)
Download Alpha and Omega v2.
Source as Visual Studio 2010 Solution (28 MB)

Hi there! Nice of you to keep reading all those words coming out of my fingers. Lets continue our previous post on a deterministic multithreaded renderloop, yay! The last post ended with the explanation and implementation of an actual deterministic multithreaded render loop, but the render state was still too simplistic for a decent demonstration. If we keep track of an actual state of the world and create usable render states we quickly discover that it is more difficult than we'd like it to be. Before we dive in the abyss that is called "game engine development", lets start with a quick recap:

  • The world state is the whole state we need. It contains all objects and properties we'd like to have in our update-thread and render-thread.
  • The render state is a subset of the world state. It contains all objects and properties that the render-thread needs to render a frame (assuming all resources are loaded to the memory of the video card). We always have 3 of these so we can apply safe and efficient multithreading.
  • The render-thread has at most 1 of the 3 render-states in use and it has read-only access to it.
  • The update-thread has at most 2 of the 3 render states in use. The 1 that is updating now (where he has read-and-write access) and the render state he updated previously (read-only access).
  • We only need a mutex when we are switching states which is a very quick operation. During rendering no mutex-checking needs to be done. The update-thread also doesn't need any mutexes, unless we decide to divide the update-thread into multiple threads.
There are two additional advantages to having an deterministic rendering loop I didn't thought of when writing Part 1.
  1. It becomes a lot simpler to create replay-files because we only have to record stuff like "at update-frame #X, User Y performed action Z". We don't have to be afraid that small perturbations in algorithms will cause the replay to deviate from the actual played game.
  2. It is easier to to add online multiplayer support. As long as we ensure that update-message get received correctly, we are able to ensure that the world state for each client runs synchronized (although a bit delayed) with the server.

Word state = Render state ∪ Rest state ?

The biggest issue is that we have 1 world state and 3 render states, so we need to split our world state in two parts:

  • The render state.
  • The rest. Ugh, we don't have a nice word for this. rest state?
With their powers combined they form world state. If you are a fellow software engineer you probably hate this situation as much as I do. There is just no nice functioning object-oriented model to represent our complete world-state and still keep a separation between the two. Before we start thinking about render state and rest state lets just create an example world state so we don't have to worry about splitting it just yet. Let me first state that there isn't a one-solution-fits-all achievement that we may get if we take our time and think things through thoroughly t... (sorry can't think of other words starting with a 't'). I'll just design it with my game in mind and hopefully it becomes usable to me in other future projects.

So what should the world state look like? I have a good idea on how to come up with a world state that allows me to do most the things I want, but explaining 'why' I've chosen this particular representation will be very difficult. Most of the design-decisions are based on experience in previous projects and it may seem silly if you don't understand my reasoning.

World state

Okay here goes my first attempt of an example, followed by a quick Q&A afterwards.

Note all the strange things in this model, to name a few: No hierarchical model of the world, separation of 'static' (which might not be completely static) and 'dynamic' world. We have even separated the 2D overlay in opaque and transparent, what is that all about? We also only show the objects, what about each object's properties? Why aren't you talking about how to manage and load 3D models and textures on the video-card? Why do you have such a weird shaped head? How do you split this model into a render state and a rest state? Which thread is taking input from my game-pad that moves the player?

So many questions. And what was that about a weird shaped head, I mean, like, seriously? That hurts you know. Ah well, you have to be somewhat of a masochist to try and create a decent 3D game without the use of an engine. Okay let me try to answer a couple of questions you might have before we zoom in on the objects to see what properties we may expect.

  • Why don't  you use an hierarchical model of the world?
    An hierarchical model of the world isn't all that great. Discussing this topic alone is worth its own post (or two) and I am not willing to do that. Instead I'll just redirect you to a post from 0 FPS that explains it far better than I ever could. If we were to use a ray-tracer to render a voxel world (which I won't) and like to have a sparse oct tree, even then I would be doubtful on using oct trees.
  • What's up with the separation of static world and dynamic world?
    We actually had two reasons for doing this.  The static world will contain blocks with static landscape of opaque polygons. When rendering a scene you'd usually want to render the opaque polygons front-to-back so more pixels you try to draw are actually hidden by pixels you have drawn previously. Its not as simple as I might make it seem. A lot of things are interconnected again. For example there exists something as 'Z-Culling' (= Don't draw the pixel if it is hidden by another pixel, implemented using the z-buffer) and 'Early-Z' (= Prevent the use of fragment shader if pixels is hidden by another pixel, often combined by z-checking on the GPU-chip before using the actual z-buffer). Both Z-culling and early-Z  are things you could 'simply' enable assuming you don't use a fancy shader that modifies z-values (and off course only if the video card supports it). Even if we don't use early-Z we still want to render our opaque polygons front-to-back so the video card does not have to overwrite back-buffer and z-buffer fragments as much. Confused? You should be. This stuff isn't easy. The second reason I have is in the use of level-of-detail algorithms. It would be too naive to think that a single level-of-detail algorithm suits all our needs. Reducing polygons of landscape segments or ocean segments are two completely different algorithms and we didn't even mention other stuff like grass, trees or creatures roaming the wild. (There are other reasons for the separation, like algorithms for creating shadows, but I don't want to talk about that right now.)
    • So wait, you don't allow the use of transparent parts in your 'static' world?
      Really good question! That's right, no transparent parts in the static world. If we like to add transparent parts to the world (glass or ice maybe?) then we need to add those things to the dynamic world. Transparent parts need a different method of being drawn because can't depend on the z-buffer for those polygons.
    • So does static mean that we can't change the world? What if we want to create an explosion that makes a crater?
      The static world exists under the assumption that it does not change that much over time. A small change in the static world would mean that at least one of the polygon-chunks needs to be recreated. Video cards are better at drawing big chunks of polygons at a time (instead of lots of small chunks) so we like to encourage big chunks for increased rendering speed. On the other hand we'd like to use stuff like frustum-culling on chunks and like our world to be dynamic wherever possible. It's like finding the golden ratio without a calculator!
    • So the reason for separating the opaque and transparent 2D overlay is also done to increase rendering speed?
      Yes indeed. We want to draw the opaque 2D overlay as one of the first things (that will also update the z-buffer) so it hides a lot of pixels from the 3D scene but we need to draw the transparent 2D overlay as one of the last things. If we want to draw a fully opaque rectangle we could even create a simple 3D area on which we can apply culling techniques (similar to frustum-culling), but this is probably overkill and not worth my time.
  • Why don't you put the free roaming objects and the dynamic world together?
    The dynamic world is still bound by absolute positions. It may be the case that leaves are dancing to the wind or that the ocean creates waves, but there is no actual physics involved when these objects interact with each other. All the complex collision detection, physics, AI, ... needs to take place between free roaming objects and the rest of the world.
  • Why do you have such a weird head and what about all the other question I asked?
    Seriously, the head-thing again? Don't worry about those other questions, they will be answered in due time... probably.
  • Are you ever serious?

Lets us continue moving forward, the only sensible direction for us right now.

Separation of Render State(s) and Rest State

When we split the world state in a render state (three of them actually) and rest state then it'll look something like this:

Not all world-state-nodes needs to have a render-state-node (e.g. node #4), because not all nodes need to be rendered. Maybe node #4 keeps track of some AI-related stuff and is only used by the update thread. When the update-thread is finished and jumps to Render State #2 it'll probably look like this:

The render state will now be busy to update Render State #2 and the Rest State, using some of the information stored in Render State #1. However, it is possible that a render-state-state-node has been added or removed, so the update-thread may not blindly assume that all nodes in render-state #2 are available. The update-thread in the example above has been lucky enough that the render-state did not change but what if it did? What if the update-thread decided to remove node #5 and adds node #7 as child of node #6? Then the situation looks like this:

During the update process the update-thread needs to make Render State #2 the same as Render State #1 (and possibly make additional changes)! There are several ways to go about solving this issue, but I think it is important to keep all this complexity from the actual update-function. We don't want to worry about the possible non-existence of render-state-nodes while calculating physics! Have I already mentioned that we keep track of parent nodes and that order is important? Hah, just when you thought things were still reasonable.

So how will we fix old render states to have the structure we expected? There are several ways of approaching this but I have chosen the way where each operation that modifies the render-state will become an update-message. Before the update-procedure has its way with the render-state we will parse all update-messages so the render-state structure becomes up-to-date. So basically when node #5 gets deleted at render state #1 it will immediately gets deleted. For render state #2 and #3 an update message gets added to the todo-queues: "Delete render state at position X". The same happens when adding node #7, it will be created immediately for render state #1 and for render state #2 and #3 the update message "Add render-node of type X at position Y" will be added to the todo-queue.

There is only one last simple issue we need to address. We need a good way to encode the position of where to add/remove a node from the render-tree. We may not blindly use references to nodes in the rest state, because the node may no longer exist.

If the update thread decides to remove node #7 directly after he added it then the message "Add render-node for <ref. to rest-node #7>" for render state #3 becomes invalid. I'm implementing this on a managed language, C#, so it is not 'that' big of a deal, but it is still bad practice to ignore this because we may want to dispose the object before it gets deleted by the garbage collector. Again, there are multiple ways of solving this, but in my opinion the best way is to dispose of a rest-state-node only after all associated render-state-nodes are disposed. This implementation still allows the code to be ported to unmanaged programming languages, so yay :). For technical details I'll redirect you to the source distribution, because it would probably be boring to show here.

The first thing he needs to do is to parse all items in the todo-queue so the structure of the render-tree is up to date. Only after this is done will we perform the normal update-procedure. Basically this should be working, but the data structure is not as nice as I'd hoped it to be. We now have 3 render-state-trees and 1 rest-state-tree completely separated from each other. For the update-thread this is not a great situation, because this means that we need to parse 3 trees simultaneously (the current rest-state, the current render-state and the previous render-state). To somewhat remidate this annoying situation we'll give each renderable rest-state-node references to the 3 associated render-state-nodes. This connects the rest state to the render-states. We have to be really careful with these reference because all those states have become a web of references and maintaining it during all tree operations may get complicated.

That took a lot of effort, don't you agree? All this hard work is totally worth it! We can now render and update the world in two separate threads without having to worry about thread-safe access to objects. The two threads can run separately from each other and never really have to wait, we only have a little bit of mutex-checking when one of the threads wants to change the state. Better yet, almost all this multithreading is hidden from you when you're making the game WHOHOOW. So are we done? Sadly no :(

Stuttering, The Problem

In the sce, sce, scenario where we set the update-FPS to 120, while the render speed is stuck at 60FPS you might expect something like this to happen in chronological order:

  • Render frame #0
  • Update to frame #1
  • Update to frame #2
  • Render frame #2
  • Update to frame #3
  • Update to frame #4
  • Render frame #4
  • ...

Because the render thread is twice as slow as the update thread we expect that the render thread to skip a frame. We do not synchronize the threads, so he might not always skip exactly 1 frame. Sometimes he may skip 2 frames, or sometimes he doesn't need to skip at all. If you expect a behavior where the update thread almost always skips 1 frame then you are sadly mistaken :(. Yeah, I'm sad too. Before we'll explain let's see the behavior that you are more likely to get:

  • Render frame #0
  • Update to frame #1
  • Update to frame #2
  • Update to frame #3
  • Render frame #3
  • Render frame #3
  • Update to frame #4
  • ...

Wow, that is quiet messed up right? No wondering that the game is stuttering! This has to do with thread-switching. The operating system decides which thread currently gets some execution-time on what core of your CPU. You may have hundreds of threads running on your quad-core processor, so each thread will be given some time to do his thing. So even if we visualize threads as processes that run completely seperatly from each other, this is probably not the case. Your OS is deciding when your thread may perform a couple of operations. Even if it may not look like it, threads do 'wait' for each other. It is not possible to tell the operating system when and when not to switch, but we are able to give a couple of 'hints. One of the possible hints we may give is "Processor affinity".

Proccessor affinity:
Processor affinity is a property you can set on a thread that tells the operating system on what processor(-core) of your CPU's the thread may run. So it is possible to create a thread that will only get executed on processor-core #3 of your CPU. For C# check out this MSDN page to see how this is done in code. Usually you don't mess around with these properties, but if you are a tweaker then you probably already know that you can also set process-affinity using the task manager. You may be shocked to hear this, but if you are programming for the XBOX360 then you must set process affinity on each thread you create! For XNA you'll have access to Thread.SetProcessorAffinity to configure on what CPU your thread will run. It is actually not 'that' shocking, they probably have specialized cores for certain tasks instead of your run-of-the-mill cores in your desktop computer. I'm currently not that focused on development for the XBOX and messing around with processor affinity just so solve the stuttering problem is probably not a good idea. So lets continue to the 'sleep' function.

It is possible to let a thread sleep for a couple of milliseconds, for C# this is done using the Thread.Sleep(...) function. This will actually execute a windows API call to let windows know that this thread is inactive for a while. This way windows knows not to give this thread a couple of cycles on a CPU when it is thread-switching, so the CPU can be busy computing stuff for other threads. When calling this function windows also knows that it is better to immediately switch to another thread, because this thread will not be busy for a while. When calling Thread.Sleep(0), you are telling Windows that you want to sleep 0 milliseconds. You may expect that this function returns immediately  but in reality windows will force a thread-switch. It may be the case that the same thread get a couple of cycles, but it may as well be another thread. This is not some obscure undocumented feature that exists as a side effect of 'sleep', it is actually well documented and a really nice feature to have.

A forced thread switch using Sleep(0) is probably a good idea. This is the first 'fix' that we'll make. After the update thread is done performing a 'step' and when the rendering thread is done rendering we'll always make sure that we'll sleep, even if it is 0 millisecond! This may help a little bit, but it won't solve the underlying problem. The problem is caused by a lack of thread synchronization. We don't want the traditional form of thread synchronization (because we don't want threads waiting for each other) so what are we able to do?

Stuttering, Solution #1: Partial Thread Synchronization

It is possible to do some thread synchronization without creating nasty situations where threads need to wait for each other. If the update thread runs at 120 FPS and the render thread at 60 FPS, then certainly under no circumstance should the render thread draw the same frame twice! If it was reversed (update-thread at 60 FPS and render thread at 120 FPS), then under no circumstance should we allow the update-thread start updating before the render-thread is starting to draw the last updated state. This is implemented using two ManualResetEventSlim's and are used as follows:

  • if desired update fps <= current render fps, prevent frame skipping
  • if desired render fps <= current update fps, prevent frame duplication
If we prevent both frame skipping and frame duplication then the two threads are completely synchronized. Note that even in this scenario it is still fully multithreaded because the update thread is still one frame ahead of the render thread. Fully synchronized threads are desirable if (and only if) the render-thread and update-thread are almost equal in speed. The condition above is a little bit strict, because this will only happen when the all FPS-values are exactly the same and because we are talking about estimated averages here, that's very unlikely. To improve the chances on full synchronization we'll state that it is okay to slow down a thread up to ~15%, if this helps to improve synchronization. The two conditions have now become:
  • if desired update fps <= current render fps * 1.15, prevent frame skipping
  • if desired render fps <= current update fps * 1.15, prevent frame duplication

So even if the CPU is able to keep up, the game may actually run a little bit slower than desired to keep 'smoothness'. This may be undesirable for online multiplayer or games that depend on a very stable game speed, so hopefully I'll remember this setting when I get to the point where I'm actually creating a game :). Lets continue to a better solution.

Stuttering, Solution #2: Motion Extrapolation by Render Thread

If the render thread has access to the variables that are used to create motion (momentum-vector, rotation matrix, ...), then we'll be able to extrapolation the probable position of an object. We do not apply things such as physics or AI, we'll just extrapolate the expected position of an object in the render thread. This is actually not as difficult as it might seem. This solution is actually really great! Even if we render at a whopping 1000 FPS, every frame is able to render a new frame, not just a frame that we have seen before. We'll just use simple linear extrapolation. Even if the extrapolation turned out to be wrong (because of a missed collision response for example), the shown intersection will only last for a fraction of a second. We could even use input from the game-pad mouse or keyboard for extrapolation! This will increase the observed response-time of the game, so what's not to like?

Binaries and Source

The binaries and source distribution of this version contains both of these methods (download links at top of this page). I've added a couple of bouncing 'balls' so it is possible to see whether the game is stuttering. It is very likely that you'll see stuttering when you disable both thread synchronization and extrapolation. Ahhhh and everything is still deterministic... so awesome...

*Update: The next post is about deterministic behavior, but we've also written part 3 and part 4 if you like to continue reading about the multithreaded renderloop.

Multithreaded Renderloop - 1Deterministic Behavior
  1. Jan says:

    Hi Alex.

    I'd like to thank you again for putting all this information on the web giving everyone the opportunity to learn the techniques you applied to implement this tripple-state rendering mechanism.

    I am currently (or still) looking into different ways of achieving the often contradicting goals of creating game engine(s) that are as simple as possible, as flexible as possible and runs as smootly as possible.

    I like your approach very much, and from reading all the information you put here I was actually expecting this to be fulfilling all these goals. Of course you can always argue that this is not *really* a simple solution, but I find the concepts to be fairly simple. However, I tried downloading and running the demo on this site ("Alpha and Omega v2.") and I was a bit surprised to see that it does not in fact run as smoothly as I had hoped. Maybe my expectations were too high? Or I'm doing something wrong?

    I set the simulation to run with the following parameters: Render FPS at 60, no fake load, update FPS at 60, thread sync allowed, and extrapolation allowed. There are three apparent issues that annoy me when watching this simulation:
    1) I see a tailing shadow on the faster moving objects
    2) With irregular intervals (1-5 seconds in between) I see a slight, almost invisible, stutter which probably wouldn't disturb gameplay but its enough to annoy my critical eye
    3) Whenever (especially faster moving) objects change direction I see a slight deviation in the animation of the object caused by the extrapolation giving the appearance of the object moving beyond the edge, then changing its mind, moving slightly back and performing the bounce correctly.

    The appearance in point 3 is obviously also what is in fact happening, but I was hoping that this event would be invisible to the eye. It is, however, not :-(

    Could you please enlighten me here? Am I having too high expectations, or can these issues be handled as well?

    I hope you find the time to reply sometime.

    Thank you again for your efforts on this site.

    Best regards,

    Jan (which is a boys name in Denmark :-) )

  2. Jan says:

    Just a minor correction to my observations. The issue I mentioned in 3) does NOT exist when the rendering and update FPS are both at 60 as I wrote. Is is, however, an issue when the render FPS is significantly higher than the update FPS (eg. Update:30, Render:60).

    This is obviously because the extrapolation comes more into play, but my question is still: can we avoid this visual stutter or is that just the price we pay for using extrapolation?

    Best regards,


    • eierkoek eierkoek says:

      Hi Jan,

      1) Trailing shadow
      This effect is known as Troxler's fading (see:'s_fading ), and happens regardless of the method used to render the objects. I could have prevented it a little if I had chosen more sensible colors :)
      2) Stuttering
      The stuttering effect you perceived is most likely due to a bug in the code that still existed in v2. I expect that most of your issues are solved in the latest version, v2., which can be downloaded at . I found this bug when I created the page about extrapolation (see: ). This is unfortunately the result of my way of writing these pages, I write them simultaneously with implementing it resulting some bugs to still be present.
      The same demo is still available in this new version (press 'a' for archive, click om "Simple Bouncing Smilies"). The FPS values and fake load can still be modified with the Debug-Form (Ctrl+D -> Threads tab)
      3) Errors in extrapolation
      The lower the update rate, the more visible the extrapolation errors will be. Personally I'm not able to see the extrapolation errors at 30 frames per second, but when I go as low as 10 they are clearly visible. At a low update-rate it becomes a trade-off; Do we want smooth animation with occasional motion-artifacts, or do we want continuous stuttering when rendering frames.

      - Alex (also a boys name in the Netherlands, not in every country though)

      • Jan says:

        Hi Alex,

        Thank you for your reply.

        1) I had heard of Troxler's fading when used to create optical illusions, but I never made the connection between that phenomenon and the trailing shadows seen when creating fast animations. Thanks for pointing that out. I guess there is no way to get rid of that.

        2) I am not sure that the occasional, hardly perceivable stuttering is related to any bug. I still see it in your later version, and I also see it in any demos I made myself, regardless if I'm using a basic java canvas or opengl surface for rendering.

        3) Well, you're right of course that the extrapolation errors are a fine tradeoff for the continuous stuttering, and I also agree that at 30 updates per second it's only rarely visible (it requires a sufficiently high speed of the object, and an angle close to 45 degrees when rebounding).

        I tried making a simulation using your extrapolation (in a single threaded setup to better isolate what I'm investigating) and then I made the same simulation using interpolation instead (between the current and the next state). This had the advantage that rebounds were always smooth as opposed to the occasional more abrupt rebound with extrapolation. However, the resulting sort of curved rebound was annoying as well.

        Since neither extrapolation nor interpolation gives you a perfect result, and the negative effects of both are invisible above 25 updates-per-second or so, I guess I would prefer to use extrapolation since that is easier to implement.

      • eierkoek eierkoek says:

        Hi Jan,

        I've done some tests to see if there was anything that my engine does that adds stuttering to the final result. I Googled "Micro Stuttering" and found some interesting pages. Most of the pages are about micro-stuttering when using SLI, but the way they created test results was nice to see.
        They used FRAPS (the video-recording software) and used the benchmark option to generated csv-files. These csv files can be read by a spreadsheet program, or better yet, FRAFS (see: ). I have done the same on the latest version (during the bouncing smilies) and came up with these results:

        Those peaks above 16ms are very concerning, so after some research I found explorer.exe was the culprit (or one of my installed programs that modifies its behavior). After killing explorer.exe I got these results:

        That's a lot better! But still, with an FPS of 1000+ it is kind of weird that there are some minor peaks around 20ms. Wondering whether it was something my engine does or that it is 'normal', I created an empty XNA project and removed the default frame-limiter that would restrict it to 60FPS. The results are:

        The results are the same. I still wonder why there are minor peaks in these results. I need to do some more tests with a C++ program and possibly also with OpenGL, but I predict that it is either considered normal, or that my laptop's video-card is not fancy enough for these kind of tests.

  3. Jan says:

    Hi Alex,

    I'm pleased to hear that you experience this as well. I was starting to think that I was cursed with some kind of weird visual version of tinnitus.

    Of course you are right. When doing any tests to explore unwanted side effects I should always remove all foreseeable sources. I'm just so used to having 20+ windows open that I didn't consider the fact that one of them could be causing the micro-stuttering I was observing. I don't use Internet Explorer (except for downloading other browsers :-) ), but any other running application could be responsible for disturbances in an otherwise smooth flow.

    So this time I closed all other applications and processes I could and ran your bouncing head application on my system. I can't tell if there was any less stuttering, but I do know that there is still some. So I followed your advice and used Fraps to document it. The results confirm my observations, but I don't feel any closer to figuring out the reason for it.

    I ran Fraps for 30 seconds each test. It seems that whether I ran at full or limited render speed the stuttering came in fairly regular intervals with approximately 2.5 seconds in between.

    Update 30 FPS, Render unlimited (~4500 FPS):

    Update 30 FPS, Render 120 FPS:

    I have to try this with other applications were I don't notice this stuttering (which seems to be most other applications) and see if a similar pattern can be found using Fraps.

    Maybe it is normal but only noticeable in such test applications that are featuring simple predictable movement patterns. I'm starting to hope this, so I can get on with more interesting things than unexplainable microstuttering issues :-)

    Thanks again for your time and efforts.

  4. Jan says:

    So... now I'm officially confused...

    I just tried running the Indexed Geometry tests you provided links for and the Fraps results showed none of the regular deviations I saw using the bouncing heads simulation.




    The OpenGL version seems to be vsynced on my computer. But your examples were showing max framerates. I don't know why this is the case. However, they still showed no relevant deviations in framerates.

    Then I tried running the test I previously created in Java using basic Swing (though hardware accelerated) canvas, since I had the same visible stuttering there. And I did see the fairly regular stuttering as expected. However, the Fraps report showed no relevant spikes at all (except the very first frame, which for some reason always is 400-500 ms).


    So now I don't know anything any more, except that I can now dismiss the suspicion that the stuttering is caused by the spikes in the Fraps report.

    Maybe it's just an unavoidable visual effect that is only noticeable when objects are moving at a certain pace in a predictable path? Or maybe it's another trick of the mind? Or maybe it's all in my head like all the voices are? At least that's what the voices are telling me...