|Download Alpha and Omega v220.127.116.11.0|
Binaries as Win32 Installer (3 MB)
|Download Alpha and Omega v18.104.22.168.0|
Source as Visual Studio 2010 Solution (28 MB)
Hi there! Nice of you to keep reading all those words coming out of my fingers. Lets continue our previous post on a deterministic multithreaded renderloop, yay! The last post ended with the explanation and implementation of an actual deterministic multithreaded render loop, but the render state was still too simplistic for a decent demonstration. If we keep track of an actual state of the world and create usable render states we quickly discover that it is more difficult than we'd like it to be. Before we dive in the abyss that is called "game engine development", lets start with a quick recap:
The biggest issue is that we have 1 world state and 3 render states, so we need to split our world state in two parts:
So what should the world state look like? I have a good idea on how to come up with a world state that allows me to do most the things I want, but explaining 'why' I've chosen this particular representation will be very difficult. Most of the design-decisions are based on experience in previous projects and it may seem silly if you don't understand my reasoning.
Okay here goes my first attempt of an example, followed by a quick Q&A afterwards.
Note all the strange things in this model, to name a few: No hierarchical model of the world, separation of 'static' (which might not be completely static) and 'dynamic' world. We have even separated the 2D overlay in opaque and transparent, what is that all about? We also only show the objects, what about each object's properties? Why aren't you talking about how to manage and load 3D models and textures on the video-card? Why do you have such a weird shaped head? How do you split this model into a render state and a rest state? Which thread is taking input from my game-pad that moves the player?
So many questions. And what was that about a weird shaped head, I mean, like, seriously? That hurts you know. Ah well, you have to be somewhat of a masochist to try and create a decent 3D game without the use of an engine. Okay let me try to answer a couple of questions you might have before we zoom in on the objects to see what properties we may expect.
Lets us continue moving forward, the only sensible direction for us right now.
When we split the world state in a render state (three of them actually) and rest state then it'll look something like this:
Not all world-state-nodes needs to have a render-state-node (e.g. node #4), because not all nodes need to be rendered. Maybe node #4 keeps track of some AI-related stuff and is only used by the update thread. When the update-thread is finished and jumps to Render State #2 it'll probably look like this:
The render state will now be busy to update Render State #2 and the Rest State, using some of the information stored in Render State #1. However, it is possible that a render-state-state-node has been added or removed, so the update-thread may not blindly assume that all nodes in render-state #2 are available. The update-thread in the example above has been lucky enough that the render-state did not change but what if it did? What if the update-thread decided to remove node #5 and adds node #7 as child of node #6? Then the situation looks like this:
During the update process the update-thread needs to make Render State #2 the same as Render State #1 (and possibly make additional changes)! There are several ways to go about solving this issue, but I think it is important to keep all this complexity from the actual update-function. We don't want to worry about the possible non-existence of render-state-nodes while calculating physics! Have I already mentioned that we keep track of parent nodes and that order is important? Hah, just when you thought things were still reasonable.
So how will we fix old render states to have the structure we expected? There are several ways of approaching this but I have chosen the way where each operation that modifies the render-state will become an update-message. Before the update-procedure has its way with the render-state we will parse all update-messages so the render-state structure becomes up-to-date. So basically when node #5 gets deleted at render state #1 it will immediately gets deleted. For render state #2 and #3 an update message gets added to the todo-queues: "Delete render state at position X". The same happens when adding node #7, it will be created immediately for render state #1 and for render state #2 and #3 the update message "Add render-node of type X at position Y" will be added to the todo-queue.
There is only one last simple issue we need to address. We need a good way to encode the position of where to add/remove a node from the render-tree. We may not blindly use references to nodes in the rest state, because the node may no longer exist.
If the update thread decides to remove node #7 directly after he added it then the message "Add render-node for <ref. to rest-node #7>" for render state #3 becomes invalid. I'm implementing this on a managed language, C#, so it is not 'that' big of a deal, but it is still bad practice to ignore this because we may want to dispose the object before it gets deleted by the garbage collector. Again, there are multiple ways of solving this, but in my opinion the best way is to dispose of a rest-state-node only after all associated render-state-nodes are disposed. This implementation still allows the code to be ported to unmanaged programming languages, so yay :). For technical details I'll redirect you to the source distribution, because it would probably be boring to show here.
The first thing he needs to do is to parse all items in the todo-queue so the structure of the render-tree is up to date. Only after this is done will we perform the normal update-procedure. Basically this should be working, but the data structure is not as nice as I'd hoped it to be. We now have 3 render-state-trees and 1 rest-state-tree completely separated from each other. For the update-thread this is not a great situation, because this means that we need to parse 3 trees simultaneously (the current rest-state, the current render-state and the previous render-state). To somewhat remidate this annoying situation we'll give each renderable rest-state-node references to the 3 associated render-state-nodes. This connects the rest state to the render-states. We have to be really careful with these reference because all those states have become a web of references and maintaining it during all tree operations may get complicated.
That took a lot of effort, don't you agree? All this hard work is totally worth it! We can now render and update the world in two separate threads without having to worry about thread-safe access to objects. The two threads can run separately from each other and never really have to wait, we only have a little bit of mutex-checking when one of the threads wants to change the state. Better yet, almost all this multithreading is hidden from you when you're making the game WHOHOOW. So are we done? Sadly no
In the sce, sce, scenario where we set the update-FPS to 120, while the render speed is stuck at 60FPS you might expect something like this to happen in chronological order:
Because the render thread is twice as slow as the update thread we expect that the render thread to skip a frame. We do not synchronize the threads, so he might not always skip exactly 1 frame. Sometimes he may skip 2 frames, or sometimes he doesn't need to skip at all. If you expect a behavior where the update thread almost always skips 1 frame then you are sadly mistaken :(. Yeah, I'm sad too. Before we'll explain let's see the behavior that you are more likely to get:
Wow, that is quiet messed up right? No wondering that the game is stuttering! This has to do with thread-switching. The operating system decides which thread currently gets some execution-time on what core of your CPU. You may have hundreds of threads running on your quad-core processor, so each thread will be given some time to do his thing. So even if we visualize threads as processes that run completely seperatly from each other, this is probably not the case. Your OS is deciding when your thread may perform a couple of operations. Even if it may not look like it, threads do 'wait' for each other. It is not possible to tell the operating system when and when not to switch, but we are able to give a couple of 'hints. One of the possible hints we may give is "Processor affinity".
Processor affinity is a property you can set on a thread that tells the operating system on what processor(-core) of your CPU's the thread may run. So it is possible to create a thread that will only get executed on processor-core #3 of your CPU. For C# check out this MSDN page to see how this is done in code. Usually you don't mess around with these properties, but if you are a tweaker then you probably already know that you can also set process-affinity using the task manager. You may be shocked to hear this, but if you are programming for the XBOX360 then you must set process affinity on each thread you create! For XNA you'll have access to Thread.SetProcessorAffinity to configure on what CPU your thread will run. It is actually not 'that' shocking, they probably have specialized cores for certain tasks instead of your run-of-the-mill cores in your desktop computer. I'm currently not that focused on development for the XBOX and messing around with processor affinity just so solve the stuttering problem is probably not a good idea. So lets continue to the 'sleep' function.
It is possible to let a thread sleep for a couple of milliseconds, for C# this is done using the Thread.Sleep(...) function. This will actually execute a windows API call to let windows know that this thread is inactive for a while. This way windows knows not to give this thread a couple of cycles on a CPU when it is thread-switching, so the CPU can be busy computing stuff for other threads. When calling this function windows also knows that it is better to immediately switch to another thread, because this thread will not be busy for a while. When calling Thread.Sleep(0), you are telling Windows that you want to sleep 0 milliseconds. You may expect that this function returns immediately but in reality windows will force a thread-switch. It may be the case that the same thread get a couple of cycles, but it may as well be another thread. This is not some obscure undocumented feature that exists as a side effect of 'sleep', it is actually well documented and a really nice feature to have.
A forced thread switch using Sleep(0) is probably a good idea. This is the first 'fix' that we'll make. After the update thread is done performing a 'step' and when the rendering thread is done rendering we'll always make sure that we'll sleep, even if it is 0 millisecond! This may help a little bit, but it won't solve the underlying problem. The problem is caused by a lack of thread synchronization. We don't want the traditional form of thread synchronization (because we don't want threads waiting for each other) so what are we able to do?
It is possible to do some thread synchronization without creating nasty situations where threads need to wait for each other. If the update thread runs at 120 FPS and the render thread at 60 FPS, then certainly under no circumstance should the render thread draw the same frame twice! If it was reversed (update-thread at 60 FPS and render thread at 120 FPS), then under no circumstance should we allow the update-thread start updating before the render-thread is starting to draw the last updated state. This is implemented using two ManualResetEventSlim's and are used as follows:
So even if the CPU is able to keep up, the game may actually run a little bit slower than desired to keep 'smoothness'. This may be undesirable for online multiplayer or games that depend on a very stable game speed, so hopefully I'll remember this setting when I get to the point where I'm actually creating a game :). Lets continue to a better solution.
If the render thread has access to the variables that are used to create motion (momentum-vector, rotation matrix, ...), then we'll be able to extrapolation the probable position of an object. We do not apply things such as physics or AI, we'll just extrapolate the expected position of an object in the render thread. This is actually not as difficult as it might seem. This solution is actually really great! Even if we render at a whopping 1000 FPS, every frame is able to render a new frame, not just a frame that we have seen before. We'll just use simple linear extrapolation. Even if the extrapolation turned out to be wrong (because of a missed collision response for example), the shown intersection will only last for a fraction of a second. We could even use input from the game-pad mouse or keyboard for extrapolation! This will increase the observed response-time of the game, so what's not to like?
The binaries and source distribution of this version contains both of these methods (download links at top of this page). I've added a couple of bouncing 'balls' so it is possible to see whether the game is stuttering. It is very likely that you'll see stuttering when you disable both thread synchronization and extrapolation. Ahhhh and everything is still deterministic... so awesome...
|Multithreaded Renderloop - 1||Deterministic Behavior|