GPU Instancer:Terminology

From GurBu Wiki
Revision as of 21:01, 4 December 2018 by GurBu Admin (talk | contribs) (Batching)
Jump to: navigation, search

About | Features | Getting Started | Terminology | Best Practices | F.A.Q.


In this page, you can find information about the concepts and terminology that is used by GPU Instancer. GPUI works out of the box with almost no required setup, but if you are not familiar with the concepts that are utilized by GPUI, reading these sections will help you get the most performance out of using GPUI by finding the right settings for your scenes.


GPU Instancing

GPU instancing is a graphics programming technique that you can use to improve the performance of your games. It basically works by rendering the objects that share the same mesh and material multiple times - while sending them to the GPU in only a single draw call. GPU instancing, therefore, is very helpful in cases where you want to render the same object many times in your scene. Such objects can be anything from buildings to asteroids; they can be grass or trees - or even billboards and impostors. GPU instancing is a relatively new technique that can be used as an alternative to the traditional batching techniques; and it is usually supported by all the modern graphics devices that come with a dedicated GPU.


On this point, please note that the power of the GPU becomes very important when determining how much performance gain one can get from GPU instancing. Basically, it scales for the better with a more powerful GPU; and on some devices (such as the integrated graphics units offered by e.g. Intel) the performance gain might not be very impressive.


Batching

Batching in general is the process of grouping together various mesh/material data to send to the GPU for rendering in every frame.


Without any kind of batching, to render a mesh with a given material on the screen, all the meshes are sent one by one to the GPU and rendered with their corresponding materials/shaders. Thus, each mesh/material combination issues a draw call without batching. This quickly becomes inefficient given the GPU is much faster than the CPU: without any batching, the GPU would be waiting for the CPU most of the time, and those waiting times would result in a lower FPS.


Unity, being a good game engine, offers two major methods of draw call batching: Dynamic Batching and Static Batching. Both of these techniques work on mainly on the CPU.


In Dynamic Batching, vertices of the meshes that share the same material are grouped together in the CPU and sent to the GPU in a single draw call. The main limitation in Dynamic Batching (as of this writing) is 900 vertex attributes. So if you have a mesh with vertex, normal and UV information (for textures), Unity can batch together every 300 vertex in a single draw call. There are even further limitations on this: the shader on the materials must use a single pass, the objects must be in the same transform scale, they should not receive real-time shadows, etc.


In Static Batching, the amount of vertices that can be grouped together is a lot more; you are limited to 64k vertices per batch (but less on mobile platforms). However, the catch with static batching is that any kind of movement is not allowed, and thus objects must be marked static beforehand in order to be statically batched. One further problem is that Static Batching uses a lot of memory when the object counts are high, although it doesn't use as much CPU power as Dynamic Batching. For example, marking trees as static in a scene with a dense forest can have serious impact on the CPU memory.


Furthermore, since both of these batching techniques are based on the CPU, they share the power of the CPU with other running game scripts.

Instancing

Limited instancing, maximum batch counts.

Instancing vs Indirect Instancing

DrawMeshInstanced vs DrawMeshInstancedIndirect

Compute Shaders

Frustum Culling

GPU Based Frustum Culling

Without any culling, the graphics card would render everything in the scene; whether the rendered objects are actually visible or not. Since the more geometry the graphics card renders, the slower it will be rendering them - various techniques are used not to render the unnecessary geometry. "Culling" is an umbrella term that is used when some objects are decided not to be rendered based on a rule.


Frustum culling is a culling technique that checks the camera's frustum planes and culls the objects that do not fall into these planes. These objects are culled because they are not visible by the camera.


GPU Instancer does all the camera frustum testing and the required culling operations with compute shaders in the GPU before rendering the instances. Because of this, the culling operations are both faster and they do not take any time in the CPU allowing for more room to game scripts in the CPU.

Please note that GPUI currently uses only the camera that is defined on a manager to do frustum culling. If the manager is auto-detecting the camera, this will be the first camera in the scene with the MainCamera tag. What this means is that frustum culling will currently work only for this camera. If your scene requires multiple cameras rendering at the same time, you can turn off the frustum culling feature from the manager. If you are using multiple cameras but not rendering with them at the same time, you can simply call SetCamera(camera) method from the GPUInstancerAPI after you switch your camera and GPUI will start handling all its operations on this new camera.

Occlusion Culling

GPU Based Occlusion Culling

In most games, the camera is dynamic - it captures the game world from different angles. So for example in one camera position, the grass behind a house can be visible - and yet in another one it may not. However, with only frustum culling, the grass will be rendered as long as it stays inside the camera frustum - even though it may actually be hidden behind the house. This happens because objects are drawn from farthest first to closest last in Unity (and 3D graphics in general). The closer ones are thus drawn on top of the farther ones. This is referred to as overdraw.


Occlusion Culling is a technique that targets this concern. It lightens the load on the GPU by not rendering the objects that are hidden behind other geometry (and therefore are not actually visible).


As opposed to frustum culling, occlusion culling does not happen automatically in Unity. For an overview of the Unity's default occlusion culling solution, you can take a look at this link:

https://docs.unity3d.com/Manual/OcclusionCulling.html


This solution has two major drawbacks: (a) that you need to prepare (bake) your scene's occlusion data before using it and (b) that you need your occluding geometry to be static to be able to do so. That effectively means that objects that move during playtime cannot be used for occlusion culling.


Occlusion Culling without Baking

GPUI comes with a GPU based occlusion culling solution that aims to remove these limitations while increasing performance. The algorithm used for this is known as the Hierarchical Z-Buffer (Hi-Z) Occlusion Culling. In effect, GPUI allows for its defined prototypes' instances to be culled by any occluding geometry, and without baking any occlusion maps or preparing your scene.


Just like its frustum culling solution, GPUI does the required testing and culling operations with compute shaders in the GPU before rendering the instances. And again, because of this, the culling operations are both faster and they do not take any time in the CPU allowing for more room to game scripts in the CPU.


However, please note that the Hi-Z occlusion culling solution introduces additional operations in the compute shaders. Although GPUI is optimized to handle these operations efficiently and fast, it would still create unnecessary overhang in scenes where the game world is setup such that there is no gain from occlusion culling. A good example of this would be top-down cameras where almost everything is always visible and there are no obvious occluding objects. For more information on this point, please check the occlusion culling section in the Best Practices Page.


Please also note that, just as with frustum culling, GPUI currently uses only the camera that is defined on a manager to do occlusion culling. If the manager is auto-detecting the camera, this will be the first camera in the scene with the MainCamera tag. What this means is that occlusion culling will currently work only for this camera. If your scene requires multiple cameras rendering at the same time, you can turn off the occlusion culling feature from the manager. If you are using multiple cameras but not rendering with them at the same time, you can simply call SetCamera(camera) method from the GPUInstancerAPI after you switch your camera and GPUI will start handling all its operations on this new camera.


Here is a video showcasing GPUI's occlusion culling feature in action:


How does Hi-Z Occlusion Culling work?

In occlusion culling (in general), the most important idea is to never cull visible objects. After this, the second-most important idea is to cull fast. GPUI's occlusion culling algorithm makes the camera generate a depth texture and uses this to make culling decisions in the compute shaders. Culling is decided based on the bounding boxes of the instances: GPUI compares the instance bounds with the depth texture with respect to both their sizes and their depths. If the object instance is completely occluded by a part of the texture that has a closer value, then it is culled.


GPUI implements this technique by adding a command buffer to the main camera, and dispatching its depth information to a compute shader. The compute shader then calculates visibility as mentioned above and relays the visible instances to be rendered. Notice that this method does not involve a readback in the CPU, so that all the culling operations can be executed in the GPU before rendering.


There are various advantages of this; the main ones being that it works out of the box without the need to bake occlusion maps - and you can use culling with dynamic geometry that you don't even have to define as occluders. Furthermore, it is extremely fast given all the operations are executed in the GPU. However, the culling accuracy is ultimately limited by the precision of the depth buffer. On this point, GPUI analyzes the depth texture (by its mip levels), and decides how accurate it can be without compromising performance and culling actually visible instances.


As you can see in the images below, the depth texture is a grayscale representation of the camera view where white is close to the camera and black is far away.

Camera View
Depth Texture View

As the distance between the occluder and the instances become shorter with respect to the distance from the camera, their depth representations come closer to the same color:


Hi-Z Occlusion Culling-3.png


In short, given the precision of the depth buffer, GPUI makes the best choice to cull instances for better performance - but also without any chance to cull any visible objects.