GPU Instancer provides an out of the box solution for improving performance and it comes with a very easy to use interface to get started. Although GPUI internally makes use of quite complicated optimization techniques, it is designed to make it very easy to use these techniques in your project without going through their extensive learning curves (or development times). However, by following certain rules of thumb, you can use these techniques as intended and get the most out of GPUI.
In this page, you will find best practices for using GPUI that will help you get better performance.
- 1 Instance Counts
- 2 Using Occlusion Culling
- 3 Using GPUI for Unity Terrain Details
- 4 Using a No-GameObject Workflow where Possible
The More Instance Counts, the Better
The rule of thumb to follow here is to have only the prefabs that have high instance counts rendering with GPUI, while minimizing the amount of the defined GPUI prototypes on the managers as much as possible.
When using the Prefab Manager in your scene, the best practice is to add the distinctively repeating prefabs in the scene to the manager as prototypes. For example, in the included PrefabInstancingDemo scene, you can see that there are three asteroid prototypes. When the scene generates these asteroids around the planet, the resulting instance counts are around 16.5k for each. GPUI will draw each of these prototypes in a single draw call - however, since the asteroid prefabs are using LOD Groups on them with three LOD levels on each, the result is 9 draw calls (3 x 3) for all of these asteroids. Please also note that the AsteroidHazeQuad prefab (which is basically a quad with a custom shader on it; it makes the scene look dynamic and foggy) is also added to the manager as a prototype even though it has 23 instances only. Thus, the idea is not like a lower limit on the instance counts, but that the asteroids (with a lot more instance counts than the haze quads) will gain a lot more from instancing than the haze quads. Notice that the planet and the sun are not defined as prototypes since there is only one of each in the scene and therefore they would not gain anything from instancing.
Thus, using GPUI for prototypes with very low instance counts is not recommended. GPUI uses a single draw call for every mesh/material combination, and does all culling operations in the GPU. While these operations are very fast and cost efficient, it is unnecessary to use GPU resources if the instance counts are too low and the performance gain from instancing them will not be noticeable.
Furthermore, since prefabs with low instance counts will not gain a noticeable performance boost from GPU Instancing, it is usually better to let Unity handle their rendering. Unity uses draw call batching techniques on the background (such as dynamic batching). These techniques depend on the CPU to run and tax their operations on the CPU memory. When there are many instances of the same prefabs, these operations turn out to be too costly and the reduction in batch counts dwarf in comparison to GPU Instancing. This is where GPUI shines the most. But where the instance counts are noticeably low, the cost on the CPU when using these techniques becomes trivial - yet they will still reduce batch counts and thus draw calls. While using GPU instancing, on the other hand, since meshes are not combined, every mesh/material combination will always be one draw call. Please also note that there is no magic number here to use as a minimum instance count since it depends so much on the poly-counts of the meshes, how they are distributed in the scene etc.
In short, GPU Instancing will help you the most when there are many instances of the same prototype, and it is not recommended to add prototypes with low instance counts. The only exception to this is where the instance counts get to extreme numbers such as millions. See the Detail Instancing section below for more on this.
Scene Prefab Importer
GPUI comes with a Scene Prefab Importer tool that can help you with easily managing this. You can access this from:
Tools -> GPU Instancer -> Show Scene Prefab Importer
- The Scene Prefab Importer tool will show you all the GameObjects in the open scene that are instances of a Prefab - and their instance counts in the scene.
- You can use the Select Min. Instance Count slider and button to select prefabs with a given minimum amount.
- You can then Import Selected Prefabs and GPUI will create a Prefab Manager in your scene with selected prefabs as prototypes.
Registered Prefabs in the Prefab Manager
When your Prefabs are defined in the Prefab Manager, the manager will show you the instance counts that are registered on it.
- If you add any new prefab instances in your scene (or remove any), the instance counts must be registered again in the manager.
- You can simply click the Register Prefabs in Scene button to update the instance counts in the manager.
- If you add or remove instances at runtime, you can either manually register them from the GPU Instancer API, or if you setup your prototypes to do this, GPUI will automatically handle this for you.
A Single Container Prefab vs Many prefabs
One important thing to notice about instance counts is related to the structure of the prefabs. If you have a prefab that hosts many GameObjects that you created for organizational reasons, Unity does not identify the GameObjects that are actually the same under it as instances of the same object. What this means is that when you add such a prefab to the Prefab Manager, GPUI has no way of knowing that the some of the GameObjects under this prefab can actually be instanced in the same draw call. This results in creating a draw call for each mesh/material combination under this prefab and therefore beats the purpose of GPU Instancing.
To exemplify this, think of a prefab which represents a building. Maybe you have windows under this building prefab as children which use the same mesh and materials. Maybe you also have smaller building blocks, doors, tables, etc. that share the same mesh and materials. If you have this huge prefab, however, from the perspective of Unity (and therefore GPUI), you effectively have a prefab with as many meshes and materials in it that as the total sum of meshes and materials it hosts. So if you add this prefab to GPUI as a prototype, you will also see as many draw calls.
Instead, the better way of having this building in your scene (at least as far as GPUI is concerned) is to have a host building GameObject - where all its child windows, doors, tables, etc. are instances of their own prefabs. When you add these prefabs as prototypes to GPUI, you will see that all the windows, doors, etc. will share a single draw call instead.
Please note, however, that this issue is not the case starting with the Nested Prefabs support in Unity 2018.3 and later. Using Nested Prefabs, you can have all your windows, doors, etc. as prefabs and still save the building as a container prefab. GPUI supports Nested Prefabs, so if you create a prototype from the building prefab, the Prefab Manager will recognize the children as prototypes as well.
Nevertheless, this is one of the most important issues concerning prototypes and instance counts; and it is usually one of the biggest pitfalls while using GPU Instancer.
Using Occlusion Culling
The rule of thumb to follow here is that you should turn occlusion culling off if your scenes do not have a sensible amount of occluders or if the mesh geometry is too little in tri-counts.
The occlusion culling solution that GPUI implements is extremely easy to use: you literally don't have to do anything to use this feature. You do not need to bake any maps, to add additional scripts nor use Layers. Furthermore since it works in the GPU, it also is extremely fast. As such, you might be tempted to use this feature even where you probably won't need it.
However, please note that the Hi-Z occlusion culling solution introduces additional operations in the compute shaders. Although GPUI is optimized to handle these operations efficiently and fast, it would still create unnecessary overhang in scenes where the game world is setup such that there is no gain from occlusion culling. A good example of this would be strategy games with top-down cameras where almost everything is always visible and there are no obvious occluders.
GPUI makes it possible to use extreme numbers of objects in your scenes. And in higher numbers, the cost of testing for occlusion can be higher than desired in the GPU if the scene is not designed in such a way that this cost of testing is compensated by the average amount of geometry that is culled. In these scenarios, you will get a better average in performance boost out of GPUI without occlusion culling than having it on.
Also, there may be cases where the instanced geometry is so low in tri-counts that you could be getting more out of instancing them anyway rather than testing for occlusion culling. Typical case scenarios for this would be low-poly style or mobile games where instance counts are not extreme. If the graphics card can render the excess geometry faster than it would calculate whether it should cull them, then it would mean that GPU based occlusion culling is doing more harm there than good. The best way to test for this is experimenting by running your scene with and without occlusion culling on and comparing the results.
In short, it is recommended to design your scenes in such a way that there will be obvious culling that GPUI's occlusion culling will take advantage of. Examples of this are elevations that slightly block your view in a terrain, walls/buildings that a player walks in front, etc. Or, if your game is so that there will never be enough occluders (e.g. a top-down strategy) - or if your prototypes' mesh geometry is too low so that culling will not be worth the testing - than it is recommended to turn occlusion culling off.
Using GPUI for Unity Terrain Details
The rule of thumb here is to aim to balance the visual quality you introduce to the terrain details with the performance you can get out of the GPUI Detail Manager. The Detail Manager will not always be faster than the default Unity terrain.
When using the Detail Manager, one thing to keep in mind is that most of the options in the Detail Manager interface serve the purpose of increasing visual quality over that of the default Unity detail shader. You also don't have to do any extra work for this visual upgrade - GPUI simply takes your Unity terrain and makes it better looking just by adding the Detail Manager. The default settings on the manager introduce shadows for your details and turns the texture prototypes into cross-quads. The foliage shader also introduces further quality upgrades by adding ambient occlusion, color gradienting, wind wave tints, and much more.
GPUI uses instancing techniques for backing this up performance-wise, and the result is always a stable FPS with no spikes. However, in some cases, the FPS would result in a number that is lower than the Unity default especially if you are using features that don't exist in the Unity terrain details. The expected result when increasing visual quality in the Detail Manager, therefore, is not that it would always be faster than the Unity terrain, but that it is fast while still using these features.
The most impacting of the visual upgrades that the Detail Manager introduces are shadows, and it gets heavier with the amount of Shadow Cascades that are defined on your quality settings. One thing that can be done to improve performance while using shadows is that, if you have multiple detail prototypes on the manager, you can selectively use shadows on them. Also, if you are using a third party Screen Space Ambient Occlusion effect, this usually has the same effect with having shadows on your grass so you can turn the shadows off for all grass prototypes to get a similar looking terrain with better performance.
Also, depending on your settings, the cross-quadding option effectively doubles, triples or quadruples the amount of geometry you have for texture details in comparison with the Unity details. This doesn't always result in the expected visual improvement, however. If your grass is distributed on the terrain densely, you may not get too much difference quality-wise while using two quads or four for them.
Prefab Details Instead of Textures
On this point, it is also worth noting that the Detail Manager supports using prefabs as detail prototypes as well. If you use tufts of grass as prefabs (as shown in the picture on the left) you will be saving a lot on the instance counts. Although it is true that GPUI works better with higher instance counts, when using grass details on the terrain, the instance counts can get way to extreme - such as 1 or 2 million instances. At such extreme numbers, the visibility operations on the compute shaders can start to become taxing on the GPU. By using a tuft prefab as such, you can cut the instance counts down a lot. For example if you have a tuft consisting of 10 separate cross-quads, you will achieve the same effect with 100k (1.000.000 / 10) instances as opposed to where you did it with automated cross-quads from default texture terrain details with 1.000.000 instances. The only difference here is the way you distribute your grass, yet there will be better performance gain. It is only because GPUI aims to work on your terrain by staying loyal to how it already looks, it does not do this automatically (i.e. convert your texture details to combined cross quads (tufts) automatically).
Please have it in mind that you can use the GPUI foliage shader on your prefabs as well.
Occlusion Culling with Terrain Elevations
GPUI also offers culling solutions that help increase the FPS, but they require you to design your scenes with having them in mind. For example the occlusion culling feature will help you get better results in terrains where you have hills that block your view. Not necessarily giant hills, but slight elevations to hide some instances as in the included DetailInstancingDemo scene.
In scenes with wide plains where nothing is culled, the occlusion culling system effectively becomes obsolete - if you wish to have scenes like this, it is recommended to cut back on mainly the shadows and the cross-quadding options and the maximum detail distance to see better performance.
However, as you can see in the image to the left, slight terrain elevations help immensely with detail culling in combination with the occlusion culling feature.
Also, please have in mind that GPUI lightens the CPU load by moving all rendering related things to the GPU. If you test only the terrain with a GPUI Detail Manager, you are always testing the GPU since there are no scripts running at all. In a real game scenario, you will be using your CPU for other things, and GPUI's instancing solution would be more apparent in the FPS.
In short, using the Detail Manager effectively requires a bit of thought in the design process. You need to optimize the quality settings on the manager according to the terrain you have. The final FPS would always scale for the better directly with a better graphics card since it's all done in the GPU, but since you're adding features that don't exist in the Unity Terrain, it would not always be faster then the default terrain. On this point, the recommended action to take would be to create different quality levels in your game and use different settings for these. For an example of this usage, you can take a look at the included DetailInstancingDemo scene.
Using a No-GameObject Workflow where Possible
This is an advanced topic, and currently a no-game object workflow in GPUI is only available through the GPU Instancer API.
The idea behind a no-game object workflow is that even the bare-bones existence of a typical GameObject in your scene is effecting the performance. GameObjects are usually necessary for various reasons - be it you need to use colliders, or some scripts on your objects, or simply instantiate a prefab in your scenes. However, as much as GameObjects are optimized in Unity, not having them at all while still being able to render their meshes/materials would give you the best performance if all you need for them is to be seen in the camera.
GPUI makes this possible by allowing access to its core rendering system from its API. There are 2 main API methods that can be used for no-GameObject workflow.
First one is Initializewithmatrix4x4array, which mainly allocates the required memory in the GPU and sets the data from the given Matrix4x4 array. This is enough to start the rendering process for the given objects.
Second one is Updatevisibilitybufferwithmatrix4x4array, which updates the GPU memory with the given Matrix4x4 array. This is used to update the matrix data of the objects, when you want to move, rotate or scale the objects. It can also be used for add or remove operations if there was enough allocated memory during initialization.
The first method where the initialization happens works slower because it sets up GPU memory for the indirect instancing. The second one works much faster, because it only updates a portion of GPU memory. For example if you want to Add/Remove Objects, the best way to do this is to start with a big enough array that can hold your maximum number of instances, and only reinitialize when required (e.g. when the allocated memory is not enough anymore or it is too big and you want to free up some GPU memory). The extra indexes you have on the array (which are not used) can be set to Matrix4x4.zero. These will be discarded by GPUI's compute shaders automatically and will not be processed for rendering.
Here is an example usage of this: