GPU Instancer Pro:BestPractices

From GurBu Wiki
Jump to: navigation, search

About GPU Instance Pro | Getting Started | Terminology | Best Practices | API Documentation | F.A.Q. | Support


GPU Instancer provides an out of the box solution for improving performance and it comes with a very easy to use interface to get started. Although GPUI internally makes use of quite complicated optimization techniques, it is designed to make it very easy to use these techniques in your project without going through their extensive learning curves (or development times). However, by following certain rules of thumb, you can use these techniques as intended and get the most out of GPUI.


In this page, you will find best practices for using GPUI that will help you get better performance.



Instance Counts

The rule of thumb to follow here is to have only the prefabs that have high instance counts rendering with GPUI, while minimizing the amount of the defined GPUI prototypes on the managers as much as possible.

50k instances in the AsteroidDemo Scene


When using the Prefab Manager in your scene, the best practice is to add the distinctively repeating prefabs in the scene to the manager as prototypes. For example, in the included AsteroidDemo scene, you can see that there are three asteroid prototypes. When the scene generates these asteroids around the planet, the resulting instance counts are around 16.5k for each. GPUI will draw each of these prototypes in a single draw call - however, since the asteroid prefabs are using LOD Groups on them with three LOD levels on each, the result is 9 draw calls (3 x 3) for all of these asteroids. Please note that the AsteroidHazeQuad prefab (which is basically a quad with a custom shader on it; it makes the scene look dynamic and foggy) is not added to the manager as a prototype. Thus, the idea is that the asteroids (with a lot more instance counts than the haze quads) will gain a lot more from instancing than the haze quads. Notice that the planet and the sun are also not defined as prototypes since there is only one of each in the scene and therefore they would not gain anything from instancing.


Thus, using GPUI for prototypes with very low instance counts is not recommended. GPUI uses a single draw call for every mesh/material combination, and does all culling operations in the GPU. While these operations are very fast and cost efficient, it is unnecessary to use GPU resources if the instance counts are too low and the performance gain from instancing them will not be noticeable.


Furthermore, since prefabs with low instance counts will not gain a noticeable performance boost from GPU Instancing, it is usually better to let Unity handle their rendering. Unity uses draw call batching techniques on the background (such as dynamic batching). These techniques depend on the CPU to run and tax their operations on the CPU memory. When there are many instances of the same prefabs, these operations turn out to be too costly and the reduction in batch counts dwarf in comparison to GPU Instancing. This is where GPUI shines the most. But where the instance counts are noticeably low, the cost on the CPU when using these techniques becomes trivial - yet they will still reduce batch counts and thus draw calls. While using GPU instancing, on the other hand, since meshes are not combined, every mesh/material combination will always be one draw call. Please also note that there is no magic number here to use as a minimum instance count since it depends so much on the poly-counts of the meshes, how they are distributed in the scene etc.


In short, GPU Instancing will help you the most when there are many instances of the same prototype, and it is not recommended to add prototypes with low instance counts.


A Single Container Prefab vs Many prefabs

One important thing to notice about instance counts is related to the structure of the prefabs. If you have a prefab that hosts many GameObjects that you created for organizational reasons, Unity does not identify the GameObjects that are actually the same under it as instances of the same object. What this means is that when you add such a prefab to the Prefab Manager, GPUI has no way of knowing that the some of the GameObjects under this prefab can actually be instanced in the same draw call. This results in creating a draw call for each mesh/material combination under this prefab and therefore beats the purpose of GPU Instancing.


To exemplify this, think of a prefab which represents a building. Maybe you have windows under this building prefab as children which use the same mesh and materials. Maybe you also have smaller building blocks, doors, tables, etc. that share the same mesh and materials. If you have this huge prefab, however, from the perspective of Unity (and therefore GPUI), you effectively have a prefab with as many meshes and materials in it that as the total sum of meshes and materials it hosts. So if you add this prefab to GPUI as a prototype, you will also see as many draw calls.


Instead, the better way of having this building in your scene (at least as far as GPUI is concerned) is to have a host building GameObject - where all its child windows, doors, tables, etc. are instances of their own prefabs. When you add these prefabs as prototypes to GPUI, you will see that all the windows, doors, etc. will share a single draw call instead.


Using Nested Prefabs, you can have all your windows, doors, etc. as prefabs and still save the building as a container prefab. GPUI supports Nested Prefabs, so if you create a prototype from the building prefab, the Prefab Manager will recognize the children as prototypes as well.


Nevertheless, this is one of the most important issues concerning prototypes and instance counts; and it is usually one of the biggest pitfalls while using GPU Instancer.


Nested Prefabs

The Nested Prefabs example Scene view

Using Nested Prefabs (introduced in Unity 2018.3) are a great way to organize your scenes. When using GPUI, furthermore, the Nested Prefabs support could be used to define prototypes that are shared between different prefabs. This would be an ideal strategy for scenes that contain modular prefabs: If you have lots of different modules that make up your main game objects, defining these modules as prototypes to GPUI would make GPUI render those modules with a single draw call for each module. Thus with nested prefabs, you can have your main (container) game objects as prefabs that contain these prefab modules and use these to design your scenes.


Let's consider an example: We have a table, and a chair. Both the table and chair have legs, and we use the same mesh/material combination for these legs:

Without Nested Prefabs

Let's first consider what it would be like without using Nested Prefabs. We would have only two prefabs for the Table and the Chair (the legs won't be nested as prefabs under these, so we don't have a leg prefab):

Table and Chair as normal (not nested) prefabs


Notice in this screenshot that Table and Chair are showing blue (as prefabs) and legs are grey (they are not prefabs). Now in this scenario, we would define the Table prefab as a GPUI prototype to the manager, and also the Chair prefab as another. GPUI would then analyse its mesh renderers, and find that Table contains 5 renderers (1 table top and 4 legs), Chair contains 6 renderers (2 seat objects and 4 legs). Since GPUI issues a draw call for each renderer of a prototype, this would mean that we have 5 draw calls for all the tables and 6 draw calls for all the chairs in the scene; a total of 11 draw calls.

Defined Prototypes in the Manager

These would be 11 draw calls whether it is 1 table and 1 chair, or 10k tables and chairs. This is good, but it could be better if we could tell GPUI that all the legs in the scene use the same mesh/material combination so it could render all of them in a single draw call. This is where Nested Prefabs come in handy.

With Nested Prefabs

Now let's consider what it would be like with Nested Prefabs. We have the ability to use prefabs inside prefabs. We can create prefabs of our repeating modules (such as the legs and seat pieces) and nest these under the main prefabs (such as the table and chair). Thus we can have one Leg prefab, a Table prefab (that also contains the Leg prefab) and a Chair prefab:

Leg Prefab, and Table and Chair as Nested Prefabs


Notice that the Table and Chair are showing blue as before (they are prefabs) but the Legs are also showing blue this time (they are prefabs, too). Now in this scenario, we have the possibility to define the Leg prefab as a separate GPUI prototype to the manager. This will help us notify GPUI that it can draw the legs in a single draw call. We also add the Table prefab as a prototype, and the Seat Piece as a prototype too. Now when GPUI will analyse these prototypes, it will consider all the legs in the scene as instances of the Leg prototype and ignore the legs in the Table prototype.

Defined Prototypes in the Manager

Notice we did not define the Chair as a prototype. This is because we defined the Legs as a prototype, the Seat Pieces as another prototype and there are no other renderers left in the Chair prefab for GPUI to consider as yet another prototype. If we tried adding the Chair having already added Leg and Seat Piece as prototypes, GPUI would throw an error saying it could not find any Mesh Renderers inside this prefab (since it will be ignoring all the children defined as other prototypes).


Having defined our prototypes like this, we now have 1 draw call for Table instances (for the table top), 1 draw call for all the Legs and 1 draw call for the Seat Pieces in the scene: a total amount of 3 draw calls. This is a lot better than the 11 draw calls we had when not using Nested Prefabs above.


In the screenshots below, you can see a comparison between the draw calls when using nested prefabs and when not using them. Each color is a draw call in this picture.

Draw calls without Nested Prefabs
Draw calls with Nested Prefabs


Using Occlusion Culling

The rule of thumb to follow here is that you should turn occlusion culling off if your scenes do not have a sensible amount of occluders or if the mesh geometry is too little in tri-counts.


The occlusion culling solution that GPUI implements is extremely easy to use: you literally don't have to do anything to use this feature. You do not need to bake any maps, to add additional scripts nor use Layers. Furthermore since it works in the GPU, it also is extremely fast. As such, you might be tempted to use this feature even where you probably won't need it.


However, please note that the Hi-Z occlusion culling solution introduces additional operations in the compute shaders. Although GPUI is optimized to handle these operations efficiently and fast, it would still create unnecessary overhang in scenes where the game world is setup such that there is no gain from occlusion culling. A good example of this would be strategy games with top-down cameras where almost everything is always visible and there are no obvious occluders.


GPUI makes it possible to use extreme numbers of objects in your scenes. And in higher numbers, the cost of testing for occlusion can be higher than desired in the GPU if the scene is not designed in such a way that this cost of testing is compensated by the average amount of geometry that is culled. In these scenarios, you will get a better average in performance boost out of GPUI without occlusion culling than having it on.


Also, there may be cases where the instanced geometry is so low in tri-counts that you could be getting more out of instancing them anyway rather than testing for occlusion culling. Typical case scenarios for this would be low-poly style or mobile games where instance counts are not extreme. If the graphics card can render the excess geometry faster than it would calculate whether it should cull them, then it would mean that GPU based occlusion culling is doing more harm there than good. The best way to test for this is experimenting by running your scene with and without occlusion culling on and comparing the results.


In short, it is recommended to enable Occlusion Culling in scenes where there will be obvious advantage from GPUI's occlusion culling feature. Examples of this are elevations that slightly block your view in a terrain, walls/buildings that a player walks in front, etc. Or, if your game is so that there will never be enough occluders (e.g. a top-down strategy) - or if your prototypes' mesh geometry is too low so that culling will not be worth the testing - than it is recommended to turn occlusion culling off on the GPUICamera component.


No-GameObject Workflow

The idea behind a no-game object workflow is that even the bare-bones existence of a typical GameObject in your scene is effecting the performance. GameObjects are usually necessary for various reasons - be it you need to use colliders, or some scripts on your objects, or simply instantiate a prefab in your scenes. However, as much as GameObjects are optimized in Unity, not having them at all while still being able to render their meshes/materials would give you the best performance if all you need for them is to be seen in the camera.

GPUI makes this possible by allowing access to its core rendering system from its API. There are 3 main API methods that can be used for no-GameObject workflow.

First one is GPUICoreAPI.RegisterRenderer, which defines the GPUI renderers from the given prefab and outputs a rendererKey which will be needed for contoling the rendering for this renderer.

Second one is GPUICoreAPI.SetRenderParams, which creates or updates the GPU buffer with the given Matrix4x4 array. This is used to update the Matrix4x4 data for the instances, when you want to move, rotate or scale the objects. It can also be used for add or remove operations if there was enough allocated memory during initialization.

Third one is GPUICoreAPI.DisposeRenderer, which can be used to stop the rendering process and release the allocated memory for the renderer.

It is important to note that the SetRenderParams method will work slower the first time or when you increase the size of the buffer because of the new memory allocations. For example if you want to Add/Remove Objects, the best way to do this is to start with a big enough array that can hold your maximum number of instances, and only change the size when required (e.g. when the allocated memory is not enough anymore or it is too big and you want to free up some GPU memory). The extra indexes you have on the array (which are not currently used) can be set to Matrix4x4.zero. These will be discarded by GPUI's compute shaders automatically and will not be processed for rendering.


Here is an example usage of this:

  1. using UnityEngine;
  2. namespace GPUInstancerPro.Example
  3. {
  4.     [ExecuteInEditMode]
  5.     public class SimpleNoGODrawer : MonoBehaviour
  6.     {
  7.         public GameObject prefab;
  8.         public int instanceCount = 1000;
  9.         private int _rendererKey;
  10.         public void OnEnable()
  11.         {
  12.             if (prefab == null) return;
  13.             GPUICoreAPI.RegisterRenderer(this, prefab, out _rendererKey); // Register the prefab as renderer
  14.             GPUICoreAPI.SetRenderParams(_rendererKey, GenerateMatrixArray()); // Set matrices for the renderer
  15.         }
  16.         public void OnDisable()
  17.         {
  18.             if (_rendererKey != 0)
  19.                 GPUICoreAPI.DisposeRenderer(_rendererKey); // Clear renderer data
  20.         }
  21.         public Matrix4x4[] GenerateMatrixArray()
  22.         {
  23.             Matrix4x4[] matrix4X4s = new Matrix4x4[instanceCount];
  24.             for (int i = 0; i < instanceCount; i++)
  25.                 matrix4X4s[i] = Matrix4x4.TRS(Random.insideUnitSphere * 10, Random.rotation, Vector3.one);
  26.             return matrix4X4s;
  27.         }
  28.     }
  29. }


Material Variations

GPU Instancing functions by rendering a single mesh and material combination multiple times on the screen. This is why GPU Instancer creates its prototypes from Prefab definitions, utilizing the mesh and material information for each renderer the prefab contains and issuing draw calls for them separately. Consequently, by default, there are no variations for the materials of the prefab instances. However, GPU Instancer Pro provides a solution for incorporating material variations through the GPU Instancer API or by using the Prefab Manager Material Variations.

Prefab Manager Material Variations

The Prefab Manager offers tools to assist in creating material property variations for prefab instances, including color or texture UV variations. It provides user-friendly tools for creating shaders with variation support, along with components on prefab instances for easy property adjustments directly from the Inspector window.

Material Variation Definition

Prefab Manager Material Variations

When you press the +Create button under the Material Variations section, the Prefab Manager will generate a Material Variation Definition asset. This asset is utilized to determine which material on the prefab will have variations and which properties on this material will be changed. After selecting the material and properties, it can also assist in creating a new shader with support for material variations.

Alternatively, you can create a Material Variation Definition asset by right clicking on the Project window and selecting Create->Rendering->GPU Instancer Pro->Material Variation Definition.

Follow the steps below to set up the Material Variation Definition:

Material Variation Definition
  1. Select the Material that will have its properties changed per instance.
  2. Set a unique Buffer Name for the StructuredBuffer that will be used in the shader.
  3. Click on the Add Property button.
  4. From the drop drop-down on the left, select the material property you wish to change. If you wish to change a custom variable defined inside the shader, select <Custom> instead and write the name of the variable.
  5. From the drop-down menu on the right, select the property type such as Vector4, Color, Integer, or Float.
  6. Click on the Generate Shader button to create a new shader with material variations support based on the settings. For Shader Graph, this button will generate a Sub Graph instead, which can be used as a replacement for the GPUI Pro Setup node. After adding this Sub Graph to the shader, you can manually set this shader to the "Variation Shader" field. (Please note that Generate Shader function is designed based on commonly used shaders. Some shaders might be using different practices and it might not work out of the box. If you encounter an issue, please report it by following the Support Guide.)
  7. After setting up the shader, if you make changes to the settings on the Material Variation Definition such as Buffer Name or Variation Properties, click on the Generate Include File button for the changes to take effect.

Material Variation Instance

Material Variation Instance

When you create a Material Variation for a prefab in the Prefab Manager, the Material Variation Instance component will be added to the prefab. Once you select the Variation Properties on the Material Variation Definition, these properties can be individually edited for each prefab instance in your scenes, allowing you to give them different looks.


No-GameObjects Material Variations

To setup instance based variations, you basically need to create a GraphicsBuffer, register this buffer to the GPUI renderer as a material property overwrite and modify your shader to accept this buffer. The contents of the buffer (variations) can then be updated during runtime using the SetData method. Therefore, the setup has a C# scripting part and a shader scripting part.


1. On the MonoBehaviour Script:

Here you can create a buffer and set this buffer to the GPUI renderer as a material property override using the AddMaterialPropertyOverride API method. See the No-GameObject Workflow if you are not familiar with it.


  1. using UnityEngine;
  2.  
  3. namespace GPUInstancerPro.Example
  4. {
  5.     [ExecuteInEditMode]
  6.     public class SimpleNoGODrawerWithColorVariation : MonoBehaviour
  7.     {
  8.         public GameObject prefab;
  9.         public int instanceCount = 1000;
  10.         private int _rendererKey;
  11.         private GraphicsBuffer _colorBuffer;
  12.  
  13.         public void OnEnable()
  14.         {
  15.             if (prefab == null) return;
  16.             GPUICoreAPI.RegisterRenderer(this, prefab, out _rendererKey); // Register the prefab as renderer
  17.             GPUICoreAPI.SetRenderParams(_rendererKey, GenerateMatrixArray()); // Set matrices for the renderer
  18.             GPUICoreAPI.AddMaterialPropertyOverride(_rendererKey, "colorBuffer", GenerateColorBuffer()); // Set color buffer to the renderers' materials
  19.         }
  20.  
  21.         public void OnDisable()
  22.         {
  23.             if (_rendererKey != 0)
  24.                 GPUICoreAPI.DisposeRenderer(_rendererKey); // Clear renderer data
  25.             if (_colorBuffer != null)
  26.             {
  27.                 _colorBuffer.Dispose();
  28.                 _colorBuffer = null;
  29.             }
  30.         }
  31.  
  32.         public Matrix4x4[] GenerateMatrixArray()
  33.         {
  34.             Matrix4x4[] matrix4X4s = new Matrix4x4[instanceCount];
  35.             for (int i = 0; i < instanceCount; i++)
  36.                 matrix4X4s[i] = Matrix4x4.TRS(Random.insideUnitSphere * 10, Random.rotation, Vector3.one);
  37.             return matrix4X4s;
  38.         }
  39.  
  40.         private GraphicsBuffer GenerateColorBuffer()
  41.         {
  42.             if (_colorBuffer != null)
  43.                 _colorBuffer.Dispose();
  44.             Color[] colors = new Color[instanceCount];
  45.             for (int i = 0; i < instanceCount; i++)
  46.                 colors[i] = Random.ColorHSV();
  47.             _colorBuffer = new GraphicsBuffer(GraphicsBuffer.Target.Structured, instanceCount, System.Runtime.InteropServices.Marshal.SizeOf(typeof(Color)));
  48.             _colorBuffer.SetData(colors);
  49.             return _colorBuffer;
  50.         }
  51.     }
  52. }


2. On the Material's Shader:


For this setup to work, you also need the shader that the material will use to recognize the buffer you have defined above. This can be achieved by defining a StructuredBuffer property in the shader with exactly the same name as you defined the buffer in the MonoBehaviour script.

You can utilize the Material Variation Definition asset to aid in modifying the shader. For manual shader editing, please refer to the following instructions.


You can define the structured buffer among the shader properties as such:

  1.         #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED       
  2.             StructuredBuffer<float4> colorBuffer;
  3.         #endif
  4.  


And then use it as you wish by referencing the instance by using gpui_InstanceID. This is variable stores the instance index on the variation buffer. The following example, in line with the color variations example, modifies the Albedo of the surface shader with the variations buffer. In this example we set the color property in the vertex function (colorVariationVert) and pass it to the surface function (surf) to modify the final color:


The input struct:

  1.  
  2.         struct Input {
  3.             float2 uv_MainTex;
  4.             float4 colorVariation;
  5.         };
  6.  


We use the variation buffer to modify the colorVariation property of the input struct in the vertex function:

  1.  
  2.          void colorVariationVert (inout appdata_full v, out Input o) {
  3.              UNITY_INITIALIZE_OUTPUT(Input, o);
  4.              o.colorVariation = _Color;
  5.              #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED
  6.                 
  7.                 o.colorVariation = colorBuffer[gpui_InstanceID];
  8.             #endif
  9.          }
  10.  


And use that colorVariation property to modify the final Albedo in the surface function:

  1.  
  2.         void surf (Input IN, inout SurfaceOutputStandard o) {
  3.             // Albedo comes from a texture tinted by color
  4.             fixed4 c = tex2D (_MainTex, IN.uv_MainTex) * saturate(IN.colorVariation);
  5.             o.Albedo = c.rgb;
  6.             // Metallic and smoothness come from slider variables
  7.             o.Metallic = _Metallic;
  8.             o.Smoothness = _Glossiness;
  9.             o.Alpha = c.a;
  10.         }
  11.  


The important part here is getting the variation buffer index of the instance by accessing the GPUI instance id with the gpui_InstanceID. And using that index to access the variation in the buffer:

                o.colorVariation = colorBuffer[gpui_InstanceID];


You can find an example shader and the corresponding demo scene included in the package under GPUInstancerPro/Demos/Core/TutorialScenes/ColorVariations