Trees Of Daggerfall [PERFORMANCE PROBLEM/SOLUTIONS]

Discuss modding questions and implementation details.
Post Reply
User avatar
MasonFace
Posts: 543
Joined: Tue Nov 27, 2018 7:28 pm
Location: Tennessee, USA
Contact:

Re: Trees Of Daggerfall [PERFORMANCE PROBLEM/SOLUTIONS]

Post by MasonFace »

Well, I got a few hours to mess around with it yesterday. When I replicated your process to replace the billboard with the Daggerfall Billboard Batched script/shader, I got the same result: you can set it up with the proper archive/index, but once the prefab gets instantiated into the streaming world, the billboard just disappears as if it looses that archive/index information on initialization perhaps?

So, I'm kind of speaking in ignorance here, but from looking more closely at how the terrain flats are rendered in DFU, it appears as if all the billboards in a location are combined into one mesh. I assume since they all share the same material and texture (atlas), this allows them to be rendered in one call without the CPU overhead of dynamic batching. Super efficient. Since we are dealing with LODs and dynamically enabling/disabling renderers in the process, I don't think we can take advantage of the same technique. But even dynamic batching (as apposed to static batching) we should have been getting better performance than what we're currently getting with the Speedtree billboards... so I don't think batching is the problem here - I still believe it is either culling calculations or the crappy Speedtree billboards like you said.

I think your idea about creating a new atlased billboard shader would work just fine. All the billboards would share the same material, so they should dynamically batch into one draw call. I'm pretty bad with shaders, but I think I may be able to copy most of "DaggerfallBillboardBatch.shader" into a new shader that supports texture atlases so we can just reference an index in that atlas for each billboard.

At this point, I want to try one last thing to get Daggerfall Billboard Batched script to work, but I think ultimately we will have to do as you suggest and make our own simple shader and texture atlas. If it comes to that, I can repurpose my Spriter tool to bake albedo and normal textures of the 3D models, then pack them into an atlas. But first, I'll just start with the vanilla textures.
Ok, let me know if you'll need anything.
Can you upload a package with the entire set of temperate forest models? That will give me a good start on benchmarking once I get something working.

User avatar
MasonFace
Posts: 543
Joined: Tue Nov 27, 2018 7:28 pm
Location: Tennessee, USA
Contact:

Re: Trees Of Daggerfall [PERFORMANCE PROBLEM/SOLUTIONS]

Post by MasonFace »

I was able to prod into this a little more over the weekend. I did a series of tests to try to find the source of the problem.

I'm hoping TheLacus or someone else with more knowledge on the topic than me will chime in and correct any misinformation or faulty assumptions I make below.

At this point, all I know for sure is that dynamic batching is killing performance. We need a way to reliably batch the objects into as few of draw calls as possible. Ideally, we would just set the objects' prefabs to "static" and call it a day, but since the objects all have different materials, there's nothing Unity can do to statically batch them at runtime.
Edit: I am reneging my statement about dynamic batching. I suspect that's the root cause, but know is too strong of a word at this point.

The way it looks like Interkarma solved this with the vanilla DFU assets is to combine all the terrain objects' meshes (probably using StaticBatchingUtility.Combine ?) and using the same material with a texture atlas. That way Unity sees all the objects collectively as one mesh and one material, so it doesn't need to spend any CPU time batching a massive amount of separate meshes into fewer draw calls. The downside to this is that you can't utilize frustum culling or LODs since all the trees are one object and no matter which way you look, you will see it and it will always be close (not that LODs would help with the vanilla terrain flats anyway). I think this trade off is great for vanilla DFU since it saves a lot of CPU time but only adds a tiny amount to the GPU render time.

Now, working with 3D trees with LODs is a completely different case altogether. Whereas the vanilla DFU terrain flats get combined into one mesh, the mod injection 3D Tree prefabs look like they are getting embedded into the terrain data, so I can't directly see what's going on with the individual game objects in the editor in play mode. I would have thought that there would be some optimizations happening behind the scenes that Unity's terrain stuff would do to help out, but it doesn't seem like it.

At this point, I'm pretty sure your assessment has been spot on, VMBlast. A simple test we could have done (and I will do next) is just create a simple billboard material and assign all the billboards to share that one material. If that improves performance, then we will know that creating a custom atlased batched billboard shader is the way to go. It would be great if we could somehow combine all the billboard meshes in the LOD groups into one mesh like the vanilla DFU method to drastically reduce the dynamic batching load, but I'm pretty sure that'd break the whole LOD system. Alternatively, I'm hoping that setting the billboard LOD objects to "static" will have a similar impact.

As an aside, I did notice something about your SpeedTree prefabs. Is there a reason that some of them have a Rigidbody component? Is that needed for wind simulations or something? I removed the Rigidbodies from all of them, but surprisingly didn't notice any increase in performance.

Anyhow, to summarize: I'm pretty sure dynamic batching is what's killing the performance. I think VMBlast is correct that the billboards are not getting batched and that's what's causing the issue. I will try a quick test in the next few days to try to confirm this.
Last edited by MasonFace on Tue Aug 20, 2019 4:49 pm, edited 1 time in total.

User avatar
Nystul
Posts: 1501
Joined: Mon Mar 23, 2015 8:31 am

Re: Trees Of Daggerfall [PERFORMANCE PROBLEM/SOLUTIONS]

Post by Nystul »

just wanted to say that I am impressed by all the work you guys put into this. This is the kind of work that is so tedious but so important at the same time ;)
always an interesting read whenever a new post pops up ;)

noodle
Posts: 1
Joined: Fri Aug 02, 2019 8:45 am

Re: Trees Of Daggerfall [PERFORMANCE PROBLEM/SOLUTIONS]

Post by noodle »

Hello,

Just dropping in my two cents. I think the trees look great, however in my experience the more leaves you have on a tree, the higher the FPS drop. The main thing I don't like about vanilla DF trees is that they're 2D and move with the point of view.
Is there any possible way of maybe putting in some lower-poly trees, with less leaves of course, maybe they wont be as beautiful as the trees you've currently made, but it will probably help the performance hit. You could have mod settings to choose between which trees will load. (High Quality, Medium Quality, Low Quality) I can't remember at the moment if the LODs are also 3d, but maybe High Quality = Current state of Trees of Daggerfall, Med Quality = All trees are LOD 1 and Low Quality = All trees are LOD 2. Just a suggestion. My main point is - we will probably like any 3D tree model over the vanilla DF trees.

User avatar
King of Worms
Posts: 4752
Joined: Mon Oct 17, 2016 11:18 pm
Location: Scourg Barrow (CZ)
Contact:

Re: Trees Of Daggerfall [PERFORMANCE PROBLEM/SOLUTIONS]

Post by King of Worms »

Fps drop is not caused by too complex 3d mesh here...
Also "everything is better than 2d trees" is highly personal opinion.

l3lessed
Posts: 1399
Joined: Mon Aug 12, 2019 4:32 pm
Contact:

Re: Trees Of Daggerfall [PERFORMANCE PROBLEM/SOLUTIONS]

Post by l3lessed »

I do like 3d models myself. The constant sprite facing you, especially for a fps immersive rpg like this, is visual and mode breaking for me.
My Daggerfall Mod Github: l3lessed DFU Mod Github

My Beth Mods: l3lessed Nexus Page

Daggerfall Unity mods: Combat Overhaul Mod

Enjoy the free work I'm doing? Consider lending your support.

User avatar
MasonFace
Posts: 543
Joined: Tue Nov 27, 2018 7:28 pm
Location: Tennessee, USA
Contact:

Re: Trees Of Daggerfall [PERFORMANCE PROBLEM/SOLUTIONS]

Post by MasonFace »

You could have mod settings to choose between which trees will load. (High Quality, Medium Quality, Low Quality) I can't remember at the moment if the LODs are also 3d, but maybe High Quality = Current state of Trees of Daggerfall, Med Quality = All trees are LOD 1 and Low Quality = All trees are LOD 2. Just a suggestion.
I've had that same idea, and I think it's possible to allow graphical mod creators to extend customizations to the player using Unity's built in LOD groups class.

Some people like the highest quality possible, some people like the Low-Fi retro look, some people like it somewhere between. Others just want something better looking than vanilla, but their PC can't handle the highest setting. LOD groups that are built at startup which is configured by mod settings should be able to handle that.

But of course that's future work! We gotta get it running right, first! :lol:

As of now, the trees are using just two LODs: 3D model and a billboard sprite. As KoW mentioned, the complexity of the 3D tree doesn't appear to be the root cause of the performance issue. VMBlast has done an excellent job optimizing the meshes, texture resolution, etc.. My recent tests indicate that the problem is deeper than just the complexity of mesh, but my analysis so far has admittedly been largely superficial. I was hoping some quick tests would reveal something obvious, but I'm apparently going to have to deep dive into the profiler.

@KoW:
Also "everything is better than 2d trees" is highly personal opinion.
Very true, but I think that's why noodle mentioned customization. People who prefer higher-than-vanilla quality billboards could configure their LOD group to have the billboard as LOD0 and no other LODs structured beneath it.

DeltaOfPie

Re: Trees Of Daggerfall [PERFORMANCE PROBLEM/SOLUTIONS]

Post by DeltaOfPie »

Some people like the highest quality possible, some people like the Low-Fi retro look, some people like it somewhere between.
Definitely think that a separate LOD setting would make sense...
I would maybe suggest imposters with a little bit more "angles" (maybe octahedral imposter)...

An interesting effect with imposters is that you could use a retrofying postprocessing after that, so you would justify retroplayers as well as high dev fans with the same solution.

Depending whether you cache or save the "imposter models" you could scale them down and render with low res/nearest neighbour and put a "retro/toon shader" on the models with a "weighting" towards the std daggerfall palette...

EDIT: caching the models upfront (or saving them statically) could also be generally the solution to the imposter problem. I think the textures for imposters could blow up the memory use... If you calculating the imposters for each object and create textures on render time, also this could blow up the calculation time...


Regarding the performance issues... Either it's the high fidelity of the trees that is causing the FPS drop, or the imposters that can't be batched easily and create a lot of "branching" due to angle calculations for the imposters... Nevertheless if the detail level is high enough you have tons of trees that will be rendered.... You could prune them away with some "occlusion culling" or maybe save them in some octree and check if they are seen from the distance (raycaster style)...

I'm suspecting the high resolution of the trees are rather the problem, because the 2D sprites get rendered with as well and don't create that big problem... Creating imposters shouldn't be that CPU intense (if saved statically [or at loading time] ) even with checking with sprite region in the sprite batch should be rendered.....

The imposters are pretty repetitive, I don't know if Unity has a system that recognizes this automatically and tries to reduce the rendering to one tree sprite batch for multiple trees...

The tries do have a quite high resolution.... I've seen few games that render the trees with such a high poly count.... And all of them use some kind of LOD system to avoid rendering them like this in distance...

User avatar
MasonFace
Posts: 543
Joined: Tue Nov 27, 2018 7:28 pm
Location: Tennessee, USA
Contact:

Re: Trees Of Daggerfall [PERFORMANCE PROBLEM/SOLUTIONS]

Post by MasonFace »

Definitely think that a separate LOD setting would make sense...
I would maybe suggest imposters with a little bit more "angles" (maybe octahedral imposter)...
Are you sure you meant octahedral? I think that would be four view angles arrayed around the object beneath, and four array around the the top of the object. If you meant octogonal, then that's the way the SpeedTree impostors currently operate. Source
I'm suspecting the high resolution of the trees are rather the problem, because the 2D sprites get rendered with as well and don't create that big problem...


The tree resolution doesn't seem to be the problem. In my tests, the render thread was at 24ms, which is not terrible, but nowhere near vanilla DFU's incredible 4.5ms using the batch billboard script/shader. Compare the CPU thread which is at 500ms with this mod, and 13ms for Vanilla DFU. That indicates that the graphics card has no problem rendering the many full 3D trees in the foreground, or even the many billboards in the background, but instead, the GPU is sitting idle waiting to receive the instructions from the CPU on how to render them.
The imposters are pretty repetitive, I don't know if Unity has a system that recognizes this automatically and tries to reduce the rendering to one tree sprite batch for multiple trees...
Sounds like you're describing GPU instancing, and SpeedTree supposedly already does this, but I'm not seeing the performance gains from it. :cry:
The trees do have a quite high resolution.... I've seen few games that render the trees with such a high poly count.... And all of them use some kind of LOD system to avoid rendering them like this in distance...
I mean, this mod also uses billboard sprite impostors for LODs. Also, the meshes and textures have been well optimized, and the billboards start quite close to the camera, so I'm not sure where you're going with this observation.


My interim conclusion at this point is that VMBlast is right, and that the draw distance is the culprit. I do think that the most fool proof way of fixing this is to keep the draw distance of the farthest LODs of the trees to be just beyond one terrain size and have all distant terrain use the vanilla DFU batched billboard trees that are super fast. I just don't know that there's a simple way to stitch all that together in an elegant way...

DeltaOfPie

Re: Trees Of Daggerfall [PERFORMANCE PROBLEM/SOLUTIONS]

Post by DeltaOfPie »

Are you sure you meant octahedral? I think that would be four view angles arrayed around the object beneath, and four array around the the top of the object. If you meant octogonal, then that's the way the SpeedTree impostors currently operate.
With octahedral imposter I mean the technique as described here:
https://www.shaderbits.com/blog/octahedral-impostors
(it's basically where you take the views from and how you convert the coordinates to a texture)
Unity uses 8 directions and renders them to a texture.

There is also an open source implementation for it here: https://github.com/xraxra/IMP

Maybe a custom shader could help... Not necessarily as complex as the above, but you could accustom it to the streaming process more...

If the CPU-Time is really exploding like that, it could be that the SpeedTree-Implementation is doing more than just calculating the angle for the imposter and tries to cull on CPU... While the batching of sprites can technically be culled on the GPU, reducing the CPU time....

The frustum is quite extreme to be honest... and modern GPUs just seem to be fast enough to handle the amount of trees thrown at it... "draw distance" would be a problem for everything that involves the CPU in that case...
I do think that the most fool proof way of fixing this is to keep the draw distance of the farthest LODs of the trees to be just beyond one terrain size and have all distant terrain use the vanilla DFU batched billboard trees that are super fast.
I would say that the "Trees Of Daggerfall" should be rendered and be batched as normal sprites...

Post Reply