Approval to make some optimisations?

Discuss coding questions, pull requests, and implementation details.
Post Reply
User avatar
XJDHDR
Posts: 258
Joined: Thu Jan 10, 2019 5:15 pm
Location: New Zealand
Contact:

Approval to make some optimisations?

Post by XJDHDR »

I've noticed a few small optimisations that can be made to the engine, and I wanted to see if I can get approval to make these changes before I do so.

The first change is an optimisation to null checks against Unity Objects. The details of this issue can be found here. The gist of it is that when a Unity Object is compared against Null and it is not Null in C# land, the interpreter then makes a call to native code to check if the Object has been disposed in C++ land. This native code call is expensive, but I've worked out an optimisation that can be made which avoids this call. A benchmark I've tested gave me a 33% performance improvement.

Part of the optimisation I want to make involves adding additional arguments to certain public methods, which would obviously break any mods that call them. If I'm not allowed to edit these methods, I can still do the optimisations that don't involve touching these methods. The improvement just won't be as much.
The list of methods I currently want to add additional arguments to are the following:

Code: Select all

GameObjectHelper.AlignControllerToGround()
GameObjectHelper.CreateDaggerfallMeshGameObject()
GameObjectHelper.CreateRMBBlockGameObject()
GameObjectHelper.InstantiatePrefab()
MaterialReader.GetMaterial() (both)
MeshReader.GetMesh()
TextureReader.GetTerrainTextureArray()
TextureReader.GetTexture2D()
TextureReader.GetTexture2DAtlas()
The second change I want to make is to reduce the number of heap allocations that are currently happening in the codebase, especially when it comes to strings. To do this, I want to add a common StringBuilder to the DaggerfallUnity class. This StringBuilder will then be used by other classes whenever it needs to create a concatenated or formatted string. The result will be that heap allocations will only occur when the final string is created, and if the StringBuilder needs to enlarge it's internal array. Instead of the larger amount of allocations that happen behind the scenes when strings are concatenated or formatted, or when value types are boxed before being turned into a string.

If I can add this common StringBuilder, the next question is whether it should be marked as Internal or Public. Internal means only DFU classes can use it. Public means mods can also use it, but the mod author should be careful to call stringBuilder.Clear() before the method it is used in returns.

User avatar
pango
Posts: 3347
Joined: Wed Jul 18, 2018 6:14 pm
Location: France
Contact:

Re: Approval to make some optimisations?

Post by pango »

Hi XJDHDR,

What can be made 33% faster, the execution of go == null?
Are there such cases inside (nested) loops? Because I suspect the optimization of other occurrences will be below noise level... Personally I'd like stronger evidence of benefit before making interfaces uglier, and breaking existing mods.

(Editors hints may pinpoint some opportunities, but optimisation process is better driven by a profiler.)
Mastodon: @pango@fosstodon.org
When a measure becomes a target, it ceases to be a good measure.
-- Charles Goodhart

User avatar
XJDHDR
Posts: 258
Joined: Thu Jan 10, 2019 5:15 pm
Location: New Zealand
Contact:

Re: Approval to make some optimisations?

Post by XJDHDR »

pango wrote: Wed Jul 06, 2022 7:06 am What can be made 33% faster, the execution of go == null?
Yes. More specifically, I've found 2 things so far.
First, there are a few properties that check if it's backing field is Null, and create/find the associated object if it is. I've found that if you use a boolean field that gets set to True once the backing field is filled, and check the state of the boolean in the property, it improves performance by 33%. The test case I tried is DaggerfallUnity.Instance, which is called somewhere around 1000 times a second at the main menu. Instead of doing if (instance == null), I replaced it with if (!instanceAllocated) and set instanceAllocated = true once instance has been found/created. When I benchmarked getting this property 100000 times, this change reduced the time taken from 21ms to 13ms.
Second, I've spotted a number of places where the engine does a GetComponent call, then checks if the returned object is Null. I'm changing these calls to use TryGetComponent instead, which returns a boolean, then check the boolean. I haven't benchmarked this change, but it's likely a similar improvement.
pango wrote: Wed Jul 06, 2022 7:06 am Are there such cases inside (nested) loops? Because I suspect the optimization of other occurrences will be below noise level... Personally I'd like stronger evidence of benefit before making interfaces uglier, and breaking existing mods.

(Editors hints may pinpoint some opportunities, but optimisation process is better driven by a profiler.)
Hints given by Rider are what prompted my posting this thread. The problem with profiling is that you can't measure the real-world impact of an optimisation until you've finished writing the optimised code, to have something to compare against the original. And at that point, even if the new code's improvement is negligible, it's already been written so where's the logic in throwing it away? Also, one could say that the changes I'm making are probably how the code should have been written in the first place.

As I said, changing the methods I listed is only part of this optimisation. There is still work that can be done without touching those methods' parameters. It just means that those methods won't be as optimal as they could be.

User avatar
pango
Posts: 3347
Joined: Wed Jul 18, 2018 6:14 pm
Location: France
Contact:

Re: Approval to make some optimisations?

Post by pango »

XJDHDR wrote: Wed Jul 06, 2022 10:18 am The test case I tried is DaggerfallUnity.Instance, which is called somewhere around 1000 times a second at the main menu. Instead of doing if (instance == null), I replaced it with if (!instanceAllocated) and set instanceAllocated = true once instance has been found/created. When I benchmarked getting this property 100000 times, this change reduced the time taken from 21ms to 13ms.
So expected gain is (21-13)ms * 1000/100000 = 80µs every second? Sure it's an example, but you'll need many similar improvements before it becomes significant. Performance varies more randomly from run to run just because of external factors like memory layout.
XJDHDR wrote: Wed Jul 06, 2022 10:18 am Hints given by Rider are what prompted my posting this thread. The problem with profiling is that you can't measure the real-world impact of an optimisation until you've finished writing the optimised code
Well, with a profiler you can know beforehand what's the theorical maximum you can gain by working on a piece of code (Amdahl's Law), so it helps not working on changes that cannot significantly improve performance.
Mastodon: @pango@fosstodon.org
When a measure becomes a target, it ceases to be a good measure.
-- Charles Goodhart

User avatar
Interkarma
Posts: 7236
Joined: Sun Mar 22, 2015 1:51 am

Re: Approval to make some optimisations?

Post by Interkarma »

I appreciate the offer, truly. I also understand the appeal of these micro-optimisations and code purity changes. There's a satisfaction to the engineering mind even when the overall benefit is small.

But for this project at this critical time, I'm not willing to take on anything other than high value or low impact changes. I'm lining up a lot of different things right now, and don't want to get bogged down in code review and discussions about which null check is slightly faster. I'm sorry.

User avatar
XJDHDR
Posts: 258
Joined: Thu Jan 10, 2019 5:15 pm
Location: New Zealand
Contact:

Re: Approval to make some optimisations?

Post by XJDHDR »

Interkarma wrote: Wed Jul 06, 2022 5:49 pm But for this project at this critical time, I'm not willing to take on anything other than high value or low impact changes. I'm lining up a lot of different things right now, and don't want to get bogged down in code review and discussions about which null check is slightly faster. I'm sorry.
Fair enough, but would this be up for discussion after 1.0? Also, what about my second proposal?

I think it's a good idea to decide on whether changing the above public method is okay, though. This is something that should be sorted out before 1.0. At that point, you would probably want the API to be as stable as possible. Not to release 1.0 and then the next release immediately breaks compatibility with any mods that use them.
pango wrote: Wed Jul 06, 2022 4:17 pm Well, with a profiler you can know beforehand what's the theorical maximum you can gain by working on a piece of code (Amdahl's Law),
The problem I see is that we already know that my proposed optimisations definitely improve performance. That calls into question: How much time will one spend on profiling an optimisation vs. just making the change? Once I've spotted one of these Null checks, it takes me less than a minute to correct it. I would argue that in the time it takes someone to measure the hypothetical impact of one instance of this change, I will have already changed that instance, and probably done the same to the next dozen or so as well.
pango wrote: Wed Jul 06, 2022 4:17 pm so it helps not working on changes that cannot significantly improve performance.
Sure, if the choice is between a small performance improvement and something more meaningful, then the latter is definitely the better choice. But what if the choice is between making a small optimisation like this, and not doing anything at all?

User avatar
Interkarma
Posts: 7236
Joined: Sun Mar 22, 2015 1:51 am

Re: Approval to make some optimisations?

Post by Interkarma »

StringBuilder is the fastest way to concatenate or format strings, no arguments there. :) But the benefit is mostly when there's heavy string concatenation done frequently in a tight loop. Otherwise, I think the best method of string manipulation is whatever's most readable.

Where do you propose string concatenation is a problem in DFU specifically, and what do the savings look like using StringBuilder instead? Why use a common StringBuilder across the whole project and change methods that potentially break mods, when targeting worst cases behind the scenes in affected subsystems will probably do just as well? And is there a simpler problem that has equal or greater savings elsewhere?

I know it feels intuitive to optimise something, anything, rather than nothing. But IMO the correct answer really is to do nothing if the value of result is less than the value of time spent working on it. I mean this with full respect and the experience of someone who has traded far too many hours of their life on small stuff.

Post Reply