Этот сайт лучше всего просматривать в современном браузере с включённым JavaScript.

Optimization tips

warmanw

I would like to summon @Nate to clear things up. no rush.

Hi Nate

I am making youtube video with tips on how to optimize Spine project.

I myself was a programmer 10 years ago so I will understand technical language.

I have a list of tips that I will include in the video. however...

I am looking for more things to add to the list, that can be optimized and of course would be happy if I hear how those features affect on performance. I really want to not mess up with the terminology and mislead people.

So here is the list I gathered so far:

Use Linked meshes when creating sequence of meshes, and use them with skins when parts are using same silhouette

I have heard Constraints are heavy, and using them sparingly is a great Idea. any difference between constraints? for example maybe path constraint that has 10 bones and multiple vertices is much heavier than path with 10 bones and 2 vertices? And what if we enable stretching for 2 bone IK? does any of this make a big impact?

Deforming meshes in animation are super bad for CPU, but is it bad if we deform them in setup pose?

Blending mode of the slots, does any of them is heavier?

What about tint black? what would be heavier if we enable Tint Black for the slot or change blending mode to additive?

How about the keys in the timeline. any difference between the timelines? or keys just hold numbers?

What if we shear an art will that use bigger rectangle to render?

I will also talk about the vertex transforms, question is which is heavier, one vertex with 3 weights or 3 vertices with one weight each?

I know about the clipping as well, that number of clipped slots and vertices of the mask affect the performance. anything else?

What about inherit transform, rotation and scale? is it bad? looks like simple to calculate.

If we set the alpha of the slot to 0 will it continue to draw? or what if we scale the bone to 0, will containing art continue to draw? what if we have 2 slots one with 100% the other with 99% transparency. are they similar performance wise?

Skel and json are different in Size. is there a difference when code parses them? I know that json is much more easy

Anything I missed?

warmanw

Nick

I am not Nate but I would suggest you to do some benchmarks to support what your are going to suggest in the video. e.g. Running hundreds of test case project on screen to make things obvious. This will make the content more convincing and will give the viewers a general idea how each optimization affect the overall performance. In fact, most of the questions you ask can answered once you have the benchmark setup and the result is more realistic and reliable than what a developer suggested without numbers. Sometime certain optimization is insignificant and is waste of time when not dealing with large number of characters. This is very important to not lead viewer to over optimization.
Btw, I never use it but I remember a discussion about config for disabling rendering when character is off screen. Don't know if it is handled properly by default now.
Looking forward to your video.

warmanw

Thank you very much, I will definitely consider your suggestion.

Nate

You've picked a hard topic for a video!

::rubs hands together:: :nerd:

The first thing is that it's very hard to say what is important when it comes to performance. If you have something (Spine or otherwise) that takes a lot of resources, that in itself is not a problem. It is only important IF the total of all the resources your app uses exceeds a reasonable amount. Everything the app does needs to be considered, not just the Spine portion. Your super inefficient app is perfectly fine and requires no special attention right up until you exceed that reasonable amount. Only then does performance matter.

Spending some effort on constructing things in an efficient way can be helpful to make it more likely that you'll never hit that unreasonable threshold. Also if you do exceed that threshold, it may be easier to make adjustments. It's a very fine line to walk, trying to worry only the correct amount.

The problem most people fall into is that they worry too much about performance long before they are anywhere near the threshold where they should begin to worry. That worry can add up to a HUGE amount of wasted effort. As you probably know, it is called "premature optimization". Great effort can be put into doing things extremely efficiently, making everything about building the app harder, and very often NONE of that was actually necessary.

Discussing performance optimizations is good, but it's important to focus on the efforts that make the most difference. Since most people will be doing preemptive and premature optimization, it's most helpful to discuss potential problems that are most likely to cause you to exceed the unreasonable performance threshold. Give people ways to help avoid the worst performance problems with the least effort. That least effort part is important, because they don't actually have any performance problems yet!

Once you do have performance problems, there are still plenty of ways to waste huge amounts of time and effort. There is no point putting effort into making areas that are already fast even faster, even if you could make those areas much more efficient. For example, say you can reduce one action by 99% of the time it takes, and another you can reduce only 25%. However if the first took 10ms, now it takes 0.1ms and you probably can't tell the difference. If the other took 4s, it goes down to 3s, which is noticeable. Prioritizing the areas that are causing your problems (identifying your worst bottlenecks) is the first step of performance optimization and to do that you almost always need to take measurements of your actual app. I know many of your are going to ignore that bold part, please read it a few extra times! Due to that, watching a video about various optimizations is unlikely to be helpful unless you happen to have the exact problem covered by the video.

Enough blah blah, on to your questions!

warmanw написал
Use Linked meshes when creating sequence of meshes, and use them with skins when parts are using same silhouette

This isn't really related to performance. Linked meshes are better than duplicating a mesh many times, because some of the same mesh information is shared (bones, vertices, triangles, UVs, hull length, edges, width, height). However, you'll still have an attachment per frame in your sequence. It would be a bit better to use a single mesh attachment with Sequence checked. Then you have only one mesh attachment and you don't need a timeline to change attachments. That means less data in the skeleton file, less memory needed to load attachments and keys. CPU and GPU performance aren't affected though.

warmanw написал
I have heard Constraints are heavy, and using them sparingly is a great Idea. any difference between constraints? for example maybe path constraint that has 10 bones and multiple vertices is much heavier than path with 10 bones and 2 vertices? And what if we enable stretching for 2 bone IK? does any of this make a big impact?

There is no reason to avoid constraints. They can cause a few more bone transformations, but those are pretty cheap. IK and transform constraints don't take much processing at all, no matter their settings. Path constraints require more CPU than those, especially if you have many bones following the path, but even if you use many path constraints it's unlikely to be your worst bottleneck.

warmanw написал
Deforming meshes in animation are super bad for CPU, but is it bad if we deform them in setup pose?

Applying deform keys does not use a lot of CPU, it's just a simple float array copy, then an addition per vertex. That's not free, nothing is, but it's not a big deal. What's bad about deform keys is they use a lot of memory (and size in the data file) to store values for every bone weighted to each mesh vertex. A few keys isn't a big deal, but consider if you key all the meshes on your character 5 times in 10 animations

you've increased the mesh vertices that need to be stored by 50x! One of the largest parts of the skeleton data is the mesh vertices, so you have likely increased the entire size of your skeleton data by nearly 50x. This is how people get 25MB+ skeleton data files. It's easily avoided by using weights. Use deform keys sparingly or not at all.

warmanw написал
Blending mode of the slots, does any of them is heavier?

If you render using PMA then normal and additive can be used without any performance difference. Otherwise, generally changing blend modes causes a batch flush. For example to render a single attachment with a different blend mode, you cause 2 batch flushes: normal rendering, flush, other blend mode rendering, flush, more normal rendering. Like everything else, a few extra batch flushes are fine. It doesn't matter until you are flushing way too many times per frame, which depends on the performance of the devices you target.

Maybe you could explain the different runtime parts in your video. The GPU has a few: geometry submission, draw calls (batching), fill rate. You could break CPU down into the (generally) most expensive operations: clipping, bone and vertex transforms, etc.

warmanw написал
What about tint black? what would be heavier if we enable Tint Black for the slot or change blending mode to additive?

Tint black causes more data to be sent per vertex. Disabling tint black entirely for the renderer is more efficient. This only matters if you are sending way too much geometry to the GPU each frame. That is unlikely because 2D doesn't need much compared to 3D.

warmanw написал
How about the keys in the timeline. any difference between the timelines? or keys just hold numbers?

The keys just hold numbers, so more keys means a bigger data file and more memory to hold the data at runtime. Applying timelines takes some CPU though (it's a binary search to find the next key for the current animation time), so fewer timelines is better. However, you're unlikely to notice the difference in most cases. Maybe if you are applying many timelines for many skeletons then you'd see a lot of CPU usage for thousands of timelines, but in that case you probably can't easily reduce the number of timelines. Removing a few won't make much difference, because each doesn't take much processing. You probably can't remove say 50% of your timelines, because then you won't get the animation you wanted.

One thing you can do is apply half the animations every other frame (or a similar scheme). That reduces your timeline applying time by 50%.

If you have many skeletons on screen, you may be able to get away with animating say 10 skeletons, then drawing those each 10 times to make an army of 100 skeletons on screen. You've reduced your timeline applying time by 90%! With so many skeletons visible at once, it may not be noticeable that many have the same pose.

warmanw написал
What if we shear an art will that use bigger rectangle to render?

Shear has no special cost. The size of your art affects the fill rate of the GPU.

warmanw написал
I will also talk about the vertex transforms, question is which is heavier, one vertex with 3 weights or 3 vertices with one weight each?

They are the same cost. 13 or 31 results in 3 vertex transforms, which is where the cost is.

warmanw написал
I know about the clipping as well, that number of clipped slots and vertices of the mask affect the performance. anything else?

Just what's described on the clipping attachment doc page.

warmanw написал
What about inherit transform, rotation and scale? is it bad? looks like simple to calculate.

There is not a big difference between any of the combinations.

warmanw написал
If we set the alpha of the slot to 0 will it continue to draw? or what if we scale the bone to 0, will containing art continue to draw? what if we have 2 slots one with 100% the other with 99% transparency. are they similar performance wise?

Setting the alpha to 0 (or 99%) will still draw the image and use your fill rate exactly the same as alpha of 1, unless the runtime has a special case that notices the 0 alpha and skips rendering. It's better to hide attachments.

Scaling an attachment to 0 will still send geometry to the GPU, but nothing will be rendered. Just hide the attachments, it's more clear what the intent is.

warmanw написал
Skel and json are different in Size. is there a difference when code parses them? I know that json is much more easy

Parsing JSON is not easier! It is more complex, parsing is slow, and it uses a lot more memory during parsing.

The largest part of skeleton data is usually mesh vertices, which are mostly lists of numbers. Storing all that as text is not great. In binary, each number is 4 bytes and parsing it into a number is very easy. In JSON, it can take 8-9 bytes and parsing the characters into a number is slow when done thousands of times.

Binary (skel) is better than JSON in every way: it's smaller on disk and it's faster to parse. There is no good reason to use JSON in a production application. Only use JSON if you need a human to be able to read it or you need to process the data with other tools. Importing JSON data into a different version of Spine is a little more forgiving than binary, but that is not officially supported.

warmanw

OMG Nate Thanks!!!

You wrote the script to the video 😃 I just need to record me reading it outloud 😃

Cant Wait to digest everything, Thanks again

Nate

Haha, making all that dry stuff understandable, interesting, and with visuals is your challenge! What are all the important pieces and how does setup in Spine affect them? Good luck!

If you want some tips for explaining the whole process:

It starts by applying an animation. Read through the code, starting with Animation apply(). See all the implementations of Timeline apply() for how much work is done. Mostly it consists of finding the key for the current time, then adjusting the skeleton for that key and the key before it.

The next big part starts in Skeleton updateWorldTransform(). Again read through the code, it's easy to follow. See all the implementations of Updateable update(), those are in Bone, IkConstraint, TransformConstraint, and PathConstraint.

The last big part is rendering. For that look for example in spine-libgdx SkeletonRenderer draw() (any of them). For regions and meshes, they are asked to compute world vertices, then the geometry (vertices, triangles, colors) is assembled and sent to the GPU. It'll be harder to follow the GPU part, since it happens in libgdx. Also that part differs by game toolkit.

The most important approaches to not wasting performance excessively are those in the Spine user guide. Vertex and bone transforms, using prune, how to use clipping reasonably, etc. Most people make skeletons that are more detailed than necessary, likely because they can zoom in using Spine and make the details perfect. People should keep sight of how their assets will be used. If it will be low size/resolution on a phone, probably you don't need to model each finger and strand of hair individually. Don't use more bones and vertices than necessary for the movements needed. It can be surprising how much you can do with just weights, without needing very many bones. Don't use clipping if you can just mask things by putting another image on top. Basically just keep things simple until you actually need them more detailed.

You could talk about how you can use different size images in the editor and at runtime. A lot of people want to "work in HD" and have massive 8000x8000 skeletons in Spine with the bone scale cranked up, but there's no real advantage over working with a reasonable size. You can even pack multiple atlases in your game and load an atlas based on the user's screen size, so the images you use are close to the user's native screen resolution. Only users with high res screens get a high res atlas. I guess this isn't really about runtime performance, unless they are using images that are too large at runtime, which would use up more memory than necessary. Images that are way too large, like more than 2x the size they are drawn, usually look really bad.

warmanw

YES! tips still coming.

I will do my best to produce a video we all proud of.

I do remember digging those codes of libGDX 10 years ago(omg, is spine that old?)

We had over 200 soldiers showing at the same time, mostly running. To optimize we exported sequence of the body without the arms imported it back to spine and just added 2 bones for the arms for smooth interpolation. so characters looked smooth and very lightweight.

We were trying to reuse skeletondata but that would force all to synchronize their animations.

warmanw написал
Skel and json are different in Size. is there a difference when code parses them? I know that json is much more easy

I actually wanted to write easy to read and import into different versions or just modify 😃 remember once you explained this here on the forum.

warmanw

@Nate, could you expand a bit on this topic?

I had a character that needed a bright glowing outline. Initially, I achieved this by spreading a single blob texture along its hull, using additive blending to create a smooth glow that moved with the body parts. However, the client requested a reduction in attachment count, so I adapted by creating separate glows for each body part—cutting the attachments from 30 down to 7.

The trade-off, though, was that this approach required 7 unique glow textures instead of just one reusable blob texture.

So, my question is: Which is better for performance?

40 attachments with semi-transparent/alpha pixels and additive blending, all sharing a single small texture

7 attachments with additive blending, each using larger, unique textures

And do the transparent pixels affect the question? or lets say if the attachments never meant to overlap
Would love to hear your thoughts!

Nate

Fill rate is how many pixels you draw. There is a limit, but it's usually quite high on most hardware. All pixels you draw affect fill rate -- even full transparent pixels, even attachments made fully transparent by setting the slot color alpha to 0 (unless the runtime special cases skipping those). If you are not drawing pixels, eg because they are outside the mesh hull, then they do not affect fill rate. The metric for fill rate is Area (sq px).

Fewer attachments is a little less CPU processing and sending less geometry to the GPU. This isn't usually an issue for 2D. GPUs can handle 3D, which sends a LOT more geometry that 2D. The metric is Vertex transforms. Make sure you prune!

Large, unique textures eat up atlas space. Often that is at a premium and worth some minor trade off to have more attachments. The metric is mostly "does everything fit in 1 atlas page". It matters less if you are packing at runtime or using other schemes.

Why did they want fewer attachments? What are they trying to improve? Any time you are making optimizations, they should be for good reason and you should be able to measure the difference. If you can't, you are 100% wasting your time. Optimizations can take time to perform and but worse they can make projects more complex, which means more time is wasted in the future working on the more complex project. Too much of that can really bog down productivity.

That said, there's a certain amount of "voodoo" that is OK to applying as a general rule of thumb, even without having specific performance issues to solve. Those are things like: avoid clipping and deform keys, use the fewest vertices necessary, always prune, etc.

warmanw

This answers to my question. so its important to take a good care of the atlas than worry about extra dozens of additional attachments that use single texture.

Their argument was when there are many transparent pixels that overlap on each other it creates GPU bottleneck and additive blending was not making it easier for them

Nate

Rendering pixels without blending just replaces whatever color is there. That is typically done to clear the screen and draw a background, then all other rendering is done with blending.

Rendering a pixel with blending is done by reading the current pixel value, then writing the new pixel value mixed with the current value. There is no performance difference from many transparent pixels that overlap versus drawing the same number of pixels elsewhere. Drawing a pixel with blending always has the same cost, so what matters is how many pixels you draw (ie the fill rate or area sq px).

When mixing the new pixel value with the current value, for typical rendering (where you haven't customized with shaders) the mixing function can be "normal" or "additive" or others. There is no performance difference between the mixing functions when rendering with PMA. If using "additive" and not using PMA, then every switch between normal/additive breaks batching (causes a draw call and a flushing of the render pipeline). You should always use PMA because the blending is more accurate, but especially if you use additive.