The secrets of rendering performance in Flash

Flash is an incredibly powerful tool for creating amazing experiences. However, those experiences are not free. Flash has actually to draw things as they change. This can lead to performance issues if you accidentally overload the renderer.

With this text I aim to explain what tricks flash does when rendering and how to work together with Flash to ensure a highly performing rendering.

Subpixels and The rendering quality

The single most obvious thing to do about rendering performance problems is to turn down the rendering quality. But what does it do? It affects how subpixel rendering is performed. Subpixel rendering is used when things do not align perfectly to a pixel. For example, curves have plenty of places where they don’t perfectly align with the pixels.

Subpixel rendering is not a very difficult algorithm to understand. For each pixel that needs sub pixel rendering the player will rend some subpixels like in normal rendering (except for sub subpixel rendering) and then uses the average as the color for the pixel.

The quality setting controls how many subpixels are rendered. High is a 4x4 grid, while medium is a 2x2 grid. Low is effectively a 1x1 grid, turning off subpixel rendering.

If you do the math, you will see that high quality is 16 times more expensive than low quality. Do keep in mind that this extra cost is only paid when subpixel rendering is actually needed. So the effective cost will be far less in practice.

Filters, cacheAsBitmap and other buffered operations

Flash uses an off-screen buffer for some operations. Flash will render the contents to the buffer normally and then perform whatever operation it wanted to do to the pixels and then finally draw the pixels to the main rendering buffer.

Worth noting is that Flash will never split a buffer into multiple ones. This means that even empty space is included in the buffer. This empty space costs just as much as used space when it comes to performing the operation on the buffer afterwards.

Now Flash isn't stupid, it will cache this buffer and only redraws it as needed. However, depending on the content, that might be every single frame! Now, Flash would have to redraw the contents even if it wasn't using this buffer. But as it is using the buffer, you are paying for the increased complexity and most importantly, the buffer itself.

At the same time, if your buffered content doesn't change, flash can skip drawing to the buffer and just draw the buffer to the main rendering area. This can be much faster if your buffered content is expensive to draw. Flash can actually use the buffer even if the buffered content has moved. It just has to move the buffer to a different spot on the screen and can skip redrawing the buffer.

Now, you can control when Flash will create buffers. You can force Flash to use one simply by setting the cacheAsBitmap property. But Flash will also need buffering in order to execute Filters. So when you add a filter to something, Flash will create a buffer for you.

With AIR Flash has a new ability for these buffers, the ability to not redraw them just because the container had its transform changed. Instead, flash will only redraw the buffer if the actual content changes. You simply have to specify the transform that Flash should use when rendering the content and Flash will always use that transform when it draws to the buffer. Flash will then draw the buffered pixels with the transform applied. This is hopefully faster.

Per pixel rendering

Flash supports a lot of different fills for the content. But Flash has an optimization for the most common fill of them all: the solid color fill. Instead of drawing each pixel separately, Flash will instead use bulk memory writing instructions to fill the space quickly.

However, this has an obvious problem. The drawn pixels are all identical. This means that the optimization can’t be used when they are not identical. The most obvious case of this is gradient fills. But it is far from the only fill type to use per pixel rendering. Bitmap fills also change per pixel. But there is an optimization for bitmaps…

The fast bitmap blitting

Flash can draw bitmaps as well as vector graphics. Flash is actually pretty smart when it comes to bitmap rendering. It has an optimized bitmap renderer that it can use to draw bitmaps very quickly. It will simply copy the pixels in bulk to the screen. As this is a bulk transfer, it can use extra effective bulk memory copy instructions to perform the copy.

However, this design has a notable issue. It cannot transform the bitmap. It is unable to rotate it or even resize it. Flash has an alternative rendering algorithm that can do these things. However, that algorithm is much more expensive since Flash has to calculate each pixel with some math. Turning smoothing on is even more expensive.

The key here is to avoid the expensive algorithm. In order to do so, you need to keep the bitmap aligned to a whole pixel position. In addition, you can neither scale the bitmap nor rotate it.

Optimizing paths

Flash uses curves to define its graphics. These curves are a bit expensive. Flash normally have no issue with this. However, art that is imported or even hand drawn can contain a lot of curves. Flash will be able to deal with them just fine. But it will take a lot longer.

The trick here is to simplify the curves to look nearly the same, while having much fewer curves. As luck has it, the Flash developers have thought of this and provided a handy tool to do this in Flash.

By selecting the shape and using the Modify->Shape->Optimize menu option while having a shape selected, you can easily remove extraneous curves. Do use caution, since it is removing data.

The lazy rendering

Flash is lazy, Flash will avoid redrawing the full screen as much as possible. To do that Flash keeps track of what parts of the screen needs to change. Flash keeps it simple by only tracking rectangles where it needs to redraw the screen.

You can easily see what these rectangles are by turning on the “Show redraw areas” option from a content debugging player.

As it is much cheaper to only redraw a few changed areas than the full screen, try to work with Flash here. Simply don’t change stuff for no good reason. Of course, this can be a fairly hard thing to obey with, since it can greatly restrict your artistic freedom. I am not saying that you can’t change stuff. I am saying that you should use caution to not accidentally move stuff that you didn't need to move.

One very costly thing is the common tool of a virtual camera. It has the power to move everything on screen to fit in the virtual camera area. Naturally, doing this means that the full screen has to be redrawn. There isn't much that you can do about this except to avoid moving the camera. I realize that it is pointless to not have a movable camera; the thing is not to avoid all camera movements, but to avoid the expensive ones.

The presentation modes

The final thing that Flash does after drawing the screen is to actually put it on the visible screen. Flash has a variety of ways of doing this.

  • A software devicehandle of its own. The classical no fuss way for an application to put graphics on the screen. It will simply draw independently of the embedding application and let the system take care of putting the drawn content from each part together.
  • A shared software device handle. Flash shares the drawing system with the embedding application allowing them to both put content in the area dedicated for the flash player. It has a high performance cost with the additional overhead of two programs sharing the same device handle and working together. Notably, it does not support flash only redrawing some areas, since the embedding application may have drawn on any part of the shared drawing.
  • A hardware devicehandle of its own. The more advanced and faster way of drawing to the screen, allowing for the hardware to use optimized communication protocols and so on.
  • GPU compositing. The player will not just use hardware acceleration, but will outsource the final rendering step to the hardware. This takes some of the load off the CPU, but it has costs of its own, since more things needs to be sent to the GPU than the final rendered picture.

Letting the GPU scale the picture

Flash has a trick for speeding up rendering in fullscreen mode. Flash can outsource the scaling to the GPU instead. Flash does this by rendering to a smaller buffer and then letting the GPU scale up the picture to fill the screen. It is mainly designed for video, but can be used for all kinds of fullscreen applications. However, there is one notable backside with this. The rendered picture will not have as much details as if Flash had rendered for the real screen resolution. This means that the content will appear pixilated and less sharp. However, video normally is ok with this, as it would just have been the same deal if Flash had done the scaling.

With FP 10.2 comes StageVideo that adds the ability of only scaling the video on the gpu.

Rendering vs. the others

Rendering is not the only thing that the Flash player does. It also executes ActionScript, decodes audio, decodes video, handles webcameras, mixes audio and runs the incremental garbage collector. All the parts have to compete with each other for CPU time. And with some luck, it gets done with CPU time to spare.

While the most obvious time thieves are the rendering and ActionScript execution, one should not underestimate the rest. Decoding audio and video is not a cheap task. It is a lot of data and it involves fancy math. Web cameras are also reportedly heavy offenders.

Summary

The Flash player uses as many performance tricks as the developers can think of, and they can think of quite a lot of them, but not all tricks work for all cases. You should adapt your content so that more tricks can be used in your case.