Performance considerations

Performance is a key area to pay attention to when porting the application to different devices. Performance is often a trade-off, mostly of memory for CPU or vice versa. The easiest way to improve performance is to simply reduce functionality. Since LWUIT has pluggable theming, you can substitute a simple theme without changing code. This makes it easier to see whether the problem is in the UI itself.

UI responsiveness is often a user perception rather than a real performance issue. Slow performance can occur, but a developer's opinion of performance may not match an end-user's perception.

The following sections discuss the specifics of memory and responsiveness. One thing to keep in mind is that performance and memory use on an emulator is no indication of device performance and memory overhead.

Memory fundamentals

Memory is problematic, especially when programming for small devices. When using LWUIT you must understand how memory directly relates to resolution and bit depth.

Assume you have two devices, a 16-bit color (65536 colors) device with 128x128 resolution that has 2 megabytes of memory, and a 24-bit color device (1.6 million colors) with a 320x240 resolution and 3 megabytes of memory. Which device provides more memory for a LWUIT application? The answer is not so simple.

Assume both devices have a background image set and scaled, so they need enough RAM to hold the uncompressed image in memory.

The smaller device needs 32,768 bytes just for a background buffer of the screen. The larger device requires 307,200 bytes for the same buffer!

Because screen buffers are needed both for the current form, the current transition (twice), and the MIDP implementation, the amount of memory the larger device consumes is surprisingly high. How did we reach these numbers?

The simple formula is:

screen width * screen height * bytes per pixel = memory

Therefore:

16 bit: 128 * 128 * 2 = 32,768

24 bit: 320 * 240 * 4 = 307,200

Notice that in the 24-bit device 24 bits are counted as an integer, because there is no 24-bit primitive and implementations treat 24-bit color as 32-bit color.

So getting back to the two devices. In the worst case scenario four buffers are immediately consumed, and the remaining RAM compares as follows:

16 bit: 2,097,152 – 32,768 * 4 = 1,966,125

24 bit: 3,145,728 – 307,200 * 4 = 1,916,928

It turns out the 24-bit device has more RAM to begin with, but does not have as much RAM to work with.

Note that these calculations do not take into account the additional memory overhead required for LWUIT and your application.

When to use EncodedImage

On Series 40 devices, for large images you can use the EncodedImage class that will only keep weak reference to the actual platform Image class. This way the garbage collector can remove the image from memory when it is no longer needed. Only thing that the EncodedImage keeps in memory is the compressed byte data of the image.

Note that you should not use EncodedImage by default to all images. Having a lot of images in a weak reference causes the garbage collector to do more work; in weaker devices this causes the application to freeze. The default image class returned by the resource system is Image class. You should always avoid garbage collection since it is a heavy operation.

Avoid excessive use of indexed images

Indexed images carry a performance overhead. It should not be excessive, but when using many animations or indexed images you can expect a slower repaint cycle, especially on devices without a JIT or fast CPU.

Use image buffering

Image buffering is an effective way to increase speed. This of course comes with increased memory usage, so the developer needs to be careful when using this technique. An example of the buffering is done in the RLinks example application to increase speed in list scrolling. The basic idea is to draw the complex component to an image and only repaint the image when the content of a list item changes. This way LWUIT only has to draw the same image, instead of looping through each list item and the whole component hierarchy and calling paint method for each component. Image buffering is usually very application specific and should be used when there is enough memory available and all other optimisations have been applied. In image buffering the developer should use partial weak reference caching to prevent OutOfMemoryError from happening. The RLinks example is a good source of information on this as well.

Do not block Event Dispatch Thread (EDT)

Performance often suffers because of slow paints. This often occurs when the EDT is being used without being released. It is important not to “hold” the EDT but release it immediately when performing long running tasks. For further details on releasing the EDT, see Display methods callSerially, callSeriallyAndWait, and invokeAndBlock.

The EDT might be blocked due to unrelated work on a different thread. Bad thread scheduling on devices causes this problem, in part because many hardware devices ignore thread priorities.

On some devices networking can cause a visible stall in the UI, a problem for which there is no “real” solution. The workaround for such cases is logical rather than technical. In this case a standard progress indicator stalls during a networking operation. It might work better to use a progress indicator heuristic that moves slower or does not move at all, so the user is less likely to notice the interruption in the display.

Be careful with transitions

Different transition types have different performance overheads on devices. It is recommended not to use transitions on cost-optimised phones. Developers should carefully test whether transitions work smoothly on all target devices and disable transitions if necessary.

Use Light mode on low-memory devices

Light mode often trades speed for memory overhead. If there is plenty of memory and low performance, explicitly turning off light mode (after Display.init()) might impact speed.

Avoid deep component hierarchies in UI layouts

When you are implementing the UI layout, try to avoid deep component hierarchies. For example, a list in which each item is a container with 3 components will be considerably slower to draw than a custom component with the same look and feel and behaviour. Creating custom components requires always more work, but the performance improvement can be very significant compared to deep component hierarchy.