Why is SurfaceFlinger still using 5ms of CPU time per frame with Hardware Composer?

2.3k Views Asked by At

I'm trying to run a challenging and latency-senstive application at 60 fps on as many android devices as possible. It involves processing live frames from the camera (ideally also at 60 fps) along with rendering additional graphics on top with OpenGL ES 2/3.

I'm firstly just trying to identify and minimise any system-level overhead of the activity, using systrace with a minimal test app that obtains camera frames in a SurfaceTexture and renders them using OpenGL ES 2 to a GLSurfaceView.

I've been investigating with a Samsung Galaxy S8 (the Exynos version that has a Mali GPU and an 4-big, 4-little CPU setup) with Android 8.0.

When receiving camera frames but not rendering them (eg by switching the GLSurfaceView to RENDERMODE_WHEN_DIRTY rather than RENDERMODE_CONTINUOUSLY) then CPU usage appears pretty low across the board with a small amount of CPU usage per frame which looks related to queuing and dequeuing buffers for the SurfaceTexture. SurfaceFlinger appears to be doing nothing when none of the surfaces are updated, as expected.

As soon as I start rendering new frames, things get more interesting. The GLThread in my app only takes ~1.5ms of CPU time, roughly what I'd expect. What is unexpected is the CPU time required in SurfaceFlinger.

Here's a bit of the systrace output, typical for most frames:

SurfaceFlinger systrace

Every frame presented goes through 2 SurfaceFlinger operations - there's a handleMessageInvalidate that invokes an updateTexImage, and then a handleMessageRefresh that is primarily spent in doComposition, with a majority of that spent in postFramebuffer.

Note for the majority of this time, the thread is active on the CPU rather than sleeping. It's roughly a third of the frame time for one core spent in SurfaceFlinger - that's pretty significant if the scheduler decides to use the same core for one of my important threads.

I've read quite a lot of the internal docs around SurfaceFlinger internals, including this page discussing the Hardware Composer: https://source.android.com/devices/graphics/arch-sf-hwc.

My understanding of the HWC was that the composition was all done in the display hardware - I was expecting the CPU-side work for that to be minimal; just latching the latest buffers and passing them on to HWC.

dumpsys SurfaceFlinger indeed shows the HWC is being used for all the layers:

|    type   |  handle    | hint | flag | tr | blnd |   format    |     source crop (l,t,r,b)      |          frame         | name 
|-----------+------------+------+------+----+------+-------------+--------------------------------+------------------------+------
|       HWC | 75cee57f40 | 0000 | 0020 | 00 | 0100 | RGBx_8888   |    0.0,    0.0, 1152.0, 2960.0 |    0,    0, 1152, 2960 | SurfaceView - com.example.tangobravo.camera1test/com.example.tangobravo.camera1test.MainActivity@e16091d@3#0
|       HWC | 75cee59b40 | 0000 | 0000 | 00 | 0105 | RGBA_8888   | 1104.0,    0.0, 1440.0, 2960.0 | 1104,    0, 1440, 2960 | com.example.tangobravo.camera1test/com.example.tangobravo.camera1test.MainActivity#0
|       HWC | 75cee58100 | 0000 | 0000 | 00 | 0105 | RGBA_8888   |    0.0,    0.0,   96.0, 2960.0 | 1344,    0, 1440, 2960 | StatusBar#0
| FB TARGET | 75cee55b60 | 0000 | 0000 | 00 | 0105 | RGBA_8888   |    0.0,    0.0, 1440.0, 2960.0 |    0,    0, 1440, 2960 | HWC_FRAMEBUFFER_TARGET

So what's going on? Why is the HWC so expensive here? Is there a better (lower overhead) pattern I should be using for the app?

I'd expect a NEON CPU compositor to be able to churn through those layers in 5ms or so, so it doesn't feel like HWC is providing much of a win in terms of CPU usage.

0

There are 0 best solutions below