I managed to write a video recording demo which is similar to ContinuousCaptureActivity of grafika(Source code of ContinuousCaptureActivity.java).
The difference is that grafika used hardware encoding but I used software encoding. For software encoding, I get every video frame from GPU with PBO which is very fast and copy the image data to ffmpeg, then do the h264 encoding.
The performance is acceptable for most devices, glMapBufferRange() took less than 5ms and memcpy() took less than 10ms.
But the performance is low on the phone of huawei mate7. glMapBufferRange() took 15~30ms, memcpy() took between 25~35ms.
I have tested normal memcpy() on mate7, it's much faster when copy normal memory.
It is really strange, who can give me some help?
Device info:
chipset of the phone: HiSilicon Kirin 925
cpu of the phone: Quad-core 1.8 GHz Cortex-A15 & quad-core 1.3 GHz Cortex-A7
See detail here: huawei mate 7
The pbo code is as follows:
final int buffer_num = 1;
final int pbo_id[] = new int[buffer_num];
private void getPixelFromPBO(int width, int height, boolean isDefaultFb) {
try {
long start = System.currentTimeMillis();
final int pbo_size = width * height * 4;
if (mFrameNum == 0) {
GLES30.glGenBuffers(buffer_num, pbo_id, 0);
Log.d(TAG, "glGenBuffers pbo_id[0]:" + pbo_id[0]);
GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, pbo_id[0]);
//glBufferData creates a new data store for the buffer object currently bound to target
GLES30.glBufferData(GLES30.GL_PIXEL_PACK_BUFFER, pbo_size, null, GLES30.GL_DYNAMIC_READ);
GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, 0);
}
GLES30.glPixelStorei(GLES30.GL_PACK_ALIGNMENT, 1);
checkGlError("glPixelStorei");
//we need read GL_BACK when the default frame buffer is binded
//glReadBuffer specifies a color buffer as the source for subsequent glReadPixels, , glCopyTexImage2D, glCopyTexSubImage2D, and glCopyTexSubImage3D commands
if (isDefaultFb) {
GLES30.glReadBuffer(GLES30.GL_BACK);
} else {
GLES30.glReadBuffer(GLES30.GL_COLOR_ATTACHMENT0);
}
checkGlError("glReadBuffer");
GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, pbo_id[0]);
checkGlError("glBindBuffer 1 ");
long ts = System.currentTimeMillis();
glReadPixelsPBOJNI(0, 0, width, height, GLES30.GL_RGBA, GLES30.GL_UNSIGNED_BYTE, 0);
Log.d(TAG, "glReadPixelsPBOJNI took " + (System.currentTimeMillis() - ts) + "ms\n\n\n");
//GLES30.glReadPixels(0, 0, width, height, GLES30.GL_RGBA, GLES30.GL_UNSIGNED_BYTE, null);
//glReadPixelsPBOJNI(0, 0, height, width, GLES30.GL_RGBA, GLES30.GL_UNSIGNED_BYTE, 0);
checkGlError("glReadPixels");
ts = System.currentTimeMillis();
ByteBuffer buf = (ByteBuffer) GLES30.glMapBufferRange(
GLES30.GL_PIXEL_PACK_BUFFER, 0, pbo_size, GLES30.GL_MAP_READ_BIT);
checkGlError("glMapBufferRange");
Log.d(TAG, "*****glMapBufferRange took " + (System.currentTimeMillis() - ts) + "ms");
ts = System.currentTimeMillis();
cpoyDataToFFmpeg(buf, 1, 1);
Log.d(TAG, "####cpoyDataToFFmpeg took " + (System.currentTimeMillis() - ts) + "ms\n\n\n");
GLES30.glUnmapBuffer(GLES30.GL_PIXEL_PACK_BUFFER);
checkGlError("glUnmapBuffer");
GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, 0);
checkGlError("glBindBuffer 0 ");
}catch (Exception e) {
Log.e(TAG, "DO PBO exp", e);
}
}
At last, I realized that I should use double PBO to improve the data transfer and we should take care of the data alignment.
glMapBufferRange() will block until DMA transfer finished, so single PBO can not make a significant transfer improvement