Time-consuming Problem of Memory Copy Between REE and QSEE

33 Views Asked by At

Firstly, the test code as below:

#define DATA_TYPE float
#define _1KB (1024)

static inline __attribute__((__always_inline__)) void swap_data_value(DATA_TYPE* pSrc, DATA_TYPE* pDst, uint32_t elemCnt)
{
    for (int i = 0; i < elemCnt; ++i) {
        pDst[i] = pSrc[i];
    }
}

void test_func()
{
    const int DATA_NUM = _1KB * _1KB;
    uint32_t calc_len = 64;
    int loop_cnt     = DATA_NUM / calc_len;
    if((DATA_NUM % calc_len) != 0) {
           LOGE("loop_cnt not match calc_len");
    }
    for(int k = 0; k < 666; ++k) {
        DATA_TYPE* pData = (DATA_TYPE*)ftk_ta_malloc(2 * DATA_NUM * sizeof(DATA_TYPE));
        for(int i = 0; i < DATA_NUM * 2; ++i) {
             pData[i] = (DATA_TYPE)i;
        }
        DATA_TYPE* pSeg1 = pData;
        DATA_TYPE* pSeg2 = pData + k * 1024;
        ftk_millisecond_t t0 = ftk_ta_get_uptime();
        for(int j = 0; j < 400; ++j) {
            DATA_TYPE* p1 = pSeg1;
            DATA_TYPE* p2 = pSeg2;
            for (int i = 0; i < loop_cnt; i++) {
                swap_data_value(p1, p2, calc_len);
                p1 += calc_len;
                p2 += calc_len;
            }
        }
        t0 = ftk_ta_get_uptime() - t0;
        LOGD("swap_data_value[%d: %dx%d]: %0.4f ms", k, calc_len, loop_cnt, t0/400.0f);
        ftk_ta_free(pData);
    }
}

I run this code on platform sdm865, and has huge difference of performance between REE and QSEE(TrustZone of Qualcomm).

In REE, it spends 0.1325 ~ 0.1375 ms stably. But in QSEE, it spends 0.7275 ~ 10.37 ms, increased volatilily.

I doubt this is because something of cache limition. But I cann't get the cache information in QSEE, and below codes leads to the TA crash(exit directly).

uint64_t ctr_el0 = 0;
asm volatile("mrs %0, CTR_EL0" : "=r"(ctr_el0) : );

And in REE, I get the cache line is 64B.

So, is this problem because the QSEE(TrustZone) limit the cache size or cache access performance?

0

There are 0 best solutions below