I want to unroll the loop "find_col" in the inlined function "find_match". Therefore, I set the pragma array partition on the array "mp_buffer" and "mc_buffer" under the declaration of them (which is outside of "find_match"), and I set them into find_match as arguments. However, there is an II violation because the array is not patitioned. I would like to ask how to solve it, thank you!!
The function find_match (the parallel factor is set to 8 now):
inline void find_match (
Feature_Point origin,
int origin_u_bin,
int origin_v_bin,
int origin_col_idx,
Feature_Point m_buffer[7][4][COL_BIN_FEATURE_MAX],
int32_t m_buffer_num[7][4][V_BIN_NUM+1],
Matching_cand m_matching[U_BIN_NUM][4][COL_BIN_FEATURE_MAX]
) {
Matching_cand min_cand;
int16_t min_cost = 32766;
int16_t psum;
int16_t psum_03, psum_47, psum_811, psum_1215, psum_1619, psum_2023, psum_2427, psum_2831;
int16_t psum_015, psum_1631;
int32_t u_min,u_max,v_min,v_max;
u_min = origin.u-SEARCH_RAD_U;
u_max = origin.u+SEARCH_RAD_U;
v_min = origin.v-SEARCH_RAD_V;
v_max = origin.v+SEARCH_RAD_V;
// bins of interest
int32_t u_bin_min = max(origin_u_bin-3, 0);
int32_t u_bin_max = min(origin_u_bin+3, U_BIN_NUM-1);
int32_t v_bin_min = max(origin_v_bin-3, 0);
int32_t v_bin_max = min(origin_v_bin+3, V_BIN_NUM-1);
int32_t bin_class = origin.type;
int16_t tmp[32];
#pragma HLS ARRAY_PARTITION variable=tmp dim=1 complete
// for all bins of interest do
find_u_bin: for (int u_bin = u_bin_min; u_bin < u_bin_max; u_bin++) {
int u_bin_buffer = u_bin % 7;
find_col: for (int col_idx = m_buffer_num[u_bin_buffer][bin_class][v_bin_min]; col_idx < m_buffer_num[u_bin_buffer][bin_class][v_bin_max]; col_idx++) {
#pragma HLS UNROLL factor=parallel
Feature_Point target = m_buffer[u_bin_buffer][bin_class][col_idx];
if (target.u>=u_min && target.u<=u_max && target.v>=v_min && target.v<=v_max) {
psum = 0;
calc: for (int i = 0; i < 32; i++) {
#pragma HLS UNROLL factor=32
ap_uint<8> a = origin.d.range((i+1)*8-1, 8*i);
ap_uint<8> b = target.d.range((i+1)*8-1, 8*i);
tmp[i] = ABS(a, b);
}
// adder tree
...
I set the pragma here:
Matching_cand mc_matching[U_BIN_NUM][4][COL_BIN_FEATURE_MAX];
Matching_cand mp_matching[U_BIN_NUM][4][COL_BIN_FEATURE_MAX];
static int _p_matched_num;
#pragma HLS ARRAY_PARTITION variable=mc_buffer dim=3 type=cyclic factor=parallel
#pragma HLS ARRAY_PARTITION variable=mp_buffer dim=3 type=cyclic factor=parallel
find_match(origin, i, v_buffer_idx, col_idx, mp_buffer, mp_buffer_num, mc_matching);
I have tried to set array partition pragma in the "find_match" function, but it take a very long time to do the C synthesys.
find_match()
. The 'wiring' is different depending on how the memory is implemented. That is exactly the point of array partitioning for unrolling: to increase the number of ports and thus parallel accesses. If you don't set it inside the function then vivado probably assumes no array partitioning since the function might be called with arbitrary variables. Consult UG902 for more information.u_bin_buffer, bin_class
might help Vivado HLS detect, that the partitioning is sufficient. I wouldn't expect this to be necessary though.x
you duplicate the hardware to be generatedx
times. Array partitioning creates more ports on the memory which need to be wired. If Vivado HLS can't guarantee, that memory accesses are exclusive to the partitions that you set, then each part of the memory needs to be wired to each of your unrolled functions. So for demonstration purposes and very much oversimpilified: for unroll=8 and partition=8 this would increase wiring by 8x8=64.How do you inline
find_match()
? I can't see a#pragma HLS inline
in your code snippet.