Vitis HLS: pragma Array Partition doesn't work in the sub-function when I set this pragma outside

267 Views Asked by At

I want to unroll the loop "find_col" in the inlined function "find_match". Therefore, I set the pragma array partition on the array "mp_buffer" and "mc_buffer" under the declaration of them (which is outside of "find_match"), and I set them into find_match as arguments. However, there is an II violation because the array is not patitioned. I would like to ask how to solve it, thank you!!

The function find_match (the parallel factor is set to 8 now):

inline void find_match (
  Feature_Point origin,
  int origin_u_bin,
  int origin_v_bin,
  int origin_col_idx,
  Feature_Point m_buffer[7][4][COL_BIN_FEATURE_MAX],
  int32_t       m_buffer_num[7][4][V_BIN_NUM+1],
  Matching_cand m_matching[U_BIN_NUM][4][COL_BIN_FEATURE_MAX]
) {
    

  Matching_cand min_cand;
  int16_t  min_cost = 32766;
  int16_t  psum;
  int16_t  psum_03, psum_47, psum_811, psum_1215, psum_1619, psum_2023, psum_2427, psum_2831;
  int16_t  psum_015, psum_1631;
  int32_t u_min,u_max,v_min,v_max;



  u_min = origin.u-SEARCH_RAD_U;
  u_max = origin.u+SEARCH_RAD_U;
  v_min = origin.v-SEARCH_RAD_V;
  v_max = origin.v+SEARCH_RAD_V;

  // bins of interest
  int32_t u_bin_min = max(origin_u_bin-3, 0);
  int32_t u_bin_max = min(origin_u_bin+3, U_BIN_NUM-1);
  int32_t v_bin_min = max(origin_v_bin-3, 0);
  int32_t v_bin_max = min(origin_v_bin+3, V_BIN_NUM-1);
  int32_t bin_class = origin.type;
  int16_t tmp[32];
  #pragma HLS ARRAY_PARTITION variable=tmp dim=1 complete

  // for all bins of interest do
  find_u_bin: for (int u_bin = u_bin_min; u_bin < u_bin_max; u_bin++) {
    int u_bin_buffer = u_bin % 7;
    find_col: for (int col_idx = m_buffer_num[u_bin_buffer][bin_class][v_bin_min]; col_idx < m_buffer_num[u_bin_buffer][bin_class][v_bin_max]; col_idx++) {
      #pragma HLS UNROLL factor=parallel
      Feature_Point target = m_buffer[u_bin_buffer][bin_class][col_idx];
      if (target.u>=u_min && target.u<=u_max && target.v>=v_min && target.v<=v_max) {
        psum = 0;
        calc: for (int i = 0; i < 32; i++) {
          #pragma HLS UNROLL  factor=32
          ap_uint<8> a = origin.d.range((i+1)*8-1, 8*i);
          ap_uint<8> b = target.d.range((i+1)*8-1, 8*i);
          tmp[i] = ABS(a, b);
        }
        // adder tree
...

I set the pragma here:

   Matching_cand mc_matching[U_BIN_NUM][4][COL_BIN_FEATURE_MAX];
   Matching_cand mp_matching[U_BIN_NUM][4][COL_BIN_FEATURE_MAX];
   static int _p_matched_num;


   #pragma HLS ARRAY_PARTITION variable=mc_buffer dim=3 type=cyclic factor=parallel
   #pragma HLS ARRAY_PARTITION variable=mp_buffer dim=3 type=cyclic factor=parallel
   
   find_match(origin, i, v_buffer_idx, col_idx, mp_buffer, mp_buffer_num, mc_matching); 

I have tried to set array partition pragma in the "find_match" function, but it take a very long time to do the C synthesys.

1

There are 1 best solutions below

0
On
  • The pragma needs to be set on the variable inside find_match(). The 'wiring' is different depending on how the memory is implemented. That is exactly the point of array partitioning for unrolling: to increase the number of ports and thus parallel accesses. If you don't set it inside the function then vivado probably assumes no array partitioning since the function might be called with arbitrary variables. Consult UG902 for more information.
  • Your Array is multidimensional. Maybe adding const variables for u_bin_buffer, bin_class might help Vivado HLS detect, that the partitioning is sufficient. I wouldn't expect this to be necessary though.
  • Increased synthesis/compilation times are to be expected. By unrolling with factor x you duplicate the hardware to be generated x times. Array partitioning creates more ports on the memory which need to be wired. If Vivado HLS can't guarantee, that memory accesses are exclusive to the partitions that you set, then each part of the memory needs to be wired to each of your unrolled functions. So for demonstration purposes and very much oversimpilified: for unroll=8 and partition=8 this would increase wiring by 8x8=64.

How do you inline find_match()? I can't see a #pragma HLS inline in your code snippet.