Floating point numbers and the effect on 8-bit microcontrollers memory

1.8k Views Asked by At

I am currently working on a project that includes bare-metal programming on an stm-8 micro-controller using the SDCC compiler in linux. The memory in the chip is quite low so I'm trying to keep things really lean. I have gotten by with using 8-bit and 16-bit variables and things have gone well. But recently I ran into a problem were I really needed a float variable. So i wrote a function that takes in a 16-bit value converts to a float does the math I need and returns an 8-bit number. This cause my final compiled code on the MCU to go from 1198 Bytes to 3462 Bytes. Now I understand that using floating points is memory intensive and that many functions may need to be called to handle the use of the floating point number but it seems crazy to increase the size of the program by that much. I would like some help understanding why this is and what happened exactly.

Specs: MCU stm8151f2 Compiler: SDCC with --opt_code_size option

int roundNo(uint16_t bit_input) 
{ 
    float num = (((float)bit_input) - ADC_MIN)/124.0;
    return num < 0 ? num - 0.5 : num + 0.5; 
}
3

There are 3 best solutions below

5
On BEST ANSWER

To determine why the code is so large on your particular tool chain, you would need to look at the generated assembly code, and see what FP support calls it makes, then look at the map file to determine the size of each of those functions.

As an example on Godbolt for AVR using GCC 5.4.0 with -Os (Godbolt does not support STM8 or SDCC so this is for comparison as a 8-bit architecture) your code generates 6364 bytes compared 4081 bytes for an empty function. So the additional code required for the code body is 2283 bytes. Now accounting for the fact that you are using both a different compiler and architecture, these are not that different from your results. See in the generated code (below) the rcalls to subroutines such as __divsf3 - these are where the bulk of the code will be, and I suspect FP division is by far the larger contributor.

roundNo(unsigned int):
        push r12
        push r13
        push r14
        push r15
        mov r22,r24
        mov r23,r25
        ldi r24,0
        ldi r25,0
        rcall __floatunsisf
        ldi r18,0
        ldi r19,0
        ldi r20,0
        ldi r21,lo8(69)
        rcall __subsf3
        ldi r18,0
        ldi r19,0
        ldi r20,lo8(-8)
        ldi r21,lo8(66)
        rcall __divsf3
        mov r12,r22
        mov r13,r23
        mov r14,r24
        mov r15,r25
        ldi r18,0
        ldi r19,0
        ldi r20,0
        ldi r21,0
        rcall __ltsf2
        ldi r18,0
        ldi r19,0
        ldi r20,0
        ldi r21,lo8(63)
        sbrs r24,7
        rjmp .L6
        mov r25,r15
        mov r24,r14
        mov r23,r13
        mov r22,r12
        rcall __subsf3
        rjmp .L7
.L6:
        mov r25,r15
        mov r24,r14
        mov r23,r13
        mov r22,r12
        rcall __addsf3
.L7:
        rcall __fixsfsi
        mov r24,r22
        mov r25,r23
        pop r15
        pop r14
        pop r13
        pop r12
        ret

You need to perform the same analysis on the code generated by your tool chain to answer your question. No doubt SDCC is capable of generating an assembly listing and a map file which will allow you to determine exactly what code and FP support is being generated and linked.

Ultimately though your use of FP in this case is entirely unnecessary:

int roundNo(uint16_t bit_input) 
{ 
  int s = (bit_input - ADC_MIN) ;
  s += s < 0 ? -62 : 62 ;
  return s / 124 ;
}

At Godbolt 2283 bytes compared to an empty function. Still somewhat large, but the issue there most likely is that the AVR lacks a DIV instruction so calls __divmodhi4. STM8 has a DIV for 16 bit dividend and 8 bit divisor, so it will likely be significantly smaller (and faster) on your target.

1
On

OK, a version of fixed point that actually works:

// Assume a 28.4 format for math.  12.4 can be used, but roundoff may occur.

// Input should be a literal float (Note that the multiply here will be handled by the  
// compiler and not generate FP asm code. 
#define TO_FIXED(x) (int)((x * 16))

// Takes a fixed and converts to an int - should turn into a right shift 4.
#define TO_INT(x)   (int)((x / 16))

typedef int FIXED;
const uint16_t ADC_MIN = 32768;

int roundNo(uint16_t bit_input) 
{ 
  FIXED num = (TO_FIXED(bit_input - ADC_MIN)) / 124;
  num += num < 0 ? TO_FIXED(-0.5) : TO_FIXED(0.5);
  return TO_INT(num);
}

int main()
{
  printf("%d", roundNo(0));

  return 0;
}

Note we are using some 32-bit values here so it will be bigger than your current values. With care though, it could possibly convert back to a 12.4 (16-bit int) instead if round off and overflow can be managed carefully.

Or go grab a better full feature Fixed Point library from the web :)

0
On

(Update) After writing this, I noticed that @Clifford mentioned that your microcontroller supports this DIV instruction natively, in which case doing this is redundant. Anyway, I will leave it as a concept which can be applied in cases where DIV is implemented as an extern call, or for cases where DIV takes too many cycles and the goal is to make the calculation faster.

Anyway, shifting and adding is likely to be faster than division, if you ever need to squeeze some extra cycles. So if you start from the fact that 124 is almost equal to 4096/33 (the error factor is 0.00098, i.e. 0.098%, so less than 1 in 1000), you can implement the division with a single multiplication with 33 and a shift by 12 bits (division by 4096). Furthermore, 33 is 32+1, meaning multiplying by 33 is equal to shifting left by 5 and adding the input again.

Example: you want to divide 5000 by 124, and 5000/124 is approx. 40.323. What we will be doing is:

  1. 5,000 << 5 = 160,000
  2. 160,000 + 5,000 = 165,000
  3. 165,000 >> 12 = 40

Note that this only works for positive numbers. Also note that, if you're really doing lots of multiplications all over the code, then having a single extern mul or div function might result in smaller overall code in the long run, especially if the compiler is not particularly good at optimizing. And if the compiler can just emit a DIV instruction here, then the only thing you can get is a tiny bit of speed improvement, so don't bother with this.

#include <stdint.h>
#define ADC_MIN 2048

uint16_t roundNo(uint16_t bit_input) 
{ 
    // input too low, return zero
    if (bit_input < ADC_MIN)
        return 0;

    bit_input -= (ADC_MIN - 62);
    uint32_t x = bit_input;

    // this gets us x = x * 33        
    x <<= 5;
    x += bit_input;

    // this gets us x = x / 4096
    x >>= 12;

    return (uint16_t)x;
}

GCC AVR with size optimizations produces this, i.e. all calls to extern mul or div functions are gone, but it seems like AVR doesn't support shifting multiple bits in a single instruction (it emits loops which shift 5 times and 12 times respectively). I don't have a clue what your compiler will do.

If you also need to handle the bit_input < ADC_MIN case, I would handle this part separately, i.e.:

#include <stdint.h>
#include <stdbool.h>
#define ADC_MIN 2048

int16_t roundNo(uint16_t bit_input) 
{ 
    // if subtraction would result in a negative value,
    // handle it properly
    bool negative = (bit_input < ADC_MIN);
    bit_input = negative ? (ADC_MIN - bit_input) : (bit_input - ADC_MIN);

    // we are always positive from this point on
    bit_input -= (ADC_MIN - 62);

    uint32_t x = bit_input;
    x <<= 5;
    x += bit_input;
    x >>= 12;

    return negative ? -(int16_t)x : (int16_t)x;
}