I have an ESP32 microcontroller where I need to directly write and read some GPIOs. Basically, I need to move bits from a 4-bit value to some 4 positions in an uint32 output register. Also, the same in reverse, so read some other 4 positions in uint32 input register and pack them into a 4-bit value.
I would rather use any general optimizations, instead of anything that relies on particular pins or their order.
This is what I have now:
#define BIT(x) (1U << (x)) // from some builtin header
const gpio_num_t dig_in[4] = {GPIO_NUM_34, GPIO_NUM_35, GPIO_NUM_36, GPIO_NUM_39}; // same as {34, 35, 36, 39}
const gpio_num_t dig_out[4] = {GPIO_NUM_4, GPIO_NUM_25, GPIO_NUM_26, GPIO_NUM_27}; // same as { 4, 25, 26, 27}
Writing to pins:
void write_pins(uint8_t out)
{
static const uint32_t masks[4] = {BIT(dig_out[0]), BIT(dig_out[1]), BIT(dig_out[2]), BIT(dig_out[3])};
uint32_t set = ((out & 0b0001) ? masks[0] : 0) | ((out & 0b0010) ? masks[1] : 0) | ((out & 0b0100) ? masks[2] : 0) | ((out & 0b1000) ? masks[3] : 0);
uint32_t reset = (!(out & 0b0001) ? masks[0] : 0) | (!(out & 0b0010) ? masks[1] : 0) | (!(out & 0b0100) ? masks[2] : 0) | (!(out & 0b1000) ? masks[3] : 0);
REG_WRITE(GPIO_OUT_W1TS_REG, set);
REG_WRITE(GPIO_OUT_W1TC_REG, reset);
}
I found I can do that just with a tiny lookup table (or, rather, two) where the out is the index. Would that be any faster? Or maybe some other way? Reusing any of the intermediary values?
Reading from pins: (registers are 32 bit wide, and fortunately all my input pins are above GPIO32, so I can use only the upper register instead of both. For this reason all masks are shifted by 32. Not pretty, but idk how to do it better, without unnecessary performance hit of using 64 bit or array of 32 bits...)
uint8_t read_pins()
{
static const uint32_t masks[4] = {BIT(dig_in[0] - 32), BIT(dig_in[1] - 32), BIT(dig_in[2] - 32), BIT(dig_in[3] - 32)};
uint32_t pins = REG_READ(GPIO_IN1_REG);
return (!!(pins & masks[0]) << 0) | (!!(pins & masks[1]) << 1) | (!!(pins & masks[2]) << 2) | (!!(pins & masks[3]) << 3);
}
A reverse lookup probably won't be any faster than direct bit operations. Is it possible to generate some magic constant (f.e. for multiplication or any other math op or even assembly) which would move the bits to required positions? I guess not in general case, but maybe with only 4 valid bits required out of 32? Any other suggestions to reduce number of instructions?