Is it possible to auto-increment the base address of a register on a STR with a [Rn]!
? I've peered through the documentation but haven't been able to find a definitive answer, mainly because the command syntax is presented for both LDR and STR - in theory it should work for both, but I couldn't find any examples of auto-incrementing on a store (the loading works ok).
I've made a small program which stores two numbers in a vector. When it's done the contents of out
should be {1, 2}
but the store overwrites the first byte, as if the auto-increment isn't working.
#include <stdio.h>
int main()
{
int out[]={0, 0};
asm volatile (
"mov r0, #1 \n\t"
"str r0, [%0]! \n\t"
"add r0, r0, #1 \n\t"
"str r0, [%0] \n\t"
:: "r"(out)
: "r0" );
printf("%d %d\n", out[0], out[1]);
return 0;
}
EDIT: While the answer was right for regular loads and stores, I found that the optimizer messes up auto-increment on vector instructions such as vldm/vstm. For instance, the following program
#include <stdio.h>
int main()
{
volatile int *in = new int[16];
volatile int *out = new int[16];
for (int i=0;i<16;i++) in[i] = i;
asm volatile (
"vldm %0!, {d0-d3} \n\t"
"vldm %0, {d4-d7} \n\t"
"vstm %1!, {d0-d3} \n\t"
"vstm %1, {d4-d7} \n\t"
:: "r"(in), "r"(out)
: "memory" );
for (int i=0;i<16;i++) printf("%d\n", out[i]);
return 0;
}
compiled with
g++ -O2 -march=armv7-a -mfpu=neon main.cpp -o main
will produce gibberish on the output of the last 8 variables, because the optimizer is keeping the incremented variable and using it for the printf. In other words, out[i]
is actually out[i+8]
, so the first 8 printed values are the last 8 from the vector and the rest are memory locations out of bounds.
I've tried with different combinations of the volatile
keyword throughout the code, but the behavior changes only if I compile with the -O0
flag or if I use a volatile vector instead of a pointer and new, like
volatile int out[16];
For store and load you do this:
whatever you put at the end, 4 in this case, is added to the base register (r1 in the ldr example and r2 in the str example) after the register is used for the address but before the instruction has completed it is very much like
EDIT, you need to look at the disassembly to see what is going on, if anything. I am using the latest code sourcery or now just sourcery lite from mentor graphics toolchain.
arm-none-linux-gnueabi-gcc (Sourcery CodeBench Lite 2011.09-70) 4.6.1
so the
is to allocate the two local ints out[0] and out[1]
is because they are initialized to zero, then comes the inline assembly
and then the printf:
and now it is clear why it didnt work. you are didnt declare out as volatile. You gave the code no reason to go back to ram to get the values of out[0] and out[1] for the printf, the compiler knows that r4 contains the value for both out[0] and out[1], there is so little code in this function that it didnt have to evict r4 and reuse it so it used r4 for the printf.
If you change it to be volatile
Then you should get the desired result:
the preparation for printf reads from ram.