Modifying gcc compilation to improve embedded Flash size

202 Views Asked by At

Introduction

I am running out of Flash on my Cortex-M4 device. I analysed the code, and the biggest opportunity to reduce code size is simply in predefined constants.

- Example

const Struct option364[] = {
   { "String1",  0x4523, "String2" },
   { "Str3",     0x1123, "S4" },
   { "String 5", 0xAAFC, "S6" }
};

Problem

The problem is that I have a (large) number of (short) strings to store, but most of them are used in tables - arrays of const structs that have pointers to the const strings mixed with the numerical data. Each string is variable in size, however I still looked at changing the struct pointer to hold a simple (max) char array instead of a pointer - and there wasn't much difference. It didn't help that the compiler wanted to start each new string on a 4-byte boundary; which got me thinking...

Idea

If I could replace the 4-byte char pointer with a 2-byte index into a string table - a predefined linker section to which index was an offset - I would save 2 bytes per record right there, at the expense of a minor code bump. I'd also avoid the interior padding, since each string could start immediately after the previous string's NUL byte. And if I could be clever, I could re-use strings - or even part-strings - for the indexes.

But moreover, I'd change the 4 + 2 + 4 (+ 2) alignment to 2 + 2 + 2 - saving even more space!

- Consideration

Of course, inside the source code the housekeeping on all those strings, and the string table itself, would be a nightmare... unless I could get the compiler to help? I thought of changing the syntax of the actual source code: if I wanted a string to be in the string table, I would write it as #"String", where the # prefix would flag it as a string table candidate. A normal string wouldn't have that prefix, and the compiler would treat it as normal.

Implementation

So to implement this I'd have to write a pre- pre-compiler. Something that would process just the #"" strings, replacing them with "magic" 16-bit offsets, and then output everything else to the real (pre)compiler to do the actual compilation. The pre-pre-compiler would also have to write a new C file with the complete string table inside (although with a trick - see below), for the compiler to parse and provide to the linker for its dedicated section. Invoking this would be easy with the -no-integrated-cpp switch, to invoke my own pre-pre-processor that would in turn invoke the real one.

- Issues

Don't get me wrong; I know there are issues. For example, it would have to be able to handle partial builds. My solution there is that for every modified C file, it would write (if necessary) a parallel string table file. The "master" C string table file would be nothing more than a series of #includes, that the build would realise needed recompiling if one of its #includes had changed - or indeed, if a new #include was added.

Result

The upshot would be an executable that would have all the (constant) strings packed into a memory blob of no larger than 64K (not a problem!). The code would know that index would be an offset into that blob, so would add the index to the start of the string table pointer before using it as normal.

Question

My question is: is it worth it?

- Pros:

  • It would save a tonne of space. I didn't quantify it above, but assume a saving of 5%(!) of total Flash.

- Cons:

  • It would require the build process to be modified to include a bespoke preprocessor;
  • That preprocessor would have to be built as part of the toolchain rather than the project;
  • The preprocessor could have bugs or limitations;
  • The real source code wouldn't compile "out of the box".

Now...

I have donned my asbestos suit, so... GO!

1

There are 1 best solutions below

0
On

This kind of "project custom preprocessor" used to be faily common back in the days when memory was pretty constrained. It's pretty easy to do if you use make as your build system -- just a custom pattern or suffix rule to run your preprocessor.

The main question is if you want to run it on all source files or just some. If only a couple need it, you define a new file extension for source files that need preprocssing (eg, .cx and a .cx.c: rule to run the preprocessor). If all need it, you redefine the implicit .c.o: rule.

The main drawback, as you noted, is that if there's any sort of global coordination (such as pooling all the strings like you are trying to do), changing any source file needing the preprocessor will likely require rebuilding all of them, which is potentially quite slow.