Introduction
I am running out of Flash on my Cortex-M4 device. I analysed the code, and the biggest opportunity to reduce code size is simply in predefined constants.
- Example
const Struct option364[] = {
{ "String1", 0x4523, "String2" },
{ "Str3", 0x1123, "S4" },
{ "String 5", 0xAAFC, "S6" }
};
Problem
The problem is that I have a (large) number of (short) strings to store, but most of them are used in tables - arrays of const struct
s that have pointers to the const
strings mixed with the numerical data. Each string is variable in size, however I still looked at changing the struct
pointer to hold a simple (max) char
array instead of a pointer - and there wasn't much difference. It didn't help that the compiler wanted to start each new string on a 4-byte boundary; which got me thinking...
Idea
If I could replace the 4-byte char
pointer with a 2-byte index
into a string table - a predefined linker section to which index
was an offset - I would save 2 bytes per record right there, at the expense of a minor code bump. I'd also avoid the interior padding, since each string could start immediately after the previous string's NUL
byte. And if I could be clever, I could re-use strings - or even part-strings - for the indexes.
But moreover, I'd change the 4 + 2 + 4 (+ 2)
alignment to 2 + 2 + 2
- saving even more space!
- Consideration
Of course, inside the source code the housekeeping on all those strings, and the string table itself, would be a nightmare... unless I could get the compiler to help? I thought of changing the syntax of the actual source code: if I wanted a string to be in the string table, I would write it as #"String"
, where the #
prefix would flag it as a string table candidate. A normal string wouldn't have that prefix, and the compiler would treat it as normal.
Implementation
So to implement this I'd have to write a pre- pre-compiler. Something that would process just the #""
strings, replacing them with "magic" 16-bit offsets, and then output everything else to the real (pre)compiler to do the actual compilation. The pre-pre-compiler would also have to write a new C
file with the complete string table inside (although with a trick - see below), for the compiler to parse and provide to the linker for its dedicated section. Invoking this would be easy with the -no-integrated-cpp
switch, to invoke my own pre-pre-processor that would in turn invoke the real one.
- Issues
Don't get me wrong; I know there are issues. For example, it would have to be able to handle partial builds. My solution there is that for every modified C
file, it would write (if necessary) a parallel string table file. The "master" C
string table file would be nothing more than a series of #include
s, that the build would realise needed recompiling if one of its #include
s had changed - or indeed, if a new #include
was added.
Result
The upshot would be an executable that would have all the (constant) strings packed into a memory blob of no larger than 64K (not a problem!). The code would know that index
would be an offset into that blob, so would add the index to the start of the string table pointer before using it as normal.
Question
My question is: is it worth it?
- Pros:
- It would save a tonne of space. I didn't quantify it above, but assume a saving of 5%(!) of total Flash.
- Cons:
- It would require the build process to be modified to include a bespoke preprocessor;
- That preprocessor would have to be built as part of the toolchain rather than the project;
- The preprocessor could have bugs or limitations;
- The real source code wouldn't compile "out of the box".
Now...
I have donned my asbestos suit, so... GO!
This kind of "project custom preprocessor" used to be faily common back in the days when memory was pretty constrained. It's pretty easy to do if you use
make
as your build system -- just a custom pattern or suffix rule to run your preprocessor.The main question is if you want to run it on all source files or just some. If only a couple need it, you define a new file extension for source files that need preprocssing (eg,
.cx
and a.cx.c:
rule to run the preprocessor). If all need it, you redefine the implicit.c.o:
rule.The main drawback, as you noted, is that if there's any sort of global coordination (such as pooling all the strings like you are trying to do), changing any source file needing the preprocessor will likely require rebuilding all of them, which is potentially quite slow.