I want to generate all the Motzkin Number and store in an array. The formula is given as follows:

My current implementation is just way too slow:
void generate_slow() {
mm[0] = 1;
mm[1] = 1;
mm[2] = 2;
mm[3] = 4;
mm[4] = 9;
ull result;
for (int i = 5; i <= MAX_NUMBERS; ++i) {
result = mm[i - 1];
for (int k = 0; k <= (i - 2); ++k) {
result = (result + ((mm[k] * mm[i - 2 - k]) % MODULO)) % MODULO;
}
mm[i] = result;
}
}
void generate_slightly_faster() {
mm[0] = 1;
mm[1] = 1;
mm[2] = 2;
mm[3] = 4;
mm[4] = 9;
ull result;
for (int i = 5; i <= MAX_NUMBERS; ++i) {
result = mm[i - 1];
for (int l = 0, r = i - 2; l <= r; ++l, --r) {
if (l != r) {
result = (result + (2 * (mm[l] * mm[r]) % MODULO)) % MODULO;
}
else {
result = (result + ((mm[l] * mm[r]) % MODULO)) % MODULO;
}
}
mm[i] = result;
}
}
Besides, I'm stuck with finding a closed form for the recurrence matrix so that I can apply exponentiation squaring. Can anyone suggest a better algorithm? Thanks.
Edit I can't apply the second formula because division doesn't apply when modulo a number. The maximum of n is 10,000 which is beyond the range of 64-bit integer, so the answer is modulo with a larger number m where m = 10^14 + 7. Larger integer library is not allowed.
WARNING: THE FOLLOWING CODE IS WRONG BECAUSE IT USES INTEGER DIVISION (e.g. 5/2 = 2 not 2.5). Feel free to fix it!
This is a good chance to use dynamic programming. It is very similar for working out Fibonacci numbers.
The problem is because these numbers get so big, storing them all in the cache will run out of memeory if you want to go really high. Then it is better to use a for loop remembering the previous 2 terms. If you want to find the motzkin number for many numbers, I suggest you sort your numbers first, and then as you approach each of your numbers in the for loop, output the result.
EDIT: I created a looped version but got different results to my previous recursive function. At least one of them must be wrong!! Hopefully you still see how it works and can fix it!