The Adams and Mullapudi autoschedulers often generate schedules which include the vectorize and split primitives with constant parameters on them. These schedules do not work for all array sizes fed to the compiled Halide library as shown in the example below.
For the example below try array size of 3 (fails with out of bounds error) and 4 (passes). Then in the Generator class comment the vectorization schedule and uncomment the split one. Then try array size 7 (fails with out of bounds error) and 8 (passes).
Notice that if the array is not compatible with the split/vectorization parameters it can go out of bounds.
If the Mullapudi and Adams add specializations into the generated schedule to filter out incompatible sizes then this problem would not have happened. Maybe also if the split/vectorization somehow can be parameterized, but maybe that is not a good option.
Do the Mullapudi or Adams autoschedulers support the specialization for cases like this or is there plan to support it?
SchBugGen.cpp file:
#include "Halide.h"
#include <stdio.h>
using namespace Halide;
class SchBugGen : public Halide::Generator <SchBugGen> {
public:
Input<Buffer<double>> aIn1{"aIn1", 1};
Output<Buffer<double>> aOut1{"aOut1", 1};
void generate() {
aOut1(d1) = aIn1(d1) * 2;
}
void schedule() {
Var d2("d2");
// Default schedule
aOut1.vectorize(d1, 4);
// aOut1.split(d1, d1, d2, 8);
}
private:
Var d1{"d1"};
};
HALIDE_REGISTER_GENERATOR(SchBugGen, SchBugGenerator)
bugRepro.cpp file:
#include <stdio.h>
#include <stdlib.h>
#include "schBugFun.h"
#include "HalideBuffer.h"
void printOut(double aOut1[], int aLen) {
printf("Out = {");
for (int i = 0; i < aLen; i++) {
printf("%0.0lf ", aOut1[i]);
}
printf("}\n");
}
void initArrs(double aIn1[], int aIn1Size) {
for (int i = 0; i < aIn1Size; i++) {
aIn1[i] = 10;
}
}
int main() {
// For vectorization of size 4 try fl = 3 and 4. The former asserts, the later does not.
// For split of size 8 try fl = 7 and 8. The former asserts, the later does not.
const int fl = 3;
double in1[fl];
double out1[fl] = {};
initArrs(in1, fl);
Halide::Runtime::Buffer<const double> inHBuff(in1, fl);
Halide::Runtime::Buffer<double> outHBuff(out1, fl);
schBugFun(inHBuff, outHBuff);
printOut(out1, fl);
return 0;
}
// Use these commands to compile the code above: Do this only once:
set PATH=<HALIDE_BIN_PATH>:$PATH
set LD_LIBRARY_PATH=<HALIDE_BIN_PATH>
Compile Halide generator class:
g++ -std=c++17 -g -I <HALIDE_INCLUDE_PATH> -L <HALIDE_BIN_PATH> -lHalide -lpthread -ldl - rdynamic -fno-rtti -Wl,-rpath,<HALIDE_BIN_PATH> SchBugGen.cpp <HALIDE_INCLUDE_PATH>/GenGen.cpp -o schBugLibGen
Create Halide library by running compiled generator without schedule:
./schBugLibGen -f schBugFun -g SchBugGenerator -e static_library,h,assembly,bitcode,cpp,html,cpp_stub,stmt,o,schedule target=host auto_schedule=false -o .
Compile test harness:
g++ -std=c++17 schBugFun.o -I <HALIDE_INCLUDE_PATH> -L <HALIDE_BIN_PATH> -lHalide -lpthread -ldl -rdynamic -fno-rtti -Wl,-rpath,<HALIDE_BIN_PATH> -O3 -g bugRepro.cpp -o out
Run the program:
./out
Thanks, Ivan
This issue was also captured here: https://github.com/halide/Halide/issues/3104
And is expected to be addressed here: https://github.com/halide/Halide/issues/6847
Note in issue 6847 these two points:
• There must be a way to ensure that schedules are resilient to varying bounds; it's currently common to get a scheduler that will work for the "estimated" size, but will OOB on smaller/etc sizes. This is unacceptable for production work. (Adams2019 autoscheduler can produce schedules that aren't bounds-resilient #5070, Autoscheduled code doesn't work on buffers smaller than estimates #3953, Adams2019 autoscheduler generates incorrect code #4512)
• Consider whether/how to add support for specialize() to the autoscheduler. (Specializing the auto-schedule #3104)