We have a code generator that takes random seed as an input. If no seed specified, it will randomly pick a random seed, which means the outcome is not deterministic:
# generated_code1.h and generated_code2.h are almost always different
my-code-gen -o generated_code1.h
my-code-gen -o generated_code2.h
On the other hand,
# generated_code3.h and generated_code4.h are always the same
my-code-gen --seed 1234 -o generated_code3.h
my-code-gen --seed 1234 -o generated_code4.h
Our first attempt to create a target for the generated code was:
genrule(
name = "generated_code",
srcs = [],
outs = ["generated_code.h"],
cmd = "my-code-gen -o $@", # Notice that seed not specified
)
However, we think this breaks the hermeticity of targets depending on :generated_code.
So we ended up implementing a customized rule and use build_setting (i.e. configuration) to configure the seed for the invocation of my-code-gen.
This makes it possible to specify the seed from CLI to any targets that depends on the generated code, e.g.
bazel build :generated_code --//:code-gen-seed=1234
bazel build :binary --//:code-gen-seed=1234
My questions are:
- Consider the
genruledefinition above, it is callingmy-code-genwithout--seedwhich results in non-deterministic output. Does that mean non-hermetic? What is the cost of breaking hermeticity? (e.g. what trouble would it cause in the future?) - I've found
--action_envas an alternative tobuild_setting, which also allow us to pass a seed value from CLI tomy-code-gen. Compared tobuild_setting, what is the preferred approach in our case?
Yes, it's non-hermetic. To be more precise, this is non-determinism, which is a symptom of a non-hermetic build, because the PRNG isn't seeded with a statically known value to the build system. A common other cause of non-determinism is embedding timestamps in build outputs.
Bazel defines hermeticity as:
The biggest problem is breaking cacheability of everything that depends on the genrule, because you can no longer trust/guarantee that given a cache key (i.e. hashes of the genrule's inputs, command, environment), the output will be identical and reproducible across build invocations.
This has costs ranging from
The
//:code-gen-seedbuild setting only affects targets that depend on it, but--action_envaffects every action. Changes to the build setting would only invalidate the minimal set of targets, and causing minimal re-analysis, cache lookups, and rebuilds, and is thus preferred. You can experiment with this by comparing incremental build speeds with more targets that don't depend on//:code-gen-seed.