Why is generic instantiation syntax disallowed in Hack?

554 Views Asked by At

From the docs:

Note: HHVM allows syntax such as $x = Vector<int>{5,10};, but Hack disallows the syntax in this situation, instead opting to infer it.

Is there a specific reason for this? Isn't this a violation of the fail-fast rule?

There are some situations in which this would cause error to be deffered, which in turn leads to harder backtracing.

For example:

<?hh // strict
function main() : void {
    $myVector = new Vector([]); // no generic syntax
    $myVector->addAll(require 'some_external_source.php');
}

The above code causes no errors until it is used in a context where the statically-typed collection is actually in place:

class Foo
{
    public ?Vector<int> $v;
}

$f = new Foo();
$f->v = $myVector;

Now there is an error if the vector contains something else then int. But one must trace back the error to the point where the flawed data was actually imported. This would not be necessary if one could instantiate the vector using generic syntax in the first place:

$myVector = new Vector<int>([]);
$myVector->addAll(require 'some_external_source.php'); // fail immediately
1

There are 1 best solutions below

4
On BEST ANSWER

I work on the Hack type system and typechecker at Facebook. This question has been asked a few times internally at FB, and it's good to have a nice, externally-visible place to have an answer to it written down.

So first of all, your question is premised on the following code:

<?hh // strict
function main() : void {
    $myVector = new Vector([]); // no generic syntax
    $myVector->addAll(require 'some_external_source.php');
}

However, that code does not pass the typechecker due to the usage of require outside toplevel, and so any result of actually executing it on HHVM is undefined behavior, rendering this whole discussion moot for that code.

But it's still a legitimate question for other potential pieces of code that do actually typecheck, so let me go ahead and actually answer it. :)

The reason that it's unsupported is because the typechecker is actually able to infer the generic correctly, unlike many other languages, and so we made the judgement call that the syntax would get in the way, and decided to disallow it. It turns out that if you just don't worry about, we'll infer it right, and still give useful type errors. You can certainly come up with contrived code that doesn't "fail fast" in the way you want, but it's, well, contrived. Take for example this fixup of your example:

<?hh // strict
function main(): void {
  $myVector = Vector {}; // I intend this to be a Vector<int>
  $myVector[] = 0;
  $myVector[] = 'oops'; // Oops! Now it's inferred to be a Vector<mixed>
}

You might argue that this is bad, because you intended to have a Vector<int> but actually have a Vector<mixed> with no type error; you would have liked to be able to express this when creating it, so that adding 'oops' into it would cause such an error.. But there is no type error only because you never actually tried to use $myVector! If you tried to pull out any of its values, or return it from the function, you'd get some sort of type compatibility error. For example:

<?hh // strict
function main(): Vector<int> {
  $myVector = Vector {}; // I intend this to be a Vector<int>
  $myVector[] = 0;
  $myVector[] = 'oops'; // Oops! Now it's inferred to be a Vector<mixed>
  return $myVector; // Type error!
}

The return statement will cause a type error, saying that the 'oops' is a string, incompatible with the int return type annotation -- exactly what you wanted. So the inference is good, it works, and you don't ever actually need to explicitly annotate the type of locals.

But why shouldn't you be able to if you really want? Because annotating only generics when instantiating new objects isn't really the right feature here. The core of what you're getting at with "but occasionally I really want to annotate Vector<int> {}" is actually "but occasionally I really want to annotate locals". So the right language feature is not to let you write $x = Vector<int> {}; but let you explicitly declare variables and write Vector<int> $x = Vector {}; -- which also allows things like int $x = 42;. Adding explicit variable declarations to the language is a much more general, reasonable addition than just annotating generics at object instantiation. (It's however not a feature being actively worked on, nor can I see it being such in the near to medium term future, so don't get your hopes up now. But leaving the option open is why we made this decision.)

Furthermore, allowing either of these syntaxes would be actively misleading at this point in time. Generics are only enforced by the static typechecker and are erased by the runtime. This means that if you get untyped values from PHP or Hack partial mode code, the runtime cannot possibly check the real type of the generic. Noting that untyped values are "trust the programmer" and so you can do anything with them in the static typechecker too, consider the following code, which includes the hypothetical syntax you propose:

<?hh // partial
function get_foo() /* unannotated */ {
  return 'not an int';
}

<?hh // strict
function f(): void {
  $v = Vector<int> {};
  $v[] = 1; // OK
  // $v[] = 'whoops'; // Error since explicitly annotated as Vector<int>

  // No error from static typechecker since get_foo is unannotated
  // No error from runtime since generics are erased
  $v[] = get_foo();
}

Of course, you can't have unannotated values in 100% strict mode code, but we have to think about how it interacts with all potential usages, including untyped code in partial mode or even PHP.