Default (nested) dataclass initialization in hydra when no arguments are provided

635 Views Asked by At

I have the following code, using the hydra framework

# dummy_hydra.py

from dataclasses import dataclass

import hydra
from hydra.core.config_store import ConfigStore
from omegaconf import DictConfig, OmegaConf


@dataclass
class Foo:
    x: int = 0
    y: int = 1


@dataclass
class Bar:
    a: int = 0
    b: int = 1


@dataclass
class FooBar:
    foo: Foo
    bar: Bar


cs = ConfigStore.instance()
cs.store(name="config_schema", node=FooBar)


@hydra.main(config_name="dummy_config", config_path=".", version_base=None)
def main(config: DictConfig):
    config_obj: FooBar = OmegaConf.to_object(config)
    print(config_obj)


if __name__ == '__main__':
    main()

(This is a simplified code of my actual use case, of course)

As you can see, I have a nested dataclass - the FooBar class contains instances of Foo and Bar. Both Foo and Bar have default attribute values. Hence, I thought I can define a yaml file that does not necessarily initializes Foo and/or Bar. Here's the file I use:

# dummy_config.yaml
defaults:
  - config_schema
  - _self_

foo:
  x: 123
  y: 456

When I run this code, surprisingly (?) it does not initialize Bar (which is not mentioned in the yaml config file), but throws an error:

omegaconf.errors.MissingMandatoryValue: Structured config of type `FooBar` has missing mandatory value: bar
    full_key: bar
    object_type=FooBar

What's the proper way to use this class structure such that I don't need to explicitly initialize classes with non-mandatory fields (such as Bar)?

2

There are 2 best solutions below

1
Omry Yadan On BEST ANSWER

Uninitialized values in dataclasses are considered missing. This semantic is unique to OmegaConf (the underlying config library powering Hydra) and accessing those fields will result in the MissingMandatoryValue exception when you access the field. You can use OmegaConf.is_missing(cfg, "bar") to determine if the field is missing without triggering the exception.

In pure YAML config, you can achieve this behavior by using the value ??? in your config file. In Structured Configs (dataclasses) you can achieve it explicitly by assigning OmegaConf.MISSING to a field.

It is not clear from your question what you want in the bar field. If it's None, you can convert change the signature of your dataclass to something like:

@dataclass
class FooBar:
    foo: Optional[Foo] = None
    bar: Optional[Bar] = None

If you want to have foo and bar initialized to their default values, this just assign Foo() and Bar() respectively. I saw in another comment that you are concerned that the instance will be shared. This is not the case. The config is converted to OmegaConf DictConfig in any case before you convert it to an object. Try and see.


@dataclass
class Foo:
    x: int = 0
    y: int = 1


@dataclass
class Bar:
    a: int = 0
    b: int = 1
    f: Foo = Foo()


@dataclass
class FooBar:
    foo: Foo = Foo()
    bar1: Bar = Bar()
    bar2: Bar = Bar()


cs = ConfigStore.instance()
cs.store(name="config_schema", node=FooBar)


@hydra.main(config_name="dummy_config", config_path=".", version_base=None)
def main(config: DictConfig):
    config_obj: FooBar = OmegaConf.to_object(config)
    config_obj.foo.x = 100
    config_obj.bar1.f.x = 200
    config_obj.bar2.f.x = 300
    print(config_obj)
    # FooBar(foo=Foo(x=100, y=456), bar1=Bar(a=0, b=1, f=Foo(x=200, y=1)), bar2=Bar(a=0, b=1, f=Foo(x=300, y=1)))
6
Matteo Zanoni On

The FooBar class has no default for either the foo or bar attributes, this is my guess as why you are seeing that error.

You could provide a default using the default_factory:

from dataclasses import field

...

@dataclass
class FooBar:
    foo: Foo = field(default_factory=Foo)
    bar: Bar = field(default_factory=Bar)

...