How do I dump yaml without evaluating anchor values?

162 Views Asked by At

I have a yaml file as

anchor1: &anchor1
  resource_class: small

anchor2: &anchor2
  hello: world

anchor3: &anchor3
  hello: world

root1:
  nested1:
    <<: *anchor1
    some_list:
      - item1:
          hello: world
      - *anchor2
      - *anchor3

  nested2:
    <<: *anchor1
    some_list:
      - item1:
          hello: world
      - *anchor2
      - *anchor3

  nested3:
    <<: *anchor1
    some_list:
      - item1:
          hello: world
      - *anchor2
      - *anchor3


root2:
  nested1:
    <<: *anchor1
    some_list:
      - item1:
          hello: world2
      - *anchor2
      - *anchor3

...

I want to pull out the value of nested1 into a separate file without evaluating all the anchors.

import ruamel.yaml.YAML

yaml = YAML()
with open(Path('in_file')) as f:
    data = yaml.load(f)

with open(Path('out_file'), 'w') as f:
    yaml.dump(data['root1']['nested1'], f)

The output I want when dumping is

<<: *anchor1
some_list:
  - item1:
      hello: world
  - *anchor2
  - *anchor3

I understand it is invalid yaml, as the anchor definitions are not present.

The main problem I run into, is that the moment I grab a value from the root config, it has already been processed.

For example, if I load and dump my in_file, it works as expected, but if I take the data and get a value out, data['root1'], it has already processed the anchors.

I suspect that's because the anchor definitions are not part of data['root1'] but I'm not sure how to work around that.

1

There are 1 best solutions below

6
Anthon On BEST ANSWER

If you are working with files containing YAML documents, use the officially recommended extension for such files, which has been .yaml since at least September 2006.

Then you should consider using pathlib.Path() instances for files when loading instead of providing a stream:

data = yaml.load(Path('in_file.yaml')

resp. dumping:

yaml.dump(data, Path('out_file.yaml')

(although that output might be considered not to be a file containing a YAML document). Your original use of yaml.dump() is not going to work as you opened the file for reading only, and your updated version opens the output for 'w', but yaml.dump() dumps an (UTF-8) binary stream (so use 'wb')


Although it is possible to hook into the representer to skip output until a certain point, it is much more easy to do the selection post-processing using the transform parameter of the dump method:

import ruamel.yaml
from pathlib import Path

in_file = Path('in_file.yaml')
out_file = Path('out_file.yaml')

class SelectKey:
    """this assumes mappings for all levels of keys, mappings indented by indent spaces"""
    def __init__(self, *keys, indent=2):
        self.keys = keys
        self.indent = indent

    def __call__(self, s):
        """
        s will contain the full YAML output, process it line by line to find key
        """
        processing = [False] * len(self.keys)
        result = ""
        level = 0
        for line in s.splitlines(True):
            dedented_line = line[level*self.indent:]
            if processing[level]:
                if not line.startswith(' ' * (self.indent * level)):
                    break
                dedented_line = dedented_line[self.indent:]  # the values
                if line and line[0] not in ' \n':
                    break
                result += dedented_line
            else:
                key = self.keys[level]
                if dedented_line.startswith(key) and dedented_line[len(key)] in ' :':
                     processing[level] = True
                     if level + 1 == len(self.keys):
                         pass  # we don't want the key itself, only its value
                     else:
                         level += 1
        return result.rstrip() + '\n' # remove potential empty lines
    
yaml = ruamel.yaml.YAML()
yaml.indent(sequence=4, offset=2)
data = yaml.load(in_file)
yaml.dump(data, out_file, transform=SelectKey('root1', 'nested1'))
print(out_file.read_text(), end='')

which gives:

<<: *anchor1
some_list:
  - item1:
      hello: world
  - *anchor2
  - *anchor3

You need to call yaml.indent() to get your non-standard sequence indentation. As the selection is based on the "path" of keys to a value, you won't get just any value for a key nested1.