Deleting content from a YAML file using Python while retaining the original structure

61 Views Asked by At

I have a YAML file. I want to use my script to repo all "repository" instances that are not contained in a list of strings I have defined. My script:

import yaml

core_repos = ["REPO1",
              "REPO2"]

if __name__ == "__main__":
    yml_file_name = "azure-pipelines.yml"
    with open(yml_file_name, 'r') as yml_file:
        yml_content = yaml.safe_load(yml_file)
    repositories = yml_content.get("resources", {}).get("repositories", [])
    filtered_repositories = [repo for repo in repositories if repo.get("repository") in core_repos]
    yml_content["resources"]["repositories"] = filtered_repositories

    with open(yml_file_name, 'w') as f:
        yaml.safe_dump(yml_content, f, default_flow_style=False)

The original:

trigger:
  - release/test

pool:
  name: <REDACTED>-Linux
  demands:
    - agent.name -equals  <REDACTED>

# Overrides the value for Build.BuildNumber, which is used to name the artifact (ZIP file) that is produced
name: '$(Date:yyyyMMdd)T$(Hours)$(Minutes)$(Seconds)'

resources:
  repositories:
    - repository: REPO1
      type: git
      ref: release/test
      name: <REDACTED>/REPO1
      trigger:
        branches:
          include:
            - release/test

    - repository: REPO2
      type: git
      ref: release/test
      name: <REDACTED>/REPO2
      trigger:
        branches:
          include:
            - release/test

    - repository: REPO3
      type: git
      ref: release/test
      name: <REDACTED>/REPO3
      trigger:
        branches:
          include:
            - release/test

    - repository: REPO4
      type: git
      ref: release/test
      name: <REDACTED>/REPO4
      trigger:
        branches:
          include:
            - release/test

stages:
  - stage: 'BuildAndUploadArtifact'
    jobs:
      - job:
        workspace:
          clean: all
        steps:
          - checkout: self
          # Core repos
          - checkout: REPO1
          - checkout: REPO2
          - checkout: REPO3
          - checkout: REPO4

After running the script my main goal seems to have been accomplished, but the output otherwise looks very wrong in several cases. The trigger ended up at the bottom and my comment is missing completely, to name a few things. What is causing this?

name: $(Date:yyyyMMdd)T$(Hours)$(Minutes)$(Seconds)
pool:
  demands:
  - agent.name -equals  <REDACTED>
  name: <REDACTED>-Linux
resources:
  repositories:
  - name: <REDACTED>/REPO1
    ref: release/test
    repository: REPO1
    trigger:
      branches:
        include:
        - release/test
    type: git
  - name: <REDACTED>/REPO2
    ref: release/test
    repository: REPO2
    trigger:
      branches:
        include:
        - release/test
    type: git
stages:
- jobs:
  - job: null
    steps:
    - checkout: self
    - checkout: REPO1
    - checkout: REPO2
    - checkout: REPO3
    - checkout: REPO4
    workspace:
      clean: all
  stage: BuildAndUploadArtifact
trigger:
- release/test
2

There are 2 best solutions below

3
Rattletrap On BEST ANSWER

I fixed it myself, this works using ruamel.yaml:

from ruamel.yaml import YAML

core_repos = ["REPO1", "REPO2"]

if __name__ == "__main__":
    yml_file_name = "azure-pipelines.yml"

    yaml = YAML()
    yaml.preserve_quotes = True
    with open(yml_file_name, 'rb') as yml_file:
        yml_content = yaml.load(yml_file)

        repositories = yml_content.get("resources", {}).get("repositories", [])
        filtered_repositories = [repo for repo in repositories if repo.get("repository") in core_repos]
        yml_content["resources"]["repositories"] = filtered_repositories

    with open(yml_file_name, 'wb') as f:
        yaml.dump(yml_content, f)
0
Lourenço Monteiro Rodrigues On

As you are loading the original file, processing the resulting YAML object and then dumping it to a file with the same name, you should not expect a simple file edition, but simply a new YAML file that represents the structure you want, but not necessarily with the same formatting as the original file.

In your case, all keys end up ordered alphabetically, and the comment is ignored by the loading method, so it does not exist in the YAML object, and neither exists in the resulting file.

If you really want to make sure that YAML editions behave as if you were manually editing the file, you would need to treat the file as a regular txt and filter out the lines you want to remove. But that is a lot more regex and logic