Azure DevOps Multiple Build pipeline under single Repo Drain the build server memory when using Git

2.8k Views Asked by At

I have more than 20 solutions under single repository. Even I added path filter under the trigger, each build pipeline checkout save entire repository under build agents _work/x/s folder. It is draining the server memory. Anybody could help me in this situation to apply a path filter for the checkout or each build pipeline should refer same source. I'm using Git pipeline.

3

There are 3 best solutions below

0
On

In Azure DevOps, we don't have option to get only part of the repository, but there is a workaround: Disable the "Get sources" step and get only the source you want by manually executing the according git commands in a script.

a. Disable "Get sources"

- checkout: none

b. In the pipeline add task cmd or PowerShell to get the sources manually with git sparse-checkout. For example, get only the directories src_1 and src_2 within the test folder

parameters:
  access: '{personal access token}'
  repository: '{dev.azure.com/organisation/project/_git/repository}'
  sourcePath: '{path/to/files/}'

- task: CmdLine@2
      inputs:
        script: |
          ECHO ##[command] git init
          git init
          ECHO ##[command] git sparse-checkout: ${{ parameters.sourcePath }}
          git config core.sparsecheckout true
          echo ${{ parameters.sourcePath }} >> .git/info/sparse-checkout
          ECHO ##[command] git remote add origin https://${{ parameters.repository }}
          git remote add origin https://${{ parameters.access }}@${{ parameters.repository }}
          ECHO ##[command] git fetch --progress --verbose --depth=1 origin master
          git fetch --progress --verbose --depth=1 origin master
          ECHO ##[command] git pull --progress --verbose origin master
          git pull --progress --verbose origin master

You can refer to this ticket to get more details

0
On

Note : you have not provided your YAML file for reference, so making some standard assumptions that you are using a simple YAML schema.

multiple solutions in a single repo, so a classic mono repo setup. you are looking at something like this.

- script: npm install
  workingDirectory: service-b/

Change your working directory at the beginning of the CICD Pipeline code in your YAML. Once the working directory, rest of it goes as usual with your existing YAML

Note a few things.

  • it is possible to setup triggers based on specific folder changes. So, if you have

Repository Root --ProjectOne --ProjectTwo . . . --ProjectThree

It is possible to setup triggers for each folder, and then, in your YAML, change the working Directory, and accordingly do your business.

  • Further, for deployment, if you are new, i would simply build a new pipeline for each project to make it easy.

  • Or, if you really want to have it all in one file, you can add conditions and route the entire mono repo in a single YAML. Depending on a specific variable (for example, a project name), you can route the deployment to your desired target.

more details at the following locations.

0
On

Assuming with Memory, you mean the disk space, I can see how 20 solutions, with their own pipeline, each with their own working folder can cause it to consume a lot of disk space. Each pipeline gets it's own workspace folder, this is to allow them to change independently, yet give them the greatest speed. In your case each pipeline accesses the same repo, but each may have different repo settings. This way the agent doesn't have to care about that.

There are a few options, some already mentioned by others.

  1. Use shallow clones. You can configure the pipeline to only fetch up to X amount of commits of history. This may greatly reduce the working folder size for each pipeline, but may slow down your builds a bit if you build many different branches. You can enable shallow clones by setting the fetchDepth a small number. If you depend on GitVersion to calculate a build number, or any other tasks that depend on the repo history, they may break when setting fetchDepth too low.

    steps:
    - checkout: self | none | repository name # self represents the repo where the initial Pipelines YAML file was found
      clean: boolean  # if true, run `execute git clean -ffdx && git reset --hard HEAD` before fetching
      fetchDepth: number  # the depth of commits to ask Git to fetch; defaults to no limit
      lfs: boolean  # whether to download Git-LFS files; defaults to false
      submodules: true | recursive  # set to 'true' for a single level of submodules or 'recursive' to get submodules of submodules; defaults to not checking out submodules
      path: string  # path to check out source code, relative to the agent's build directory (e.g. \_work\1); defaults to a directory called `s`
      persistCredentials: boolean  # if 'true', leave the OAuth token in the Git config after the initial fetch; defaults to false
    
  2. Clean-up at the end of each build. You could add a script step to the end of the pipeline to clean up your local working folders. A simple script step with condition: always() should do the trick. That way you can remove large build outputs from the drive when the build is done. You could use a YAML template to auto-inject this task into every job/pipeline.

  3. Take control over the checkout process As others mentioned, you can tell Pipelines to not checkout the repo, that way you can take control of that process yourself. You have a few options here:

        - checkout: none
    

    a. worktree, this command will allow your build pipelines to share their .git folder. Instead of cloning the repo each time, you clone it once, then each pipeline creates a new worktree in the $(build.sourcesdirectory). This can be a massive space saver, yet keeping all the history on the build agent for fast branch switching and support for tools like gitversion. Follow the general steps from Vitu Liu.

    b. sparse-checkout this command will allow you to configure which folders to check out in your local working directory. The git commandline client is clever enough to also fetch only the data needed for those folders. Follow the steps from Vitu Liu.

    c. Use a custom checkout task. This custom checkout task will use a single folder on the agent and uses a symlink to make sure each pipeline uses the same repo under the hood. It doesn't seem to be released to the marketplace though, so you'll have to install if using tfx-cli.

    cd drive:\path\to\extracted\zip
    tfx login
    tfx build tasks upload --task-path .
    
  4. Create a single pipeline and include parameterized YAML files. By creating a single pipeline, the agent will create a single working folder. Then, based on the files that were changed, execute the pipeline files of the solutions you want to build. See the Parameters to select at runtime docs. Other answers have already explained how to use the git command line to iterate the files that have changed and setting a variable based on their outcome.

    steps:
    - ${{ if eq(parameters.experimentalTemplate, true) }}:
      - template: experimental.yml
    - ${{ if not(eq(parameters.experimentalTemplate, true)) }}:
      - template: stable.yml
    
  5. Use the new Scale-set agent. Azure Pipelines now offers a way to setup your own agent in the same way the hosted pipelines run. One of the advantages is that you control when an image is reset. Basically allowing you to have a clean image every build with all your dependencies pre-installed or your npm/nuget caches pre-populated. The low-cost parallelization features are probably also very useful to your scenario.

  6. Configure maintenance, You can enable a cleanup job on the agent pool. It will queue a special job to clean up the working folders, old tasks, temporary files etc. It may help in case not all 20 solutions run on the same day. You can find this option on the account level.

    Enable maintenance job on agent pool at the account level.

There is an open issue on the Azure Pipelines Agent repo.