Issue:
When using TFENV_AUTO_INSTALL
environment variable in a Terragrunt repository, concurrent installations of the many different Terraform versions trigger a race condition.
This results in an error where tfenv attempts to install many versions of Terraform concurrently in parallel pipeline jobs, leading to permission denied issues.
My code repo:
dev-account01
├── eu-west-1
│ ├── iam_roles
│ │ ├── .terraform-version
│ │ ├── main.tf
│ ├── networking
│ │ ├── .terraform-version
│ │ ├── main.tf
For each module a different terrform version
1.6.2 and 1.5.5
PS: in my actual setup I have many more regions and more modules and more accounts.
Error Message:
/home/user/.tfenv/lib/tfenv-exec.sh: line 43: /home/user/.tfenv/versions/1.6.2/terraform: Permission denied
/home/user/.tfenv/lib/tfenv-exec.sh: line 43: exec: /home/user/.tfenv/versions/1.6.2/terraform: cannot execute: Permission denied
Reproducible Scenario:
- Enable
TFENV_AUTO_INSTALL
in a Terragrunt repo. - Trigger pipeline with multiple jobs/plans that attempt to install many versions of Terraform not previously used.
Expected Behavior:
TFENV_AUTO_INSTALL
should handle concurrent installations gracefully or sequentially, avoiding race conditions and permission denied errors.
Or is there any way to serialize the installations of the different terraform versions present in my terraform module in each account?
EDIT:
example of solution:
#!/bin/bash
LOCK_FILE="/tmp/tfenv-wrapper.lock"
MAX_CONCURRENT_PROCESSES=1
# Function to acquire a lock
function acquire_lock() {
while true; do
exec 202>"$LOCK_FILE"
flock -n 202 && break
echo "Another instance of the script is already running. Waiting for it to complete."
sleep 5
done
}
# Function to release the lock
function release_lock() {
flock -u 202
rm -f "$LOCK_FILE"
}
# Function to check the number of running processes matching the pattern
function check_tfenv_processes() {
pgrep -f "tfenv install" | grep -v $$ | wc -l
}
# Infinite loop to keep the script running
while true; do
# Acquire the lock
acquire_lock
# Check the number of running processes
num_processes=$(check_tfenv_processes)
# If the number of running processes exceeds the limit, wait
while [ "$num_processes" -ge "$MAX_CONCURRENT_PROCESSES" ]; do
echo "Maximum number of concurrent 'tfenv install' processes reached. Waiting for processes to complete."
sleep 5
num_processes=$(check_tfenv_processes)
done
# Your script logic goes here
# Simulate some work
echo "Script is running..."
# Release the lock
release_lock
done
Current workspaces:
atlantis-git-test-0:/$ ls -l /atlantis-data/repos/orga/infra-test/4
total 24
drwx--S--- 5 atlantis atlantis 4096 Jan 8 10:00 default
drwx--S--- 5 atlantis atlantis 4096 Jan 8 10:00 environments_eks-dev-1_09_eks
drwx--S--- 5 atlantis atlantis 4096 Jan 8 10:00 environments_eks-dev-1_11_r53_zones
drwx--S--- 5 atlantis atlantis 4096 Jan 8 10:00 environments_eks-dev-1_13_irsa
drwx--S--- 5 atlantis atlantis 4096 Jan 8 10:00 environments_eks-dev-1_15_vault
drwx--S--- 5 atlantis atlantis 4096 Jan 8 10:00 environments_eks-staging-1_11_r53_zones
You might consider a wrapper script, to serialize the Terraform version installations, ensuring that only one version is installed (
tfenv install
) at a time, avoiding race conditions and permission issues.Run this script (
tfenv_serial_install.sh
) before executing Terragrunt commands in your pipeline.In that case, you could include a check for existing Terraform versions before attempting installation (
tfenv list | grep -q "$version"
). That should prevent redundant installations and reduce the likelihood of concurrent installation attempts.tfenv_serial_install.sh
would be:Yes, you can modify the approach to lock the
tfenv
process or system call, enabling on-demand installation of Terraform versions while preventing race conditions.Instead of pre-installing all versions, you can modify the approach to only lock and install a specific Terraform version when it is actually required by a job. That way, the installations are truly on-demand, without pre-installing versions that might not be needed.
Adjust the locking script to be used directly within each job that requires a specific Terraform version. The script will check if the required version is already installed and, if not, install it with a lock to prevent race conditions.
Modify your Atlantis configuration or pipeline scripts to call this script at the beginning of each job. The script should receive the required Terraform version as a parameter. That makes sure the version is installed only if it is not already available, right before it is needed.
For instance, in an Atlantis job, you would call the script like this:
That would avoid the need to scan for all versions beforehand.
The
tfenv_install_with_lock.sh
script uses a file lock (tfenv-install.lock
) to make sure serialized installation of Terraform versions. When multiple Atlantis workspaces are generated, each attempting to execute Terraform commands, you would get:Lock acquisition: Each workspace/job that needs to install a Terraform version will execute the
tfenv_install_with_lock.sh
script. The script attempts to acquire a lock on thetfenv-install.lock
file.Serialized installation:
flock -x
) until the lock becomes available.Parallel execution management: The use of the lock makes sure even when multiple workspaces are executed in parallel, any installation of Terraform versions is done sequentially. That prevents race conditions that could occur if multiple installations were attempted simultaneously.
Plus, each workspace checks for the required Terraform version and only attempts installation if it is not already present. That reduces redundant installations and makes the process efficient.
The script is designed to be integrated into the Atlantis workflow (
pre_plan
or similar stages).The setup assumes that each workspace/job can independently execute the script as part of its initialization or planning phase.
The script
tfenv_install_with_lock.sh
is designed to manage Terraform version installations in a way that avoids conflicts when multiple workspaces are operating concurrently.But it is important to understand the distinction between installations per workspace and installations per plan:
One installation per plan: The ideal scenario is to have Terraform versions installed only once per plan execution, regardless of the number of workspaces. That ensures efficiency and reduces redundant installations.
One installation per workspace: That scenario implies that each workspace, when initiating a plan, might attempt to install the Terraform version it requires. While the locking mechanism prevents simultaneous installations, it does not inherently reduce the total number of installations if each workspace separately determines the need for installation.
Given your setup and concern, the key is to make sure Terraform versions are installed only as needed for each plan, not redundantly across workspaces.
You might consider:
Centralized version management: Implement a mechanism to manage Terraform versions centrally before workspaces initiate their plans. That can be a script or process that runs once at the start of your pipeline and ensures all required Terraform versions are installed. That approach ensures one installation per plan.
Refined workspace-level installation: Modify the
tfenv_install_with_lock.sh
script to better track which versions have been installed during the current pipeline run. That could involve creating a record of installed versions and checking against this record before attempting an installation. That approach aims to reduce redundant installations at the workspace level.