I hope someone can help me to solve this issue.
I am trying to deploy an Azure kubernetes cluster with a virtual node, the terraform apply runs and provisions all the nodepools correctly but the aci connector pod throws an CrashLoopBackOff with the following error:
level=fatal msg="error creating provider: error setting up network: error while looking up subnet: GET 403: 403 Forbidden\nERROR CODE: AuthorizationFailed\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"AuthorizationFailed\",\n \"message\": \"The client does not have authorization to perform action 'Microsoft.Network/virtualNetworks/subnets/read' over scope /aks-dev-vnet/subnets/aks-dev-aci' or the scope is invalid. If access was recently granted, please refresh your credentials.\"\n }\n}\n--------------------------------------------------------------------------------\n"
i do have the following configurations in my terraform:
Virtual network and subnets:
resource "azurerm_virtual_network" "aks_virtual_network" {
address_space = [""]
location = var.location
name = "aks-${var.environment}-${var.region}-vnet"
resource_group_name = azurerm_resource_group.kubernetes_resource_group.name
}
# System Pool Subnet
resource "azurerm_subnet" "system_pool_subnet" {
address_prefixes = [""]
name = "aks-${var.environment}-${var.region}-system-pool"
resource_group_name = azurerm_resource_group.kubernetes_resource_group.name
virtual_network_name = azurerm_virtual_network.aks_virtual_network.name
}
# System Pool Subnet
resource "azurerm_subnet" "linux_pool_subnet" {
address_prefixes = [""]
name = "aks-${var.environment}-${var.region}-linux-pool"
resource_group_name = azurerm_resource_group.kubernetes_resource_group.name
virtual_network_name = azurerm_virtual_network.aks_virtual_network.name
}
resource "azurerm_subnet" "aci_subnet" {
address_prefixes = [""]
name = "aks-${var.environment}-${var.region}-aci"
resource_group_name = azurerm_resource_group.kubernetes_resource_group.name
virtual_network_name = azurerm_virtual_network.aks_virtual_network.name
delegation {
name = "aci-subnet"
service_delegation {
name = "Microsoft.ContainerInstance/containerGroups"
actions = ["Microsoft.Network/virtualNetworks/subnets/action"]
}
}
}
Service principal configurations:
resource "azuread_application" "aks_application" {
display_name = "azure-app-${var.environment}-${var.region}-principle"
}
resource "azuread_service_principal" "aks_service_principle" {
application_id = azuread_application.aks_application.application_id
}
resource "azuread_service_principal_password" "main" {
service_principal_id = azuread_service_principal.aks_service_principle.id
}
# Grant AKS cluster access to join AKS subnet
resource "azurerm_role_assignment" "linux_subnet_aci" {
scope = azurerm_subnet.linux_pool_subnet.id
role_definition_name = "Network Contributor"
principal_id = azuread_service_principal.aks_service_principle.id
}
# Grant AKS cluster access to join AKS subnet
resource "azurerm_role_assignment" "system_subnet_aci" {
scope = azurerm_subnet.system_pool_subnet.id
role_definition_name = "Network Contributor"
principal_id = azuread_service_principal.aks_service_principle.id
}
resource "azurerm_role_assignment" "aci_subnet" {
scope = azurerm_subnet.aci_subnet.id
role_definition_name = "Network Contributor"
principal_id = azuread_service_principal.aks_service_principle.id
}
and my cluster configuration:
default_node_pool {
name = "systempool"
vm_size = "Standard_D2_V2"
orchestrator_version = data.azurerm_kubernetes_service_versions.current.latest_version
zones = [1]
enable_auto_scaling = true
min_count = 1
max_count = 2
os_disk_size_gb = 64
type = "VirtualMachineScaleSets"
vnet_subnet_id = azurerm_subnet.system_pool_subnet.id
node_labels = {
"nodepool-type" = "System"
"environment" = "${var.environment}"
"nodepoolOS" = "linux"
"app" = "System"
"nodepool" = "systempool00"
}
upgrade_settings {
max_surge = "10%"
}
}
identity {
type = "UserAssigned"
identity_ids = [data.azurerm_user_assigned_identity.user_assigned.id]
}
kubelet_identity {
client_id = data.azurerm_user_assigned_identity.user_assigned.client_id
object_id = data.azurerm_user_assigned_identity.user_assigned.principal_id
user_assigned_identity_id = data.azurerm_user_assigned_identity.user_assigned.id
}
oidc_issuer_enabled = true
# Add On Profile
aci_connector_linux {
subnet_name = azurerm_subnet.aci_subnet.name
}
At this point i know that i should user the service principle in the cluster configuration:
# service_principal {
# client_id = azuread_application.aks_application.application_id
# client_secret = azuread_service_principal_password.main.value
# }
but when i try to enable it, i get the error that only one identity can be enabled (identity or service principal). In my specific case i need the identity to be configured for dns permission for another bit of the platform.
when i get the error in the aci connector. The object id that is referencing, is the managed identity that get automatically created, and the issue is that that managed identity doesnt have permission to perform actions on the subnet.
I am really confused at this point and i cannot think about any workaround this configurations to grant permission to the aci managed identity and also have my identity enabled.
Can anyone shade some light here please?..if my query is not 100% clear just let me know and i can provide more informations.
thank you so much for any help you can provide.
The issue with the ACI connector in your Azure Kubernetes Service (AKS) deployment stems from the managed identity's permissions to access the subnet. It appears you're using a user-assigned managed identity for the AKS cluster and are trying to apply role assignments to provide the required permissions for network activities. Nevertheless, the ACI connector needs distinct permissions to engage with the designated subnet, and the error suggests that these permissions have not been correctly established.
I tried a updated terraform configuration with changes as per the requirement and I was able to provision the requirement successfully.
My Terraform configuration:
deployment successfull: