Failover-Clustering feature install suddenly requires a reboot before Get-Cluster will work

153 Views Asked by At

I have a previous working terraform setup which creates a Windows VM in Azure and runs a bootstrap script via a custom script extension to install and configure SQL Server. This has been working fine for months, but suddenly (I assume after a specific Windows image update) it decides that installing the Failover-Clustering feature now requires a reboot, which breaks the script as there are steps that need to run after this feature is installed, i.e. running the PowerShell Get-Cluster command returns the following error:

ERROR: Unhandled exception caught: The term 'Get-Cluster' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.

If I manually restart the VM and then run the command again, it is now recognised.

So, what I want to know is, is there any way to handle this other than to create an intermediate image with Failover-Clustering installed, and then use that as the base image? I know I can't add a second custom script extension (why Microsoft, just, why?) and I'm struggling to think of an easier way to do this.

Expected Install-WindowsFeature Failover-Clustering -IncludeAllSubFeature -IncludeManagementTools not to require a reboot, like it use to.

1

There are 1 best solutions below

3
On

Expected Install-WindowsFeature Failover-Clustering -IncludeAllSubFeature -IncludeManagementTools not to require a reboot.

The Failover-Clustering feature requires a mandatory reboot after installation on the computer. Unfortunately, this reboot requirement cannot be avoided.

There is one possible workaround is to reboot the VM after completing the installation and run the remaining script with Terraform without manual reboot. The script includes a reboot step after installing the Failover-Clustering feature. You can use the azurerm_virtual_machine_extension resource in Terraform to add a custom script extension that reboots the VM after the feature is installed.

Terraform Code

    provider "azurerm" {
      features {}
    }
    
    resource "azurerm_resource_group" "example" {
      name     = "VM-resources1"
      location = "West Europe"
    }
    
    resource "azurerm_virtual_network" "example" {
      name                = "vm-network1"
      address_space       = ["10.0.0.0/16"]
      location            = azurerm_resource_group.example.location
      resource_group_name = azurerm_resource_group.example.name
    
    
    }
    
    resource "azurerm_subnet" "example" {
      name                 = "internal1"
      resource_group_name  = azurerm_resource_group.example.name
      virtual_network_name = azurerm_virtual_network.example.name
      address_prefixes     = ["10.0.2.0/24"]
    }
    
    resource "azurerm_network_interface" "example" {
      name                = "vm-nic1"
      location            = azurerm_resource_group.example.location
      resource_group_name = azurerm_resource_group.example.name
    
      ip_configuration {
        name                          = "internal1"
        subnet_id                     = azurerm_subnet.example.id
        private_ip_address_allocation = "Dynamic"
        public_ip_address_id          = azurerm_public_ip.example.id
      }
    }
    
    resource "azurerm_public_ip" "example" {
      name                = "vm-public-ip1"
      location            = azurerm_resource_group.example.location
      resource_group_name = azurerm_resource_group.example.name
      allocation_method   = "Dynamic"
    }
    
    # Create Network Security Group and rules
    resource "azurerm_network_security_group" "my_terraform_nsg" {
      name                = "vm-nsg1"
      location            = azurerm_resource_group.example.location
      resource_group_name = azurerm_resource_group.example.name
    
      security_rule {
        name                       = "Allow_All"
        priority                   = 101
        direction                  = "Inbound"
        access                     = "Allow"
        protocol                   = "*"
        source_port_range          = "*"
        destination_port_range     = "*"
        source_address_prefix      = "*"
        destination_address_prefix = "*"
      }
    }
    
    resource "azurerm_network_interface_security_group_association" "example" {
      network_interface_id      = azurerm_network_interface.example.id
      network_security_group_id = azurerm_network_security_group.my_terraform_nsg.id
    }
    resource "azurerm_windows_virtual_machine" "example" {
      name                = "cluster-vm"
      resource_group_name = azurerm_resource_group.example.name
      location            = azurerm_resource_group.example.location
      size                = "Standard_F2"
      admin_username      = "adminuser"
      admin_password      = "P@$$w0rd1234!"
      network_interface_ids = [
        azurerm_network_interface.example.id
      ]
    
      os_disk {
        caching              = "ReadWrite"
        storage_account_type = "Standard_LRS"
      }
    
      source_image_reference {
        publisher = "MicrosoftWindowsServer"
        offer     = "WindowsServer"
        sku       = "2016-Datacenter"
        version   = "latest"
      }
    }
    
    
    resource "azurerm_virtual_machine_extension" "test" {
      name                 = "CustomScripts"
      virtual_machine_id   = azurerm_windows_virtual_machine.example.id
      publisher            = "Microsoft.Compute"
      type                 = "CustomScriptExtension"
      type_handler_version = "1.9"
      settings = <<SETTINGS
        {
       "commandToExecute": "powershell Add-WindowsFeature Failover-Clustering -IncludeAllSubFeature -IncludeManagementTools"
        }
    SETTINGS
    }
    
    
    
    resource "null_resource" "powershell" {
      triggers = {
        # Add triggers if necessary
      }
      
      provisioner "local-exec" {
        command = <<-EOT
          pwsh -Command "Invoke-AzVMRunCommand -ResourceGroupName 'VM-resources1' -VMName 'cluster-vm' -CommandId 'RunPowerShellScript' -ScriptPath '/home/venkat/VM/additional_script.ps1'"
        EOT
      }
    }

additional_script.ps1

    shutdown /r
    
    Write-Host 'Waiting for VM to restart...'
    
    Start-Sleep -Seconds 150
    
    Write-Host 'VM has restarted. Resuming script execution...'
    
    Get-Cluster

Terraform apply

The code executed the Failover-Clustering feature installation and ran additional commands from the script.

enter image description here