Chef Client Not Resuming After Restart

1.8k Views Asked by At

I ran the following recipe

O:\chef\cookbooks\wincfg>chef-client -L C:\chef\rds_deployment.log -l info -z -o wincfg::rds_deployment

The server reboots as expected after installing a Windows feature

I see the last lines of my log file say:

[2016-04-17T01:43:51+00:00] INFO: powershell_script[Desktop-Experience] ran successfully
[2016-04-17T01:43:51+00:00] INFO: powershell_script[Desktop-Experience] sending reboot_now action to reboot[reboot] (immediate)
[2016-04-17T01:43:51+00:00] INFO: Processing reboot[reboot] action reboot_now (wincfg::rds_deployment line 6)
[2016-04-17T01:43:51+00:00] WARN: Rebooting system immediately, requested by 'reboot'
[2016-04-17T01:43:51+00:00] INFO: Changing reboot status from {} to {:delay_mins=>0, :reason=>"There is a pending reboot.", :timestamp=>2016-04-17 01:43:51 +0000, :requested_by=>"reboot"}
[2016-04-17T01:43:51+00:00] WARN: Skipping final node save because override_runlist was given
[2016-04-17T01:43:51+00:00] INFO: Chef Run complete in 90.479509 seconds
[2016-04-17T01:43:51+00:00] INFO: Skipping removal of unused files from the cache
[2016-04-17T01:43:51+00:00] INFO: Running report handlers
[2016-04-17T01:43:51+00:00] INFO: Report handlers complete
[2016-04-17T01:43:51+00:00] WARN: Rebooting server at a recipe's request. Details: {:delay_mins=>0, :reason=>"There is a pending reboot.", :timestamp=>2016-04-17 01:43:51 +0000, :requested_by=>"reboot"}

The part of the recipe in question is:

reboot "reboot" do
  action :nothing
  reason 'There is a pending reboot.'
  only_if { reboot_pending? }
end

%w{ Desktop-Experience 
  Remote-Desktop-Services 
  RDS-RD-Server 
  RDS-Connection-Broker 
  RDS-Web-Access 
  RDS-Licensing 
  RDS-Gateway }.each do |feature|
  powershell_script "#{feature}" do
    code <<-EOH
    Import-Module ServerManager
    Add-WindowsFeature #{feature}
    EOH
    not_if "Import-Module ServerManager; (Get-WindowsFeature -Name #{feature}).Installed -eq $true"
    notifies :reboot_now, 'reboot[reboot]', :immediately
  end
end

I would expect for each of the features in the recipe, it would install using Add-WindowsFeature, if not already installed, then reboot immediately if reboot_pending is true.

It seems that the reboot is happening, but then the recipe isn't picking up with the next feature (after Desktop-Experience).

UPDATE: Here is how I'm installing Chef (on a brand new out of the box EC2 image running Server 2012 R2 Base), the Chef Windows service, and the Chef DK:

powershell -NoProfile -ExecutionPolicy Bypass ". { iwr -useb https://omnitruck.chef.io/install.ps1 } | iex; install; cd C:\opscode\chef\bin\; cmd /c chef-service-manager -a install; cmd /c chef-service-manager -a start"

powershell -NoProfile -ExecutionPolicy Bypass ". { iwr -useb https://omnitruck.chef.io/install.ps1 } | iex; install -project chefdk"

Immediately after install, I run

net use O: \\fileserver\share
O:
cd chef\cookbooks\wincfg
berks vendor ..\..\cookbooks
chef-client -L C:\chef\rds_deployment.log -l info -z -o wincfg::rds_deployment

UPDATE 2:

I saw [2016-04-17T01:43:51+00:00] WARN: Skipping final node save because override_runlist was given

in the logs...so instead of specifying the run list with -o, I am now specifying it with -r. This warning no longer appears in the logs (and I see a TON more info in nodes\thehost.json)...but it still doesn't resume after reboots correctly :(

I do see the following in the Application Event Viewer following restart:

Failed Chef Client run UNKNOWN in UNKNOWN seconds.
 Exception type: Chef::Exceptions::PrivateKeyMissing
 Exception message: I cannot read C:\chef\validation.pem, which you told me to use to sign requests!
 Exception backtrace: C:/opscode/chef/embedded/lib/ruby/gems/2.1.0/gems/chef-12.9.38-universal-mingw32/lib/chef/http/authenticator.rb:86:in `rescue in load_signing_key'
C:/opscode/chef/embedded/lib/ruby/gems/2.1.0/gems/chef-12.9.38-universal-mingw32/lib/chef/http/authenticator.rb:76:in `load_signing_key'

I love a good adventure through (lack of) documentation.

I ALMOST got it working

  • making sure the chef_repo path is available at all times (not a network drive)
  • making a client.rb file in C:\chef\ that indicated to run the chef-client always in zero client mode (not just when manually invoked by me from the command line)

So, my new artifacts look like

C:\chef\client.rb

log_level :info
log_location 'C:\chef\client.log'
chef_server_url 'https://localhost:4000'
validation_client_name 'chef-validator'
chef_zero.enabled true
chef_zero.port 4000
local_mode true
cookbook_path ['C:\chef_repo\cookbooks']

\ops01\ops\chef\bootstrap.bat:

mklink C:\chef_repo %~dp0 /d
powershell -NoProfile -ExecutionPolicy Bypass ". { iwr -useb https://omnitruck.chef.io/install.ps1 } | iex; install"
C:
cd \opscode\chef\bin\
copy %~dp0client.rb C:\chef\ /y
call chef-service-manager -a install
call chef-service-manager -a start

key parts are bootstrapping the client.rb and making sure the link is available at all times since the client.rb doesn't support unc/smb paths.

The chef-client Windows service now seems to automatically pick up runs correctly on reboots....BUT when it does, it doesn't trigger the reboot itself. Instead it logs

[2016-04-18T02:38:24+00:00] INFO: Changing reboot status from {} to {:delay_mins=>0, :reason=>"There is a pending reboot for \#{pack}.", :timestamp=>2016-04-18 02:38:24 +0000, :requested_by=>"googlechrome_reboot"}
[2016-04-18T02:38:24+00:00] INFO: HTTP Request Returned 500 Internal Server Error: error
[2016-04-18T02:38:24+00:00] ERROR: Running exception handlers
[2016-04-18T02:38:24+00:00] ERROR: Exception handlers complete
[2016-04-18T02:38:24+00:00] FATAL: Stacktrace dumped to c:/chef/local-mode-cache/cache/chef-stacktrace.out
[2016-04-18T02:38:24+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2016-04-18T02:38:24+00:00] FATAL: Net::HTTPFatalError: 500 "Internal Server Error"
[2016-04-18T02:38:37+00:00] INFO: Child process exited (pid: 692)
[2016-04-18T02:38:38+00:00] INFO: Next chef-client run will happen in 1800.8035677517687 seconds

so...it looks like the zero client server is returning an http 500 error. The Event Viewer application log shows:

Failed Chef Client run af972109-32ca-4089-97ef-789b7b5d8d07 in 133.762612 seconds.
 Exception type: Net::HTTPFatalError
 Exception message: 500 "Internal Server Error"
 Exception backtrace: C:/opscode/chef/embedded/lib/ruby/2.1.0/net/http/response.rb:119:in `error!'
C:/opscode/chef/embedded/lib/ruby/gems/2.1.0/gems/chef-12.9.38-universal-mingw32/lib/chef/http.rb:146:in `request'
C:/opscode/chef/embedded/lib/ruby/gems/2.1.0/gems/chef-12.9.38-universal-mingw32/lib/chef/http.rb:119:in `put'
C:/opscode/chef/embedded/lib/ruby/gems/2.1.0/gems/chef-12.9.38-universal-mingw32/lib/chef/node.rb:620:in `save'
C:/opscode/chef/embedded/lib/ruby/gems/2.1.0/gems/chef-12.9.38-universal-mingw32/lib/chef/client.rb:542:in `save_updated_node'
C:/opscode/chef/embedded/lib/ruby/gems/2.1.0/gems/chef-12.9.38-universal-mingw32/lib/chef/client.rb:704:in `converge_and_save'
C:/opscode/chef/embedded/lib/ruby/gems/2.1.0/gems/chef-12.9.38-universal-mingw32/lib/chef/client.rb:281:in `run'
C:/opscode/chef/embedded/lib/ruby/gems/2.1.0/gems/chef-12.9.38-universal-mingw32/lib/chef/application.rb:267:in `run_with_graceful_exit_option'
C:/opscode/chef/embedded/lib/ruby/gems/2.1.0/gems/chef-12.9.38-universal-mingw32/lib/chef/application.rb:243:in `block in run_chef_client'
C:/opscode/chef/embedded/lib/ruby/gems/2.1.0/gems/chef-12.9.38-universal-mingw32/lib/chef/local_mode.rb:44:in `with_server_connectivity'
C:/opscode/chef/embedded/lib/ruby/gems/2.1.0/gems/chef-12.9.38-universal-mingw32/lib/chef/application.rb:226:in `run_chef_client'
C:/opscode/chef/embedded/lib/ruby/gems/2.1.0/gems/chef-12.9.38-universal-mingw32/lib/chef/application/client.rb:419:in `run_application'
C:/opscode/chef/embedded/lib/ruby/gems/2.1.0/gems/chef-12.9.38-universal-mingw32/lib/chef/application.rb:58:in `run'
C:/opscode/chef/embedded/lib/ruby/gems/2.1.0/gems/chef-12.9.38-universal-mingw32/bin/chef-client:26:in `<top (required)>'
C:/opscode/chef/bin/chef-client:61:in `load'
C:/opscode/chef/bin/chef-client:61:in `<main>'

which doesn't really indicate anything to me...

But if I go to the command line and just run chef-client (from any directory, with no parameters, it immediately recognizes the need to reboot and does so).

Any ideas to finish out this problem? Would REALLY appreciate it.

1

There are 1 best solutions below

19
On BEST ANSWER

Unless you set something up where Chef runs as a service or via a scheduled task, it can't just end up running again on its own after a restart. Also Chef doesn't per se "pick up where it left off", but it is normally idempotent and only changes things that need to be changed. The not_if guard on your resource is the idempotence check for each thing. Is there a reason you aren't using the windows_feature resource?