I have a WebJob scheduled to run every 10 minutes via Settings.job '0 0/10 * * *' that has been working fine then last night my job just quit being called. Looking around in eventlog.xml for the last call and I see the following
<EventData>
<Data>7192</Data>
<Data>LogCleanup</Data>
<Data>Role environment . FAILED TO INITIALIZE. hr: -2147024891</Data>
</EventData>
No more calls after this, I manually run the job from the portal this morning and it worked fine and is being called eevery 10 minutes again as expected. My NLog internal log file logged the the following for the last run that was called
2016-12-15 21:40:02.6449 Error Error has been raised. Exception: Microsoft.WindowsAzure.Storage.StorageException: The remote server returned an error: (409) Conflict. ---> System.Net.WebException: The remote server returned an error: (409) Conflict.
at System.Net.HttpWebRequest.GetResponse()
at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](RESTCommand`1 cmd, IRetryPolicy policy, OperationContext operationContext)
--- End of inner exception stack trace ---
at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](RESTCommand`1 cmd, IRetryPolicy policy, OperationContext operationContext)
at Microsoft.WindowsAzure.Storage.Table.TableOperation.Execute(CloudTableClient client, CloudTable table, TableRequestOptions requestOptions, OperationContext operationContext)
at Microsoft.WindowsAzure.Storage.Table.CloudTable.Execute(TableOperation operation, TableRequestOptions requestOptions, OperationContext operationContext)
at NLog.AzureTableStorage.AzureTableStorageTarget.Write(LogEventInfo logEvent)
at NLog.Targets.Target.Write(AsyncLogEventInfo logEvent)
Request Information
RequestID:1153c4ed-0002-000e-611b-57d353000000
RequestDate:Thu, 15 Dec 2016 21:40:02 GMT
StatusMessage:Conflict
ErrorCode:EntityAlreadyExists
The errors don't make any sense to me but the bigger question is why the job just quit being called? One run failed with some unexplained error and the scheduler quits calling it does not seem right.
How reliable are WebJobs? What kind of checks do I need in place to validate that they are being called?
Please make sure you have AlwaysOn enabled for your Web App. CRON Scheduled jobs require this - see documentation here. The runtime will actually emit a Warning to the logs if we detect you don't have AlwaysOn enabled:
Please check your WebJob logs for this - you should see it. The idea of that log was to help users auto-diagnose this, but perhaps you didn't see it? We also show a warning in the portal for the WebJob if we detect you have continuous WebJobs but AlwaysOn is not enabled. The warning for this will appear by the AlwaysOn setting on the settings page for your WebApp.