SOS_SCHEDULER_YIELD spike on Availability Group

331 Views Asked by At

I am about to lose my mind on this one.

We have a 3 server availability group from which our applications read from all 3 servers. Runs great 99.9% of the time. Every now and then we get a spike in SOS_SCHEDULER_YIELD. When this happens a lot of our queries timeout. Usually doesn't last longer than a minute. We have a task that captures wait stats every 2 minutes (image below).

8a is the primary server in the availability group. As you can see, SOS_SCHEDULER_YIELD spikes from 122,000 at 10:40 to 4,000,000 at 10:42 and back to 85,000 at 10:44. The other servers spike to around 2,000,000.

These servers are all virtual. 8a and 8c are on the same host while 8b is in a different local datacenter. Servers use the SAN in the datacenter in which they reside so 8a and 8c use the same SAN.

No jobs were running at the time. The Server admin did not see any issues on the servers themselves. The host for 8b CPU usage spiked from 43% at 10:40 to 70% at 1045 and the host for the other 2 spiked from 42% to 62% at the same time. Both dropped back down at the 10:50 mark.

I need ideas on what could cause this type of behavior and/or ideas on how to troubleshoot. I understand the SOS_SCHEDULER_YIELD may be an indicator and not the problem itself. I just know that when I start getting timeouts on these servers, SOS_SCHEDULER_YIELD spikes and spikes high. Thanks in advance for the ideas.

Wait Stats Monitor

2

There are 2 best solutions below

1
On

I would suggest you read this article by Paul Randal. It explain the SOS_SCHEDULER_YIELD wait type. What cause the SOS_SCHEDULER_YIELD to show up? Read it here.

Around the time of 1042am, try to execute the query below. From there check the query and query plan and analyze it from there.

    SELECT
    [er].[session_id],
    [es].[program_name],
    [est].text,
    [er].[database_id],
    [eqp].[query_plan],
    [er].[cpu_time]
FROM sys.dm_exec_requests [er]
INNER JOIN sys.dm_exec_sessions [es] ON
    [es].[session_id] = [er].[session_id]
OUTER APPLY sys.dm_exec_sql_text ([er].[sql_handle]) [est]
OUTER APPLY sys.dm_exec_query_plan ([er].[plan_handle]) [eqp]
WHERE
    [es].[is_user_process] = 1
    AND [er].[last_Wait_type] = N'SOS_SCHEDULER_YIELD'
ORDER BY
    [er].[session_id];
GO
0
On

We exerienced the same today and it was due to a virtualization issue. We had another VM on the same physical host which spiked to 100% CPU and due to overcommitment and missing reservation, took the database VM with it. I think this is almost impossible to track down from within the Database / Database VM. The SOS_SCHEDULER_YIELD waits were actually the whole VM waiting vor CPU cycles.