Sunday, July 12, 2015

Troubleshooting High-CPU Utilization for SQL Server



The objective of this FAQ is to outline the basic steps in troubleshooting high CPU utilization on  a server hosting a SQL Server instance.
The first and the most common step if you suspect high CPU utilization (or are alerted for it) is to login to the physical server and check the Windows Task Manager. The Performance tab will show the high utilization as shown below:



Next, we need to determine which process is responsible for the high CPU consumption. The Processes tab of the Task Manager will show this information:


Note that to see all processes you should select Show processes from all user.
In this case, SQL Server (sqlserver.exe) is consuming 99% of the CPU (a normal benchmark for max CPU utilization is about 50-60%).
Next we examine the scheduler data. Scheduler is a component of SQLOS which evenly distributes load amongst CPUs. The query below returns the important columns for CPU troubleshooting.
Note – if your server is under severe stress and you are unable to login to SSMS, you can use another machine’s SSMS to login to the server through DAC – Dedicated Administrator Connection (see http://msdn.microsoft.com/en-us/library/ms189595.aspx for details on using DAC)
SELECT  scheduler_id
        ,cpu_id
        ,status
        ,runnable_tasks_count
        ,active_workers_count
        ,load_factor
        ,yield_count
FROM sys.dm_os_schedulers
WHERE scheduler_id < 255



See below for the BOL definitions for the above columns.
scheduler_id – ID of the scheduler. All schedulers that are used to run regular queries have ID numbers less than 1048576. Those schedulers that have IDs greater than or equal to 1048576 are used internally by SQL Server, such as the dedicated administrator connection scheduler.
cpu_id – ID of the CPU with which this scheduler is associated.
status – Indicates the status of the scheduler.
runnable_tasks_count – Number of workers, with tasks assigned to them that are waiting to be scheduled on the runnable queue.
active_workers_count – Number of workers that are active. An active worker is never preemptive, must have an associated task, and is either running, runnable, or suspended.
current_tasks_count - Number of current tasks that are associated with this scheduler.
load_factor – Internal value that indicates the perceived load on this scheduler.
yield_count – Internal value that is used to indicate progress on this scheduler.
                                                               
Now to interpret the above data. There are four schedulers and each assigned to a different CPU. All the CPUs are ready to accept user queries as they all are ONLINE.
There are 294 active tasks in the output as per the current_tasks_count column. This count indicates how many activities currently associated with the schedulers. When a  task is complete, this number is decremented. The 294 is quite a high figure and indicates all four schedulers are extremely busy.
When a task is enqueued, the load_factor  value is incremented. This value is used to determine whether a new task should be put on this scheduler or another scheduler. The new task will be allocated to less loaded scheduler by SQLOS. The very high value of this column indicates all the schedulers have a high load.
There are 268 runnable tasks which mean all these tasks are assigned a worker and waiting to be scheduled on the runnable queue.  
The next step is  to identify which queries are demanding a lot of CPU time. The below query is useful for this purpose (note, in its current form,  it only shows the top 10 records).
SELECT TOP 10 st.text
               ,st.dbid
               ,st.objectid
               ,qs.total_worker_time
               ,qs.last_worker_time
               ,qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
ORDER BY qs.total_worker_time DESC
This query as total_worker_time as the measure of CPU load and is in descending order of the  total_worker_time to show the most expensive queries and their plans at the top:   

 


Note the BOL definitions for the important columns:
total_worker_time - Total amount of CPU time, in microseconds, that was consumed by executions of this plan since it was compiled.
last_worker_time - CPU time, in microseconds, that was consumed the last time the plan was executed.

I re-ran the same query again after few seconds and was returned the below output.





After few seconds the SP dbo.TestProc1 is shown in fourth place and once again the last_worker_time is the highest. This means the procedure TestProc1 consumes a CPU time continuously each time it executes.   

In this case, the primary cause for high CPU utilization was a stored procedure. You can view the execution plan by clicking on query_plan column to investigate why this is causing a high CPU load.
I have used SQL Server 2008 (SP1) to test all the queries used in this article.


Lock,Block and Deadlock in SQL Server

LOCK vs Block vs Deadlock in SQL Server

08MAY
Question :
What is the main difference between Lock , Block and Deadlock in SQL Server ?
Answer :
The Meaning of lock is :
Lock is a done by database when any connection access a same piece of data concurrently. One connection need to access Piece of data .

The Meaning of Block :
It occurs when two connections need access to same piece of data concurrently and the meanwhile another is blocked because at a particular time, only one connection can have access. SQL knows that once the blocking process finishes the resource will be available and so the blocked process will wait (until it times out), but it won’t be killed.

The Meaning of Deadlock :
Deadlock occurs when one connection is blocked and waiting for a second to complete its work, and this situation is again with another process as it waiting for first connection to release the lock. Hence deadlock occurs.
Example :
i have 2 processes. P1 & P2 trying to get to 2 resources R1 & R2.
P1 gets a lock on R1
and
P2 gets a lock on R2
THEN
P1 tries to get a lock on R2 but can’t because it is locked by P2
and
P2 tries to get a lock on R1 but can’t because it is locked by P1
in this point no process can finish because they are waiting on locked resources.  they are deadlocked. One of them must be killed to allow either of them to finish.

Details About CheckPoint

Checkpoint

Writes all dirty pages for the current database to disk. Dirty pages are data pages that have been entered into the buffer cache and modified, but not yet written to disk. Checkpoints save time during a later recovery by creating a point at which all dirty pages are guaranteed to have been written to disk.
For performance reasons, the Database Engine performs modifications to database pages in memory and does not write the page to disk after every change. However, periodically the Database Engine needs to perform a checkpoint to write these dirty pages to disk. Writing dirty pages to disk creates a known good point from which the Database Engine can start applying changes contained in the log during recovery after an unexpected shutdown or crash
 
Syntex: CHECKPOINT [ checkpoint_duration* ]
 
*checkpoint_duration is the amount of time, in seconds, for the checkpoint to complete. When checkpoint_duration is specified, the SQL Server Database Engine attempts to perform the checkpoint within the requested duration. The checkpoint_duration must be an expression of type int, and must be greater than zero. When this parameter is omitted, SQL Server Database Engine automatically adjusts the checkpoint duration to minimize the performance impact on database applications.

 

Events That Cause Checkpoints

Before a database backup, the Database Engine automatically performs a checkpoint so that all changes to the database pages are contained in the backup. In addition, checkpoints occur automatically when either of the following conditions occur:
  • The active portion of the log exceeds the size that the server could recover in the amount of time specified in the recovery intervalserver configuration option.
  • The log becomes 70 percent full, and the database is in log-truncate mode.
    A database is in log truncate mode when both these conditions are TRUE: the database is using the Simple recovery model, and, after execution of the last BACKUP DATABASE statement that referenced the database, one of the following events occurs:
    • A minimally logged operation is performed in the database, such as a minimally logged bulk copy operation or a minimally logged WRITETEXT statement is executed.
    • An ALTER DATABASE statement is executed that adds or deletes a file in the database.
Also, stopping a server issues a checkpoint in each database on the server. The following methods of stopping SQL Server perform checkpoints for each database:
  • Using SQL Server Configuration Manager.
  • Using SQL Server Management Studio.
  • Using the SHUTDOWN statement.( The SHUTDOWN WITH NOWAIT statement shuts down SQL Server without executing a checkpoint in each database. This may cause the subsequent restart to take a longer time than usual to recover the databases on the server )
  • Using the net stop mssqlserver command in a command-prompt window.
  • Using Services in Control Panel, selecting mssqlserver, and clicking Stop.
  • Bringing an instance offline in a cluster.
Permissions: CHECKPOINT permissions default to members of the sysadmin fixed server role and the db_owner and db_backupoperatorfixed database roles, and are not transferable.