Key Performance Counters for Windows Server using Performance Monitor

Every windows server ships with a lot of free tools, to help monitor and better understand what’s happening on the system. One of the most use full tools is called “Performance Monitor”, don’t let the outdated UI fool you. Performance Monitor is an extremely useful tool that allows you to log, collect, and visualize all your performance related data.

Before you go increase your VM size on your cloud provider, or switch to SSDs. Be sure to take a look at Performance Monitor, they could surprise you.

The data you collect can be broken down into four main groups:

  • CPU
    • Spawning too many threads
    • Doing too much work at one time
    • Background job that could be eating up all the CPU cycles
    • Too many Garbage collector calls
  • Memory
    • Creating too much garbage
    • Memory leaks
    • Trying to load too much data into memory at one time
    • Not streaming in data
  • Disk
    • Writing to disk all the time
    • Using too much ram at one time, so the OS starts to use Disk
    • Serving too much static content for your site (use a CDN)
  • Network
    • More traffic then your one machine has bandwidth for
    • Other applications on machine saturating connection

Any hiccup in any one of this areas of the machine could have a cascading effect on performance degradation, since each one could act as a bottle neck to the others.

On to the actual metrics you want to watch for, grouped by type:

CPU

% Processor Time:

  • What is it ?
    • Total time the processor was busy processing.
  • Why should I care ?
    • Gives a very general measure of how busy the processor can get.
  • How does it help me ?
    • If this counter is always very high, then you need to use some of the counters down below

% Privileged Time:

  • What is it ?
    • Total time the processor spent executing in kernel mode.
  • Why should I care ?
    • This measure takes into account only the kernel related operations that processor does, like memory management
  • How does it help me ?
    • If this counter is above 20% you have a driver, or hardware issue.

% User Time:

  • What is it ?
    • Total time the processor spent executing in any user application code.
  • Why should I care ?
    • Give you an idea of how much work your application code forces the processor to do.
  • How does it help me ?
    • If this percentage is too high it may be conflicting with privileged processor time. You always want to have some buffer, between “user time” and “privileged time” so the system can run smoothly.

Queue Length:

  • What is it ?
    • Number of threads waiting for a core to become available.
  • Why should I care ?
    • Give you an idea of how much work your machine is trying to do, at any given time.
  • How does it help me ?
    • Divide this number by the core count of machine. If the value is greater than 3, there is too much CPU pressure on the machine, and created a back long.

Process (*) \ Thread Count:

  • What is it ?
    • Amount of threads currently active in this process.
  • Why should I care ?
    • More threads means more CPU utilization.
  • How does it help me ?
    • Could be an indication that your application is spinning up too many threads at once.

Disk

Average Disk Queue Length

  • What is it ?
    • A simplified definition is how many disk operations (read & writes) were queued.
  • Why should I care ?
    • Gives you an idea of how saturated your disk is.
  • How does it help me ?
    • If the queue length is over 2 for a prolonged periods of time, then it could be an indication of a disk bottleneck.

% Disk Idle Time

  • What is it ?
    • How much time your disk spends doing nothing.
  • Why should I care ?
    • Gives you a picture of when the disk is free.
  • How does it help me ?
    • If the “Disk Idle Time” is low then it is doing a lot of work, which is okay. But only if the “Average Disk Queue Length” is below 2. Could give you an indication that the disk isn’t busy when it should be busy.

Avg Disk sec/Read & Avg Disk sec/Write

  • What is it ?
    • Latency of your disks.
  • Why should I care ?
    • Gives you an indication of how much lag there is to do anything with the disk.
  • How does it help me ?
    • Could indicate a hardware issue if the latency is too high.

Memory

Available Mega bytes

  • What is it ?
    • Amount of physical Memory available to processes.
  • Why should I care ?
    • Lets you know if your running out of memory.
  • How does it help me ?
    • Could indicate a memory leak if it continues to decrease

Pages / Sec

  • What is it ?
    • Rate at which pages are read directly from disk (slows down the whole system)
  • Why should I care ?
    • Occurs when there are page faults which cause system wide delays.
  • How does it help me ?
    • The higher this number gets, the more the system is running out of memory. The more it runs out of memory the more page faults will occur.

Pool Nonpaged Bytes

  • What is it ?
    • Area of memory for objects that cannot be written to disk
  • Why should I care ?
    • Eats up your available memory (ram)
  • How does it help me ?
    • If it becomes greater then 80% could lead to system halting, Nonpaged Pool Depletion Issue (Event Id 2019)

Pool Paged Bytes

  • What is it ?
    • Area of memory for objects that can be written to disk, when not being used.
  • Why should I care ?
    • The more object you have in this area the more expensive it will be to retrieve them.
  • How does it help me ?
    • The bigger this value gets the longer it takes to retrieve objects from memory.

Process (*) \ Private Bytes

  • What is it ?
    • The current size in bytes of the memory this process has allocated, that cannot be shared with other running processes.
  • Why should I care ?
    • Tells you how much memory your process takes up.
  • How does it help me ?
    • If this value gets consistently bigger over time, it could indicate the your application has a memory leak.

Network

Output Queue Length

  • What is it ?
    • The length of the output packet queue.
  • Why should I care ?
    • The higher this number the large the backlog of queue packets. The longer it takes to send out data
  • How does it help me ?
    • Helps indicate a network bottle neck. If greater then one, the systems network is nearing capacity.

Resources used: