Wrong CPU attribution in Monitor, Mutex and Semaphore?
asthomas
Posts: 6
My program makes use of C#'s lock, which is a wrapper on the Monitor object.
The sampling period of my code has millions of calls to other functions, across about 10 relevant threads, and only 383 calls to the locked section of code.
My first thought was that my program must have a lot of contention for the lock and that contended locks use a lot of CPU. So I spent some time to ensure that there is no contention for the lock over the sampling period, and the CPU remains very high. So Ants says that 383 calls to an uncontended lock use 20% of the program's CPU.
So, I modified my program to remove the lock entirely. It's dangerous and I can't ship the program like that, but it provides an interesting test. Now when I profile the program the total CPU does not change, but that 20% that was being attributed to the lock() call is now being attributed to another callee of the function that contained the lock. In past runs Ants reported that callee had low CPU usage, but with the lock removed Ants reports high CPU on that function.
So I look down the stack, and I see that the CPU is now being attributed to a call to SemaphoreSlim.Release further down the calling stack. The SemaphoreSlim has at most 3 threads waiting on it, and they are idle waiting on the SemaphoreSlim about 93% of the time. Ants reports that the CPU is being used in the body of SemaphoreSlim.Release, not in any callee. Looking at the decompiled source for SemaphoreSlim.Release there does not seem to be much opportunity to consume so much CPU.
If I replace the SemaphoreSlim with a Semaphore, Ants attributes the CPU to the native Windows semaphore release.
I cannot remove the semaphore without breaking the program, so I don't know where the CPU gets attributed in that case.
For reference, over the period that I'm studying Ants reports that SemaphoreSlim.Release is using 784 CPU microseconds per call. For comparison, http://www.albahari.com/threading/part2.aspx claims that SemaphoreSlim should have overhead in the nanoseconds.
Based on all of this, do you have any insight? Have you seen something like this before? Is Ants mis-attributing the CPU usage in this program, or is this correct? How do we square what Ants reports with the performance we should expect from synchronization primitives? Using almost a CPU millisecond to release a semaphore seems excessive.
lock (myObject) { // do something }When I run my program with Ants, the Monitor.Enter and Monitor.Release calls appear to be using a huge amount of CPU (about 20% of my program's CPU usage). I have tried replacing lock() with a Mutex, like this:
// constructor calls WaitOne on the Mutex, Dispose calls Release using (new MyMutexLock(myMutex)) { // do something }That lets me see explicitly where the CPU is being attributed because Ants can see the source for MyMutexLock.constructor and MyMutexLock.Dispose. The CPU usage is reported by Ants to be slightly better than with Monitor, but not much, and is reported to be either in the Mutex.WaitOne or Mutex.Release calls.
The sampling period of my code has millions of calls to other functions, across about 10 relevant threads, and only 383 calls to the locked section of code.
My first thought was that my program must have a lot of contention for the lock and that contended locks use a lot of CPU. So I spent some time to ensure that there is no contention for the lock over the sampling period, and the CPU remains very high. So Ants says that 383 calls to an uncontended lock use 20% of the program's CPU.
So, I modified my program to remove the lock entirely. It's dangerous and I can't ship the program like that, but it provides an interesting test. Now when I profile the program the total CPU does not change, but that 20% that was being attributed to the lock() call is now being attributed to another callee of the function that contained the lock. In past runs Ants reported that callee had low CPU usage, but with the lock removed Ants reports high CPU on that function.
So I look down the stack, and I see that the CPU is now being attributed to a call to SemaphoreSlim.Release further down the calling stack. The SemaphoreSlim has at most 3 threads waiting on it, and they are idle waiting on the SemaphoreSlim about 93% of the time. Ants reports that the CPU is being used in the body of SemaphoreSlim.Release, not in any callee. Looking at the decompiled source for SemaphoreSlim.Release there does not seem to be much opportunity to consume so much CPU.
If I replace the SemaphoreSlim with a Semaphore, Ants attributes the CPU to the native Windows semaphore release.
I cannot remove the semaphore without breaking the program, so I don't know where the CPU gets attributed in that case.
For reference, over the period that I'm studying Ants reports that SemaphoreSlim.Release is using 784 CPU microseconds per call. For comparison, http://www.albahari.com/threading/part2.aspx claims that SemaphoreSlim should have overhead in the nanoseconds.
Based on all of this, do you have any insight? Have you seen something like this before? Is Ants mis-attributing the CPU usage in this program, or is this correct? How do we square what Ants reports with the performance we should expect from synchronization primitives? Using almost a CPU millisecond to release a semaphore seems excessive.