.NET multi-threaded application profiling does not "work"

Hi,

I am profiling a multi-threaded application. The line methods are incorrect and the method times (with children) are also wrong.

For example, the app first loads data asynchronously and takes about 60 seconds to complete all the data, which is accurately depicted by the top real-time processing chart in the profiler. However, the top method "with children" indicates 140 seconds, which is obviously incorrect since the total is 60 seconds. In addition, we log the processing time. A method that takes 3.2 seconds is indicated by the profiler to take 55 seconds!

I have tried to turn off some of the tools > Options such as Inlining and Ajust Timings... but that makes no difference.

I have used the profiler in the past with a single threaded program and was very happy with the results but this time is quite different so my question is:

Does it work for multi-threaded applications? If so, what do I need to adjust for the tool to provide correct analysis results?
Thanks,
Julien

Comments

  • Brian DonahueBrian Donahue Posts: 6,590 Bronze 1
    Hi Julien,

    It's not because the app is multithreaded. We have these sorts of complaints a lot and sometimes it can be explained and sometimes it's the profiler's own overhead skewing the results because it works inside your process. I don't know if this is still valid but we had this problem in 5.1:
    http://www.red-gate.com/supportcenter/C ... 000444.htm

    Admittedly there are some trade-offs between accuracy and detail - please see the grid on the product documentation: http://www.red-gate.com/supportcenter/C ... 20Profiler.
  • It's in the nature of profiling on modern CPUs that the more instrumentation you add, the harder it is to work out how the program would have behaved without the profiler. ANTS has to compromise and instead tries to ensure that the relationship between method times remains constant. Turning down the detail level using the startup dialog will reduce the amount of instrumentation and improves ANTS estimation of the performance of the application.

    With multithreaded applications, you should probably consider using sampling mode wherever possible to get the best impression of how the application is performing (with instrumentation, the profiler may cause threads to be synchronised when under heavy load where they would not normally get synchronised)

    It's also worth noting that ANTS counts CPU ticks and not real-time performance (except in sampling mode). This does provide a better representation of the work that an application must do, but technologies like SpeedStep and TurboBoost mean that any given clock tick can take a very variable amount of time. ANTS works out the clock speed at any given point in time and uses this to get the real time values, but because the CPU is varying in speed this doesn't necessarily correspond very well to the amount of work the application is actually doing at any given point.

    Finally, with multi-threaded applications the amount of 'time' spent in total is multiplied by the number of threads (or the number of running threads if you choose CPU time). This is particularly noticeable when using wallclock time (where having 3 threads for 3 minutes will give you 9 minutes of time in the call tree), but will also happen with CPU time. We do this because dividing the time between the threads doesn't actually make sense - especially with multicore systems where you really can do 9 minutes of work in 3 minutes of time. The timeline can be used to untangle this: when you select a method it will show a bar showing exactly when it was running: for a method that's running in parallel this will be shorter than the amount of time shown in the call tree. You can also select an individual thread using the drop-down to eliminate extra time caused by parallelism.
    Andrew Hunter
    Software Developer
    Red Gate Software Ltd.
Sign In or Register to comment.