It seems, ANTS profiler is not aware of write barrier performance impact? At least, in my sample it didn't show the real place which executed very slowly (because of write barrier)
Sorry for the delay, but I did eventually get an answer to this question after asking around.
The profiler measures real time spent in any given place, so that should include time caused by write barriers. It's possible that something else the profiler does causes the behaviour of the CLR to be different in such a way that the performance issue does not occur (line-level timing in particular will change this performance): switching to one of the profiling modes marked as 'more accurate' will report this; sampling should be particularly representative of the real performance of the application.
In general, it's not possible to measure the performance of a single write barrier, only the cumulative effect of them over a longer period of time.
Comments
The profiler measures real time spent in any given place, so that should include time caused by write barriers. It's possible that something else the profiler does causes the behaviour of the CLR to be different in such a way that the performance issue does not occur (line-level timing in particular will change this performance): switching to one of the profiling modes marked as 'more accurate' will report this; sampling should be particularly representative of the real performance of the application.
In general, it's not possible to measure the performance of a single write barrier, only the cumulative effect of them over a longer period of time.