Monday, August 19, 2013

This blog post is mostly so I don't forget this kind of stuff.

http://software.intel.com/sites/default/files/m/a/d/2/2/e/15529-Intel_VTune_Using.pdf mentions "% execution stalled". This is the core i7 document rather than the Sandy Bridge document, but bear with me.

The formula is:

(UOPS_EXECUTED.CORE_STALL_CYCLES /(UOPS_
EXECUTED.CORE_ACTIVE_CYCLES +UOPS_EXECUTED.
CORE_STALL_CYCLES))* 100

However, there's no UOPS_EXECUTED.CORE_STALL_CYCLES in the PMC documentation, nor is it in the Intel SDM chapter on performance counters.

But wait! It kind of is there. There /is/ UOPS_EXECUTED.THREAD, which is "Counts the total number of uops to be executed per thread each cycle." In the same block, it says that to count stall cycles, set CMASK=1, INV=1. Ok, so how does one do that with PMC?

# pmcstat -S UOPS_EXECUTED.THREAD,inv,cmask=1 -T -w 5

Now, it seems to be showing me the ACPI wait and MWAIT functions as high sample events - which is odd, as I didn't think this particular PMC measured C1 and MWAIT states. I'll chase this up.

For Sandy Bridge it's UOPS_DISPATCHED.THREAD - this counts dispatched micro-operatons per-thread each cycle. CMASK=1,INV=1 counts the number of stall cycles.

No comments:

Post a Comment