Re: A proposal to enhance RISC-V HPM (Hardware Performance Monitor)


Andy Glew Si5
 

I am curious how x86 address this problem. How does it enable hypervisor mode sampling without similar issues?
x86's hardware performance monitoring interrupt  is delivered  to whatever is specified by the local APIC's LVT (Local Vector Table) entry for performance monitoring.    This gets it out of being a special case, and just makes it like any other interrupt. The hypervisor or virtual machine manager has to be able to handle any interrupt appropriately.  in a simple virtual machine architecture, such as the initial version of Intel VT that I worked on, all  external interrupts  go to the hypervisor, and then the hypervisor can decide if it wants to  deliver them to a guest privilege level. Fancier virtual machine architectures such as current Intel allow certain interrupts to be sent directly to  the guest, without  being caught by the  hypervisor first.

There should not be any special handling for hardware performance monitoring interests. It should be just like any other  interrupt or exception. There should be a uniform delegation architecture for all interests and traps.   Eliminate as many special cases as possible.

For any given interrupt or exception, sometimes you wanted to go straight to the hypervisor, sometimes you wanted to go straight to the guest..

I say "hypervisor" here,  but it might just as well be M-mode: or generalize, sometimes you want  it to go to the most privileged software level, sometimes to the least, sometimes one of the privileged software levels in between. The interrupt  architecture should support that.

--

There's a bit of funkiness with respect to precise  performance monitoring exceptions just like there is for machine check. If you go through a complicated interrupt vectoring mechanism, it may become  difficult to be precise. In fact, that's one of the reasons why P6's  original  performance monitoring interrupt forcing precise (that,  and the fact that it took several cycles to propagate from the unit where the event occurred to the performance counter logic, and not even a uniform number cycles -  there was a tree of wires with differing numbers  of latches on different paths).

But that is okay-ish:   You can either have an interlock to  prevent more instructions from retiring after the instruction where the precise performance monitor event has occurred.    taking care to avoid deadlock, e.g. taking care that a higher priority interrupt can preempt while that interlock is in flight. Or you can add the mechanisms  To provide appropriate sampling when interrupts are actually imprecise. Or, you can add a new interrupt/exception delivery mechanism  but basically does the first thing, but throw it out some of the complexity of your legacy trip delivery mechanism. It's microarchitecture.

By the way, if your performance counter takes more than one cycle to propagate carry  and detect overflow, you need such an interlock anyway.  That is not a common problem, but at Intel circa 2000  we regularly  imagined a "fireball" OOO core that  ran out of frequencies such that you could only propagate a carry  across 16 bits at a time;  if you are wire limited rather than logic limited, for cycles for 64-bit counter.   In fact, I believe that Intel PEBS (Precise Event Based Sampling)  does not actually sample when the counter overflows;  instead, when the counter overflows, it sets a bit that says the next time this event is recognized, then generate the interrupt. Which, if you think about it,  is actually imprecise, if more than one event occurs in any given cycle.

 However, propagating through the interrupt delivery logic probably takes more cycles than propagating a carry 64-bits.




From: Alankao <alankao@...>
Sent: Tuesday, July 21, 2020 4:40PM
To: Tech-Privileged <tech-privileged@...>
Subject: Re: [RISC-V] [tech-privileged] A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

 

On 7/21/2020 4:40 PM, alankao wrote:
Hi Andy,

Thank you for the hints as an Intel PMU architect.  My question is about the mode selection part as below.

It is not difficult to implement such a mechanism that an event should only be counted in some privileged modes.  Both Greg's and my approach can achieve this. But in practice, we found profiling higher-privileged modes has some problems. Under basic Unix-like RISC-V configuration, the kernel runs in S-mode and there is M-mode for platform-specific stuff. 

Says we now want to sample M-mode software. The first implement decision is which mode the HPM interrupt should go. Everything can be more controllable if the interrupt can just go S-mode, but obviously there is no easy way for S-mode software, the kernel, to read general M-mode information like mepc (Machine Exception Program Counter) register.  The other route goes to M-mode, but since RISC-V HPM interrupt has never been seriously/publicly discussed until this thread, the effort so far including current PMU SBI extension proposal did not address this.

I am curious how x86 address this problem. How does it enable hypervisor mode sampling without similar issues?

I apologize for some of the language errors that occur far too frequently in my email. I use speech recognition much of the time, and far too often do not catch misrecognition errors. This can be quite embarrassing, amusing, and/or confusing. Typical errors are not spelling but homonyms, words that sound the same - e.g. "cash" instead of "cache".

Join tech-privileged@lists.riscv.org to automatically receive all group messages.