Re: A proposal to enhance RISC-V HPM (Hardware Performance Monitor)
Andy Glew Si5
x86's hardware performance monitoring interrupt is delivered to whatever is specified by the local APIC's LVT (Local Vector Table) entry for performance monitoring. This gets it out of being a special case, and just makes it like any other interrupt. The hypervisor or virtual machine manager has to be able to handle any interrupt appropriately. in a simple virtual machine architecture, such as the initial version of Intel VT that I worked on, all external interrupts go to the hypervisor, and then the hypervisor can decide if it wants to deliver them to a guest privilege level. Fancier virtual machine architectures such as current Intel allow certain interrupts to be sent directly to the guest, without being caught by the hypervisor first.
There should not be any special handling for hardware performance monitoring interests. It should be just like any other interrupt or exception. There should be a uniform delegation architecture for all interests and traps. Eliminate as many special cases as possible.
For any given interrupt or exception, sometimes you wanted to go straight to the hypervisor, sometimes you wanted to go straight to the guest..
I say "hypervisor" here, but it might just as well be M-mode: or generalize, sometimes you want it to go to the most privileged software level, sometimes to the least, sometimes one of the privileged software levels in between. The interrupt architecture should support that.
There's a bit of funkiness with respect to precise performance monitoring exceptions just like there is for machine check. If you go through a complicated interrupt vectoring mechanism, it may become difficult to be precise. In fact, that's one of the reasons why P6's original performance monitoring interrupt forcing precise (that, and the fact that it took several cycles to propagate from the unit where the event occurred to the performance counter logic, and not even a uniform number cycles - there was a tree of wires with differing numbers of latches on different paths).
But that is okay-ish: You can either have an interlock to prevent more instructions from retiring after the instruction where the precise performance monitor event has occurred. taking care to avoid deadlock, e.g. taking care that a higher priority interrupt can preempt while that interlock is in flight. Or you can add the mechanisms To provide appropriate sampling when interrupts are actually imprecise. Or, you can add a new interrupt/exception delivery mechanism but basically does the first thing, but throw it out some of the complexity of your legacy trip delivery mechanism. It's microarchitecture.
By the way, if your performance counter takes more than one cycle to propagate carry and detect overflow, you need such an interlock anyway. That is not a common problem, but at Intel circa 2000 we regularly imagined a "fireball" OOO core that ran out of frequencies such that you could only propagate a carry across 16 bits at a time; if you are wire limited rather than logic limited, for cycles for 64-bit counter. In fact, I believe that Intel PEBS (Precise Event Based Sampling) does not actually sample when the counter overflows; instead, when the counter overflows, it sets a bit that says the next time this event is recognized, then generate the interrupt. Which, if you think about it, is actually imprecise, if more than one event occurs in any given cycle.
However, propagating through the interrupt delivery logic probably takes more cycles than propagating a carry 64-bits.
From: Alankao <alankao@...>
Sent: Tuesday, July 21, 2020 4:40PM
To: Tech-Privileged <tech-privileged@...>
Subject: Re: [RISC-V] [tech-privileged] A proposal to enhance RISC-V HPM (Hardware Performance Monitor)
On 7/21/2020 4:40 PM, alankao wrote:
Thank you for the hints as an Intel PMU architect. My question is about the mode selection part as below.
It is not difficult to implement such a mechanism that an event should only be counted in some privileged modes. Both Greg's and my approach can achieve this. But in practice, we found profiling higher-privileged modes has some problems. Under basic Unix-like RISC-V configuration, the kernel runs in S-mode and there is M-mode for platform-specific stuff.
Says we now want to sample M-mode software. The first implement decision is which mode the HPM interrupt should go. Everything can be more controllable if the interrupt can just go S-mode, but obviously there is no easy way for S-mode software, the kernel, to read general M-mode information like mepc (Machine Exception Program Counter) register. The other route goes to M-mode, but since RISC-V HPM interrupt has never been seriously/publicly discussed until this thread, the effort so far including current PMU SBI extension proposal did not address this.
I am curious how x86 address this problem. How does it enable hypervisor mode sampling without similar issues?
I apologize for some of the language errors that occur far too frequently in my email. I use speech recognition much of the time, and far too often do not catch misrecognition errors. This can be quite embarrassing, amusing, and/or confusing. Typical errors are not spelling but homonyms, words that sound the same - e.g. "cash" instead of "cache".