Date
1 - 18 of 18
Questions on HPMs
Beeman Strong
Hi there, I'm working on PMU definition at Rivos, and had some questions about the HPM architecture (including Sscofpmf extension). I started just a couple of weeks ago, and while I tried to do my homework by digging through old messages I may have missed some, so forgive me if some of these questions have been answered before. 1) I notice that the instret and cycle counters have been excluded from filtering/sampling support with Sscofpmf. Doesn't the inability to filter these counters based on privilege level (PL) create problems for exposing them to lower PLs? For instance, a guest OS running in VS may be confused when a guest page fault causes the instret value to be inflated by the HS fault handler. Such cross-PL exposure could even be a side-channel threat, enabling code in U to discern the number of instructions retired handling an ECALL in S/M. 2) I haven't seen a clear definition of what instret counts, other than: * The minstret CSR counts the number of instructions the hart has retired. * As ECALL and EBREAK cause synchronous exceptions, they are not considered to retire, and should not increment the minstret CSR. From this I'm inferring that neither interrupts nor instructions that take exceptions increment instret. Is that right? thanks, beeman |
|
Greg Favor
On Thu, Oct 28, 2021 at 3:21 PM Beeman Strong <beeman@...> wrote:
Andrew can confirm, but I believe the answer is Yes. Which makes sense since in both cases an instruction is not successfully executed (and one doesn't want double-counting of an instruction that first takes an exception and then successfully executes after return from the exception, e.g. a page fault). Greg |
|
Greg Favor
On Thu, Oct 28, 2021 at 3:21 PM Beeman Strong <beeman@...> wrote:
This lack or omission starts from the fact that there are no mhpmevent CSRs for those fixed-function counters. (Also note that one could program one of the other "general function" counters to count instret or cycles.) This is one of a number of issues or desires that various people have raised - which can all translate into other (separate) fast track extensions that would be separately considered by the community and justified or not as to whether they should be standardized. Any security considerations around exposing, for example, the cycle counter to User level are a whole distinct topic of discussion of the pros (needs by User level for access to the equivalent of an x86 TSC) and the security cons, and of how best to balance all that in defining some arch functionality. Lastly, security concerns exist around exposing many of the architectural and microarchitectural events that could be counted, i.e. this is a broad issue that isn't necessarily fully addressed by having mode-based filtering. For example, limiting U-mode to only count cycles spent in U-mode is still revealing some information about non-U-mode execution (which a security attack could try to leverage in a controlled way). Greg |
|
atishp@...
On Thu, 2021-10-28 at 15:21 -0700, Beeman Strong wrote:
Hi there,Yes. This was discussed before with Greg. This issue can be avoided by allowing other programmable counters (hpmcounter3-31) monitor the instret/cycle event as well. The current implementation in Qemu[1] already supports this workaround. It was also discussed that a future extension may address this issue if implementing this in hardware is too difficult or expensive. Greg can elaborate more on this if required. [1] https://lists.nongnu.org/archive/html/qemu-devel/2021-10/msg05795.html For instance, a guest OS running in VS may be confused when a guest-- Regards, Atish |
|
andrew@...
On Thu, Oct 28, 2021 at 4:18 PM Greg Favor <gfavor@...> wrote:
👍
|
|
Ved Shanbhogue
I do not think it avoids the issue but works around it. Its not an ideal workaround as the performance analyst now loses some of the programmable performance counters... Programmable counters would be more expensive to build and this places pressure on those limited resources. |
|
On 29/10/21, 5:19 AM, "tech-privileged@... on behalf of Vedvyas Shanbhogue" <tech-privileged@... on behalf of ved@...> wrote:
> > Yes. This was discussed before with Greg. This issue can be avoided by > allowing other programmable counters (hpmcounter3-31) monitor the > instret/cycle event as well. > > The current implementation in Qemu[1] already supports this workaround. > I do not think it avoids the issue but works around it. Its not an ideal workaround as the performance analyst now loses some of the programmable performance counters... Programmable counters would be more expensive to build and this places pressure on those limited resources. [Anup] I agree this is not an ideal work-around. [Anup] I think the fast-track extension for CYCLE and INSTRET filtering should be straight forward (i.e. defines only two CSRs) because most of counter filtering functionality is already defined by Sscofpmf extension. [Anup] I suggest Greg (or someone else) can just propose a write-up for this fast-track extension. Regards, Anup |
|
atishp@...
On Thu, 2021-10-28 at 18:49 -0500, Vedvyas Shanbhogue wrote:
Completely agree. In fact, this was the exact point raised during theI do not think it avoids the issue but works around it. Its not an past discussion. I think all of us agree that we probably need another extension to solve this issue properly. In the meantime, the current OpenSBI implementation tries to reduce some of the pressure by choosing the programmable counter for cycle/instret only if sscof extension is implemented. This can be improved further by selecting the programmable counter only if perf sampling is requested from user for cycle/instret. -- Regards, Atish |
|
Ved Shanbhogue
On Thu, Oct 28, 2021 at 04:36:19PM -0700, Andrew Waterman wrote:
On Thu, Oct 28, 2021 at 4:18 PM Greg Favor <gfavor@...> wrote:A related question. With privilege mode based inhibits the treatment of instructionsOn Thu, Oct 28, 2021 at 3:21 PM Beeman Strong <beeman@...> wrote:👍2) I haven't seen a clear definition of what instret counts, other than:Andrew can confirm, but I believe the answer is Yes. Which makes sense that cross privilege mode is not explicitly defined in the Sscofmpf extension. For example, ECALL and EBREAK will not count as instruction retired but they may count other events. To take a hypothetical example, if there was a event for a uncompressed instruction starting at unliagned address. Likewise SRET and MRET are eligible to be counted as instructions and may also trigger other events. In all these cases where the instruction leads to a privilege domain change it would be good clarify that the inhibits at the originating privilege domain that apply for the counting. I hope thats the agreed upon expected behavior? regards ved |
|
Ved Shanbhogue
On Fri, Oct 29, 2021 at 05:10:31AM +0000, Anup Patel wrote:
Since Sscofmpf is still not ratified a second extension could be avoided if its just folded into the current extension. Is that an option? regards ved |
|
Beeman Strong
You just beat me, I had a mail to the same effect in progress. For instructions I agree that events, like instret, should be based on the originating PL. Other uarch events, like cycles or stalls, will naturally depend on implementation but ideally would be associated with the originating PL as much as possible. For interrupts or exceptions, I think it's the opposite: events will ideally depend on the destination PL. Since these operations don't increment instret I'm not sure there's a hard and fast rule here, so perhaps it's just up to implementations how they handle these cases. But suffice it to say that events that count interrupts or page faults, for instance, would not be expected by a user profiler. From an implementation perspective, the best approach would seem to be to increment most non-uarch events (like instret) at retire, using the PL at retire. xRET will require special treatment, to increment events before the PL change. On Fri, Oct 29, 2021 at 8:22 AM Ved Shanbhogue <ved@...> wrote: On Thu, Oct 28, 2021 at 04:36:19PM -0700, Andrew Waterman wrote: |
|
Beeman Strong
Thanks guys. It would be good to clarify this in the spec, so there is no confusion. x86's INST_RETIRED.ALL, for instance, does increment for interrupts and exceptions. On Thu, Oct 28, 2021 at 4:36 PM Andrew Waterman <andrew@...> wrote:
|
|
Greg Favor
On Fri, Oct 29, 2021 at 9:13 AM Beeman Strong <beeman@...> wrote:
Exactly. Even if the after-effect of an instruction is to establish a new execution context for the target or next instruction, the instruction itself is considered as being fetched/decoded/executed in its starting context (privilege mode, translation context, etc.).
Agreed. For some events this would be straightforward; for others it will be fuzzier or not possible. There are many types of uarch events that are hard or not possible to clearly associate with a causative instruction and hence a PL (or even with just a causative PL for some types of events).
ARM and its attempt at standardizing arch (and uarch) events explicitly acknowledges that there are ones that cannot be required across all implementations to be non-speculative - and hence ARM allows an implementation to implement either a speculative or non-speculative version of the defined event. Greg |
|
Greg Favor
On Fri, Oct 29, 2021 at 9:15 AM Beeman Strong <beeman@...> wrote:
I don't argue against a non-normative clarification, but a direct literal reading of the RV text (i.e. only completed/retired instructions are counted) leaves no ambiguity. Taking an interrupt or an exception does not correspond to execution of any instruction in the program. It's an interesting event, but it's not the execution and retirement of an instruction. (Similarly, executing an instruction, causing cache/TLB misses, and causing other lasting uarch effects, but ultimately not retiring the instruction, must not be counted even though real effects with performance implications have been caused.) As far as adding clarification because another architecture has a different definition of something (in this case a specific perf event), this starts down a long slippery slope - across the RV arch specs - of countering people's presumptions based on x86 or ARM or MIPS or .... Currently the RV arch specs expect people to fully read the spec and to not presume meanings or behaviors that may happen to be true in other architectures. Other architectures take the same tack. Conversely, in practice, there will be people that only do a quick or loose read of some arch text, don't read other related sections of the arch specs, and bring in their own interpretative presumptions. Trying to defend against all that would ultimately result in lots and lots of clarifications. (And discussions over where to draw the line.) Greg |
|
Beeman Strong
Fair enough. But it would be nice to have the full definition of instret in one place. To discern that instructions that cause exceptions don't increment the counter requires reading the ECALL/EBREAK description. Perhaps the following change would be reasonable: M-mode includes a basic hardware performance-monitoring facility. The mcycle CSR counts the number of clock cycles executed by the processor core on which the hart is running. The minstret CSR counts the number of instructions the hart has retired. The mcycle and minstret registers have 64-bit precision on all RV32 and RV64 systems. ========== NEW non-normative text ========== Instructions that cause synchronous exceptions, including ECALL and EBREAK, are not considered to retire and hence do not increment the minstret CSR. On Fri, Oct 29, 2021 at 9:57 AM Greg Favor <gfavor@...> wrote:
|
|
andrew@...
Ordinarily I'd recommend against the duplication of information, but I've got to agree this information would be more usefully presented next to the description of the counters than at the description of ECALL/EBREAK. But instead of adding this to the priv spec next to the minstret definition, I'm going to recommend adding it to the unpriv spec next to the instret definition, where it's more visible. On Fri, Oct 29, 2021 at 12:34 PM Beeman Strong <beeman@...> wrote:
|
|
Greg Favor
On Fri, Oct 29, 2021 at 8:26 AM Ved Shanbhogue <ved@...> wrote: Since Sscofmpf is still not ratified a second extension could be avoided if its just Over the past year RVI (the TSC especially) has clearly moved towards expecting extensions to be developed as smaller separate extensions instead of being glommed together in longer development efforts. TG's going forward will all have tightly focused and well defined charters, with a more bounded timeline to produce a specific ratified standard. The intent is to standardize basic important functionality sooner, and then come back around in a second (or even third) development phase to standardize secondary features, additional features, and/or feature enhancements. (The CMO group is an example of this.) Fast track extensions are intended to focus on one limited architecture extension; further fast track extensions can add on further features. Also note that adding functionality to an extension while in development is not just a matter of writing a larger spec, but of expanding the amount of work to be done with respect to all the required DoD (Definition of Done) tasks (e.g. ACTs, Spike/QEMU/SAIL support, toolchain and software support, PoCs). Greg |
|
Ved Shanbhogue
On Sat, Oct 30, 2021 at 01:36:07AM -0700, Greg Favor wrote:
On Fri, Oct 29, 2021 at 8:26 AM Ved Shanbhogue <ved@...> wrote:Totally agree its not just a matter of writing a larger specification. My observationSince Sscofmpf is still not ratified a second extension could be avoidedOver the past year RVI (the TSC especially) has clearly moved towards in this specific case was that its not creating fundamentally new architecture. The CSR we discussed would likely follow the rules of other mhpmevent CSR and not likely create unique new architecture. Further I was observing that software had workaround being built in to address this issue and it may ease software enabling by not needing to carry the workaround. A new extension would be fine as well. Software can pick between invoking the work around or not based on existence of the new extension. regards ved |
|