Re: Fast-track extension proposal for "Hardware Performance Monitor count overflow and mode-based event filtering"

Ernie Edgar

The Debug Spec has a count inhibit control bit already -- See the stopcount field in dcsr.  


On Wed, Feb 3, 2021 at 12:47 PM Greg Favor <gfavor@...> wrote:
Good question.  A couple of quick thoughts:

- D-mode is not a required part of the Debug spec, i.e. a fully compliant implementation of the full Debug spec may not use an "execution-based" approach and hence would not support D-mode.  I don't think this forces anything; I'm just noting that full support of the Debug spec does not imply presence of D-mode.

- The Priv and Unpriv architectures consistently say nothing about debug features (other than EBREAK with very little definition).  All things debug (and trace) are separated and encapsulated into the Debug and Trace specs.  Properly it should be the Debug spec that adds a D-mode filter bit as a new feature.  This is no different than it also being responsible to add H extension-related new features within the Debug spec.  And other new features that may arise because of other new extensions.  It would be up to the Debug TG to decide whether to add a D-mode filter bit in mhpmeventn, provide such a bit in a debug-related register, or whatever other option that it might settle on.


On Wed, Feb 3, 2021 at 11:31 AM Brian Grayson <brian.grayson@...> wrote:
Another thought from one of my debug coworkers: what do we want to do about debug (D-mode)? Having a bit that says "inhibit counting when we are in D-mode" would allow debuggers to do accurate perfmon on M-mode code. This support could be part of the debug extension, but that adds a tie-in across extensions. If we shift all the inhibit bits down and make debug the highest, that keeps things sane, and implementations that don't implement D-mode can ignore the bit.


On Wed, Feb 3, 2021 at 11:19 AM Greg Favor <gfavor@...> wrote:
Dang.  I think that typo crept in at the last second.  The bit numbering for the bottom three bits should be 58, 57, and 56 - resulting in the full top byte of mhpmevent being covered.


On Wed, Feb 3, 2021 at 8:57 AM Brian Grayson <brian.grayson@...> wrote:
I noticed another typo that I don't think has been pointed out -- reuse of bit 59.

bit [59]  VSINH       -  If set, then counting of events in VS-mode is inhibited
bit [59]  VUINH       -  If set, then counting of events in VU-mode is inhibited


On Mon, Feb 1, 2021 at 2:08 PM Greg Favor <gfavor@...> wrote:
On Mon, Feb 1, 2021 at 10:00 AM Brian Grayson <brian.grayson@...> wrote:
Given the discussions about cache-ops and the name for them on tech-cmo and the desire to avoid "co", "COF" (which can also mean "change of flow") may not be the best choice for the extension short-name. What about just "Sshpm", as this extension is what really allows the HPM to be well-utilized by tools like perf? Or is that too confusing since hpm already exists?

Using 'hpm' probably would be a bit confusing.  But I'll look into alternatives.  Btw, since a new extension naming standard is being developed, ultimately this (and all other extensions) will need to conform to the new scheme (although the 'Ss' part of this name is expected to be consistent with that new scheme).  Also note that CMO group extensions will have "Zi*" names and the concern over use of "co" or "cop" as a root name was particularly in that context (i.e. wrt other Unpriv spec extensions; while this extension in the "S" name space for Priv extensions).  But in any case I'll explore alternatives that may be acceptable.
Is there a reason there is no mcountovf? It would simplify the software for an M-mode tool, and for cores that don't have an S-mode.

This has been discussed (with the lead architects; I'll stop repeatedly mentioning this).  And in standard RISC philosophy form, it was considered to have insufficient justification.  For a core with S-mode and if M-mode wants to examine the bits for counters that have not been "delegated" down to S-mode via mcounteren, then M-mode can either use a three-instruction sequence to read a version of scountovf unaffected by mcounteren, or it can directly check the individual mhpmevent.OF bits that it cares about.  The latter also applies for a core without S-mode that implements this extension.  (Further, I imagine a "no S-mode" CPU probably only implements a small number of counters.)
How is overflow defined for an implementation that implements 32<n<64 bits in the counter registers? Although the registers are architecturally 64 bits, an implementation may not want to support all of them.

The Priv spec says "The mhpmcounters are WARL registers that support up to 64 bits of precision".  This allows complete flexibility for how many implemented bits there are.

Since count values are defined as unsigned, there is always an equivalent unsigned 64-bit current count value irrespective of the implemented size.  So overflow is well-defined (modulo the issue down below).
Mandating full 64-bit counters may make an implementation area-prohibitive for the smallest of perfmon-enabled embedded cores. I think this could be specified like this: "An implementation may implement less than 64 bits for the hpmcounter CSRs. On such an implementation, software can query the bit width of the hmpcounter registers by taking advantage of the WARL behavior: writing all 1's and reading back to see which bits retained the set value.

This would be an issue to raise with the existing Priv spec, not with this extension.  But as noted above, this isn't really an issue since it is already comprehended by the Priv spec.
Also, on such implementations, overflow is defined to occur when the highest implemented bit transitions from 1 to 0." Given that, software can do the right thing regardless of implemented bit width.

Good point.  The current proposed definition doesn't properly comprehend the WARL nature of the hpmcounter registers.  I'll switch to a definition along the lines of what you describe (I agree that that is what is needed).  Count values remain as unsigned values and overflow is unsigned overflow of the implemented bits.

Join to automatically receive all group messages.