Given the discussions about cache-ops and the name for them on tech-cmo and the desire to avoid "co", "COF" (which can also mean "change of flow") may not be the best choice for the extension short-name. What about just "Sshpm", as this extension is what really allows the HPM to be well-utilized by tools like perf? Or is that too confusing since hpm already exists?
Using 'hpm' probably would be a bit confusing. But I'll look into alternatives. Btw, since a new extension naming standard is being developed, ultimately this (and all other extensions) will need to conform to the new scheme (although the 'Ss' part of this name is expected to be consistent with that new scheme). Also note that CMO group extensions will have "Zi*" names and the concern over use of "co" or "cop" as a root name was particularly in that context (i.e. wrt other Unpriv spec extensions; while this extension in the "S" name space for Priv extensions). But in any case I'll explore alternatives that may be acceptable.
Is there a reason there is no mcountovf? It would simplify the software for an M-mode tool, and for cores that don't have an S-mode.
This has been discussed (with the lead architects; I'll stop repeatedly mentioning this). And in standard RISC philosophy form, it was considered to have insufficient justification. For a core with S-mode and if M-mode wants to examine the bits for counters that have not been "delegated" down to S-mode via mcounteren, then M-mode can either use a three-instruction sequence to read a version of scountovf unaffected by mcounteren, or it can directly check the individual mhpmevent.OF bits that it cares about. The latter also applies for a core without S-mode that implements this extension. (Further, I imagine a "no S-mode" CPU probably only implements a small number of counters.)
How is overflow defined for an implementation that implements 32<n<64 bits in the counter registers? Although the registers are architecturally 64 bits, an implementation may not want to support all of them.
The Priv spec says "The mhpmcounters are WARL registers that support up to 64 bits of precision". This allows complete flexibility for how many implemented bits there are.
Since count values are defined as unsigned, there is always an equivalent unsigned 64-bit current count value irrespective of the implemented size. So overflow is well-defined (modulo the issue down below).
Mandating full 64-bit counters may make an implementation area-prohibitive for the smallest of perfmon-enabled embedded cores. I think this could be specified like this: "An implementation may implement less than 64 bits for the hpmcounter CSRs. On such an implementation, software can query the bit width of the hmpcounter registers by taking advantage of the WARL behavior: writing all 1's and reading back to see which bits retained the set value.
This would be an issue to raise with the existing Priv spec, not with this extension. But as noted above, this isn't really an issue since it is already comprehended by the Priv spec.
Also, on such implementations, overflow is defined to occur when the highest implemented bit transitions from 1 to 0." Given that, software can do the right thing regardless of implemented bit width.
Good point. The current proposed definition doesn't properly comprehend the WARL nature of the hpmcounter registers. I'll switch to a definition along the lines of what you describe (I agree that that is what is needed). Count values remain as unsigned values and overflow is unsigned overflow of the implemented bits.
Greg