Re: A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

Greg Favor

Ah, I see.  The 'marked' bit is state associated with and managed by the code running, not associated with a counter.  Then a counter could be configured (via its event selection) to count a selected event while the current 'marked' bit is set or not set.

As you note, for everyone using a perf-style approach, this 'marked' bit is not so useful.  And for the other bare-metal embedded customers that might desire this 'marked' bit, this bit of state needs to be added to some other existing or new CSR that is distinct from the current hpmcounter/mhpmevent CSR's.  That sounds like a separate (small) extension, orthogonal to the current discussion, targeted at this embedded segment of people.


On Mon, Jul 20, 2020 at 10:26 PM Brian Grayson <brian.grayson@...> wrote:
The 'marked bit' in my proposal is different from a per-counter active bit, and may be a bit hard to explain well, but I'll try.

To me, there are two very different performance monitor approaches for system software:

- Linux-style, where one uses a tool like perf, and perfmon state is saved and restored on context switches. In this case, a 'marked bit' is not needed, as one can just control everything on a per-process basis

- embedded bare-metal whole-system performance monitoring, where there is no support like perf. This is where the 'marked bit' becomes more obvious, where basically one configures the counters to be free-running, except for the masking. So for example, consider an embedded application where there is the bread-and-butter ordinary work, but there is also some kind of exceptional/unordinary work (bad packet checksum, new route to establish, network link up/down, etc as networking examples). One could set and clear the marked bit on entry and exit from these routines, allowing easy profiling of everything, or of just the ordinary work, or of just the exceptional work, by using the marked bit. The same could be done by having each entry/exit point reprogram N counters, or altering a global mcountinhibit, but both of those approaches fall short. The first one forces a recompile of your system software whenever you want to change events, or a swapin/swapout of perfmon state (just like a context switch) when entering/leaving these routines, while the marked bit just requires setting a single bit; the second one (using mcountinhibit) forces one to choose between counting or not counting, and doesn't allow you to count marked-bit activities on one counter, and non-marked-bit activities on another counter, and all activities (both marked and unmarked) on a third counter.

If all of the embedded customers will be using a perf-style approach with context-switching of perfmon registers, I think I can agree that the marked bit is not as useful, but I don't think that will be true for all of our customers.


Join to automatically receive all group messages.