Fast-track extension proposal for "Hardware Performance Monitor count overflow and mode-based event filtering"


Greg Favor
 

Hi all,

Recently the TSC established a lightweight "fast track" architecture extension process that small, straightforward, relatively uncontentious arch extension proposals can utilize.  This is the second of two Privileged architecture related small extensions - that a number of people/companies have expressed desire for over the past year - that Andrew and I discussed trying to help move through this process sooner than later (especially since this entails much more than simply developing a spec).  The following starts with an intro for context, and then provides the draft spec.

Note that the draft spec is written as the actual changes to be made to existing paragraphs of Priv spec text (or additional paragraphs and/or sections within the existing text).  The surrounding sentence(s) of a change are included for context.  Text in square brackets is temporary commentary that is not part of the proposed spec changes.

In anticipation of some questions that may arise in people's minds, I'll note that this extension has been extensively reviewed by the lead architects of the Privileged and Hypervisor architectures for consistency with the current architecture (including little things like extension, CSR, and bit/field names).  Various changes were made along the way because of this.

===============================================================================
Introduction

The current Privileged specification defines mhpmevent CSRs to select and control event counting by the associated hpmcounter CSRs, but provides no standardization of any fields within these CSRs.  For at least Linux-class rich-OS systems it is desirable to standardize certain basic features that are broadly desired (and have come up over the past year plus on RISC-V lists, as well as have been the subject of past proposals).  This enables there to be standard upstream software support that eliminates the need for implementations to provide their own custom software support.  (Implementations are free, of course, to not implement this extension.)

This proposal serves to accomplish exactly this within the existing mhpmevent CSRs (and correspondingly avoids the unnecessary creation of whole new sets of CSRs - past just one new CSR).

Below is a one-page draft spec of the proposal - which sticks to addressing two basic well-understood needs that have been requested by various people.  The proposed extension name is "Sscof" ('Ss' for Privileged arch and Supervisor-level extensions, and 'cof' for Count Overflow and Filtering).  There are other features that various people may desire (and that even I would desire) that don't have clear-cut, non-contentious, and relatively broad support.  These can be grist for separate discussions and possibly another arch extension by a motivated party that gathers a sufficient degree of concensus.

Although one such feature worth highlighting is having a WrEn bit in mhpmevent that allows lower privilege modes that can read the associated hpmcounter CSR (based on the *counteren CSRs) to also be able to write it.  In essence enabling direct S/VS-mode and U/VU-mode write access instead of always requiring OpenSBI calls up to M-mode.  But this feature has had some contention, involves some details to properly support virtualization, and requires allocating a second set of "User-Read-Write" hpmcounter CSR numbers (since the current hpmcounter CSRs are "User-Read-Only").  If there is a broad upwelling of support and justification for this feature, and some party willing to put together a complete spec (including virtualization support), then this could be another fast-track extension.

Lastly note that the new count overflow interrupt will be treated as a standard local interrupt that is assigned to bit 13 in the mip/mie/sip/sie registers.  (This has been discussed and agreed to with key Priv Arch people.)

This posting to this email list starts an initial review period (over the next few weeks) for people to provide feedback, questions, comments, etc.

================================================================================
Proposed Spec

=======================================================================
=======================  Machine-Level ISA Additions  ========================

Hardware Performance Monitor
[ This extension expands the hardware performance monitor description and extends the mhpmevent registers to 64 bits (in RV32) as follows: ]

The hardware performance monitor includes 29 additional 64-bit event counters and 29 associated 64-bit event selector registers - the mhpmcounter3–mhpmcounter31 and mhpmevent3–mhpmevent31 CSRs.

The mhpmcounters are WARL registers that support up to 64 bits of precision on RV32 and RV64. 

The mhpmeventn registers are WARL registers that control which event causes the corresponding counter to increment and what happens when the corresponding count overflows. Currently just a few bits are defined here.  Past this, the actual selection and meaning of events is defined by the platform, but (mhpmevent == 0) is defined to mean “no event" and that the corresponding counter will never be incremented.  Typically the lower bits of mhpmevent will be used for event selection purposes.  

On RV32 only, reads of the mcycle, minstret, mhpmcountern, and mhpmeventn CSRs return the low 32 bits, while reads of the mcycleh, minstreth, mhpmcounternh, and mhpmeventnh CSRs return bits 63–32 of the corresponding counter or event selector.  [ The proposed CSR numbers for mhpmeventnh are 0x723 - 0x73F. ]

The following bits are added to mhpmevent:

bit [63]  OF            -  Overflow status and interrupt disable bit that is set when counter overflows

bit [62]  MINH        -  If set, then counting of events in M-mode is inhibited
bit [61]  SINH         -  If set, then counting of events in S/HS-mode is inhibited
bit [60]  UINH         -  If set, then counting of events in U-mode is inhibited
bit [59]  VSINH       -  If set, then counting of events in VS-mode is inhibited
bit [59]  VUINH       -  If set, then counting of events in VU-mode is inhibited
bit [58]  0                -  Reserved for possible future modes
bit [57]  0                -  Reserved for possible future modes

Each of the five 'x'INH bits, when set, inhibit counting of events while in privilege mode 'x'.  All-zeroes for these bits results in counting of events in all modes.

The OF bit is set when the corresponding hpmcounter overflows, and remains set until written by software.  Since hpmcounter values are unsigned values, overflow is defined as unsigned overflow.  [ This matches x86 and ARMv8. ]  Note that there is no loss of information after an overflow since the counter wraps around and keeps counting while the sticky OF bit remains set.  [ For a 64-bit counter it will be an awfully long time before another overflow could possibly occur. ]

If supervisor mode is implemented, the 32-bit scountovf register contains read-only shadow copies of the OF bits in all 32 mhpmevent registers.

If an hpmcounter overflows while the associated OF bit is zero, then a "count overflow interrupt request" is generated.  If the OF bit is one, then no interrupt request is generated.  Consequently the OF bit also functions as a count overflow interrupt disable for the associated hpmcounter.

----------------------------  Non-Normative Text    ----------------------------
There are not separate overflow status and overflow interrupt enable bits.  In practice, enabling overflow interrupt generation (by clearing the OF bit) is done in conjunction with initializing the counter to a starting value.  Once a counter has overflowed, it and the OF bit must be reinitialized before another overflow interrupt can be generated.
----------------------------------------------------------------------------------------

This "count overflow interrupt request" signal is treated as a standard local interrupt that corresponds to bit 13 in the mip/mie/sip/sie registers.  The mip/sip LCOFIP and mie/sie LCOFIE bits are respectively the interrupt-pending and interrupt-enable bits for this interrupt.  ('LCOFI' represents 'Local Count Overflow Interrupt'.)  [ This proposal doesn't try to introduce per-privilege mode overflow interrupt request signals.  ARMv8 doesn't have this and I don't think x86 does either. ]
 
Generation of a "count overflow interrupt request" by an hpmcounter sets the LCOFIP bit in the mip/sip registers and sets the associated OF bit.  The LCOFIP bit is cleared by software after servicing the count overflow interrupt resulting from one or more count overflows.

----------------------------  Non-Normative Text    ----------------------------
Software can maintain a bit mask to distinguish newly overflowed counters (yet to be serviced by an overflow interrupt handler) from overflowed counters that have already been serviced or that are configured to not generate an interrupt on overflow.
----------------------------------------------------------------------------------------

Machine Interrupt Registers (mip and mie)
[ This extension adds the description of the LCOFIP/LCOFIE bits in these registers (and modifies related text) as follows: ]

LCOFIP is added to mip in Figure 3.14 as bit 13.  LCOFIP is added to mie in Figure 3.15 as bit 13.

If the Sscof extension is implemented, bits mip.LCOFIP and mie.LCOFIE are the interrupt-pending and interrupt-enable bits for local count overflow interrupts.  LCOFIP is read-write in mip and reflects the occurrence of a local count overflow interrupt request resulting from any of the mhpmeventn.OF bits being set.   If the Sscof extension is not implemented, these LCOFIP and LCOFIE bits are hardwired to zeros.

Multiple simultaneous interrupts destined for different privilege modes are handled in decreasing order of destined privilege mode. Multiple simultaneous interrupts destined for the same privilege mode are handled in the following decreasing priority order: MEI, MSI, MTI, SEI, SSI, STI, LCOFI.

=========================================================================
=======================  Supervisor-Level ISA Additions  ========================

Supervisor Interrupt Registers (sip and sie)
[ This extension adds the description of the LCOFIP/LCOFIE bits in these registers (and modifies related text) as follows: ]

LCOFIP is added to sip in Figure 4.6 as bit 13.  LCOFIP is added to sie in Figure 4.7 as bit 13.

If the Sscof extension is implemented, bits sip.LCOFIP and sie.LCOFIE are the interrupt-pending and interrupt-enable bits for local count overflow interrupts.  LCOFIP is read-write in sip and reflects the occurrence of a local count overflow interrupt request resulting from any of the mhpmeventn.OF bits being set.  If the Sscof extension is not implemented, these LCOFIP and LCOFIE bits are hardwired to zeros. 

Each standard interrupt type (LCOFI, SEI, STI, or SSI) may not be implemented, in which case the corresponding interrupt-pending and interrupt-enable bits are hardwired to zeros.  All bits in sip and sie are WARL fields.

Multiple simultaneous interrupts destined for supervisor mode are handled in the following decreasing priority order: SEI, SSI, STI, LCOFI.

Supervisor Count Overflow (scountovf)
[ This extension adds this new CSR. ]

The scountovf CSR is a 32-bit read-only register that contains shadow copies of the OF bits in the 32 mhpmevent CSRs - where scountovf bit X corresponds to mhpmeventX.  The proposed CSR number is 0xD33.

This register enables supervisor-level overflow interrupt handler software to quickly and easily determine which counter(s) have overflowed (without needing to make an execution environment call or series of calls ultimately up to M-mode).  [ ARMv8 and x86 have a similar register for the same reasons. ]

Read access to bit X is subject to the same mcounteren (or mcounteren and hcounteren) CSRs that mediate access to the hpmcounter CSRs by S-mode (or VS-mode).  In M and S modes, scountovf bit X is readable when mcounteren bit X is set, and otherwise reads as zero.  Similarly, in VS mode, scountovf bit X is readable when mcounteren bit X and hcounteren bit X are both set, and otherwise reads as zero. 


Greg Favor
 

One typo crept by me and some other pre-reviewers:  scountovf contains shadow copies of the OF bits in the 29 mhpmevent CSRs (i.e. mhpmevent3-mhpmevent31).

Greg


On Sun, Jan 31, 2021 at 10:38 PM Greg Favor <gfavor@...> wrote:
Hi all,

Recently the TSC established a lightweight "fast track" architecture extension process that small, straightforward, relatively uncontentious arch extension proposals can utilize.  This is the second of two Privileged architecture related small extensions - that a number of people/companies have expressed desire for over the past year - that Andrew and I discussed trying to help move through this process sooner than later (especially since this entails much more than simply developing a spec).  The following starts with an intro for context, and then provides the draft spec.

Note that the draft spec is written as the actual changes to be made to existing paragraphs of Priv spec text (or additional paragraphs and/or sections within the existing text).  The surrounding sentence(s) of a change are included for context.  Text in square brackets is temporary commentary that is not part of the proposed spec changes.

In anticipation of some questions that may arise in people's minds, I'll note that this extension has been extensively reviewed by the lead architects of the Privileged and Hypervisor architectures for consistency with the current architecture (including little things like extension, CSR, and bit/field names).  Various changes were made along the way because of this.

===============================================================================
Introduction

The current Privileged specification defines mhpmevent CSRs to select and control event counting by the associated hpmcounter CSRs, but provides no standardization of any fields within these CSRs.  For at least Linux-class rich-OS systems it is desirable to standardize certain basic features that are broadly desired (and have come up over the past year plus on RISC-V lists, as well as have been the subject of past proposals).  This enables there to be standard upstream software support that eliminates the need for implementations to provide their own custom software support.  (Implementations are free, of course, to not implement this extension.)

This proposal serves to accomplish exactly this within the existing mhpmevent CSRs (and correspondingly avoids the unnecessary creation of whole new sets of CSRs - past just one new CSR).

Below is a one-page draft spec of the proposal - which sticks to addressing two basic well-understood needs that have been requested by various people.  The proposed extension name is "Sscof" ('Ss' for Privileged arch and Supervisor-level extensions, and 'cof' for Count Overflow and Filtering).  There are other features that various people may desire (and that even I would desire) that don't have clear-cut, non-contentious, and relatively broad support.  These can be grist for separate discussions and possibly another arch extension by a motivated party that gathers a sufficient degree of concensus.

Although one such feature worth highlighting is having a WrEn bit in mhpmevent that allows lower privilege modes that can read the associated hpmcounter CSR (based on the *counteren CSRs) to also be able to write it.  In essence enabling direct S/VS-mode and U/VU-mode write access instead of always requiring OpenSBI calls up to M-mode.  But this feature has had some contention, involves some details to properly support virtualization, and requires allocating a second set of "User-Read-Write" hpmcounter CSR numbers (since the current hpmcounter CSRs are "User-Read-Only").  If there is a broad upwelling of support and justification for this feature, and some party willing to put together a complete spec (including virtualization support), then this could be another fast-track extension.

Lastly note that the new count overflow interrupt will be treated as a standard local interrupt that is assigned to bit 13 in the mip/mie/sip/sie registers.  (This has been discussed and agreed to with key Priv Arch people.)

This posting to this email list starts an initial review period (over the next few weeks) for people to provide feedback, questions, comments, etc.

================================================================================
Proposed Spec

=======================================================================
=======================  Machine-Level ISA Additions  ========================

Hardware Performance Monitor
[ This extension expands the hardware performance monitor description and extends the mhpmevent registers to 64 bits (in RV32) as follows: ]

The hardware performance monitor includes 29 additional 64-bit event counters and 29 associated 64-bit event selector registers - the mhpmcounter3–mhpmcounter31 and mhpmevent3–mhpmevent31 CSRs.

The mhpmcounters are WARL registers that support up to 64 bits of precision on RV32 and RV64. 

The mhpmeventn registers are WARL registers that control which event causes the corresponding counter to increment and what happens when the corresponding count overflows. Currently just a few bits are defined here.  Past this, the actual selection and meaning of events is defined by the platform, but (mhpmevent == 0) is defined to mean “no event" and that the corresponding counter will never be incremented.  Typically the lower bits of mhpmevent will be used for event selection purposes.  

On RV32 only, reads of the mcycle, minstret, mhpmcountern, and mhpmeventn CSRs return the low 32 bits, while reads of the mcycleh, minstreth, mhpmcounternh, and mhpmeventnh CSRs return bits 63–32 of the corresponding counter or event selector.  [ The proposed CSR numbers for mhpmeventnh are 0x723 - 0x73F. ]

The following bits are added to mhpmevent:

bit [63]  OF            -  Overflow status and interrupt disable bit that is set when counter overflows

bit [62]  MINH        -  If set, then counting of events in M-mode is inhibited
bit [61]  SINH         -  If set, then counting of events in S/HS-mode is inhibited
bit [60]  UINH         -  If set, then counting of events in U-mode is inhibited
bit [59]  VSINH       -  If set, then counting of events in VS-mode is inhibited
bit [59]  VUINH       -  If set, then counting of events in VU-mode is inhibited
bit [58]  0                -  Reserved for possible future modes
bit [57]  0                -  Reserved for possible future modes

Each of the five 'x'INH bits, when set, inhibit counting of events while in privilege mode 'x'.  All-zeroes for these bits results in counting of events in all modes.

The OF bit is set when the corresponding hpmcounter overflows, and remains set until written by software.  Since hpmcounter values are unsigned values, overflow is defined as unsigned overflow.  [ This matches x86 and ARMv8. ]  Note that there is no loss of information after an overflow since the counter wraps around and keeps counting while the sticky OF bit remains set.  [ For a 64-bit counter it will be an awfully long time before another overflow could possibly occur. ]

If supervisor mode is implemented, the 32-bit scountovf register contains read-only shadow copies of the OF bits in all 32 mhpmevent registers.

If an hpmcounter overflows while the associated OF bit is zero, then a "count overflow interrupt request" is generated.  If the OF bit is one, then no interrupt request is generated.  Consequently the OF bit also functions as a count overflow interrupt disable for the associated hpmcounter.

----------------------------  Non-Normative Text    ----------------------------
There are not separate overflow status and overflow interrupt enable bits.  In practice, enabling overflow interrupt generation (by clearing the OF bit) is done in conjunction with initializing the counter to a starting value.  Once a counter has overflowed, it and the OF bit must be reinitialized before another overflow interrupt can be generated.
----------------------------------------------------------------------------------------

This "count overflow interrupt request" signal is treated as a standard local interrupt that corresponds to bit 13 in the mip/mie/sip/sie registers.  The mip/sip LCOFIP and mie/sie LCOFIE bits are respectively the interrupt-pending and interrupt-enable bits for this interrupt.  ('LCOFI' represents 'Local Count Overflow Interrupt'.)  [ This proposal doesn't try to introduce per-privilege mode overflow interrupt request signals.  ARMv8 doesn't have this and I don't think x86 does either. ]
 
Generation of a "count overflow interrupt request" by an hpmcounter sets the LCOFIP bit in the mip/sip registers and sets the associated OF bit.  The LCOFIP bit is cleared by software after servicing the count overflow interrupt resulting from one or more count overflows.

----------------------------  Non-Normative Text    ----------------------------
Software can maintain a bit mask to distinguish newly overflowed counters (yet to be serviced by an overflow interrupt handler) from overflowed counters that have already been serviced or that are configured to not generate an interrupt on overflow.
----------------------------------------------------------------------------------------

Machine Interrupt Registers (mip and mie)
[ This extension adds the description of the LCOFIP/LCOFIE bits in these registers (and modifies related text) as follows: ]

LCOFIP is added to mip in Figure 3.14 as bit 13.  LCOFIP is added to mie in Figure 3.15 as bit 13.

If the Sscof extension is implemented, bits mip.LCOFIP and mie.LCOFIE are the interrupt-pending and interrupt-enable bits for local count overflow interrupts.  LCOFIP is read-write in mip and reflects the occurrence of a local count overflow interrupt request resulting from any of the mhpmeventn.OF bits being set.   If the Sscof extension is not implemented, these LCOFIP and LCOFIE bits are hardwired to zeros.

Multiple simultaneous interrupts destined for different privilege modes are handled in decreasing order of destined privilege mode. Multiple simultaneous interrupts destined for the same privilege mode are handled in the following decreasing priority order: MEI, MSI, MTI, SEI, SSI, STI, LCOFI.

=========================================================================
=======================  Supervisor-Level ISA Additions  ========================

Supervisor Interrupt Registers (sip and sie)
[ This extension adds the description of the LCOFIP/LCOFIE bits in these registers (and modifies related text) as follows: ]

LCOFIP is added to sip in Figure 4.6 as bit 13.  LCOFIP is added to sie in Figure 4.7 as bit 13.

If the Sscof extension is implemented, bits sip.LCOFIP and sie.LCOFIE are the interrupt-pending and interrupt-enable bits for local count overflow interrupts.  LCOFIP is read-write in sip and reflects the occurrence of a local count overflow interrupt request resulting from any of the mhpmeventn.OF bits being set.  If the Sscof extension is not implemented, these LCOFIP and LCOFIE bits are hardwired to zeros. 

Each standard interrupt type (LCOFI, SEI, STI, or SSI) may not be implemented, in which case the corresponding interrupt-pending and interrupt-enable bits are hardwired to zeros.  All bits in sip and sie are WARL fields.

Multiple simultaneous interrupts destined for supervisor mode are handled in the following decreasing priority order: SEI, SSI, STI, LCOFI.

Supervisor Count Overflow (scountovf)
[ This extension adds this new CSR. ]

The scountovf CSR is a 32-bit read-only register that contains shadow copies of the OF bits in the 32 mhpmevent CSRs - where scountovf bit X corresponds to mhpmeventX.  The proposed CSR number is 0xD33.

This register enables supervisor-level overflow interrupt handler software to quickly and easily determine which counter(s) have overflowed (without needing to make an execution environment call or series of calls ultimately up to M-mode).  [ ARMv8 and x86 have a similar register for the same reasons. ]

Read access to bit X is subject to the same mcounteren (or mcounteren and hcounteren) CSRs that mediate access to the hpmcounter CSRs by S-mode (or VS-mode).  In M and S modes, scountovf bit X is readable when mcounteren bit X is set, and otherwise reads as zero.  Similarly, in VS mode, scountovf bit X is readable when mcounteren bit X and hcounteren bit X are both set, and otherwise reads as zero. 


Phil McCoy
 

Could you clarify how this extension interacts with mideleg?  I assume interrupt 13 would be taken in M-mode by default unless it is delegated to S-mode, but it would be nice to state this explicitly.

For implementations that support the Hypervisor extension, hideleg, hvip, hip, hie, vsip and vsie would also be of interest.

Thanks,
Phil


Greg Favor
 

On Mon, Feb 1, 2021 at 6:29 AM Phil McCoy <pnm@...> wrote:
Could you clarify how this extension interacts with mideleg?  I assume interrupt 13 would be taken in M-mode by default unless it is delegated to S-mode, but it would be nice to state this explicitly.

You're correct.  Standard mideleg functionality applies.  I'll incorporate a clarification note.
 
For implementations that support the Hypervisor extension, hideleg, hvip, hip, hie, vsip and vsie would also be of interest.

As you note, this starts getting into adding a number of bits and associated functionality.  The broader arch consistency question is whether this is the best path for the architecture going forward as other local interrupts come into being.

This has specifically been discussed with the lead Priv architects and the view is that there is a different and better way to support "delegation" of local interrupts into a VM.  (In particular, the new virtualization-aware next gen interrupt architecture will properly support this.  A working group on this will be starting shortly; I believe a public announcement is imminent.)

Greg


Brian Grayson
 

Given the discussions about cache-ops and the name for them on tech-cmo and the desire to avoid "co", "COF" (which can also mean "change of flow") may not be the best choice for the extension short-name. What about just "Sshpm", as this extension is what really allows the HPM to be well-utilized by tools like perf? Or is that too confusing since hpm already exists?

I like the concept of putting overflow and filtering control into the mhpmevent registers -- single write to completely configure a counter.

Is there a reason there is no mcountovf? It would simplify the software for an M-mode tool, and for cores that don't have an S-mode.

How is overflow defined for an implementation that implements 32<n<64 bits in the counter registers? Although the registers are architecturally 64 bits, an implementation may not want to support all of them. Mandating full 64-bit counters may make an implementation area-prohibitive for the smallest of perfmon-enabled embedded cores. I think this could be specified like this: "An implementation may implement less than 64 bits for the hpmcounter CSRs. On such an implementation, software can query the bit width of the hmpcounter registers by taking advantage of the WARL behavior: writing all 1's and reading back to see which bits retained the set value. Also, on such implementations, overflow is defined to occur when the highest implemented bit transitions from 1 to 0." Given that, software can do the right thing regardless of implemented bit width.

Brian._,_._,_


Greg Favor
 

On Mon, Feb 1, 2021 at 10:00 AM Brian Grayson <brian.grayson@...> wrote:
Given the discussions about cache-ops and the name for them on tech-cmo and the desire to avoid "co", "COF" (which can also mean "change of flow") may not be the best choice for the extension short-name. What about just "Sshpm", as this extension is what really allows the HPM to be well-utilized by tools like perf? Or is that too confusing since hpm already exists?

Using 'hpm' probably would be a bit confusing.  But I'll look into alternatives.  Btw, since a new extension naming standard is being developed, ultimately this (and all other extensions) will need to conform to the new scheme (although the 'Ss' part of this name is expected to be consistent with that new scheme).  Also note that CMO group extensions will have "Zi*" names and the concern over use of "co" or "cop" as a root name was particularly in that context (i.e. wrt other Unpriv spec extensions; while this extension in the "S" name space for Priv extensions).  But in any case I'll explore alternatives that may be acceptable.
 
Is there a reason there is no mcountovf? It would simplify the software for an M-mode tool, and for cores that don't have an S-mode.

This has been discussed (with the lead architects; I'll stop repeatedly mentioning this).  And in standard RISC philosophy form, it was considered to have insufficient justification.  For a core with S-mode and if M-mode wants to examine the bits for counters that have not been "delegated" down to S-mode via mcounteren, then M-mode can either use a three-instruction sequence to read a version of scountovf unaffected by mcounteren, or it can directly check the individual mhpmevent.OF bits that it cares about.  The latter also applies for a core without S-mode that implements this extension.  (Further, I imagine a "no S-mode" CPU probably only implements a small number of counters.)
 
How is overflow defined for an implementation that implements 32<n<64 bits in the counter registers? Although the registers are architecturally 64 bits, an implementation may not want to support all of them.

The Priv spec says "The mhpmcounters are WARL registers that support up to 64 bits of precision".  This allows complete flexibility for how many implemented bits there are.

Since count values are defined as unsigned, there is always an equivalent unsigned 64-bit current count value irrespective of the implemented size.  So overflow is well-defined (modulo the issue down below).
 
Mandating full 64-bit counters may make an implementation area-prohibitive for the smallest of perfmon-enabled embedded cores. I think this could be specified like this: "An implementation may implement less than 64 bits for the hpmcounter CSRs. On such an implementation, software can query the bit width of the hmpcounter registers by taking advantage of the WARL behavior: writing all 1's and reading back to see which bits retained the set value.

This would be an issue to raise with the existing Priv spec, not with this extension.  But as noted above, this isn't really an issue since it is already comprehended by the Priv spec.
 
Also, on such implementations, overflow is defined to occur when the highest implemented bit transitions from 1 to 0." Given that, software can do the right thing regardless of implemented bit width.

Good point.  The current proposed definition doesn't properly comprehend the WARL nature of the hpmcounter registers.  I'll switch to a definition along the lines of what you describe (I agree that that is what is needed).  Count values remain as unsigned values and overflow is unsigned overflow of the implemented bits.
 
Greg


Brian Grayson
 

I noticed another typo that I don't think has been pointed out -- reuse of bit 59.

bit [59]  VSINH       -  If set, then counting of events in VS-mode is inhibited
bit [59]  VUINH       -  If set, then counting of events in VU-mode is inhibited

Brian

On Mon, Feb 1, 2021 at 2:08 PM Greg Favor <gfavor@...> wrote:
On Mon, Feb 1, 2021 at 10:00 AM Brian Grayson <brian.grayson@...> wrote:
Given the discussions about cache-ops and the name for them on tech-cmo and the desire to avoid "co", "COF" (which can also mean "change of flow") may not be the best choice for the extension short-name. What about just "Sshpm", as this extension is what really allows the HPM to be well-utilized by tools like perf? Or is that too confusing since hpm already exists?

Using 'hpm' probably would be a bit confusing.  But I'll look into alternatives.  Btw, since a new extension naming standard is being developed, ultimately this (and all other extensions) will need to conform to the new scheme (although the 'Ss' part of this name is expected to be consistent with that new scheme).  Also note that CMO group extensions will have "Zi*" names and the concern over use of "co" or "cop" as a root name was particularly in that context (i.e. wrt other Unpriv spec extensions; while this extension in the "S" name space for Priv extensions).  But in any case I'll explore alternatives that may be acceptable.
 
Is there a reason there is no mcountovf? It would simplify the software for an M-mode tool, and for cores that don't have an S-mode.

This has been discussed (with the lead architects; I'll stop repeatedly mentioning this).  And in standard RISC philosophy form, it was considered to have insufficient justification.  For a core with S-mode and if M-mode wants to examine the bits for counters that have not been "delegated" down to S-mode via mcounteren, then M-mode can either use a three-instruction sequence to read a version of scountovf unaffected by mcounteren, or it can directly check the individual mhpmevent.OF bits that it cares about.  The latter also applies for a core without S-mode that implements this extension.  (Further, I imagine a "no S-mode" CPU probably only implements a small number of counters.)
 
How is overflow defined for an implementation that implements 32<n<64 bits in the counter registers? Although the registers are architecturally 64 bits, an implementation may not want to support all of them.

The Priv spec says "The mhpmcounters are WARL registers that support up to 64 bits of precision".  This allows complete flexibility for how many implemented bits there are.

Since count values are defined as unsigned, there is always an equivalent unsigned 64-bit current count value irrespective of the implemented size.  So overflow is well-defined (modulo the issue down below).
 
Mandating full 64-bit counters may make an implementation area-prohibitive for the smallest of perfmon-enabled embedded cores. I think this could be specified like this: "An implementation may implement less than 64 bits for the hpmcounter CSRs. On such an implementation, software can query the bit width of the hmpcounter registers by taking advantage of the WARL behavior: writing all 1's and reading back to see which bits retained the set value.

This would be an issue to raise with the existing Priv spec, not with this extension.  But as noted above, this isn't really an issue since it is already comprehended by the Priv spec.
 
Also, on such implementations, overflow is defined to occur when the highest implemented bit transitions from 1 to 0." Given that, software can do the right thing regardless of implemented bit width.

Good point.  The current proposed definition doesn't properly comprehend the WARL nature of the hpmcounter registers.  I'll switch to a definition along the lines of what you describe (I agree that that is what is needed).  Count values remain as unsigned values and overflow is unsigned overflow of the implemented bits.
 
Greg


Greg Favor
 

Dang.  I think that typo crept in at the last second.  The bit numbering for the bottom three bits should be 58, 57, and 56 - resulting in the full top byte of mhpmevent being covered.

Thanks,
Greg
 

On Wed, Feb 3, 2021 at 8:57 AM Brian Grayson <brian.grayson@...> wrote:
I noticed another typo that I don't think has been pointed out -- reuse of bit 59.

bit [59]  VSINH       -  If set, then counting of events in VS-mode is inhibited
bit [59]  VUINH       -  If set, then counting of events in VU-mode is inhibited

Brian

On Mon, Feb 1, 2021 at 2:08 PM Greg Favor <gfavor@...> wrote:
On Mon, Feb 1, 2021 at 10:00 AM Brian Grayson <brian.grayson@...> wrote:
Given the discussions about cache-ops and the name for them on tech-cmo and the desire to avoid "co", "COF" (which can also mean "change of flow") may not be the best choice for the extension short-name. What about just "Sshpm", as this extension is what really allows the HPM to be well-utilized by tools like perf? Or is that too confusing since hpm already exists?

Using 'hpm' probably would be a bit confusing.  But I'll look into alternatives.  Btw, since a new extension naming standard is being developed, ultimately this (and all other extensions) will need to conform to the new scheme (although the 'Ss' part of this name is expected to be consistent with that new scheme).  Also note that CMO group extensions will have "Zi*" names and the concern over use of "co" or "cop" as a root name was particularly in that context (i.e. wrt other Unpriv spec extensions; while this extension in the "S" name space for Priv extensions).  But in any case I'll explore alternatives that may be acceptable.
 
Is there a reason there is no mcountovf? It would simplify the software for an M-mode tool, and for cores that don't have an S-mode.

This has been discussed (with the lead architects; I'll stop repeatedly mentioning this).  And in standard RISC philosophy form, it was considered to have insufficient justification.  For a core with S-mode and if M-mode wants to examine the bits for counters that have not been "delegated" down to S-mode via mcounteren, then M-mode can either use a three-instruction sequence to read a version of scountovf unaffected by mcounteren, or it can directly check the individual mhpmevent.OF bits that it cares about.  The latter also applies for a core without S-mode that implements this extension.  (Further, I imagine a "no S-mode" CPU probably only implements a small number of counters.)
 
How is overflow defined for an implementation that implements 32<n<64 bits in the counter registers? Although the registers are architecturally 64 bits, an implementation may not want to support all of them.

The Priv spec says "The mhpmcounters are WARL registers that support up to 64 bits of precision".  This allows complete flexibility for how many implemented bits there are.

Since count values are defined as unsigned, there is always an equivalent unsigned 64-bit current count value irrespective of the implemented size.  So overflow is well-defined (modulo the issue down below).
 
Mandating full 64-bit counters may make an implementation area-prohibitive for the smallest of perfmon-enabled embedded cores. I think this could be specified like this: "An implementation may implement less than 64 bits for the hpmcounter CSRs. On such an implementation, software can query the bit width of the hmpcounter registers by taking advantage of the WARL behavior: writing all 1's and reading back to see which bits retained the set value.

This would be an issue to raise with the existing Priv spec, not with this extension.  But as noted above, this isn't really an issue since it is already comprehended by the Priv spec.
 
Also, on such implementations, overflow is defined to occur when the highest implemented bit transitions from 1 to 0." Given that, software can do the right thing regardless of implemented bit width.

Good point.  The current proposed definition doesn't properly comprehend the WARL nature of the hpmcounter registers.  I'll switch to a definition along the lines of what you describe (I agree that that is what is needed).  Count values remain as unsigned values and overflow is unsigned overflow of the implemented bits.
 
Greg


Brian Grayson
 

Another thought from one of my debug coworkers: what do we want to do about debug (D-mode)? Having a bit that says "inhibit counting when we are in D-mode" would allow debuggers to do accurate perfmon on M-mode code. This support could be part of the debug extension, but that adds a tie-in across extensions. If we shift all the inhibit bits down and make debug the highest, that keeps things sane, and implementations that don't implement D-mode can ignore the bit.

Brian

On Wed, Feb 3, 2021 at 11:19 AM Greg Favor <gfavor@...> wrote:
Dang.  I think that typo crept in at the last second.  The bit numbering for the bottom three bits should be 58, 57, and 56 - resulting in the full top byte of mhpmevent being covered.

Thanks,
Greg
 

On Wed, Feb 3, 2021 at 8:57 AM Brian Grayson <brian.grayson@...> wrote:
I noticed another typo that I don't think has been pointed out -- reuse of bit 59.

bit [59]  VSINH       -  If set, then counting of events in VS-mode is inhibited
bit [59]  VUINH       -  If set, then counting of events in VU-mode is inhibited

Brian

On Mon, Feb 1, 2021 at 2:08 PM Greg Favor <gfavor@...> wrote:
On Mon, Feb 1, 2021 at 10:00 AM Brian Grayson <brian.grayson@...> wrote:
Given the discussions about cache-ops and the name for them on tech-cmo and the desire to avoid "co", "COF" (which can also mean "change of flow") may not be the best choice for the extension short-name. What about just "Sshpm", as this extension is what really allows the HPM to be well-utilized by tools like perf? Or is that too confusing since hpm already exists?

Using 'hpm' probably would be a bit confusing.  But I'll look into alternatives.  Btw, since a new extension naming standard is being developed, ultimately this (and all other extensions) will need to conform to the new scheme (although the 'Ss' part of this name is expected to be consistent with that new scheme).  Also note that CMO group extensions will have "Zi*" names and the concern over use of "co" or "cop" as a root name was particularly in that context (i.e. wrt other Unpriv spec extensions; while this extension in the "S" name space for Priv extensions).  But in any case I'll explore alternatives that may be acceptable.
 
Is there a reason there is no mcountovf? It would simplify the software for an M-mode tool, and for cores that don't have an S-mode.

This has been discussed (with the lead architects; I'll stop repeatedly mentioning this).  And in standard RISC philosophy form, it was considered to have insufficient justification.  For a core with S-mode and if M-mode wants to examine the bits for counters that have not been "delegated" down to S-mode via mcounteren, then M-mode can either use a three-instruction sequence to read a version of scountovf unaffected by mcounteren, or it can directly check the individual mhpmevent.OF bits that it cares about.  The latter also applies for a core without S-mode that implements this extension.  (Further, I imagine a "no S-mode" CPU probably only implements a small number of counters.)
 
How is overflow defined for an implementation that implements 32<n<64 bits in the counter registers? Although the registers are architecturally 64 bits, an implementation may not want to support all of them.

The Priv spec says "The mhpmcounters are WARL registers that support up to 64 bits of precision".  This allows complete flexibility for how many implemented bits there are.

Since count values are defined as unsigned, there is always an equivalent unsigned 64-bit current count value irrespective of the implemented size.  So overflow is well-defined (modulo the issue down below).
 
Mandating full 64-bit counters may make an implementation area-prohibitive for the smallest of perfmon-enabled embedded cores. I think this could be specified like this: "An implementation may implement less than 64 bits for the hpmcounter CSRs. On such an implementation, software can query the bit width of the hmpcounter registers by taking advantage of the WARL behavior: writing all 1's and reading back to see which bits retained the set value.

This would be an issue to raise with the existing Priv spec, not with this extension.  But as noted above, this isn't really an issue since it is already comprehended by the Priv spec.
 
Also, on such implementations, overflow is defined to occur when the highest implemented bit transitions from 1 to 0." Given that, software can do the right thing regardless of implemented bit width.

Good point.  The current proposed definition doesn't properly comprehend the WARL nature of the hpmcounter registers.  I'll switch to a definition along the lines of what you describe (I agree that that is what is needed).  Count values remain as unsigned values and overflow is unsigned overflow of the implemented bits.
 
Greg


Greg Favor
 

Good question.  A couple of quick thoughts:

- D-mode is not a required part of the Debug spec, i.e. a fully compliant implementation of the full Debug spec may not use an "execution-based" approach and hence would not support D-mode.  I don't think this forces anything; I'm just noting that full support of the Debug spec does not imply presence of D-mode.

- The Priv and Unpriv architectures consistently say nothing about debug features (other than EBREAK with very little definition).  All things debug (and trace) are separated and encapsulated into the Debug and Trace specs.  Properly it should be the Debug spec that adds a D-mode filter bit as a new feature.  This is no different than it also being responsible to add H extension-related new features within the Debug spec.  And other new features that may arise because of other new extensions.  It would be up to the Debug TG to decide whether to add a D-mode filter bit in mhpmeventn, provide such a bit in a debug-related register, or whatever other option that it might settle on.

Greg



On Wed, Feb 3, 2021 at 11:31 AM Brian Grayson <brian.grayson@...> wrote:
Another thought from one of my debug coworkers: what do we want to do about debug (D-mode)? Having a bit that says "inhibit counting when we are in D-mode" would allow debuggers to do accurate perfmon on M-mode code. This support could be part of the debug extension, but that adds a tie-in across extensions. If we shift all the inhibit bits down and make debug the highest, that keeps things sane, and implementations that don't implement D-mode can ignore the bit.

Brian

On Wed, Feb 3, 2021 at 11:19 AM Greg Favor <gfavor@...> wrote:
Dang.  I think that typo crept in at the last second.  The bit numbering for the bottom three bits should be 58, 57, and 56 - resulting in the full top byte of mhpmevent being covered.

Thanks,
Greg
 

On Wed, Feb 3, 2021 at 8:57 AM Brian Grayson <brian.grayson@...> wrote:
I noticed another typo that I don't think has been pointed out -- reuse of bit 59.

bit [59]  VSINH       -  If set, then counting of events in VS-mode is inhibited
bit [59]  VUINH       -  If set, then counting of events in VU-mode is inhibited

Brian

On Mon, Feb 1, 2021 at 2:08 PM Greg Favor <gfavor@...> wrote:
On Mon, Feb 1, 2021 at 10:00 AM Brian Grayson <brian.grayson@...> wrote:
Given the discussions about cache-ops and the name for them on tech-cmo and the desire to avoid "co", "COF" (which can also mean "change of flow") may not be the best choice for the extension short-name. What about just "Sshpm", as this extension is what really allows the HPM to be well-utilized by tools like perf? Or is that too confusing since hpm already exists?

Using 'hpm' probably would be a bit confusing.  But I'll look into alternatives.  Btw, since a new extension naming standard is being developed, ultimately this (and all other extensions) will need to conform to the new scheme (although the 'Ss' part of this name is expected to be consistent with that new scheme).  Also note that CMO group extensions will have "Zi*" names and the concern over use of "co" or "cop" as a root name was particularly in that context (i.e. wrt other Unpriv spec extensions; while this extension in the "S" name space for Priv extensions).  But in any case I'll explore alternatives that may be acceptable.
 
Is there a reason there is no mcountovf? It would simplify the software for an M-mode tool, and for cores that don't have an S-mode.

This has been discussed (with the lead architects; I'll stop repeatedly mentioning this).  And in standard RISC philosophy form, it was considered to have insufficient justification.  For a core with S-mode and if M-mode wants to examine the bits for counters that have not been "delegated" down to S-mode via mcounteren, then M-mode can either use a three-instruction sequence to read a version of scountovf unaffected by mcounteren, or it can directly check the individual mhpmevent.OF bits that it cares about.  The latter also applies for a core without S-mode that implements this extension.  (Further, I imagine a "no S-mode" CPU probably only implements a small number of counters.)
 
How is overflow defined for an implementation that implements 32<n<64 bits in the counter registers? Although the registers are architecturally 64 bits, an implementation may not want to support all of them.

The Priv spec says "The mhpmcounters are WARL registers that support up to 64 bits of precision".  This allows complete flexibility for how many implemented bits there are.

Since count values are defined as unsigned, there is always an equivalent unsigned 64-bit current count value irrespective of the implemented size.  So overflow is well-defined (modulo the issue down below).
 
Mandating full 64-bit counters may make an implementation area-prohibitive for the smallest of perfmon-enabled embedded cores. I think this could be specified like this: "An implementation may implement less than 64 bits for the hpmcounter CSRs. On such an implementation, software can query the bit width of the hmpcounter registers by taking advantage of the WARL behavior: writing all 1's and reading back to see which bits retained the set value.

This would be an issue to raise with the existing Priv spec, not with this extension.  But as noted above, this isn't really an issue since it is already comprehended by the Priv spec.
 
Also, on such implementations, overflow is defined to occur when the highest implemented bit transitions from 1 to 0." Given that, software can do the right thing regardless of implemented bit width.

Good point.  The current proposed definition doesn't properly comprehend the WARL nature of the hpmcounter registers.  I'll switch to a definition along the lines of what you describe (I agree that that is what is needed).  Count values remain as unsigned values and overflow is unsigned overflow of the implemented bits.
 
Greg


Ernie Edgar
 

The Debug Spec has a count inhibit control bit already -- See the stopcount field in dcsr.  

Ernie


On Wed, Feb 3, 2021 at 12:47 PM Greg Favor <gfavor@...> wrote:
Good question.  A couple of quick thoughts:

- D-mode is not a required part of the Debug spec, i.e. a fully compliant implementation of the full Debug spec may not use an "execution-based" approach and hence would not support D-mode.  I don't think this forces anything; I'm just noting that full support of the Debug spec does not imply presence of D-mode.

- The Priv and Unpriv architectures consistently say nothing about debug features (other than EBREAK with very little definition).  All things debug (and trace) are separated and encapsulated into the Debug and Trace specs.  Properly it should be the Debug spec that adds a D-mode filter bit as a new feature.  This is no different than it also being responsible to add H extension-related new features within the Debug spec.  And other new features that may arise because of other new extensions.  It would be up to the Debug TG to decide whether to add a D-mode filter bit in mhpmeventn, provide such a bit in a debug-related register, or whatever other option that it might settle on.

Greg



On Wed, Feb 3, 2021 at 11:31 AM Brian Grayson <brian.grayson@...> wrote:
Another thought from one of my debug coworkers: what do we want to do about debug (D-mode)? Having a bit that says "inhibit counting when we are in D-mode" would allow debuggers to do accurate perfmon on M-mode code. This support could be part of the debug extension, but that adds a tie-in across extensions. If we shift all the inhibit bits down and make debug the highest, that keeps things sane, and implementations that don't implement D-mode can ignore the bit.

Brian

On Wed, Feb 3, 2021 at 11:19 AM Greg Favor <gfavor@...> wrote:
Dang.  I think that typo crept in at the last second.  The bit numbering for the bottom three bits should be 58, 57, and 56 - resulting in the full top byte of mhpmevent being covered.

Thanks,
Greg
 

On Wed, Feb 3, 2021 at 8:57 AM Brian Grayson <brian.grayson@...> wrote:
I noticed another typo that I don't think has been pointed out -- reuse of bit 59.

bit [59]  VSINH       -  If set, then counting of events in VS-mode is inhibited
bit [59]  VUINH       -  If set, then counting of events in VU-mode is inhibited

Brian

On Mon, Feb 1, 2021 at 2:08 PM Greg Favor <gfavor@...> wrote:
On Mon, Feb 1, 2021 at 10:00 AM Brian Grayson <brian.grayson@...> wrote:
Given the discussions about cache-ops and the name for them on tech-cmo and the desire to avoid "co", "COF" (which can also mean "change of flow") may not be the best choice for the extension short-name. What about just "Sshpm", as this extension is what really allows the HPM to be well-utilized by tools like perf? Or is that too confusing since hpm already exists?

Using 'hpm' probably would be a bit confusing.  But I'll look into alternatives.  Btw, since a new extension naming standard is being developed, ultimately this (and all other extensions) will need to conform to the new scheme (although the 'Ss' part of this name is expected to be consistent with that new scheme).  Also note that CMO group extensions will have "Zi*" names and the concern over use of "co" or "cop" as a root name was particularly in that context (i.e. wrt other Unpriv spec extensions; while this extension in the "S" name space for Priv extensions).  But in any case I'll explore alternatives that may be acceptable.
 
Is there a reason there is no mcountovf? It would simplify the software for an M-mode tool, and for cores that don't have an S-mode.

This has been discussed (with the lead architects; I'll stop repeatedly mentioning this).  And in standard RISC philosophy form, it was considered to have insufficient justification.  For a core with S-mode and if M-mode wants to examine the bits for counters that have not been "delegated" down to S-mode via mcounteren, then M-mode can either use a three-instruction sequence to read a version of scountovf unaffected by mcounteren, or it can directly check the individual mhpmevent.OF bits that it cares about.  The latter also applies for a core without S-mode that implements this extension.  (Further, I imagine a "no S-mode" CPU probably only implements a small number of counters.)
 
How is overflow defined for an implementation that implements 32<n<64 bits in the counter registers? Although the registers are architecturally 64 bits, an implementation may not want to support all of them.

The Priv spec says "The mhpmcounters are WARL registers that support up to 64 bits of precision".  This allows complete flexibility for how many implemented bits there are.

Since count values are defined as unsigned, there is always an equivalent unsigned 64-bit current count value irrespective of the implemented size.  So overflow is well-defined (modulo the issue down below).
 
Mandating full 64-bit counters may make an implementation area-prohibitive for the smallest of perfmon-enabled embedded cores. I think this could be specified like this: "An implementation may implement less than 64 bits for the hpmcounter CSRs. On such an implementation, software can query the bit width of the hmpcounter registers by taking advantage of the WARL behavior: writing all 1's and reading back to see which bits retained the set value.

This would be an issue to raise with the existing Priv spec, not with this extension.  But as noted above, this isn't really an issue since it is already comprehended by the Priv spec.
 
Also, on such implementations, overflow is defined to occur when the highest implemented bit transitions from 1 to 0." Given that, software can do the right thing regardless of implemented bit width.

Good point.  The current proposed definition doesn't properly comprehend the WARL nature of the hpmcounter registers.  I'll switch to a definition along the lines of what you describe (I agree that that is what is needed).  Count values remain as unsigned values and overflow is unsigned overflow of the implemented bits.
 
Greg


Brian Grayson
 

It's been three weeks since this proposal has been floated, and feedback was provided on the list. Everyone I've checked with off-list has been okay with the spec.

Does anyone object to moving it forward, towards fast-track ratification? Is there anything else required before it begins the 45-day public review?

Greg, do you want to publish the latest version with the tweaks that you made based on the earlier feedback, for reference?

Thanks.

Brian


On Mon, Feb 1, 2021 at 12:38 AM Greg Favor <gfavor@...> wrote:
Hi all,

Recently the TSC established a lightweight "fast track" architecture extension process that small, straightforward, relatively uncontentious arch extension proposals can utilize.  This is the second of two Privileged architecture related small extensions - that a number of people/companies have expressed desire for over the past year - that Andrew and I discussed trying to help move through this process sooner than later (especially since this entails much more than simply developing a spec).  The following starts with an intro for context, and then provides the draft spec.

Note that the draft spec is written as the actual changes to be made to existing paragraphs of Priv spec text (or additional paragraphs and/or sections within the existing text).  The surrounding sentence(s) of a change are included for context.  Text in square brackets is temporary commentary that is not part of the proposed spec changes.

In anticipation of some questions that may arise in people's minds, I'll note that this extension has been extensively reviewed by the lead architects of the Privileged and Hypervisor architectures for consistency with the current architecture (including little things like extension, CSR, and bit/field names).  Various changes were made along the way because of this.

===============================================================================
Introduction

The current Privileged specification defines mhpmevent CSRs to select and control event counting by the associated hpmcounter CSRs, but provides no standardization of any fields within these CSRs.  For at least Linux-class rich-OS systems it is desirable to standardize certain basic features that are broadly desired (and have come up over the past year plus on RISC-V lists, as well as have been the subject of past proposals).  This enables there to be standard upstream software support that eliminates the need for implementations to provide their own custom software support.  (Implementations are free, of course, to not implement this extension.)

This proposal serves to accomplish exactly this within the existing mhpmevent CSRs (and correspondingly avoids the unnecessary creation of whole new sets of CSRs - past just one new CSR).

Below is a one-page draft spec of the proposal - which sticks to addressing two basic well-understood needs that have been requested by various people.  The proposed extension name is "Sscof" ('Ss' for Privileged arch and Supervisor-level extensions, and 'cof' for Count Overflow and Filtering).  There are other features that various people may desire (and that even I would desire) that don't have clear-cut, non-contentious, and relatively broad support.  These can be grist for separate discussions and possibly another arch extension by a motivated party that gathers a sufficient degree of concensus.

Although one such feature worth highlighting is having a WrEn bit in mhpmevent that allows lower privilege modes that can read the associated hpmcounter CSR (based on the *counteren CSRs) to also be able to write it.  In essence enabling direct S/VS-mode and U/VU-mode write access instead of always requiring OpenSBI calls up to M-mode.  But this feature has had some contention, involves some details to properly support virtualization, and requires allocating a second set of "User-Read-Write" hpmcounter CSR numbers (since the current hpmcounter CSRs are "User-Read-Only").  If there is a broad upwelling of support and justification for this feature, and some party willing to put together a complete spec (including virtualization support), then this could be another fast-track extension.

Lastly note that the new count overflow interrupt will be treated as a standard local interrupt that is assigned to bit 13 in the mip/mie/sip/sie registers.  (This has been discussed and agreed to with key Priv Arch people.)

This posting to this email list starts an initial review period (over the next few weeks) for people to provide feedback, questions, comments, etc.

================================================================================
Proposed Spec

=======================================================================
=======================  Machine-Level ISA Additions  ========================

Hardware Performance Monitor
[ This extension expands the hardware performance monitor description and extends the mhpmevent registers to 64 bits (in RV32) as follows: ]

The hardware performance monitor includes 29 additional 64-bit event counters and 29 associated 64-bit event selector registers - the mhpmcounter3–mhpmcounter31 and mhpmevent3–mhpmevent31 CSRs.

The mhpmcounters are WARL registers that support up to 64 bits of precision on RV32 and RV64. 

The mhpmeventn registers are WARL registers that control which event causes the corresponding counter to increment and what happens when the corresponding count overflows. Currently just a few bits are defined here.  Past this, the actual selection and meaning of events is defined by the platform, but (mhpmevent == 0) is defined to mean “no event" and that the corresponding counter will never be incremented.  Typically the lower bits of mhpmevent will be used for event selection purposes.  

On RV32 only, reads of the mcycle, minstret, mhpmcountern, and mhpmeventn CSRs return the low 32 bits, while reads of the mcycleh, minstreth, mhpmcounternh, and mhpmeventnh CSRs return bits 63–32 of the corresponding counter or event selector.  [ The proposed CSR numbers for mhpmeventnh are 0x723 - 0x73F. ]

The following bits are added to mhpmevent:

bit [63]  OF            -  Overflow status and interrupt disable bit that is set when counter overflows

bit [62]  MINH        -  If set, then counting of events in M-mode is inhibited
bit [61]  SINH         -  If set, then counting of events in S/HS-mode is inhibited
bit [60]  UINH         -  If set, then counting of events in U-mode is inhibited
bit [59]  VSINH       -  If set, then counting of events in VS-mode is inhibited
bit [59]  VUINH       -  If set, then counting of events in VU-mode is inhibited
bit [58]  0                -  Reserved for possible future modes
bit [57]  0                -  Reserved for possible future modes

Each of the five 'x'INH bits, when set, inhibit counting of events while in privilege mode 'x'.  All-zeroes for these bits results in counting of events in all modes.

The OF bit is set when the corresponding hpmcounter overflows, and remains set until written by software.  Since hpmcounter values are unsigned values, overflow is defined as unsigned overflow.  [ This matches x86 and ARMv8. ]  Note that there is no loss of information after an overflow since the counter wraps around and keeps counting while the sticky OF bit remains set.  [ For a 64-bit counter it will be an awfully long time before another overflow could possibly occur. ]

If supervisor mode is implemented, the 32-bit scountovf register contains read-only shadow copies of the OF bits in all 32 mhpmevent registers.

If an hpmcounter overflows while the associated OF bit is zero, then a "count overflow interrupt request" is generated.  If the OF bit is one, then no interrupt request is generated.  Consequently the OF bit also functions as a count overflow interrupt disable for the associated hpmcounter.

----------------------------  Non-Normative Text    ----------------------------
There are not separate overflow status and overflow interrupt enable bits.  In practice, enabling overflow interrupt generation (by clearing the OF bit) is done in conjunction with initializing the counter to a starting value.  Once a counter has overflowed, it and the OF bit must be reinitialized before another overflow interrupt can be generated.
----------------------------------------------------------------------------------------

This "count overflow interrupt request" signal is treated as a standard local interrupt that corresponds to bit 13 in the mip/mie/sip/sie registers.  The mip/sip LCOFIP and mie/sie LCOFIE bits are respectively the interrupt-pending and interrupt-enable bits for this interrupt.  ('LCOFI' represents 'Local Count Overflow Interrupt'.)  [ This proposal doesn't try to introduce per-privilege mode overflow interrupt request signals.  ARMv8 doesn't have this and I don't think x86 does either. ]
 
Generation of a "count overflow interrupt request" by an hpmcounter sets the LCOFIP bit in the mip/sip registers and sets the associated OF bit.  The LCOFIP bit is cleared by software after servicing the count overflow interrupt resulting from one or more count overflows.

----------------------------  Non-Normative Text    ----------------------------
Software can maintain a bit mask to distinguish newly overflowed counters (yet to be serviced by an overflow interrupt handler) from overflowed counters that have already been serviced or that are configured to not generate an interrupt on overflow.
----------------------------------------------------------------------------------------

Machine Interrupt Registers (mip and mie)
[ This extension adds the description of the LCOFIP/LCOFIE bits in these registers (and modifies related text) as follows: ]

LCOFIP is added to mip in Figure 3.14 as bit 13.  LCOFIP is added to mie in Figure 3.15 as bit 13.

If the Sscof extension is implemented, bits mip.LCOFIP and mie.LCOFIE are the interrupt-pending and interrupt-enable bits for local count overflow interrupts.  LCOFIP is read-write in mip and reflects the occurrence of a local count overflow interrupt request resulting from any of the mhpmeventn.OF bits being set.   If the Sscof extension is not implemented, these LCOFIP and LCOFIE bits are hardwired to zeros.

Multiple simultaneous interrupts destined for different privilege modes are handled in decreasing order of destined privilege mode. Multiple simultaneous interrupts destined for the same privilege mode are handled in the following decreasing priority order: MEI, MSI, MTI, SEI, SSI, STI, LCOFI.

=========================================================================
=======================  Supervisor-Level ISA Additions  ========================

Supervisor Interrupt Registers (sip and sie)
[ This extension adds the description of the LCOFIP/LCOFIE bits in these registers (and modifies related text) as follows: ]

LCOFIP is added to sip in Figure 4.6 as bit 13.  LCOFIP is added to sie in Figure 4.7 as bit 13.

If the Sscof extension is implemented, bits sip.LCOFIP and sie.LCOFIE are the interrupt-pending and interrupt-enable bits for local count overflow interrupts.  LCOFIP is read-write in sip and reflects the occurrence of a local count overflow interrupt request resulting from any of the mhpmeventn.OF bits being set.  If the Sscof extension is not implemented, these LCOFIP and LCOFIE bits are hardwired to zeros. 

Each standard interrupt type (LCOFI, SEI, STI, or SSI) may not be implemented, in which case the corresponding interrupt-pending and interrupt-enable bits are hardwired to zeros.  All bits in sip and sie are WARL fields.

Multiple simultaneous interrupts destined for supervisor mode are handled in the following decreasing priority order: SEI, SSI, STI, LCOFI.

Supervisor Count Overflow (scountovf)
[ This extension adds this new CSR. ]

The scountovf CSR is a 32-bit read-only register that contains shadow copies of the OF bits in the 32 mhpmevent CSRs - where scountovf bit X corresponds to mhpmeventX.  The proposed CSR number is 0xD33.

This register enables supervisor-level overflow interrupt handler software to quickly and easily determine which counter(s) have overflowed (without needing to make an execution environment call or series of calls ultimately up to M-mode).  [ ARMv8 and x86 have a similar register for the same reasons. ]

Read access to bit X is subject to the same mcounteren (or mcounteren and hcounteren) CSRs that mediate access to the hpmcounter CSRs by S-mode (or VS-mode).  In M and S modes, scountovf bit X is readable when mcounteren bit X is set, and otherwise reads as zero.  Similarly, in VS mode, scountovf bit X is readable when mcounteren bit X and hcounteren bit X are both set, and otherwise reads as zero. 


Greg Favor
 

Cc'ing tech-priv since others may be wondering about the answer to Brian's question.

Brian,

As Mark touched on below, there is a whole "Definition of Done" checklist of items that needs to be done (including software support, Spike and Sail models, OACR review, PoC, ...).  So that is the next order of business that I need to work on - where "I" doesn't mean me doing it all (or even having the expertise to do all those things).  Btw, would you or anyone else be willing and able to help out with one of the DoD checklist items?  Any help will be greatly appreciated and will help move this ball towards the goal line.

Greg


On Wed, Feb 24, 2021 at 3:54 PM Mark Himelstein <markhimelstein@...> wrote:
please check out the ratification policy for next steps.


On Wed, Feb 24, 2021 at 3:43 PM Brian Grayson <brian.grayson@...> wrote:
It's been three weeks since this proposal has been floated, and feedback was provided on the list. Everyone I've checked with off-list has been okay with the spec.

Does anyone object to moving it forward, towards fast-track ratification? Is there anything else required before it begins the 45-day public review?

Greg, do you want to publish the latest version with the tweaks that you made based on the earlier feedback, for reference?

Thanks.

Brian

On Mon, Feb 1, 2021 at 12:38 AM Greg Favor <gfavor@...> wrote:
Hi all,

Recently the TSC established a lightweight "fast track" architecture extension process that small, straightforward, relatively uncontentious arch extension proposals can utilize.  This is the second of two Privileged architecture related small extensions - that a number of people/companies have expressed desire for over the past year - that Andrew and I discussed trying to help move through this process sooner than later (especially since this entails much more than simply developing a spec).  The following starts with an intro for context, and then provides the draft spec.

Note that the draft spec is written as the actual changes to be made to existing paragraphs of Priv spec text (or additional paragraphs and/or sections within the existing text).  The surrounding sentence(s) of a change are included for context.  Text in square brackets is temporary commentary that is not part of the proposed spec changes.

In anticipation of some questions that may arise in people's minds, I'll note that this extension has been extensively reviewed by the lead architects of the Privileged and Hypervisor architectures for consistency with the current architecture (including little things like extension, CSR, and bit/field names).  Various changes were made along the way because of this.

===============================================================================
Introduction

The current Privileged specification defines mhpmevent CSRs to select and control event counting by the associated hpmcounter CSRs, but provides no standardization of any fields within these CSRs.  For at least Linux-class rich-OS systems it is desirable to standardize certain basic features that are broadly desired (and have come up over the past year plus on RISC-V lists, as well as have been the subject of past proposals).  This enables there to be standard upstream software support that eliminates the need for implementations to provide their own custom software support.  (Implementations are free, of course, to not implement this extension.)

This proposal serves to accomplish exactly this within the existing mhpmevent CSRs (and correspondingly avoids the unnecessary creation of whole new sets of CSRs - past just one new CSR).

Below is a one-page draft spec of the proposal - which sticks to addressing two basic well-understood needs that have been requested by various people.  The proposed extension name is "Sscof" ('Ss' for Privileged arch and Supervisor-level extensions, and 'cof' for Count Overflow and Filtering).  There are other features that various people may desire (and that even I would desire) that don't have clear-cut, non-contentious, and relatively broad support.  These can be grist for separate discussions and possibly another arch extension by a motivated party that gathers a sufficient degree of concensus.

Although one such feature worth highlighting is having a WrEn bit in mhpmevent that allows lower privilege modes that can read the associated hpmcounter CSR (based on the *counteren CSRs) to also be able to write it.  In essence enabling direct S/VS-mode and U/VU-mode write access instead of always requiring OpenSBI calls up to M-mode.  But this feature has had some contention, involves some details to properly support virtualization, and requires allocating a second set of "User-Read-Write" hpmcounter CSR numbers (since the current hpmcounter CSRs are "User-Read-Only").  If there is a broad upwelling of support and justification for this feature, and some party willing to put together a complete spec (including virtualization support), then this could be another fast-track extension.

Lastly note that the new count overflow interrupt will be treated as a standard local interrupt that is assigned to bit 13 in the mip/mie/sip/sie registers.  (This has been discussed and agreed to with key Priv Arch people.)

This posting to this email list starts an initial review period (over the next few weeks) for people to provide feedback, questions, comments, etc.

================================================================================
Proposed Spec

=======================================================================
=======================  Machine-Level ISA Additions  ========================

Hardware Performance Monitor
[ This extension expands the hardware performance monitor description and extends the mhpmevent registers to 64 bits (in RV32) as follows: ]

The hardware performance monitor includes 29 additional 64-bit event counters and 29 associated 64-bit event selector registers - the mhpmcounter3–mhpmcounter31 and mhpmevent3–mhpmevent31 CSRs.

The mhpmcounters are WARL registers that support up to 64 bits of precision on RV32 and RV64. 

The mhpmeventn registers are WARL registers that control which event causes the corresponding counter to increment and what happens when the corresponding count overflows. Currently just a few bits are defined here.  Past this, the actual selection and meaning of events is defined by the platform, but (mhpmevent == 0) is defined to mean “no event" and that the corresponding counter will never be incremented.  Typically the lower bits of mhpmevent will be used for event selection purposes.  

On RV32 only, reads of the mcycle, minstret, mhpmcountern, and mhpmeventn CSRs return the low 32 bits, while reads of the mcycleh, minstreth, mhpmcounternh, and mhpmeventnh CSRs return bits 63–32 of the corresponding counter or event selector.  [ The proposed CSR numbers for mhpmeventnh are 0x723 - 0x73F. ]

The following bits are added to mhpmevent:

bit [63]  OF            -  Overflow status and interrupt disable bit that is set when counter overflows

bit [62]  MINH        -  If set, then counting of events in M-mode is inhibited
bit [61]  SINH         -  If set, then counting of events in S/HS-mode is inhibited
bit [60]  UINH         -  If set, then counting of events in U-mode is inhibited
bit [59]  VSINH       -  If set, then counting of events in VS-mode is inhibited
bit [59]  VUINH       -  If set, then counting of events in VU-mode is inhibited
bit [58]  0                -  Reserved for possible future modes
bit [57]  0                -  Reserved for possible future modes

Each of the five 'x'INH bits, when set, inhibit counting of events while in privilege mode 'x'.  All-zeroes for these bits results in counting of events in all modes.

The OF bit is set when the corresponding hpmcounter overflows, and remains set until written by software.  Since hpmcounter values are unsigned values, overflow is defined as unsigned overflow.  [ This matches x86 and ARMv8. ]  Note that there is no loss of information after an overflow since the counter wraps around and keeps counting while the sticky OF bit remains set.  [ For a 64-bit counter it will be an awfully long time before another overflow could possibly occur. ]

If supervisor mode is implemented, the 32-bit scountovf register contains read-only shadow copies of the OF bits in all 32 mhpmevent registers.

If an hpmcounter overflows while the associated OF bit is zero, then a "count overflow interrupt request" is generated.  If the OF bit is one, then no interrupt request is generated.  Consequently the OF bit also functions as a count overflow interrupt disable for the associated hpmcounter.

----------------------------  Non-Normative Text    ----------------------------
There are not separate overflow status and overflow interrupt enable bits.  In practice, enabling overflow interrupt generation (by clearing the OF bit) is done in conjunction with initializing the counter to a starting value.  Once a counter has overflowed, it and the OF bit must be reinitialized before another overflow interrupt can be generated.
----------------------------------------------------------------------------------------

This "count overflow interrupt request" signal is treated as a standard local interrupt that corresponds to bit 13 in the mip/mie/sip/sie registers.  The mip/sip LCOFIP and mie/sie LCOFIE bits are respectively the interrupt-pending and interrupt-enable bits for this interrupt.  ('LCOFI' represents 'Local Count Overflow Interrupt'.)  [ This proposal doesn't try to introduce per-privilege mode overflow interrupt request signals.  ARMv8 doesn't have this and I don't think x86 does either. ]
 
Generation of a "count overflow interrupt request" by an hpmcounter sets the LCOFIP bit in the mip/sip registers and sets the associated OF bit.  The LCOFIP bit is cleared by software after servicing the count overflow interrupt resulting from one or more count overflows.

----------------------------  Non-Normative Text    ----------------------------
Software can maintain a bit mask to distinguish newly overflowed counters (yet to be serviced by an overflow interrupt handler) from overflowed counters that have already been serviced or that are configured to not generate an interrupt on overflow.
----------------------------------------------------------------------------------------

Machine Interrupt Registers (mip and mie)
[ This extension adds the description of the LCOFIP/LCOFIE bits in these registers (and modifies related text) as follows: ]

LCOFIP is added to mip in Figure 3.14 as bit 13.  LCOFIP is added to mie in Figure 3.15 as bit 13.

If the Sscof extension is implemented, bits mip.LCOFIP and mie.LCOFIE are the interrupt-pending and interrupt-enable bits for local count overflow interrupts.  LCOFIP is read-write in mip and reflects the occurrence of a local count overflow interrupt request resulting from any of the mhpmeventn.OF bits being set.   If the Sscof extension is not implemented, these LCOFIP and LCOFIE bits are hardwired to zeros.

Multiple simultaneous interrupts destined for different privilege modes are handled in decreasing order of destined privilege mode. Multiple simultaneous interrupts destined for the same privilege mode are handled in the following decreasing priority order: MEI, MSI, MTI, SEI, SSI, STI, LCOFI.

=========================================================================
=======================  Supervisor-Level ISA Additions  ========================

Supervisor Interrupt Registers (sip and sie)
[ This extension adds the description of the LCOFIP/LCOFIE bits in these registers (and modifies related text) as follows: ]

LCOFIP is added to sip in Figure 4.6 as bit 13.  LCOFIP is added to sie in Figure 4.7 as bit 13.

If the Sscof extension is implemented, bits sip.LCOFIP and sie.LCOFIE are the interrupt-pending and interrupt-enable bits for local count overflow interrupts.  LCOFIP is read-write in sip and reflects the occurrence of a local count overflow interrupt request resulting from any of the mhpmeventn.OF bits being set.  If the Sscof extension is not implemented, these LCOFIP and LCOFIE bits are hardwired to zeros. 

Each standard interrupt type (LCOFI, SEI, STI, or SSI) may not be implemented, in which case the corresponding interrupt-pending and interrupt-enable bits are hardwired to zeros.  All bits in sip and sie are WARL fields.

Multiple simultaneous interrupts destined for supervisor mode are handled in the following decreasing priority order: SEI, SSI, STI, LCOFI.

Supervisor Count Overflow (scountovf)
[ This extension adds this new CSR. ]

The scountovf CSR is a 32-bit read-only register that contains shadow copies of the OF bits in the 32 mhpmevent CSRs - where scountovf bit X corresponds to mhpmeventX.  The proposed CSR number is 0xD33.

This register enables supervisor-level overflow interrupt handler software to quickly and easily determine which counter(s) have overflowed (without needing to make an execution environment call or series of calls ultimately up to M-mode).  [ ARMv8 and x86 have a similar register for the same reasons. ]

Read access to bit X is subject to the same mcounteren (or mcounteren and hcounteren) CSRs that mediate access to the hpmcounter CSRs by S-mode (or VS-mode).  In M and S modes, scountovf bit X is readable when mcounteren bit X is set, and otherwise reads as zero.  Similarly, in VS mode, scountovf bit X is readable when mcounteren bit X and hcounteren bit X are both set, and otherwise reads as zero. 

--
Mark I Himelstein
CTO RISC-V International
+1-408-250-6611
twitter @mark_riscv


Greg Favor
 

Just to emphasize:  If a number of people can pitch on on various DoD pieces, that would help a lot.  Spec-wise we're in good shape (even though there is work to be done), and I've already submitted this extension to the OACR committee for review.  So it's the other items that are the non-trivial hurdles to get over (including two notable items I forgot to mention: defining coverage for arch compatibility tests and then actually creating them).

Greg


On Wed, Feb 24, 2021 at 5:01 PM Greg Favor <gfavor@...> wrote:
Cc'ing tech-priv since others may be wondering about the answer to Brian's question.

Brian,

As Mark touched on below, there is a whole "Definition of Done" checklist of items that needs to be done (including software support, Spike and Sail models, OACR review, PoC, ...).  So that is the next order of business that I need to work on - where "I" doesn't mean me doing it all (or even having the expertise to do all those things).  Btw, would you or anyone else be willing and able to help out with one of the DoD checklist items?  Any help will be greatly appreciated and will help move this ball towards the goal line.

Greg

On Wed, Feb 24, 2021 at 3:54 PM Mark Himelstein <markhimelstein@...> wrote:
please check out the ratification policy for next steps.


On Wed, Feb 24, 2021 at 3:43 PM Brian Grayson <brian.grayson@...> wrote:
It's been three weeks since this proposal has been floated, and feedback was provided on the list. Everyone I've checked with off-list has been okay with the spec.

Does anyone object to moving it forward, towards fast-track ratification? Is there anything else required before it begins the 45-day public review?

Greg, do you want to publish the latest version with the tweaks that you made based on the earlier feedback, for reference?

Thanks.

Brian

On Mon, Feb 1, 2021 at 12:38 AM Greg Favor <gfavor@...> wrote:
Hi all,

Recently the TSC established a lightweight "fast track" architecture extension process that small, straightforward, relatively uncontentious arch extension proposals can utilize.  This is the second of two Privileged architecture related small extensions - that a number of people/companies have expressed desire for over the past year - that Andrew and I discussed trying to help move through this process sooner than later (especially since this entails much more than simply developing a spec).  The following starts with an intro for context, and then provides the draft spec.

Note that the draft spec is written as the actual changes to be made to existing paragraphs of Priv spec text (or additional paragraphs and/or sections within the existing text).  The surrounding sentence(s) of a change are included for context.  Text in square brackets is temporary commentary that is not part of the proposed spec changes.

In anticipation of some questions that may arise in people's minds, I'll note that this extension has been extensively reviewed by the lead architects of the Privileged and Hypervisor architectures for consistency with the current architecture (including little things like extension, CSR, and bit/field names).  Various changes were made along the way because of this.

===============================================================================
Introduction

The current Privileged specification defines mhpmevent CSRs to select and control event counting by the associated hpmcounter CSRs, but provides no standardization of any fields within these CSRs.  For at least Linux-class rich-OS systems it is desirable to standardize certain basic features that are broadly desired (and have come up over the past year plus on RISC-V lists, as well as have been the subject of past proposals).  This enables there to be standard upstream software support that eliminates the need for implementations to provide their own custom software support.  (Implementations are free, of course, to not implement this extension.)

This proposal serves to accomplish exactly this within the existing mhpmevent CSRs (and correspondingly avoids the unnecessary creation of whole new sets of CSRs - past just one new CSR).

Below is a one-page draft spec of the proposal - which sticks to addressing two basic well-understood needs that have been requested by various people.  The proposed extension name is "Sscof" ('Ss' for Privileged arch and Supervisor-level extensions, and 'cof' for Count Overflow and Filtering).  There are other features that various people may desire (and that even I would desire) that don't have clear-cut, non-contentious, and relatively broad support.  These can be grist for separate discussions and possibly another arch extension by a motivated party that gathers a sufficient degree of concensus.

Although one such feature worth highlighting is having a WrEn bit in mhpmevent that allows lower privilege modes that can read the associated hpmcounter CSR (based on the *counteren CSRs) to also be able to write it.  In essence enabling direct S/VS-mode and U/VU-mode write access instead of always requiring OpenSBI calls up to M-mode.  But this feature has had some contention, involves some details to properly support virtualization, and requires allocating a second set of "User-Read-Write" hpmcounter CSR numbers (since the current hpmcounter CSRs are "User-Read-Only").  If there is a broad upwelling of support and justification for this feature, and some party willing to put together a complete spec (including virtualization support), then this could be another fast-track extension.

Lastly note that the new count overflow interrupt will be treated as a standard local interrupt that is assigned to bit 13 in the mip/mie/sip/sie registers.  (This has been discussed and agreed to with key Priv Arch people.)

This posting to this email list starts an initial review period (over the next few weeks) for people to provide feedback, questions, comments, etc.

================================================================================
Proposed Spec

=======================================================================
=======================  Machine-Level ISA Additions  ========================

Hardware Performance Monitor
[ This extension expands the hardware performance monitor description and extends the mhpmevent registers to 64 bits (in RV32) as follows: ]

The hardware performance monitor includes 29 additional 64-bit event counters and 29 associated 64-bit event selector registers - the mhpmcounter3–mhpmcounter31 and mhpmevent3–mhpmevent31 CSRs.

The mhpmcounters are WARL registers that support up to 64 bits of precision on RV32 and RV64. 

The mhpmeventn registers are WARL registers that control which event causes the corresponding counter to increment and what happens when the corresponding count overflows. Currently just a few bits are defined here.  Past this, the actual selection and meaning of events is defined by the platform, but (mhpmevent == 0) is defined to mean “no event" and that the corresponding counter will never be incremented.  Typically the lower bits of mhpmevent will be used for event selection purposes.  

On RV32 only, reads of the mcycle, minstret, mhpmcountern, and mhpmeventn CSRs return the low 32 bits, while reads of the mcycleh, minstreth, mhpmcounternh, and mhpmeventnh CSRs return bits 63–32 of the corresponding counter or event selector.  [ The proposed CSR numbers for mhpmeventnh are 0x723 - 0x73F. ]

The following bits are added to mhpmevent:

bit [63]  OF            -  Overflow status and interrupt disable bit that is set when counter overflows

bit [62]  MINH        -  If set, then counting of events in M-mode is inhibited
bit [61]  SINH         -  If set, then counting of events in S/HS-mode is inhibited
bit [60]  UINH         -  If set, then counting of events in U-mode is inhibited
bit [59]  VSINH       -  If set, then counting of events in VS-mode is inhibited
bit [59]  VUINH       -  If set, then counting of events in VU-mode is inhibited
bit [58]  0                -  Reserved for possible future modes
bit [57]  0                -  Reserved for possible future modes

Each of the five 'x'INH bits, when set, inhibit counting of events while in privilege mode 'x'.  All-zeroes for these bits results in counting of events in all modes.

The OF bit is set when the corresponding hpmcounter overflows, and remains set until written by software.  Since hpmcounter values are unsigned values, overflow is defined as unsigned overflow.  [ This matches x86 and ARMv8. ]  Note that there is no loss of information after an overflow since the counter wraps around and keeps counting while the sticky OF bit remains set.  [ For a 64-bit counter it will be an awfully long time before another overflow could possibly occur. ]

If supervisor mode is implemented, the 32-bit scountovf register contains read-only shadow copies of the OF bits in all 32 mhpmevent registers.

If an hpmcounter overflows while the associated OF bit is zero, then a "count overflow interrupt request" is generated.  If the OF bit is one, then no interrupt request is generated.  Consequently the OF bit also functions as a count overflow interrupt disable for the associated hpmcounter.

----------------------------  Non-Normative Text    ----------------------------
There are not separate overflow status and overflow interrupt enable bits.  In practice, enabling overflow interrupt generation (by clearing the OF bit) is done in conjunction with initializing the counter to a starting value.  Once a counter has overflowed, it and the OF bit must be reinitialized before another overflow interrupt can be generated.
----------------------------------------------------------------------------------------

This "count overflow interrupt request" signal is treated as a standard local interrupt that corresponds to bit 13 in the mip/mie/sip/sie registers.  The mip/sip LCOFIP and mie/sie LCOFIE bits are respectively the interrupt-pending and interrupt-enable bits for this interrupt.  ('LCOFI' represents 'Local Count Overflow Interrupt'.)  [ This proposal doesn't try to introduce per-privilege mode overflow interrupt request signals.  ARMv8 doesn't have this and I don't think x86 does either. ]
 
Generation of a "count overflow interrupt request" by an hpmcounter sets the LCOFIP bit in the mip/sip registers and sets the associated OF bit.  The LCOFIP bit is cleared by software after servicing the count overflow interrupt resulting from one or more count overflows.

----------------------------  Non-Normative Text    ----------------------------
Software can maintain a bit mask to distinguish newly overflowed counters (yet to be serviced by an overflow interrupt handler) from overflowed counters that have already been serviced or that are configured to not generate an interrupt on overflow.
----------------------------------------------------------------------------------------

Machine Interrupt Registers (mip and mie)
[ This extension adds the description of the LCOFIP/LCOFIE bits in these registers (and modifies related text) as follows: ]

LCOFIP is added to mip in Figure 3.14 as bit 13.  LCOFIP is added to mie in Figure 3.15 as bit 13.

If the Sscof extension is implemented, bits mip.LCOFIP and mie.LCOFIE are the interrupt-pending and interrupt-enable bits for local count overflow interrupts.  LCOFIP is read-write in mip and reflects the occurrence of a local count overflow interrupt request resulting from any of the mhpmeventn.OF bits being set.   If the Sscof extension is not implemented, these LCOFIP and LCOFIE bits are hardwired to zeros.

Multiple simultaneous interrupts destined for different privilege modes are handled in decreasing order of destined privilege mode. Multiple simultaneous interrupts destined for the same privilege mode are handled in the following decreasing priority order: MEI, MSI, MTI, SEI, SSI, STI, LCOFI.

=========================================================================
=======================  Supervisor-Level ISA Additions  ========================

Supervisor Interrupt Registers (sip and sie)
[ This extension adds the description of the LCOFIP/LCOFIE bits in these registers (and modifies related text) as follows: ]

LCOFIP is added to sip in Figure 4.6 as bit 13.  LCOFIP is added to sie in Figure 4.7 as bit 13.

If the Sscof extension is implemented, bits sip.LCOFIP and sie.LCOFIE are the interrupt-pending and interrupt-enable bits for local count overflow interrupts.  LCOFIP is read-write in sip and reflects the occurrence of a local count overflow interrupt request resulting from any of the mhpmeventn.OF bits being set.  If the Sscof extension is not implemented, these LCOFIP and LCOFIE bits are hardwired to zeros. 

Each standard interrupt type (LCOFI, SEI, STI, or SSI) may not be implemented, in which case the corresponding interrupt-pending and interrupt-enable bits are hardwired to zeros.  All bits in sip and sie are WARL fields.

Multiple simultaneous interrupts destined for supervisor mode are handled in the following decreasing priority order: SEI, SSI, STI, LCOFI.

Supervisor Count Overflow (scountovf)
[ This extension adds this new CSR. ]

The scountovf CSR is a 32-bit read-only register that contains shadow copies of the OF bits in the 32 mhpmevent CSRs - where scountovf bit X corresponds to mhpmeventX.  The proposed CSR number is 0xD33.

This register enables supervisor-level overflow interrupt handler software to quickly and easily determine which counter(s) have overflowed (without needing to make an execution environment call or series of calls ultimately up to M-mode).  [ ARMv8 and x86 have a similar register for the same reasons. ]

Read access to bit X is subject to the same mcounteren (or mcounteren and hcounteren) CSRs that mediate access to the hpmcounter CSRs by S-mode (or VS-mode).  In M and S modes, scountovf bit X is readable when mcounteren bit X is set, and otherwise reads as zero.  Similarly, in VS mode, scountovf bit X is readable when mcounteren bit X and hcounteren bit X are both set, and otherwise reads as zero. 

--
Mark I Himelstein
CTO RISC-V International
+1-408-250-6611
twitter @mark_riscv


Brian Grayson
 

How would one make compliance tests for this extension? For example, how can one test the overflow exception when there is no standard on how to configure any of the HPM event counters to count, in order to cause an overflow?

Usually, anything having to do with performance monitors is not implemented in functional simulators. Would an exemption be needed for that? Of course, the CSRs need to be supported in the simulators, but expecting a simulator to implement any of the events (that aren't standardized), in order to allow it to cause an overflow exception, seems a tall order, and of limited usefulness to the community.

In fact, a lot of the DoD appears to be oriented towards instruction extensions, and not CSR/behavior extensions like this one. It seems like most of items 4, 5, 6, 7, and 8 are all waiver-worthy, for example.

Brian



On Wed, Feb 24, 2021 at 7:06 PM Greg Favor <gfavor@...> wrote:
Just to emphasize:  If a number of people can pitch on on various DoD pieces, that would help a lot.  Spec-wise we're in good shape (even though there is work to be done), and I've already submitted this extension to the OACR committee for review.  So it's the other items that are the non-trivial hurdles to get over (including two notable items I forgot to mention: defining coverage for arch compatibility tests and then actually creating them).

Greg


On Wed, Feb 24, 2021 at 5:01 PM Greg Favor <gfavor@...> wrote:
Cc'ing tech-priv since others may be wondering about the answer to Brian's question.

Brian,

As Mark touched on below, there is a whole "Definition of Done" checklist of items that needs to be done (including software support, Spike and Sail models, OACR review, PoC, ...).  So that is the next order of business that I need to work on - where "I" doesn't mean me doing it all (or even having the expertise to do all those things).  Btw, would you or anyone else be willing and able to help out with one of the DoD checklist items?  Any help will be greatly appreciated and will help move this ball towards the goal line.

Greg

On Wed, Feb 24, 2021 at 3:54 PM Mark Himelstein <markhimelstein@...> wrote:
please check out the ratification policy for next steps.


On Wed, Feb 24, 2021 at 3:43 PM Brian Grayson <brian.grayson@...> wrote:
It's been three weeks since this proposal has been floated, and feedback was provided on the list. Everyone I've checked with off-list has been okay with the spec.

Does anyone object to moving it forward, towards fast-track ratification? Is there anything else required before it begins the 45-day public review?

Greg, do you want to publish the latest version with the tweaks that you made based on the earlier feedback, for reference?

Thanks.

Brian

On Mon, Feb 1, 2021 at 12:38 AM Greg Favor <gfavor@...> wrote:
Hi all,

Recently the TSC established a lightweight "fast track" architecture extension process that small, straightforward, relatively uncontentious arch extension proposals can utilize.  This is the second of two Privileged architecture related small extensions - that a number of people/companies have expressed desire for over the past year - that Andrew and I discussed trying to help move through this process sooner than later (especially since this entails much more than simply developing a spec).  The following starts with an intro for context, and then provides the draft spec.

Note that the draft spec is written as the actual changes to be made to existing paragraphs of Priv spec text (or additional paragraphs and/or sections within the existing text).  The surrounding sentence(s) of a change are included for context.  Text in square brackets is temporary commentary that is not part of the proposed spec changes.

In anticipation of some questions that may arise in people's minds, I'll note that this extension has been extensively reviewed by the lead architects of the Privileged and Hypervisor architectures for consistency with the current architecture (including little things like extension, CSR, and bit/field names).  Various changes were made along the way because of this.

===============================================================================
Introduction

The current Privileged specification defines mhpmevent CSRs to select and control event counting by the associated hpmcounter CSRs, but provides no standardization of any fields within these CSRs.  For at least Linux-class rich-OS systems it is desirable to standardize certain basic features that are broadly desired (and have come up over the past year plus on RISC-V lists, as well as have been the subject of past proposals).  This enables there to be standard upstream software support that eliminates the need for implementations to provide their own custom software support.  (Implementations are free, of course, to not implement this extension.)

This proposal serves to accomplish exactly this within the existing mhpmevent CSRs (and correspondingly avoids the unnecessary creation of whole new sets of CSRs - past just one new CSR).

Below is a one-page draft spec of the proposal - which sticks to addressing two basic well-understood needs that have been requested by various people.  The proposed extension name is "Sscof" ('Ss' for Privileged arch and Supervisor-level extensions, and 'cof' for Count Overflow and Filtering).  There are other features that various people may desire (and that even I would desire) that don't have clear-cut, non-contentious, and relatively broad support.  These can be grist for separate discussions and possibly another arch extension by a motivated party that gathers a sufficient degree of concensus.

Although one such feature worth highlighting is having a WrEn bit in mhpmevent that allows lower privilege modes that can read the associated hpmcounter CSR (based on the *counteren CSRs) to also be able to write it.  In essence enabling direct S/VS-mode and U/VU-mode write access instead of always requiring OpenSBI calls up to M-mode.  But this feature has had some contention, involves some details to properly support virtualization, and requires allocating a second set of "User-Read-Write" hpmcounter CSR numbers (since the current hpmcounter CSRs are "User-Read-Only").  If there is a broad upwelling of support and justification for this feature, and some party willing to put together a complete spec (including virtualization support), then this could be another fast-track extension.

Lastly note that the new count overflow interrupt will be treated as a standard local interrupt that is assigned to bit 13 in the mip/mie/sip/sie registers.  (This has been discussed and agreed to with key Priv Arch people.)

This posting to this email list starts an initial review period (over the next few weeks) for people to provide feedback, questions, comments, etc.

================================================================================
Proposed Spec

=======================================================================
=======================  Machine-Level ISA Additions  ========================

Hardware Performance Monitor
[ This extension expands the hardware performance monitor description and extends the mhpmevent registers to 64 bits (in RV32) as follows: ]

The hardware performance monitor includes 29 additional 64-bit event counters and 29 associated 64-bit event selector registers - the mhpmcounter3–mhpmcounter31 and mhpmevent3–mhpmevent31 CSRs.

The mhpmcounters are WARL registers that support up to 64 bits of precision on RV32 and RV64. 

The mhpmeventn registers are WARL registers that control which event causes the corresponding counter to increment and what happens when the corresponding count overflows. Currently just a few bits are defined here.  Past this, the actual selection and meaning of events is defined by the platform, but (mhpmevent == 0) is defined to mean “no event" and that the corresponding counter will never be incremented.  Typically the lower bits of mhpmevent will be used for event selection purposes.  

On RV32 only, reads of the mcycle, minstret, mhpmcountern, and mhpmeventn CSRs return the low 32 bits, while reads of the mcycleh, minstreth, mhpmcounternh, and mhpmeventnh CSRs return bits 63–32 of the corresponding counter or event selector.  [ The proposed CSR numbers for mhpmeventnh are 0x723 - 0x73F. ]

The following bits are added to mhpmevent:

bit [63]  OF            -  Overflow status and interrupt disable bit that is set when counter overflows

bit [62]  MINH        -  If set, then counting of events in M-mode is inhibited
bit [61]  SINH         -  If set, then counting of events in S/HS-mode is inhibited
bit [60]  UINH         -  If set, then counting of events in U-mode is inhibited
bit [59]  VSINH       -  If set, then counting of events in VS-mode is inhibited
bit [59]  VUINH       -  If set, then counting of events in VU-mode is inhibited
bit [58]  0                -  Reserved for possible future modes
bit [57]  0                -  Reserved for possible future modes

Each of the five 'x'INH bits, when set, inhibit counting of events while in privilege mode 'x'.  All-zeroes for these bits results in counting of events in all modes.

The OF bit is set when the corresponding hpmcounter overflows, and remains set until written by software.  Since hpmcounter values are unsigned values, overflow is defined as unsigned overflow.  [ This matches x86 and ARMv8. ]  Note that there is no loss of information after an overflow since the counter wraps around and keeps counting while the sticky OF bit remains set.  [ For a 64-bit counter it will be an awfully long time before another overflow could possibly occur. ]

If supervisor mode is implemented, the 32-bit scountovf register contains read-only shadow copies of the OF bits in all 32 mhpmevent registers.

If an hpmcounter overflows while the associated OF bit is zero, then a "count overflow interrupt request" is generated.  If the OF bit is one, then no interrupt request is generated.  Consequently the OF bit also functions as a count overflow interrupt disable for the associated hpmcounter.

----------------------------  Non-Normative Text    ----------------------------
There are not separate overflow status and overflow interrupt enable bits.  In practice, enabling overflow interrupt generation (by clearing the OF bit) is done in conjunction with initializing the counter to a starting value.  Once a counter has overflowed, it and the OF bit must be reinitialized before another overflow interrupt can be generated.
----------------------------------------------------------------------------------------

This "count overflow interrupt request" signal is treated as a standard local interrupt that corresponds to bit 13 in the mip/mie/sip/sie registers.  The mip/sip LCOFIP and mie/sie LCOFIE bits are respectively the interrupt-pending and interrupt-enable bits for this interrupt.  ('LCOFI' represents 'Local Count Overflow Interrupt'.)  [ This proposal doesn't try to introduce per-privilege mode overflow interrupt request signals.  ARMv8 doesn't have this and I don't think x86 does either. ]
 
Generation of a "count overflow interrupt request" by an hpmcounter sets the LCOFIP bit in the mip/sip registers and sets the associated OF bit.  The LCOFIP bit is cleared by software after servicing the count overflow interrupt resulting from one or more count overflows.

----------------------------  Non-Normative Text    ----------------------------
Software can maintain a bit mask to distinguish newly overflowed counters (yet to be serviced by an overflow interrupt handler) from overflowed counters that have already been serviced or that are configured to not generate an interrupt on overflow.
----------------------------------------------------------------------------------------

Machine Interrupt Registers (mip and mie)
[ This extension adds the description of the LCOFIP/LCOFIE bits in these registers (and modifies related text) as follows: ]

LCOFIP is added to mip in Figure 3.14 as bit 13.  LCOFIP is added to mie in Figure 3.15 as bit 13.

If the Sscof extension is implemented, bits mip.LCOFIP and mie.LCOFIE are the interrupt-pending and interrupt-enable bits for local count overflow interrupts.  LCOFIP is read-write in mip and reflects the occurrence of a local count overflow interrupt request resulting from any of the mhpmeventn.OF bits being set.   If the Sscof extension is not implemented, these LCOFIP and LCOFIE bits are hardwired to zeros.

Multiple simultaneous interrupts destined for different privilege modes are handled in decreasing order of destined privilege mode. Multiple simultaneous interrupts destined for the same privilege mode are handled in the following decreasing priority order: MEI, MSI, MTI, SEI, SSI, STI, LCOFI.

=========================================================================
=======================  Supervisor-Level ISA Additions  ========================

Supervisor Interrupt Registers (sip and sie)
[ This extension adds the description of the LCOFIP/LCOFIE bits in these registers (and modifies related text) as follows: ]

LCOFIP is added to sip in Figure 4.6 as bit 13.  LCOFIP is added to sie in Figure 4.7 as bit 13.

If the Sscof extension is implemented, bits sip.LCOFIP and sie.LCOFIE are the interrupt-pending and interrupt-enable bits for local count overflow interrupts.  LCOFIP is read-write in sip and reflects the occurrence of a local count overflow interrupt request resulting from any of the mhpmeventn.OF bits being set.  If the Sscof extension is not implemented, these LCOFIP and LCOFIE bits are hardwired to zeros. 

Each standard interrupt type (LCOFI, SEI, STI, or SSI) may not be implemented, in which case the corresponding interrupt-pending and interrupt-enable bits are hardwired to zeros.  All bits in sip and sie are WARL fields.

Multiple simultaneous interrupts destined for supervisor mode are handled in the following decreasing priority order: SEI, SSI, STI, LCOFI.

Supervisor Count Overflow (scountovf)
[ This extension adds this new CSR. ]

The scountovf CSR is a 32-bit read-only register that contains shadow copies of the OF bits in the 32 mhpmevent CSRs - where scountovf bit X corresponds to mhpmeventX.  The proposed CSR number is 0xD33.

This register enables supervisor-level overflow interrupt handler software to quickly and easily determine which counter(s) have overflowed (without needing to make an execution environment call or series of calls ultimately up to M-mode).  [ ARMv8 and x86 have a similar register for the same reasons. ]

Read access to bit X is subject to the same mcounteren (or mcounteren and hcounteren) CSRs that mediate access to the hpmcounter CSRs by S-mode (or VS-mode).  In M and S modes, scountovf bit X is readable when mcounteren bit X is set, and otherwise reads as zero.  Similarly, in VS mode, scountovf bit X is readable when mcounteren bit X and hcounteren bit X are both set, and otherwise reads as zero. 

--
Mark I Himelstein
CTO RISC-V International
+1-408-250-6611
twitter @mark_riscv


atishp@...
 

On Wed, 2021-02-24 at 17:01 -0800, Greg Favor wrote:
Cc'ing tech-priv since others may be wondering about the answer to
Brian's question.

Brian,

As Mark touched on below, there is a whole "Definition of Done"
checklist of items that needs to be done (including software support,
Spike and Sail models, OACR review, PoC, ...).  So that is the next
order of business that I need to work on - where "I" doesn't mean me
doing it all (or even having the expertise to do all those things). 
Btw, would you or anyone else be willing and able to help out with
one of the DoD checklist items?  Any help will be greatly appreciated
and will help move this ball towards the goal line.
Hi Greg,
I am working on the SBI PMU extension implementation in OpenSBI & Linux
kernel. I will update the SBI PMU extension based on Sscofpmf extension
as well.

I can work on implementing the Sscofpmf extension in Qemu and required
software changes in OpenSBI & Linux kernel as well if that is okay with
you.

Do we require anything else for the PoC part of the DoD policy?

Greg

On Wed, Feb 24, 2021 at 3:54 PM Mark Himelstein <
markhimelstein@...> wrote:
please check out the ratification policy for next steps.

https://docs.google.com/document/d/1-UlaSGqk59_myeuPMrV9gyuaIgnmFzGh5Gfy_tpViwM/edit


On Wed, Feb 24, 2021 at 3:43 PM Brian Grayson <
brian.grayson@...> wrote:
It's been three weeks since this proposal has been floated, and
feedback was provided on the list. Everyone I've checked with
off-list has been okay with the spec.

Does anyone object to moving it forward, towards fast-track
ratification? Is there anything else required before it begins
the 45-day public review?

Greg, do you want to publish the latest version with the tweaks
that you made based on the earlier feedback, for reference?

Thanks.

Brian

On Mon, Feb 1, 2021 at 12:38 AM Greg Favor <
gfavor@...> wrote:
Hi all,

Recently the TSC established a lightweight "fast track"
architecture extension process that small, straightforward,
relatively uncontentious arch extension proposals can utilize. 
This is the second of two Privileged architecture related small
extensions - that a number of people/companies have expressed
desire for over the past year - that Andrew and I discussed
trying to help move through this process sooner than later
(especially since this entails much more than simply developing
a spec).  The following starts with an intro for context, and
then provides the draft spec.

Note that the draft spec is written as the actual changes to be
made to existing paragraphs of Priv spec text (or additional
paragraphs and/or sections within the existing text).  The
surrounding sentence(s) of a change are included for context. 
Text in square brackets is temporary commentary that is not
part of the proposed spec changes.

In anticipation of some questions that may arise in people's
minds, I'll note that this extension has been
extensively reviewed by the lead architects of the Privileged
and Hypervisor architectures for consistency with the current
architecture (including little things like extension, CSR, and
bit/field names).  Various changes were made along the way
because of this.

===============================================================
================
Introduction

The current Privileged specification defines mhpmevent CSRs to
select and control event counting by the associated hpmcounter
CSRs, but provides no standardization of any fields within
these CSRs.  For at least Linux-class rich-OS systems it is
desirable to standardize certain basic features that are
broadly desired (and have come up over the past year plus on
RISC-V lists, as well as have been the subject of past
proposals).  This enables there to be standard upstream
software support that eliminates the need for implementations
to provide their own custom software support.  (Implementations
are free, of course, to not implement this extension.)

This proposal serves to accomplish exactly this within the
existing mhpmevent CSRs (and correspondingly avoids the
unnecessary creation of whole new sets of CSRs - past just one
new CSR).

Below is a one-page draft spec of the proposal - which sticks
to addressing two basic well-understood needs that have been
requested by various people.  The proposed extension name is
"Sscof" ('Ss' for Privileged arch and Supervisor-level
extensions, and 'cof' for Count Overflow and Filtering).  There
are other features that various people may desire (and that
even I would desire) that don't have clear-cut, non-
contentious, and relatively broad support.  These can be grist
for separate discussions and possibly another arch extension by
a motivated party that gathers a sufficient degree of
concensus.

Although one such feature worth highlighting is having a WrEn
bit in mhpmevent that allows lower privilege modes that can
read the associated hpmcounter CSR (based on the *counteren
CSRs) to also be able to write it.  In essence enabling direct
S/VS-mode and U/VU-mode write access instead of always
requiring OpenSBI calls up to M-mode.  But this feature has had
some contention, involves some details to properly support
virtualization, and requires allocating a second set of "User-
Read-Write" hpmcounter CSR numbers (since the current
hpmcounter CSRs are "User-Read-Only").  If there is a broad
upwelling of support and justification for this feature, and
some party willing to put together a complete spec (including
virtualization support), then this could be another fast-track
extension.

Lastly note that the new count overflow interrupt will be
treated as a standard local interrupt that is assigned to bit
13 in the mip/mie/sip/sie registers.  (This has been discussed
and agreed to with key Priv Arch people.)

This posting to this email list starts an initial review period
(over the next few weeks) for people to provide feedback,
questions, comments, etc.

===============================================================
=================
Proposed Spec

===============================================================
========
=======================  Machine-Level ISA Additions 
========================

Hardware Performance Monitor
[ This extension expands the hardware performance monitor
description and extends the mhpmevent registers to 64 bits (in
RV32) as follows: ]

The hardware performance monitor includes 29 additional 64-bit
event counters and 29 associated 64-bit event selector
registers - the mhpmcounter3–mhpmcounter31 and
mhpmevent3–mhpmevent31 CSRs.

The mhpmcounters are WARL registers that support up to 64 bits
of precision on RV32 and RV64. 

The mhpmeventn registers are WARL registers that control which
event causes the corresponding counter to increment and what
happens when the corresponding count overflows. Currently just
a few bits are defined here.  Past this, the actual selection
and meaning of events is defined by the platform, but
(mhpmevent == 0) is defined to mean “no event" and that the
corresponding counter will never be incremented.  Typically the
lower bits of mhpmevent will be used for event selection
purposes.  

On RV32 only, reads of the mcycle, minstret, mhpmcountern, and
mhpmeventn CSRs return the low 32 bits, while reads of the
mcycleh, minstreth, mhpmcounternh, and mhpmeventnh CSRs return
bits 63–32 of the corresponding counter or event selector.  [
The proposed CSR numbers for mhpmeventnh are 0x723 - 0x73F. ]

The following bits are added to mhpmevent:

bit [63]  OF            -  Overflow status and interrupt
disable bit that is set when counter overflows

bit [62]  MINH        -  If set, then counting of events in M-
mode is inhibited
bit [61]  SINH         -  If set, then counting of events in
S/HS-mode is inhibited
bit [60]  UINH         -  If set, then counting of events in U-
mode is inhibited
bit [59]  VSINH       -  If set, then counting of events in VS-
mode is inhibited
bit [59]  VUINH       -  If set, then counting of events in VU-
mode is inhibited
bit [58]  0                -  Reserved for possible future
modes
bit [57]  0                -  Reserved for possible future
modes

Each of the five 'x'INH bits, when set, inhibit counting of
events while in privilege mode 'x'.  All-zeroes for these bits
results in counting of events in all modes.

The OF bit is set when the corresponding hpmcounter overflows,
and remains set until written by software.  Since hpmcounter
values are unsigned values, overflow is defined as unsigned
overflow.  [ This matches x86 and ARMv8. ]  Note that there is
no loss of information after an overflow since the counter
wraps around and keeps counting while the sticky OF bit remains
set.  [ For a 64-bit counter it will be an awfully long time
before another overflow could possibly occur. ]

If supervisor mode is implemented, the 32-bit scountovf
register contains read-only shadow copies of the OF bits in all
32 mhpmevent registers.

If an hpmcounter overflows while the associated OF bit is zero,
then a "count overflow interrupt request" is generated.  If the
OF bit is one, then no interrupt request is generated. 
Consequently the OF bit also functions as a count overflow
interrupt disable for the associated hpmcounter.

----------------------------  Non-Normative Text    -----------
-----------------
There are not separate overflow status and overflow interrupt
enable bits.  In practice, enabling overflow interrupt
generation (by clearing the OF bit) is done in conjunction with
initializing the counter to a starting value.  Once a counter
has overflowed, it and the OF bit must be reinitialized before
another overflow interrupt can be generated.
---------------------------------------------------------------
-------------------------

This "count overflow interrupt request" signal is treated as a
standard local interrupt that corresponds to bit 13 in the
mip/mie/sip/sie registers.  The mip/sip LCOFIP and mie/sie
LCOFIE bits are respectively the interrupt-pending and
interrupt-enable bits for this interrupt.  ('LCOFI' represents
'Local Count Overflow Interrupt'.)  [ This proposal doesn't try
to introduce per-privilege mode overflow interrupt request
signals.  ARMv8 doesn't have this and I don't think x86 does
either. ]
 
Generation of a "count overflow interrupt request" by an
hpmcounter sets the LCOFIP bit in the mip/sip registers and
sets the associated OF bit.  The LCOFIP bit is cleared by
software after servicing the count overflow interrupt resulting
from one or more count overflows.

----------------------------  Non-Normative Text    -----------
-----------------
Software can maintain a bit mask to distinguish newly
overflowed counters (yet to be serviced by an overflow
interrupt handler) from overflowed counters that
have already been serviced or that are configured to not
generate an interrupt on overflow.
---------------------------------------------------------------
-------------------------

Machine Interrupt Registers (mip and mie)
[ This extension adds the description of the LCOFIP/LCOFIE bits
in these registers (and modifies related text) as follows: ]

LCOFIP is added to mip in Figure 3.14 as bit 13.  LCOFIP is
added to mie in Figure 3.15 as bit 13.

If the Sscof extension is implemented, bits mip.LCOFIP and
mie.LCOFIE are the interrupt-pending and interrupt-enable bits
for local count overflow interrupts.  LCOFIP is read-write in
mip and reflects the occurrence of a local count overflow
interrupt request resulting from any of the mhpmeventn.OF bits
being set.   If the Sscof extension is not implemented, these
LCOFIP and LCOFIE bits are hardwired to zeros.

Multiple simultaneous interrupts destined for different
privilege modes are handled in decreasing order of destined
privilege mode. Multiple simultaneous interrupts destined for
the same privilege mode are handled in the following decreasing
priority order: MEI, MSI, MTI, SEI, SSI, STI, LCOFI.

===============================================================
==========
=======================  Supervisor-Level ISA Additions 
========================

Supervisor Interrupt Registers (sip and sie)
[ This extension adds the description of the LCOFIP/LCOFIE bits
in these registers (and modifies related text) as follows: ]

LCOFIP is added to sip in Figure 4.6 as bit 13.  LCOFIP is
added to sie in Figure 4.7 as bit 13.

If the Sscof extension is implemented, bits sip.LCOFIP and
sie.LCOFIE are the interrupt-pending and interrupt-enable bits
for local count overflow interrupts.  LCOFIP is read-write in
sip and reflects the occurrence of a local count overflow
interrupt request resulting from any of the mhpmeventn.OF bits
being set.  If the Sscof extension is not implemented, these
LCOFIP and LCOFIE bits are hardwired to zeros. 

Each standard interrupt type (LCOFI, SEI, STI, or SSI) may not
be implemented, in which case the corresponding interrupt-
pending and interrupt-enable bits are hardwired to zeros.  All
bits in sip and sie are WARL fields.

Multiple simultaneous interrupts destined for supervisor mode
are handled in the following decreasing priority order: SEI,
SSI, STI, LCOFI.

Supervisor Count Overflow (scountovf)
[ This extension adds this new CSR. ]

The scountovf CSR is a 32-bit read-only register that contains
shadow copies of the OF bits in the 32 mhpmevent CSRs -
where scountovf bit X corresponds to mhpmeventX.  The proposed
CSR number is 0xD33.

This register enables supervisor-level overflow interrupt
handler software to quickly and easily determine which
counter(s) have overflowed (without needing to make an
execution environment call or series of calls ultimately up to
M-mode).  [ ARMv8 and x86 have a similar register for the same
reasons. ]

Read access to bit X is subject to the same mcounteren (or
mcounteren and hcounteren) CSRs that mediate access to the
hpmcounter CSRs by S-mode (or VS-mode).  In M and S
modes, scountovf bit X is readable when mcounteren bit X is
set, and otherwise reads as zero.  Similarly, in VS
mode, scountovf bit X is readable when mcounteren bit X and
hcounteren bit X are both set, and otherwise reads as zero. 
--
Regards,
Atish


Greg Favor
 

On Wed, Feb 24, 2021 at 5:31 PM Brian Grayson <brian.grayson@...> wrote:
How would one make compliance tests for this extension? For example, how can one test the overflow exception when there is no standard on how to configure any of the HPM event counters to count, in order to cause an overflow?

These are the sorts of questions I'm about to start grappling with. :) 

Usually, anything having to do with performance monitors is not implemented in functional simulators. Would an exemption be needed for that?

Very possibly (tbd).  Which doesn't mean that basic aspects of the extension can't be tested; just that the core functionality at the heart of the extension may be problematic.
 
Of course, the CSRs need to be supported in the simulators, but expecting a simulator to implement any of the events (that aren't standardized), in order to allow it to cause an overflow exception, seems a tall order, and of limited usefulness to the community.

Agreed.
 
In fact, a lot of the DoD appears to be oriented towards instruction extensions, and not CSR/behavior extensions like this one. It seems like most of items 4, 5, 6, 7, and 8 are all waiver-worthy, for example.

The question to be answered is how much of each of 4-8 can be done.  To get waivers there will need to be a concrete assessment and explanation as to what can and can't be done.  It's hard to imagine that not even having the added CSRs and *ip/*ie bits put into these various models will be deemed acceptable.  Whereas I agree that there is only so much than can usefully be done.

Greg


Allen Baum
 

I think I have trouble answering questions concisely.

From a Definition of Done perspective, you're supposed to have architectural tests that will pass when run on the Sail model and on Spike -
more specifically, generate the same signature. 

The signature for this case would have to be that it interrupted where it was expected on both the reference model and the device under test
We do that by comparing trap state: xEPC, xCAUSE, xIP, xTVAL by the interrupt handler.
That works well for synchronous traps, but works less well for asynchronous events, e.g. interrupts.
If you can't inject the interrupt at a deterministic point, xEPC will differ.

The only way I see to pick a deterministic point is to interrupt on the INSTRET counter, which is as close to architectural as you might get.

There are (at least) 4 problems with this:
 
1. As has been pointed out, counter events are implementation specific.
    The only solution I can see is to choose instret, which is writable, required, and deterministic
2. Oops, the current definition of the instruction doesn't include instret.
    This can be fixed by adding another overflow bit , or by further standardizing event selection to include the equivalent 
3 There is no architectural requirement that the interval from an interrupt event occurring to the first instruction of the handler being executed.
4. what happens when an implementation retires more than instruction per cycle? 
    These last 2 are related: even if the event occurs at a deterministic point, the interrupt still won't necessarily occur at a deterministic point

I think we have possible solutions from a framework perspective, but they're ways off .
This sometimes requires the reference model (Sail or possibly Spike) to be able to replicate the possible behaviors.

For this particular case there may be a test structure that might work: 
  disable counter interrupts until after the point that the counter overflows, and enable it at some deterministic point later.
Again: I don't know if there an architectural requirement for the interval between an instruction enabling an interrupt and the first instruction of a trap handler executing
But, if there is AND we can guarantee a universal architectural event - we have a chance of this working.
Otherwise, we would have to have a modified version of the framework (e.g. one that somehow allows fuzzy comparisons for some specific signature values)


On Wed, Feb 24, 2021 at 5:55 PM Atish Patra <atish.patra@...> wrote:
On Wed, 2021-02-24 at 17:01 -0800, Greg Favor wrote:
> Cc'ing tech-priv since others may be wondering about the answer to
> Brian's question.
>
> Brian,
>
> As Mark touched on below, there is a whole "Definition of Done"
> checklist of items that needs to be done (including software support,
> Spike and Sail models, OACR review, PoC, ...).  So that is the next
> order of business that I need to work on - where "I" doesn't mean me
> doing it all (or even having the expertise to do all those things). 
> Btw, would you or anyone else be willing and able to help out with
> one of the DoD checklist items?  Any help will be greatly appreciated
> and will help move this ball towards the goal line.
>

Hi Greg,
I am working on the SBI PMU extension implementation in OpenSBI & Linux
kernel. I will update the SBI PMU extension based on Sscofpmf extension
as well.

I can work on implementing the Sscofpmf extension in Qemu and required
software changes in OpenSBI & Linux kernel as well if that is okay with
you.

Do we require anything else for the PoC part of the DoD policy?

> Greg
>
> On Wed, Feb 24, 2021 at 3:54 PM Mark Himelstein <
> markhimelstein@...> wrote:
> > please check out the ratification policy for next steps.
> >
> > https://docs.google.com/document/d/1-UlaSGqk59_myeuPMrV9gyuaIgnmFzGh5Gfy_tpViwM/edit
> >
> >
> > On Wed, Feb 24, 2021 at 3:43 PM Brian Grayson <
> > brian.grayson@...> wrote:
> > > It's been three weeks since this proposal has been floated, and
> > > feedback was provided on the list. Everyone I've checked with
> > > off-list has been okay with the spec.
> > >
> > > Does anyone object to moving it forward, towards fast-track
> > > ratification? Is there anything else required before it begins
> > > the 45-day public review?
> > >
> > > Greg, do you want to publish the latest version with the tweaks
> > > that you made based on the earlier feedback, for reference?
> > >
> > > Thanks.
> > >
> > > Brian
> > >
> > > On Mon, Feb 1, 2021 at 12:38 AM Greg Favor <
> > > gfavor@...> wrote:
> > > > Hi all,
> > > >
> > > > Recently the TSC established a lightweight "fast track"
> > > > architecture extension process that small, straightforward,
> > > > relatively uncontentious arch extension proposals can utilize. 
> > > > This is the second of two Privileged architecture related small
> > > > extensions - that a number of people/companies have expressed
> > > > desire for over the past year - that Andrew and I discussed
> > > > trying to help move through this process sooner than later
> > > > (especially since this entails much more than simply developing
> > > > a spec).  The following starts with an intro for context, and
> > > > then provides the draft spec.
> > > >
> > > > Note that the draft spec is written as the actual changes to be
> > > > made to existing paragraphs of Priv spec text (or additional
> > > > paragraphs and/or sections within the existing text).  The
> > > > surrounding sentence(s) of a change are included for context. 
> > > > Text in square brackets is temporary commentary that is not
> > > > part of the proposed spec changes.
> > > >
> > > > In anticipation of some questions that may arise in people's
> > > > minds, I'll note that this extension has been
> > > > extensively reviewed by the lead architects of the Privileged
> > > > and Hypervisor architectures for consistency with the current
> > > > architecture (including little things like extension, CSR, and
> > > > bit/field names).  Various changes were made along the way
> > > > because of this.
> > > >
> > > > ===============================================================
> > > > ================
> > > > Introduction
> > > >
> > > > The current Privileged specification defines mhpmevent CSRs to
> > > > select and control event counting by the associated hpmcounter
> > > > CSRs, but provides no standardization of any fields within
> > > > these CSRs.  For at least Linux-class rich-OS systems it is
> > > > desirable to standardize certain basic features that are
> > > > broadly desired (and have come up over the past year plus on
> > > > RISC-V lists, as well as have been the subject of past
> > > > proposals).  This enables there to be standard upstream
> > > > software support that eliminates the need for implementations
> > > > to provide their own custom software support.  (Implementations
> > > > are free, of course, to not implement this extension.)
> > > >
> > > > This proposal serves to accomplish exactly this within the
> > > > existing mhpmevent CSRs (and correspondingly avoids the
> > > > unnecessary creation of whole new sets of CSRs - past just one
> > > > new CSR).
> > > >
> > > > Below is a one-page draft spec of the proposal - which sticks
> > > > to addressing two basic well-understood needs that have been
> > > > requested by various people.  The proposed extension name is
> > > > "Sscof" ('Ss' for Privileged arch and Supervisor-level
> > > > extensions, and 'cof' for Count Overflow and Filtering).  There
> > > > are other features that various people may desire (and that
> > > > even I would desire) that don't have clear-cut, non-
> > > > contentious, and relatively broad support.  These can be grist
> > > > for separate discussions and possibly another arch extension by
> > > > a motivated party that gathers a sufficient degree of
> > > > concensus.
> > > >
> > > > Although one such feature worth highlighting is having a WrEn
> > > > bit in mhpmevent that allows lower privilege modes that can
> > > > read the associated hpmcounter CSR (based on the *counteren
> > > > CSRs) to also be able to write it.  In essence enabling direct
> > > > S/VS-mode and U/VU-mode write access instead of always
> > > > requiring OpenSBI calls up to M-mode.  But this feature has had
> > > > some contention, involves some details to properly support
> > > > virtualization, and requires allocating a second set of "User-
> > > > Read-Write" hpmcounter CSR numbers (since the current
> > > > hpmcounter CSRs are "User-Read-Only").  If there is a broad
> > > > upwelling of support and justification for this feature, and
> > > > some party willing to put together a complete spec (including
> > > > virtualization support), then this could be another fast-track
> > > > extension.
> > > >
> > > > Lastly note that the new count overflow interrupt will be
> > > > treated as a standard local interrupt that is assigned to bit
> > > > 13 in the mip/mie/sip/sie registers.  (This has been discussed
> > > > and agreed to with key Priv Arch people.)
> > > >
> > > > This posting to this email list starts an initial review period
> > > > (over the next few weeks) for people to provide feedback,
> > > > questions, comments, etc.
> > > >
> > > > ===============================================================
> > > > =================
> > > > Proposed Spec
> > > >
> > > > ===============================================================
> > > > ========
> > > > =======================  Machine-Level ISA Additions 
> > > > ========================
> > > >
> > > > Hardware Performance Monitor
> > > > [ This extension expands the hardware performance monitor
> > > > description and extends the mhpmevent registers to 64 bits (in
> > > > RV32) as follows: ]
> > > >
> > > > The hardware performance monitor includes 29 additional 64-bit
> > > > event counters and 29 associated 64-bit event selector
> > > > registers - the mhpmcounter3–mhpmcounter31 and
> > > > mhpmevent3–mhpmevent31 CSRs.
> > > >
> > > > The mhpmcounters are WARL registers that support up to 64 bits
> > > > of precision on RV32 and RV64. 
> > > >
> > > > The mhpmeventn registers are WARL registers that control which
> > > > event causes the corresponding counter to increment and what
> > > > happens when the corresponding count overflows. Currently just
> > > > a few bits are defined here.  Past this, the actual selection
> > > > and meaning of events is defined by the platform, but
> > > > (mhpmevent == 0) is defined to mean “no event" and that the
> > > > corresponding counter will never be incremented.  Typically the
> > > > lower bits of mhpmevent will be used for event selection
> > > > purposes.  
> > > >
> > > > On RV32 only, reads of the mcycle, minstret, mhpmcountern, and
> > > > mhpmeventn CSRs return the low 32 bits, while reads of the
> > > > mcycleh, minstreth, mhpmcounternh, and mhpmeventnh CSRs return
> > > > bits 63–32 of the corresponding counter or event selector.  [
> > > > The proposed CSR numbers for mhpmeventnh are 0x723 - 0x73F. ]
> > > >
> > > > The following bits are added to mhpmevent:
> > > >
> > > > bit [63]  OF            -  Overflow status and interrupt
> > > > disable bit that is set when counter overflows
> > > >
> > > > bit [62]  MINH        -  If set, then counting of events in M-
> > > > mode is inhibited
> > > > bit [61]  SINH         -  If set, then counting of events in
> > > > S/HS-mode is inhibited
> > > > bit [60]  UINH         -  If set, then counting of events in U-
> > > > mode is inhibited
> > > > bit [59]  VSINH       -  If set, then counting of events in VS-
> > > > mode is inhibited
> > > > bit [59]  VUINH       -  If set, then counting of events in VU-
> > > > mode is inhibited
> > > > bit [58]  0                -  Reserved for possible future
> > > > modes
> > > > bit [57]  0                -  Reserved for possible future
> > > > modes
> > > >
> > > > Each of the five 'x'INH bits, when set, inhibit counting of
> > > > events while in privilege mode 'x'.  All-zeroes for these bits
> > > > results in counting of events in all modes.
> > > >
> > > > The OF bit is set when the corresponding hpmcounter overflows,
> > > > and remains set until written by software.  Since hpmcounter
> > > > values are unsigned values, overflow is defined as unsigned
> > > > overflow.  [ This matches x86 and ARMv8. ]  Note that there is
> > > > no loss of information after an overflow since the counter
> > > > wraps around and keeps counting while the sticky OF bit remains
> > > > set.  [ For a 64-bit counter it will be an awfully long time
> > > > before another overflow could possibly occur. ]
> > > >
> > > > If supervisor mode is implemented, the 32-bit scountovf
> > > > register contains read-only shadow copies of the OF bits in all
> > > > 32 mhpmevent registers.
> > > >
> > > > If an hpmcounter overflows while the associated OF bit is zero,
> > > > then a "count overflow interrupt request" is generated.  If the
> > > > OF bit is one, then no interrupt request is generated. 
> > > > Consequently the OF bit also functions as a count overflow
> > > > interrupt disable for the associated hpmcounter.
> > > >
> > > > ----------------------------  Non-Normative Text    -----------
> > > > -----------------
> > > > There are not separate overflow status and overflow interrupt
> > > > enable bits.  In practice, enabling overflow interrupt
> > > > generation (by clearing the OF bit) is done in conjunction with
> > > > initializing the counter to a starting value.  Once a counter
> > > > has overflowed, it and the OF bit must be reinitialized before
> > > > another overflow interrupt can be generated.
> > > > ---------------------------------------------------------------
> > > > -------------------------
> > > >
> > > > This "count overflow interrupt request" signal is treated as a
> > > > standard local interrupt that corresponds to bit 13 in the
> > > > mip/mie/sip/sie registers.  The mip/sip LCOFIP and mie/sie
> > > > LCOFIE bits are respectively the interrupt-pending and
> > > > interrupt-enable bits for this interrupt.  ('LCOFI' represents
> > > > 'Local Count Overflow Interrupt'.)  [ This proposal doesn't try
> > > > to introduce per-privilege mode overflow interrupt request
> > > > signals.  ARMv8 doesn't have this and I don't think x86 does
> > > > either. ]
> > > >  
> > > > Generation of a "count overflow interrupt request" by an
> > > > hpmcounter sets the LCOFIP bit in the mip/sip registers and
> > > > sets the associated OF bit.  The LCOFIP bit is cleared by
> > > > software after servicing the count overflow interrupt resulting
> > > > from one or more count overflows.
> > > >
> > > > ----------------------------  Non-Normative Text    -----------
> > > > -----------------
> > > > Software can maintain a bit mask to distinguish newly
> > > > overflowed counters (yet to be serviced by an overflow
> > > > interrupt handler) from overflowed counters that
> > > > have already been serviced or that are configured to not
> > > > generate an interrupt on overflow.
> > > > ---------------------------------------------------------------
> > > > -------------------------
> > > >
> > > > Machine Interrupt Registers (mip and mie)
> > > > [ This extension adds the description of the LCOFIP/LCOFIE bits
> > > > in these registers (and modifies related text) as follows: ]
> > > >
> > > > LCOFIP is added to mip in Figure 3.14 as bit 13.  LCOFIP is
> > > > added to mie in Figure 3.15 as bit 13.
> > > >
> > > > If the Sscof extension is implemented, bits mip.LCOFIP and
> > > > mie.LCOFIE are the interrupt-pending and interrupt-enable bits
> > > > for local count overflow interrupts.  LCOFIP is read-write in
> > > > mip and reflects the occurrence of a local count overflow
> > > > interrupt request resulting from any of the mhpmeventn.OF bits
> > > > being set.   If the Sscof extension is not implemented, these
> > > > LCOFIP and LCOFIE bits are hardwired to zeros.
> > > >
> > > > Multiple simultaneous interrupts destined for different
> > > > privilege modes are handled in decreasing order of destined
> > > > privilege mode. Multiple simultaneous interrupts destined for
> > > > the same privilege mode are handled in the following decreasing
> > > > priority order: MEI, MSI, MTI, SEI, SSI, STI, LCOFI.
> > > >
> > > > ===============================================================
> > > > ==========
> > > > =======================  Supervisor-Level ISA Additions 
> > > > ========================
> > > >
> > > > Supervisor Interrupt Registers (sip and sie)
> > > > [ This extension adds the description of the LCOFIP/LCOFIE bits
> > > > in these registers (and modifies related text) as follows: ]
> > > >
> > > > LCOFIP is added to sip in Figure 4.6 as bit 13.  LCOFIP is
> > > > added to sie in Figure 4.7 as bit 13.
> > > >
> > > > If the Sscof extension is implemented, bits sip.LCOFIP and
> > > > sie.LCOFIE are the interrupt-pending and interrupt-enable bits
> > > > for local count overflow interrupts.  LCOFIP is read-write in
> > > > sip and reflects the occurrence of a local count overflow
> > > > interrupt request resulting from any of the mhpmeventn.OF bits
> > > > being set.  If the Sscof extension is not implemented, these
> > > > LCOFIP and LCOFIE bits are hardwired to zeros. 
> > > >
> > > > Each standard interrupt type (LCOFI, SEI, STI, or SSI) may not
> > > > be implemented, in which case the corresponding interrupt-
> > > > pending and interrupt-enable bits are hardwired to zeros.  All
> > > > bits in sip and sie are WARL fields.
> > > >
> > > > Multiple simultaneous interrupts destined for supervisor mode
> > > > are handled in the following decreasing priority order: SEI,
> > > > SSI, STI, LCOFI.
> > > >
> > > > Supervisor Count Overflow (scountovf)
> > > > [ This extension adds this new CSR. ]
> > > >
> > > > The scountovf CSR is a 32-bit read-only register that contains
> > > > shadow copies of the OF bits in the 32 mhpmevent CSRs -
> > > > where scountovf bit X corresponds to mhpmeventX.  The proposed
> > > > CSR number is 0xD33.
> > > >
> > > > This register enables supervisor-level overflow interrupt
> > > > handler software to quickly and easily determine which
> > > > counter(s) have overflowed (without needing to make an
> > > > execution environment call or series of calls ultimately up to
> > > > M-mode).  [ ARMv8 and x86 have a similar register for the same
> > > > reasons. ]
> > > >
> > > > Read access to bit X is subject to the same mcounteren (or
> > > > mcounteren and hcounteren) CSRs that mediate access to the
> > > > hpmcounter CSRs by S-mode (or VS-mode).  In M and S
> > > > modes, scountovf bit X is readable when mcounteren bit X is
> > > > set, and otherwise reads as zero.  Similarly, in VS
> > > > mode, scountovf bit X is readable when mcounteren bit X and
> > > > hcounteren bit X are both set, and otherwise reads as zero. 
> > > >

--
Regards,
Atish






Allen Baum
 

Addendum: 
 without some kind of event standardization, even after all of this, we can only test one counter...not great coverage.

On Wed, Feb 24, 2021 at 8:03 PM Allen Baum <allen.baum@...> wrote:
I think I have trouble answering questions concisely.

From a Definition of Done perspective, you're supposed to have architectural tests that will pass when run on the Sail model and on Spike -
more specifically, generate the same signature. 

The signature for this case would have to be that it interrupted where it was expected on both the reference model and the device under test
We do that by comparing trap state: xEPC, xCAUSE, xIP, xTVAL by the interrupt handler.
That works well for synchronous traps, but works less well for asynchronous events, e.g. interrupts.
If you can't inject the interrupt at a deterministic point, xEPC will differ.

The only way I see to pick a deterministic point is to interrupt on the INSTRET counter, which is as close to architectural as you might get.

There are (at least) 4 problems with this:
 
1. As has been pointed out, counter events are implementation specific.
    The only solution I can see is to choose instret, which is writable, required, and deterministic
2. Oops, the current definition of the instruction doesn't include instret.
    This can be fixed by adding another overflow bit , or by further standardizing event selection to include the equivalent 
3 There is no architectural requirement that the interval from an interrupt event occurring to the first instruction of the handler being executed.
4. what happens when an implementation retires more than instruction per cycle? 
    These last 2 are related: even if the event occurs at a deterministic point, the interrupt still won't necessarily occur at a deterministic point

I think we have possible solutions from a framework perspective, but they're ways off .
This sometimes requires the reference model (Sail or possibly Spike) to be able to replicate the possible behaviors.

For this particular case there may be a test structure that might work: 
  disable counter interrupts until after the point that the counter overflows, and enable it at some deterministic point later.
Again: I don't know if there an architectural requirement for the interval between an instruction enabling an interrupt and the first instruction of a trap handler executing
But, if there is AND we can guarantee a universal architectural event - we have a chance of this working.
Otherwise, we would have to have a modified version of the framework (e.g. one that somehow allows fuzzy comparisons for some specific signature values)

On Wed, Feb 24, 2021 at 5:55 PM Atish Patra <atish.patra@...> wrote:
On Wed, 2021-02-24 at 17:01 -0800, Greg Favor wrote:
> Cc'ing tech-priv since others may be wondering about the answer to
> Brian's question.
>
> Brian,
>
> As Mark touched on below, there is a whole "Definition of Done"
> checklist of items that needs to be done (including software support,
> Spike and Sail models, OACR review, PoC, ...).  So that is the next
> order of business that I need to work on - where "I" doesn't mean me
> doing it all (or even having the expertise to do all those things). 
> Btw, would you or anyone else be willing and able to help out with
> one of the DoD checklist items?  Any help will be greatly appreciated
> and will help move this ball towards the goal line.
>

Hi Greg,
I am working on the SBI PMU extension implementation in OpenSBI & Linux
kernel. I will update the SBI PMU extension based on Sscofpmf extension
as well.

I can work on implementing the Sscofpmf extension in Qemu and required
software changes in OpenSBI & Linux kernel as well if that is okay with
you.

Do we require anything else for the PoC part of the DoD policy?

> Greg
>
> On Wed, Feb 24, 2021 at 3:54 PM Mark Himelstein <
> markhimelstein@...> wrote:
> > please check out the ratification policy for next steps.
> >
> > https://docs.google.com/document/d/1-UlaSGqk59_myeuPMrV9gyuaIgnmFzGh5Gfy_tpViwM/edit
> >
> >
> > On Wed, Feb 24, 2021 at 3:43 PM Brian Grayson <
> > brian.grayson@...> wrote:
> > > It's been three weeks since this proposal has been floated, and
> > > feedback was provided on the list. Everyone I've checked with
> > > off-list has been okay with the spec.
> > >
> > > Does anyone object to moving it forward, towards fast-track
> > > ratification? Is there anything else required before it begins
> > > the 45-day public review?
> > >
> > > Greg, do you want to publish the latest version with the tweaks
> > > that you made based on the earlier feedback, for reference?
> > >
> > > Thanks.
> > >
> > > Brian
> > >
> > > On Mon, Feb 1, 2021 at 12:38 AM Greg Favor <
> > > gfavor@...> wrote:
> > > > Hi all,
> > > >
> > > > Recently the TSC established a lightweight "fast track"
> > > > architecture extension process that small, straightforward,
> > > > relatively uncontentious arch extension proposals can utilize. 
> > > > This is the second of two Privileged architecture related small
> > > > extensions - that a number of people/companies have expressed
> > > > desire for over the past year - that Andrew and I discussed
> > > > trying to help move through this process sooner than later
> > > > (especially since this entails much more than simply developing
> > > > a spec).  The following starts with an intro for context, and
> > > > then provides the draft spec.
> > > >
> > > > Note that the draft spec is written as the actual changes to be
> > > > made to existing paragraphs of Priv spec text (or additional
> > > > paragraphs and/or sections within the existing text).  The
> > > > surrounding sentence(s) of a change are included for context. 
> > > > Text in square brackets is temporary commentary that is not
> > > > part of the proposed spec changes.
> > > >
> > > > In anticipation of some questions that may arise in people's
> > > > minds, I'll note that this extension has been
> > > > extensively reviewed by the lead architects of the Privileged
> > > > and Hypervisor architectures for consistency with the current
> > > > architecture (including little things like extension, CSR, and
> > > > bit/field names).  Various changes were made along the way
> > > > because of this.
> > > >
> > > > ===============================================================
> > > > ================
> > > > Introduction
> > > >
> > > > The current Privileged specification defines mhpmevent CSRs to
> > > > select and control event counting by the associated hpmcounter
> > > > CSRs, but provides no standardization of any fields within
> > > > these CSRs.  For at least Linux-class rich-OS systems it is
> > > > desirable to standardize certain basic features that are
> > > > broadly desired (and have come up over the past year plus on
> > > > RISC-V lists, as well as have been the subject of past
> > > > proposals).  This enables there to be standard upstream
> > > > software support that eliminates the need for implementations
> > > > to provide their own custom software support.  (Implementations
> > > > are free, of course, to not implement this extension.)
> > > >
> > > > This proposal serves to accomplish exactly this within the
> > > > existing mhpmevent CSRs (and correspondingly avoids the
> > > > unnecessary creation of whole new sets of CSRs - past just one
> > > > new CSR).
> > > >
> > > > Below is a one-page draft spec of the proposal - which sticks
> > > > to addressing two basic well-understood needs that have been
> > > > requested by various people.  The proposed extension name is
> > > > "Sscof" ('Ss' for Privileged arch and Supervisor-level
> > > > extensions, and 'cof' for Count Overflow and Filtering).  There
> > > > are other features that various people may desire (and that
> > > > even I would desire) that don't have clear-cut, non-
> > > > contentious, and relatively broad support.  These can be grist
> > > > for separate discussions and possibly another arch extension by
> > > > a motivated party that gathers a sufficient degree of
> > > > concensus.
> > > >
> > > > Although one such feature worth highlighting is having a WrEn
> > > > bit in mhpmevent that allows lower privilege modes that can
> > > > read the associated hpmcounter CSR (based on the *counteren
> > > > CSRs) to also be able to write it.  In essence enabling direct
> > > > S/VS-mode and U/VU-mode write access instead of always
> > > > requiring OpenSBI calls up to M-mode.  But this feature has had
> > > > some contention, involves some details to properly support
> > > > virtualization, and requires allocating a second set of "User-
> > > > Read-Write" hpmcounter CSR numbers (since the current
> > > > hpmcounter CSRs are "User-Read-Only").  If there is a broad
> > > > upwelling of support and justification for this feature, and
> > > > some party willing to put together a complete spec (including
> > > > virtualization support), then this could be another fast-track
> > > > extension.
> > > >
> > > > Lastly note that the new count overflow interrupt will be
> > > > treated as a standard local interrupt that is assigned to bit
> > > > 13 in the mip/mie/sip/sie registers.  (This has been discussed
> > > > and agreed to with key Priv Arch people.)
> > > >
> > > > This posting to this email list starts an initial review period
> > > > (over the next few weeks) for people to provide feedback,
> > > > questions, comments, etc.
> > > >
> > > > ===============================================================
> > > > =================
> > > > Proposed Spec
> > > >
> > > > ===============================================================
> > > > ========
> > > > =======================  Machine-Level ISA Additions 
> > > > ========================
> > > >
> > > > Hardware Performance Monitor
> > > > [ This extension expands the hardware performance monitor
> > > > description and extends the mhpmevent registers to 64 bits (in
> > > > RV32) as follows: ]
> > > >
> > > > The hardware performance monitor includes 29 additional 64-bit
> > > > event counters and 29 associated 64-bit event selector
> > > > registers - the mhpmcounter3–mhpmcounter31 and
> > > > mhpmevent3–mhpmevent31 CSRs.
> > > >
> > > > The mhpmcounters are WARL registers that support up to 64 bits
> > > > of precision on RV32 and RV64. 
> > > >
> > > > The mhpmeventn registers are WARL registers that control which
> > > > event causes the corresponding counter to increment and what
> > > > happens when the corresponding count overflows. Currently just
> > > > a few bits are defined here.  Past this, the actual selection
> > > > and meaning of events is defined by the platform, but
> > > > (mhpmevent == 0) is defined to mean “no event" and that the
> > > > corresponding counter will never be incremented.  Typically the
> > > > lower bits of mhpmevent will be used for event selection
> > > > purposes.  
> > > >
> > > > On RV32 only, reads of the mcycle, minstret, mhpmcountern, and
> > > > mhpmeventn CSRs return the low 32 bits, while reads of the
> > > > mcycleh, minstreth, mhpmcounternh, and mhpmeventnh CSRs return
> > > > bits 63–32 of the corresponding counter or event selector.  [
> > > > The proposed CSR numbers for mhpmeventnh are 0x723 - 0x73F. ]
> > > >
> > > > The following bits are added to mhpmevent:
> > > >
> > > > bit [63]  OF            -  Overflow status and interrupt
> > > > disable bit that is set when counter overflows
> > > >
> > > > bit [62]  MINH        -  If set, then counting of events in M-
> > > > mode is inhibited
> > > > bit [61]  SINH         -  If set, then counting of events in
> > > > S/HS-mode is inhibited
> > > > bit [60]  UINH         -  If set, then counting of events in U-
> > > > mode is inhibited
> > > > bit [59]  VSINH       -  If set, then counting of events in VS-
> > > > mode is inhibited
> > > > bit [59]  VUINH       -  If set, then counting of events in VU-
> > > > mode is inhibited
> > > > bit [58]  0                -  Reserved for possible future
> > > > modes
> > > > bit [57]  0                -  Reserved for possible future
> > > > modes
> > > >
> > > > Each of the five 'x'INH bits, when set, inhibit counting of
> > > > events while in privilege mode 'x'.  All-zeroes for these bits
> > > > results in counting of events in all modes.
> > > >
> > > > The OF bit is set when the corresponding hpmcounter overflows,
> > > > and remains set until written by software.  Since hpmcounter
> > > > values are unsigned values, overflow is defined as unsigned
> > > > overflow.  [ This matches x86 and ARMv8. ]  Note that there is
> > > > no loss of information after an overflow since the counter
> > > > wraps around and keeps counting while the sticky OF bit remains
> > > > set.  [ For a 64-bit counter it will be an awfully long time
> > > > before another overflow could possibly occur. ]
> > > >
> > > > If supervisor mode is implemented, the 32-bit scountovf
> > > > register contains read-only shadow copies of the OF bits in all
> > > > 32 mhpmevent registers.
> > > >
> > > > If an hpmcounter overflows while the associated OF bit is zero,
> > > > then a "count overflow interrupt request" is generated.  If the
> > > > OF bit is one, then no interrupt request is generated. 
> > > > Consequently the OF bit also functions as a count overflow
> > > > interrupt disable for the associated hpmcounter.
> > > >
> > > > ----------------------------  Non-Normative Text    -----------
> > > > -----------------
> > > > There are not separate overflow status and overflow interrupt
> > > > enable bits.  In practice, enabling overflow interrupt
> > > > generation (by clearing the OF bit) is done in conjunction with
> > > > initializing the counter to a starting value.  Once a counter
> > > > has overflowed, it and the OF bit must be reinitialized before
> > > > another overflow interrupt can be generated.
> > > > ---------------------------------------------------------------
> > > > -------------------------
> > > >
> > > > This "count overflow interrupt request" signal is treated as a
> > > > standard local interrupt that corresponds to bit 13 in the
> > > > mip/mie/sip/sie registers.  The mip/sip LCOFIP and mie/sie
> > > > LCOFIE bits are respectively the interrupt-pending and
> > > > interrupt-enable bits for this interrupt.  ('LCOFI' represents
> > > > 'Local Count Overflow Interrupt'.)  [ This proposal doesn't try
> > > > to introduce per-privilege mode overflow interrupt request
> > > > signals.  ARMv8 doesn't have this and I don't think x86 does
> > > > either. ]
> > > >  
> > > > Generation of a "count overflow interrupt request" by an
> > > > hpmcounter sets the LCOFIP bit in the mip/sip registers and
> > > > sets the associated OF bit.  The LCOFIP bit is cleared by
> > > > software after servicing the count overflow interrupt resulting
> > > > from one or more count overflows.
> > > >
> > > > ----------------------------  Non-Normative Text    -----------
> > > > -----------------
> > > > Software can maintain a bit mask to distinguish newly
> > > > overflowed counters (yet to be serviced by an overflow
> > > > interrupt handler) from overflowed counters that
> > > > have already been serviced or that are configured to not
> > > > generate an interrupt on overflow.
> > > > ---------------------------------------------------------------
> > > > -------------------------
> > > >
> > > > Machine Interrupt Registers (mip and mie)
> > > > [ This extension adds the description of the LCOFIP/LCOFIE bits
> > > > in these registers (and modifies related text) as follows: ]
> > > >
> > > > LCOFIP is added to mip in Figure 3.14 as bit 13.  LCOFIP is
> > > > added to mie in Figure 3.15 as bit 13.
> > > >
> > > > If the Sscof extension is implemented, bits mip.LCOFIP and
> > > > mie.LCOFIE are the interrupt-pending and interrupt-enable bits
> > > > for local count overflow interrupts.  LCOFIP is read-write in
> > > > mip and reflects the occurrence of a local count overflow
> > > > interrupt request resulting from any of the mhpmeventn.OF bits
> > > > being set.   If the Sscof extension is not implemented, these
> > > > LCOFIP and LCOFIE bits are hardwired to zeros.
> > > >
> > > > Multiple simultaneous interrupts destined for different
> > > > privilege modes are handled in decreasing order of destined
> > > > privilege mode. Multiple simultaneous interrupts destined for
> > > > the same privilege mode are handled in the following decreasing
> > > > priority order: MEI, MSI, MTI, SEI, SSI, STI, LCOFI.
> > > >
> > > > ===============================================================
> > > > ==========
> > > > =======================  Supervisor-Level ISA Additions 
> > > > ========================
> > > >
> > > > Supervisor Interrupt Registers (sip and sie)
> > > > [ This extension adds the description of the LCOFIP/LCOFIE bits
> > > > in these registers (and modifies related text) as follows: ]
> > > >
> > > > LCOFIP is added to sip in Figure 4.6 as bit 13.  LCOFIP is
> > > > added to sie in Figure 4.7 as bit 13.
> > > >
> > > > If the Sscof extension is implemented, bits sip.LCOFIP and
> > > > sie.LCOFIE are the interrupt-pending and interrupt-enable bits
> > > > for local count overflow interrupts.  LCOFIP is read-write in
> > > > sip and reflects the occurrence of a local count overflow
> > > > interrupt request resulting from any of the mhpmeventn.OF bits
> > > > being set.  If the Sscof extension is not implemented, these
> > > > LCOFIP and LCOFIE bits are hardwired to zeros. 
> > > >
> > > > Each standard interrupt type (LCOFI, SEI, STI, or SSI) may not
> > > > be implemented, in which case the corresponding interrupt-
> > > > pending and interrupt-enable bits are hardwired to zeros.  All
> > > > bits in sip and sie are WARL fields.
> > > >
> > > > Multiple simultaneous interrupts destined for supervisor mode
> > > > are handled in the following decreasing priority order: SEI,
> > > > SSI, STI, LCOFI.
> > > >
> > > > Supervisor Count Overflow (scountovf)
> > > > [ This extension adds this new CSR. ]
> > > >
> > > > The scountovf CSR is a 32-bit read-only register that contains
> > > > shadow copies of the OF bits in the 32 mhpmevent CSRs -
> > > > where scountovf bit X corresponds to mhpmeventX.  The proposed
> > > > CSR number is 0xD33.
> > > >
> > > > This register enables supervisor-level overflow interrupt
> > > > handler software to quickly and easily determine which
> > > > counter(s) have overflowed (without needing to make an
> > > > execution environment call or series of calls ultimately up to
> > > > M-mode).  [ ARMv8 and x86 have a similar register for the same
> > > > reasons. ]
> > > >
> > > > Read access to bit X is subject to the same mcounteren (or
> > > > mcounteren and hcounteren) CSRs that mediate access to the
> > > > hpmcounter CSRs by S-mode (or VS-mode).  In M and S
> > > > modes, scountovf bit X is readable when mcounteren bit X is
> > > > set, and otherwise reads as zero.  Similarly, in VS
> > > > mode, scountovf bit X is readable when mcounteren bit X and
> > > > hcounteren bit X are both set, and otherwise reads as zero. 
> > > >

--
Regards,
Atish






Greg Favor
 

On Wed, Feb 24, 2021 at 5:55 PM Atish Patra <Atish.Patra@...> wrote:
I can work on implementing the Sscofpmf extension in Qemu and required
software changes in OpenSBI & Linux kernel as well if that is okay with
you.

Thanks.  That would be great.  Let me also send you (tomorrow) the "final" version as submitted to the OACR committee.

Do we require anything else for the PoC part of the DoD policy?

Let me get back to you on that (probably after I talk with Andrew).

Greg