Date   

Re: Proposal v2: SBI PMU Extension

Zong Li
 

On Thu, Jul 9, 2020 at 4:47 PM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 09 July 2020 14:09
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Atish Patra
<Atish.Patra@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU
Extension

On Thu, Jul 9, 2020 at 3:57 PM Anup Patel <Anup.Patel@...> wrote:

Based on my previous reply…



To monitor RAW event 0x12345678, user-space perf tool will create user
space perf RAW event (i.e. perf_event_attr.type == 4 and
perf_event_attr.config = = 0x12345678). The Linux RISC-V PMU driver will
allocate and map matching HARDWARE counter which supports specified
corrosponding SBI RAW event (event_idx.type = 2, event_idx.code = 0x678
and event_idx.info = 0x12345). Finally, the SBI_PMU_COUNTER_START call
implemented by OpenSBI will write 0x12345678 (or some platform specific
translated value of 0x12345678) to appropriate mhpmeventX CSR).



(Note: above we assume mhpmcounterX supports monitoring RAW event
0x12345678 and OpenSBI is aware of this)
The Linux PMU driver should be aware of this as well, because
SBI_PMU_COUNTER_START takes a parameter for countex_idx which is fed
by Linux PMU driver.
The SBI_PMU_COUNTER_DESCRIBE call is designed considering aspect. This
call provides list of event_idx supported by a given counter_idx (including
RAW events) so when allocating a counter in event_add() we can easily
find matching counter.
Each element of Event_list consists of event_idx.type and
event_idx.code, how does
it present the raw event like above? (event_idx.info = 0x12345,
event_idx.type = 0x2, event_idx.code = 0x678)
It seems to conflict if we have the raw event 0xXXXXX678 and 0xYYYYY678.


This means Linux RISC-V PMU only needs to deal with SBI calls for handling
all types of SBI PMU counters and events. In other words, no need to parse
DT/ACPI for event mappings.

Regards,
Anup





Regards,

Anup





From: Brian Grayson <brian.grayson@...>
Sent: 09 July 2020 12:35
To: Zong Li <zong.li@...>
Cc: Anup Patel <Anup.Patel@...>; Atish Patra
<Atish.Patra@...>; andrew@...;
tech-unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU
Extension



My question is, let's say I know that putting the value 0x12345678 into the
mhpmevent3 register gets me the event I want, and there is no support for
that event in the SBI spec/API. Will this API allow me to program such an
event, basically bypassing the usual mapping functionality? perf basically
allows you to say "I know this event number is not one you know about, but
it's the value I want placed directly into the hardware." I want to ensure that
the full capabilities of the hardware will still be accessible through the SBI
spec in some sort of "raw" mode, and I didn't see a way for that to happen
right now. We don't want to restrict users to the lowest common
denominator of functionality.



Brian





On Wed, Jul 8, 2020 at 10:27 PM Zong Li <zong.li@...> wrote:

On Thu, Jul 9, 2020 at 1:06 AM Brian Grayson <brian.grayson@...>
wrote:

Would there be a raw style interface to access all the SBI-unaware
events, like perf's rNNN support?
Follow this question, in our current proposal, s-mode software only
knows the event_idx, and m-mode firmware takes care of the mapping,
my
question is that s-mode software doesn't seem to understand the
meaning of each event_idx, that means, it just get the array of all
supported event_idx, but couldn't know which one is for what. This
also happened on u-mode program, for rNNN interface, normally, we
should refer to the processor specific documentation for getting these
details, and now, users won't know what value they should give. Please
correct me if I miss something. Thanks.

How would this work on a multicore system -- would the SBI calls only
handle the current hart's counters? That seems easiest to deal with.

Brian


Re: Proposal v2: SBI PMU Extension

Anup Patel
 

-----Original Message-----
From: tech-unixplatformspec@... <tech-
unixplatformspec@...> On Behalf Of Zong Li
Sent: 09 July 2020 14:42
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Atish Patra
<Atish.Patra@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU
Extension

On Thu, Jul 9, 2020 at 4:47 PM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 09 July 2020 14:09
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Atish Patra
<Atish.Patra@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU
Extension

On Thu, Jul 9, 2020 at 3:57 PM Anup Patel <Anup.Patel@...> wrote:

Based on my previous reply…



To monitor RAW event 0x12345678, user-space perf tool will create
user
space perf RAW event (i.e. perf_event_attr.type == 4 and
perf_event_attr.config = = 0x12345678). The Linux RISC-V PMU driver
will allocate and map matching HARDWARE counter which supports
specified corrosponding SBI RAW event (event_idx.type = 2,
event_idx.code = 0x678 and event_idx.info = 0x12345). Finally, the
SBI_PMU_COUNTER_START call implemented by OpenSBI will write
0x12345678 (or some platform specific translated value of 0x12345678) to
appropriate mhpmeventX CSR).



(Note: above we assume mhpmcounterX supports monitoring RAW
event
0x12345678 and OpenSBI is aware of this)
The Linux PMU driver should be aware of this as well, because
SBI_PMU_COUNTER_START takes a parameter for countex_idx which is
fed
by Linux PMU driver.
The SBI_PMU_COUNTER_DESCRIBE call is designed considering aspect. This
call provides list of event_idx supported by a given counter_idx
(including RAW events) so when allocating a counter in event_add() we
can easily find matching counter.
Each element of Event_list consists of event_idx.type and event_idx.code,
how does it present the raw event like above? (event_idx.info = 0x12345,
event_idx.type = 0x2, event_idx.code = 0x678) It seems to conflict if we have
the raw event 0xXXXXX678 and 0xYYYYY678.
The above break-up of 0x12345678 RAW event is only one of the possible
ways shown as example.

If we have two different RAW events 0xXXXXX678 and 0xYYYYY678 then
platform can show it as:
event_idx.type == 2 and event_idx.code == X and event_idx.info = 0xXXXXX678
event_idx.type == 2 and event_idx.code == Y and event_idx.info = 0xYYYYY678
OR
event_idx.type == 2 and event_idx.code == Z and event_idx.info = 0xXXXXX678
event_idx.type == 2 and event_idx.code == Z and event_idx.info = 0xYYYYY678

Further, the OpenSBI platform support can translate RAW event_idx into
final mhpmeventX CSR value using platform callback or describing it in DT/ACPI
for OpenSBI generic platform.

In other words, we have to distinguish all HARDWARE and SOFTWARE
events of a platform using 16bit event_idx.type and event_idx.code. The
event_idx.info is only additional parameter to a event.

Regards,
Anup



This means Linux RISC-V PMU only needs to deal with SBI calls for
handling all types of SBI PMU counters and events. In other words, no
need to parse DT/ACPI for event mappings.

Regards,
Anup





Regards,

Anup





From: Brian Grayson <brian.grayson@...>
Sent: 09 July 2020 12:35
To: Zong Li <zong.li@...>
Cc: Anup Patel <Anup.Patel@...>; Atish Patra
<Atish.Patra@...>; andrew@...;
tech-unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU
Extension



My question is, let's say I know that putting the value 0x12345678
into the
mhpmevent3 register gets me the event I want, and there is no
support for that event in the SBI spec/API. Will this API allow me
to program such an event, basically bypassing the usual mapping
functionality? perf basically allows you to say "I know this event
number is not one you know about, but it's the value I want placed
directly into the hardware." I want to ensure that the full
capabilities of the hardware will still be accessible through the
SBI spec in some sort of "raw" mode, and I didn't see a way for that
to happen right now. We don't want to restrict users to the lowest
common denominator of functionality.



Brian





On Wed, Jul 8, 2020 at 10:27 PM Zong Li <zong.li@...> wrote:

On Thu, Jul 9, 2020 at 1:06 AM Brian Grayson
<brian.grayson@...>
wrote:

Would there be a raw style interface to access all the
SBI-unaware
events, like perf's rNNN support?
Follow this question, in our current proposal, s-mode software
only knows the event_idx, and m-mode firmware takes care of the
mapping,
my
question is that s-mode software doesn't seem to understand the
meaning of each event_idx, that means, it just get the array of
all supported event_idx, but couldn't know which one is for what.
This also happened on u-mode program, for rNNN interface,
normally, we should refer to the processor specific documentation
for getting these details, and now, users won't know what value
they should give. Please correct me if I miss something. Thanks.

How would this work on a multicore system -- would the SBI calls
only
handle the current hart's counters? That seems easiest to deal with.

Brian


Re: Proposal v2: SBI PMU Extension

Greg Favor
 

Anup,

I missed a point in the v1 email thread where I was requesting for event_idx.code to be 16 bits along with a 4-bit event_idx.type (which you seemed to agree with), and then you asked whether it would be fine to have event_idx be XLEN bits wide - and I agreed.  But I didn't notice in your attached example that event_idx.code was still 12 bits.  It would be great if this is acceptable:

event_idx[XLEN-1:20] = info

event_idx[19:16] = type

event_idx[15:0] = code


Greg



On Sun, Jul 5, 2020 at 9:35 AM Anup Patel <Anup.Patel@...> wrote:
Hi All,

We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER
CSRs. A RISC-V implementation can support monitoring various HARDWARE
events using limited number of HPMCOUNTER CSRs.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR,
HPMCOUNTER CSRs and SOFTWARE counters provided by SBI implementation.

To define SBI PMU extension, we first define counter_idx which is a
logical number assigned to a counter and event_idx which is an encoded
number representing the HARDWARE/SOFTWARE event to be monitored.

The SBI PMU event_idx is a XLEN bits wide number encoded as follows:
event_idx[XLEN-1:16] = info
event_idx[15:12] = type
event_idx[11:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event,
the event_idx.info is optional and can be passed zero whereas the
event_idx.code can be one of the following values:
enum sbi_pmu_hw_id {
    SBI_PMU_HW_CPU_CYCLES              = 0,
    SBI_PMU_HW_INSTRUCTIONS            = 1,
    SBI_PMU_HW_CACHE_REFERENCES        = 2,
    SBI_PMU_HW_CACHE_MISSES            = 3,
    SBI_PMU_HW_BRANCH_INSTRUCTIONS     = 4,
    SBI_PMU_HW_BRANCH_MISSES           = 5,
    SBI_PMU_HW_BUS_CYCLES              = 6,
    SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
    SBI_PMU_HW_STALLED_CYCLES_BACKEND  = 8,
    SBI_PMU_HW_REF_CPU_CYCLES          = 9,
    SBI_PMU_HW_MAX,                    /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_idx.info is optional and can be passed zero
whereas the event_idx.code is encoded as follows:
event_idx.code[11:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
    SBI_PMU_HW_CACHE_L1D  = 0,
    SBI_PMU_HW_CACHE_L1I  = 1,
    SBI_PMU_HW_CACHE_LL   = 2,
    SBI_PMU_HW_CACHE_DTLB = 3,
    SBI_PMU_HW_CACHE_ITLB = 4,
    SBI_PMU_HW_CACHE_BPU  = 5,
    SBI_PMU_HW_CACHE_NODE = 6,
    SBI_PMU_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
    SBI_PMU_HW_CACHE_OP_READ     = 0,
    SBI_PMU_HW_CACHE_OP_WRITE    = 1,
    SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
    SBI_PMU_HW_CACHE_OP_MAX,     /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
    SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
    SBI_PMU_HW_CACHE_RESULT_MISS   = 1,
    SBI_PMU_HW_CACHE_RESULT_MAX,   /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW
event, both event_idx.info and event_idx.code are platform dependent.

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE event,
event_idx.info is SBI implementation specific and event_idx.code can be
one of the following:
enum sbi_pmu_sw_id {
    SBI_PMU_SW_MISALIGNED_LOAD        = 0,
    SBI_PMU_SW_MISALIGNED_STORE       = 1,
    SBI_PMU_SW_ILLEGAL_INSN           = 2,
    SBI_PMU_SW_LOCAL_SET_TIMER        = 3,
    SBI_PMU_SW_LOCAL_IPI              = 4,
    SBI_PMU_SW_LOCAL_FENCE_I          = 5,
    SBI_PMU_SW_LOCAL_SFENCE_VMA       = 6,
    SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID  = 7,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA      = 8,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA      = 10,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
    SBI_PMU_SW_MAX,                   /* non-ABI */
};

In future, more events can be defined without breaking ABI compatibility
of SBI calls.

Using definition of counter_idx and event_idx, we can potentially have
the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
   This call will return the number of COUNTERs
2. SBI_PMU_COUNTER_DESCRIBE
   This call takes two parameters: 1) counter_idx 2) physical address
   It will write the description of SBI PMU counter at specified physical
   address. The details of the SBI PMU counter written at specified
   physical address are as follows:
   1. Name (64 bytes)
   2. CSR_Number (2 bytes)
      (CSR_Number <= 0xfff means counter is a RISC-V CSR)
      (CSR_Number > 0xfff means counter is a SBI implementation counter)
      (E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR)
   3. CSR_Width (2 bytes)
      (Number of CSR bits implemented in HW)
   4. Event_Count (2 bytes)
      (Number of events in Event_List array)
   5. Event_List (2 * Event_Count bytes)
      (This is an array of 16bit values where each 16bit value is the
       supported event_idx.type and event_idx.code combination)
3. SBI_PMU_COUNTER_SET_PHYS_ADDR
   This call takes two parameters: 1) counter_idx 2) physical address
   It will set the physical address of memory location where the SBI
   implementation will write the 64bit SOFTWARE counter. This SBI call
   is only for counters not mapped to any CSR (i.e. only for counters
   with CSR_Number > 0xfff).
4. SBI_PMU_COUNTER_START
   This call takes two parameters: 1) counter_idx 2) event_idx
   It will inform SBI implementation to configure and start/enable
   specified counter on the calling HART to monitor specific event.
   This SBI call will fail for counters which are not present and
   specified event_idx is not supported by the counter.
5. SBI_PMU_COUNTER_STOP
   This call takes one parameter: 1) counter_idx
   It will inform SBI implementation to stop/disable specified counters
   on the calling HART. This SBI call will fail for counters which are
   not present.

From above, the RISC-V PMU driver will use most of the SBI calls at boot
time. Only SBI_PMU_COUNTER_START to be used once before using the counter.
The reading of counter is by reading CSR (for CSR_Number < 0xfff) OR by
reading memory location (for CSR_Offset >= 0xfff). The counter overflow
handling will have to be done in software by Linux kernel.

Using the SBI PMU extension, the M-mode runtime firmware (or Hypervisors)
can provide a standardized view of HARDWARE/SOFTWARE counters and events
to S-mode (or VS-mode) software.

The M-mode runtime firmware (OpenSBI) will need to know following
platform dependent information:
1. Possible event_idx values allowed (or supported) by a HARDWARE
   counter (i.e. HPMCOUNTER)
2. Mapping of event_idx for HARDWARE event to HPMEVENT CSR value
3. Mapping of event_idx for HARDWARE CACHE event to HPMEVENT CSR value
4. Mapping of event_idx for HARDWARE RAW event to HPMEVENT CSR value
5. Additional platform-specific progamming required by any event_idx

All platform dependent information mentioned above, can be obtained
by M-mode runtime firmware (OpenSBI) from platform specific code. The
DT/ACPI can also be used to described 1), 2), 3), and 4) mentioned
above but 5) will always require platform specific code.

Regards,
Anup


Re: Proposal v2: SBI PMU Extension

Greg Favor
 

Anup,

Can I request that the default code path for turning an event_idx value into a value to write into mhpmeventX is to simply write event_idx[XLEN-1:0] into mhpmeventX.

Future hardware can then (optionally) choose to implement their mhpmeventX's with this standardized format and avoid the need to provide implementation-specific code for translating event_idx to the value to write into the implementation's mhpmeventX's.  Other past and future implementations can of course do whatever they want and provide a custom piece of "translation" code for use within OpenSBI.

In essence, this SBI PMU extension serves to standardize the low twenty bits (event_idx.type + event_idx.code) of mhpmeventX in a way that meshes fully and cleanly with what Linux perf currently supports (which is great).  While type==RAW (and the event_idx.info field) provides support for whatever else an implementation wants to do with all the other bits of mhpmeventX.

Greg

On Thu, Jul 9, 2020 at 12:57 AM Anup Patel <Anup.Patel@...> wrote:

Based on my previous reply…

 

To monitor RAW event 0x12345678, user-space perf tool will create user space perf RAW event (i.e. perf_event_attr.type == 4 and perf_event_attr.config = = 0x12345678). The Linux RISC-V PMU driver will allocate and map matching HARDWARE counter which supports specified corrosponding SBI RAW event (event_idx.type = 2, event_idx.code = 0x678 and event_idx.info = 0x12345). Finally, the SBI_PMU_COUNTER_START call implemented by OpenSBI will write 0x12345678 (or some platform specific translated value of 0x12345678) to appropriate mhpmeventX CSR).

 

(Note: above we assume mhpmcounterX supports monitoring RAW event 0x12345678 and OpenSBI is aware of this)

 

Regards,

Anup



Re: Proposal v2: SBI PMU Extension

Anup Patel
 

Hi Greg,

 

My bad for the confusion.

 

Yes, I had agreed for 16bits event_idx.code and 4bits event_idx.type. I choose 12bits event_idx.code and 4bits event_idx.type  so that we need only 2bytes per-event in Event_List of SBI_PMU_COUNTER_DESCRIBE call.

 

If you go for 16bits event_idx.code and 4bits event_idx.type then we need 4bytes (uint32_t) per-event in Event_List of SBI_PMU_COUNTER_DESCRIBE call and we only have 12bits for event_idx.info on RV32.

 

Do you still suggest using 16bits event_idx.code and 4bits event_idx.type ??

 

Regards,

Anup

 

From: Greg Favor <gfavor@...>
Sent: 10 July 2020 04:31
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Andrew Waterman <andrew@...>
Subject: Re: Proposal v2: SBI PMU Extension

 

Anup,

 

I missed a point in the v1 email thread where I was requesting for event_idx.code to be 16 bits along with a 4-bit event_idx.type (which you seemed to agree with), and then you asked whether it would be fine to have event_idx be XLEN bits wide - and I agreed.  But I didn't notice in your attached example that event_idx.code was still 12 bits.  It would be great if this is acceptable:

 

event_idx[XLEN-1:20] = info

event_idx[19:16] = type

event_idx[15:0] = code

 

Greg

 

 

On Sun, Jul 5, 2020 at 9:35 AM Anup Patel <Anup.Patel@...> wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER
CSRs. A RISC-V implementation can support monitoring various HARDWARE
events using limited number of HPMCOUNTER CSRs.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR,
HPMCOUNTER CSRs and SOFTWARE counters provided by SBI implementation.

To define SBI PMU extension, we first define counter_idx which is a
logical number assigned to a counter and event_idx which is an encoded
number representing the HARDWARE/SOFTWARE event to be monitored.

The SBI PMU event_idx is a XLEN bits wide number encoded as follows:
event_idx[XLEN-1:16] = info
event_idx[15:12] = type
event_idx[11:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event,
the event_idx.info is optional and can be passed zero whereas the
event_idx.code can be one of the following values:
enum sbi_pmu_hw_id {
    SBI_PMU_HW_CPU_CYCLES              = 0,
    SBI_PMU_HW_INSTRUCTIONS            = 1,
    SBI_PMU_HW_CACHE_REFERENCES        = 2,
    SBI_PMU_HW_CACHE_MISSES            = 3,
    SBI_PMU_HW_BRANCH_INSTRUCTIONS     = 4,
    SBI_PMU_HW_BRANCH_MISSES           = 5,
    SBI_PMU_HW_BUS_CYCLES              = 6,
    SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
    SBI_PMU_HW_STALLED_CYCLES_BACKEND  = 8,
    SBI_PMU_HW_REF_CPU_CYCLES          = 9,
    SBI_PMU_HW_MAX,                    /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_idx.info is optional and can be passed zero
whereas the event_idx.code is encoded as follows:
event_idx.code[11:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
    SBI_PMU_HW_CACHE_L1D  = 0,
    SBI_PMU_HW_CACHE_L1I  = 1,
    SBI_PMU_HW_CACHE_LL   = 2,
    SBI_PMU_HW_CACHE_DTLB = 3,
    SBI_PMU_HW_CACHE_ITLB = 4,
    SBI_PMU_HW_CACHE_BPU  = 5,
    SBI_PMU_HW_CACHE_NODE = 6,
    SBI_PMU_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
    SBI_PMU_HW_CACHE_OP_READ     = 0,
    SBI_PMU_HW_CACHE_OP_WRITE    = 1,
    SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
    SBI_PMU_HW_CACHE_OP_MAX,     /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
    SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
    SBI_PMU_HW_CACHE_RESULT_MISS   = 1,
    SBI_PMU_HW_CACHE_RESULT_MAX,   /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW
event, both event_idx.info and event_idx.code are platform dependent.

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE event,
event_idx.info is SBI implementation specific and event_idx.code can be
one of the following:
enum sbi_pmu_sw_id {
    SBI_PMU_SW_MISALIGNED_LOAD        = 0,
    SBI_PMU_SW_MISALIGNED_STORE       = 1,
    SBI_PMU_SW_ILLEGAL_INSN           = 2,
    SBI_PMU_SW_LOCAL_SET_TIMER        = 3,
    SBI_PMU_SW_LOCAL_IPI              = 4,
    SBI_PMU_SW_LOCAL_FENCE_I          = 5,
    SBI_PMU_SW_LOCAL_SFENCE_VMA       = 6,
    SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID  = 7,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA      = 8,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA      = 10,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
    SBI_PMU_SW_MAX,                   /* non-ABI */
};

In future, more events can be defined without breaking ABI compatibility
of SBI calls.

Using definition of counter_idx and event_idx, we can potentially have
the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
   This call will return the number of COUNTERs
2. SBI_PMU_COUNTER_DESCRIBE
   This call takes two parameters: 1) counter_idx 2) physical address
   It will write the description of SBI PMU counter at specified physical
   address. The details of the SBI PMU counter written at specified
   physical address are as follows:
   1. Name (64 bytes)
   2. CSR_Number (2 bytes)
      (CSR_Number <= 0xfff means counter is a RISC-V CSR)
      (CSR_Number > 0xfff means counter is a SBI implementation counter)
      (E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR)
   3. CSR_Width (2 bytes)
      (Number of CSR bits implemented in HW)
   4. Event_Count (2 bytes)
      (Number of events in Event_List array)
   5. Event_List (2 * Event_Count bytes)
      (This is an array of 16bit values where each 16bit value is the
       supported event_idx.type and event_idx.code combination)
3. SBI_PMU_COUNTER_SET_PHYS_ADDR
   This call takes two parameters: 1) counter_idx 2) physical address
   It will set the physical address of memory location where the SBI
   implementation will write the 64bit SOFTWARE counter. This SBI call
   is only for counters not mapped to any CSR (i.e. only for counters
   with CSR_Number > 0xfff).
4. SBI_PMU_COUNTER_START
   This call takes two parameters: 1) counter_idx 2) event_idx
   It will inform SBI implementation to configure and start/enable
   specified counter on the calling HART to monitor specific event.
   This SBI call will fail for counters which are not present and
   specified event_idx is not supported by the counter.
5. SBI_PMU_COUNTER_STOP
   This call takes one parameter: 1) counter_idx
   It will inform SBI implementation to stop/disable specified counters
   on the calling HART. This SBI call will fail for counters which are
   not present.

From above, the RISC-V PMU driver will use most of the SBI calls at boot
time. Only SBI_PMU_COUNTER_START to be used once before using the counter.
The reading of counter is by reading CSR (for CSR_Number < 0xfff) OR by
reading memory location (for CSR_Offset >= 0xfff). The counter overflow
handling will have to be done in software by Linux kernel.

Using the SBI PMU extension, the M-mode runtime firmware (or Hypervisors)
can provide a standardized view of HARDWARE/SOFTWARE counters and events
to S-mode (or VS-mode) software.

The M-mode runtime firmware (OpenSBI) will need to know following
platform dependent information:
1. Possible event_idx values allowed (or supported) by a HARDWARE
   counter (i.e. HPMCOUNTER)
2. Mapping of event_idx for HARDWARE event to HPMEVENT CSR value
3. Mapping of event_idx for HARDWARE CACHE event to HPMEVENT CSR value
4. Mapping of event_idx for HARDWARE RAW event to HPMEVENT CSR value
5. Additional platform-specific progamming required by any event_idx

All platform dependent information mentioned above, can be obtained
by M-mode runtime firmware (OpenSBI) from platform specific code. The
DT/ACPI can also be used to described 1), 2), 3), and 4) mentioned
above but 5) will always require platform specific code.

Regards,
Anup


Re: Proposal v2: SBI PMU Extension

Anup Patel
 

Hi Greg,

 

We have already considered what you are requesting. In fact, that is the right way to go.

 

Translating event_idx[XLEN-1:0] to platform specific mhpmeventX CSR value will be optional in OpenSBI (M-mode runtime firmware). If a platform does not want to translate event_idx value then we will write event_idx as-in in mhpmeventX CSR.

 

Regarding total number of bits used by event_idx.type and event_idx.code, I am fine with both 16bits and 20bits approaches. If there is no further objection for using 20bits approach then I will go with that.

 

Regards,

Anup

 

From: Greg Favor <gfavor@...>
Sent: 10 July 2020 04:48
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Zong Li <zong.li@...>; Atish Patra <Atish.Patra@...>; andrew@...; tech-unixplatformspec@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension

 

Anup,

 

Can I request that the default code path for turning an event_idx value into a value to write into mhpmeventX is to simply write event_idx[XLEN-1:0] into mhpmeventX.

 

Future hardware can then (optionally) choose to implement their mhpmeventX's with this standardized format and avoid the need to provide implementation-specific code for translating event_idx to the value to write into the implementation's mhpmeventX's.  Other past and future implementations can of course do whatever they want and provide a custom piece of "translation" code for use within OpenSBI.

 

In essence, this SBI PMU extension serves to standardize the low twenty bits (event_idx.type + event_idx.code) of mhpmeventX in a way that meshes fully and cleanly with what Linux perf currently supports (which is great).  While type==RAW (and the event_idx.info field) provides support for whatever else an implementation wants to do with all the other bits of mhpmeventX.

 

Greg

 

On Thu, Jul 9, 2020 at 12:57 AM Anup Patel <Anup.Patel@...> wrote:

Based on my previous reply…

 

To monitor RAW event 0x12345678, user-space perf tool will create user space perf RAW event (i.e. perf_event_attr.type == 4 and perf_event_attr.config = = 0x12345678). The Linux RISC-V PMU driver will allocate and map matching HARDWARE counter which supports specified corrosponding SBI RAW event (event_idx.type = 2, event_idx.code = 0x678 and event_idx.info = 0x12345). Finally, the SBI_PMU_COUNTER_START call implemented by OpenSBI will write 0x12345678 (or some platform specific translated value of 0x12345678) to appropriate mhpmeventX CSR).

 

(Note: above we assume mhpmcounterX supports monitoring RAW event 0x12345678 and OpenSBI is aware of this)

 

Regards,

Anup

 


Re: Proposal v2: SBI PMU Extension

Greg Favor
 

On Thu, Jul 9, 2020 at 9:02 PM Anup Patel <Anup.Patel@...> wrote:

Translating event_idx[XLEN-1:0] to platform specific mhpmeventX CSR value will be optional in OpenSBI (M-mode runtime firmware). If a platform does not want to translate event_idx value then we will write event_idx as-in in mhpmeventX CSR.


Great.
 

Regarding total number of bits used by event_idx.type and event_idx.code, I am fine with both 16bits and 20bits approaches. If there is no further objection for using 20bits approach then I will go with that.


Yes, that would also be good.

Thanks,
Greg
 

 

Regards,

Anup

 


Re: Proposal v2: SBI PMU Extension

Greg Favor
 

On Thu, Jul 9, 2020 at 8:52 PM Anup Patel <Anup.Patel@...> wrote:

Hi Greg,

 

My bad for the confusion.

 

Yes, I had agreed for 16bits event_idx.code and 4bits event_idx.type. I choose 12bits event_idx.code and 4bits event_idx.type  so that we need only 2bytes per-event in Event_List of SBI_PMU_COUNTER_DESCRIBE call.

 

If you go for 16bits event_idx.code and 4bits event_idx.type then we need 4bytes (uint32_t) per-event in Event_List of SBI_PMU_COUNTER_DESCRIBE call and we only have 12bits for event_idx.info on RV32.

 

Do you still suggest using 16bits event_idx.code and 4bits event_idx.type ??


I feel that RV32 designs are going to be more embedded oriented and more cost-conscious - such that they are not going to have all sorts of fancy features that would require 12 bits (plus unused event_idx.code bits), let alone even more bits.  Even in our RV64 design we were fitting everything into the low 32 bits (until adding space for the event_idx.type field pushed that over by 4 bits).

Greg

 

Regards,

Anup

 

From: Greg Favor <gfavor@...>
Sent: 10 July 2020 04:31
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Andrew Waterman <andrew@...>
Subject: Re: Proposal v2: SBI PMU Extension

 

Anup,

 

I missed a point in the v1 email thread where I was requesting for event_idx.code to be 16 bits along with a 4-bit event_idx.type (which you seemed to agree with), and then you asked whether it would be fine to have event_idx be XLEN bits wide - and I agreed.  But I didn't notice in your attached example that event_idx.code was still 12 bits.  It would be great if this is acceptable:

 

event_idx[XLEN-1:20] = info

event_idx[19:16] = type

event_idx[15:0] = code

 

Greg

 

 

On Sun, Jul 5, 2020 at 9:35 AM Anup Patel <Anup.Patel@...> wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER
CSRs. A RISC-V implementation can support monitoring various HARDWARE
events using limited number of HPMCOUNTER CSRs.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR,
HPMCOUNTER CSRs and SOFTWARE counters provided by SBI implementation.

To define SBI PMU extension, we first define counter_idx which is a
logical number assigned to a counter and event_idx which is an encoded
number representing the HARDWARE/SOFTWARE event to be monitored.

The SBI PMU event_idx is a XLEN bits wide number encoded as follows:
event_idx[XLEN-1:16] = info
event_idx[15:12] = type
event_idx[11:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event,
the event_idx.info is optional and can be passed zero whereas the
event_idx.code can be one of the following values:
enum sbi_pmu_hw_id {
    SBI_PMU_HW_CPU_CYCLES              = 0,
    SBI_PMU_HW_INSTRUCTIONS            = 1,
    SBI_PMU_HW_CACHE_REFERENCES        = 2,
    SBI_PMU_HW_CACHE_MISSES            = 3,
    SBI_PMU_HW_BRANCH_INSTRUCTIONS     = 4,
    SBI_PMU_HW_BRANCH_MISSES           = 5,
    SBI_PMU_HW_BUS_CYCLES              = 6,
    SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
    SBI_PMU_HW_STALLED_CYCLES_BACKEND  = 8,
    SBI_PMU_HW_REF_CPU_CYCLES          = 9,
    SBI_PMU_HW_MAX,                    /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_idx.info is optional and can be passed zero
whereas the event_idx.code is encoded as follows:
event_idx.code[11:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
    SBI_PMU_HW_CACHE_L1D  = 0,
    SBI_PMU_HW_CACHE_L1I  = 1,
    SBI_PMU_HW_CACHE_LL   = 2,
    SBI_PMU_HW_CACHE_DTLB = 3,
    SBI_PMU_HW_CACHE_ITLB = 4,
    SBI_PMU_HW_CACHE_BPU  = 5,
    SBI_PMU_HW_CACHE_NODE = 6,
    SBI_PMU_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
    SBI_PMU_HW_CACHE_OP_READ     = 0,
    SBI_PMU_HW_CACHE_OP_WRITE    = 1,
    SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
    SBI_PMU_HW_CACHE_OP_MAX,     /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
    SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
    SBI_PMU_HW_CACHE_RESULT_MISS   = 1,
    SBI_PMU_HW_CACHE_RESULT_MAX,   /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW
event, both event_idx.info and event_idx.code are platform dependent.

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE event,
event_idx.info is SBI implementation specific and event_idx.code can be
one of the following:
enum sbi_pmu_sw_id {
    SBI_PMU_SW_MISALIGNED_LOAD        = 0,
    SBI_PMU_SW_MISALIGNED_STORE       = 1,
    SBI_PMU_SW_ILLEGAL_INSN           = 2,
    SBI_PMU_SW_LOCAL_SET_TIMER        = 3,
    SBI_PMU_SW_LOCAL_IPI              = 4,
    SBI_PMU_SW_LOCAL_FENCE_I          = 5,
    SBI_PMU_SW_LOCAL_SFENCE_VMA       = 6,
    SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID  = 7,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA      = 8,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA      = 10,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
    SBI_PMU_SW_MAX,                   /* non-ABI */
};

In future, more events can be defined without breaking ABI compatibility
of SBI calls.

Using definition of counter_idx and event_idx, we can potentially have
the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
   This call will return the number of COUNTERs
2. SBI_PMU_COUNTER_DESCRIBE
   This call takes two parameters: 1) counter_idx 2) physical address
   It will write the description of SBI PMU counter at specified physical
   address. The details of the SBI PMU counter written at specified
   physical address are as follows:
   1. Name (64 bytes)
   2. CSR_Number (2 bytes)
      (CSR_Number <= 0xfff means counter is a RISC-V CSR)
      (CSR_Number > 0xfff means counter is a SBI implementation counter)
      (E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR)
   3. CSR_Width (2 bytes)
      (Number of CSR bits implemented in HW)
   4. Event_Count (2 bytes)
      (Number of events in Event_List array)
   5. Event_List (2 * Event_Count bytes)
      (This is an array of 16bit values where each 16bit value is the
       supported event_idx.type and event_idx.code combination)
3. SBI_PMU_COUNTER_SET_PHYS_ADDR
   This call takes two parameters: 1) counter_idx 2) physical address
   It will set the physical address of memory location where the SBI
   implementation will write the 64bit SOFTWARE counter. This SBI call
   is only for counters not mapped to any CSR (i.e. only for counters
   with CSR_Number > 0xfff).
4. SBI_PMU_COUNTER_START
   This call takes two parameters: 1) counter_idx 2) event_idx
   It will inform SBI implementation to configure and start/enable
   specified counter on the calling HART to monitor specific event.
   This SBI call will fail for counters which are not present and
   specified event_idx is not supported by the counter.
5. SBI_PMU_COUNTER_STOP
   This call takes one parameter: 1) counter_idx
   It will inform SBI implementation to stop/disable specified counters
   on the calling HART. This SBI call will fail for counters which are
   not present.

From above, the RISC-V PMU driver will use most of the SBI calls at boot
time. Only SBI_PMU_COUNTER_START to be used once before using the counter.
The reading of counter is by reading CSR (for CSR_Number < 0xfff) OR by
reading memory location (for CSR_Offset >= 0xfff). The counter overflow
handling will have to be done in software by Linux kernel.

Using the SBI PMU extension, the M-mode runtime firmware (or Hypervisors)
can provide a standardized view of HARDWARE/SOFTWARE counters and events
to S-mode (or VS-mode) software.

The M-mode runtime firmware (OpenSBI) will need to know following
platform dependent information:
1. Possible event_idx values allowed (or supported) by a HARDWARE
   counter (i.e. HPMCOUNTER)
2. Mapping of event_idx for HARDWARE event to HPMEVENT CSR value
3. Mapping of event_idx for HARDWARE CACHE event to HPMEVENT CSR value
4. Mapping of event_idx for HARDWARE RAW event to HPMEVENT CSR value
5. Additional platform-specific progamming required by any event_idx

All platform dependent information mentioned above, can be obtained
by M-mode runtime firmware (OpenSBI) from platform specific code. The
DT/ACPI can also be used to described 1), 2), 3), and 4) mentioned
above but 5) will always require platform specific code.

Regards,
Anup


Re: Proposal v2: SBI PMU Extension

Zong Li
 

On Thu, Jul 9, 2020 at 6:21 PM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: tech-unixplatformspec@... <tech-
unixplatformspec@...> On Behalf Of Zong Li
Sent: 09 July 2020 14:42
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Atish Patra
<Atish.Patra@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU
Extension

On Thu, Jul 9, 2020 at 4:47 PM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 09 July 2020 14:09
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Atish Patra
<Atish.Patra@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU
Extension

On Thu, Jul 9, 2020 at 3:57 PM Anup Patel <Anup.Patel@...> wrote:

Based on my previous reply…



To monitor RAW event 0x12345678, user-space perf tool will create
user
space perf RAW event (i.e. perf_event_attr.type == 4 and
perf_event_attr.config = = 0x12345678). The Linux RISC-V PMU driver
will allocate and map matching HARDWARE counter which supports
specified corrosponding SBI RAW event (event_idx.type = 2,
event_idx.code = 0x678 and event_idx.info = 0x12345). Finally, the
SBI_PMU_COUNTER_START call implemented by OpenSBI will write
0x12345678 (or some platform specific translated value of 0x12345678) to
appropriate mhpmeventX CSR).



(Note: above we assume mhpmcounterX supports monitoring RAW
event
0x12345678 and OpenSBI is aware of this)
The Linux PMU driver should be aware of this as well, because
SBI_PMU_COUNTER_START takes a parameter for countex_idx which is
fed
by Linux PMU driver.
The SBI_PMU_COUNTER_DESCRIBE call is designed considering aspect. This
call provides list of event_idx supported by a given counter_idx
(including RAW events) so when allocating a counter in event_add() we
can easily find matching counter.
Each element of Event_list consists of event_idx.type and event_idx.code,
how does it present the raw event like above? (event_idx.info = 0x12345,
event_idx.type = 0x2, event_idx.code = 0x678) It seems to conflict if we have
the raw event 0xXXXXX678 and 0xYYYYY678.
The above break-up of 0x12345678 RAW event is only one of the possible
ways shown as example.

If we have two different RAW events 0xXXXXX678 and 0xYYYYY678 then
platform can show it as:
event_idx.type == 2 and event_idx.code == X and event_idx.info = 0xXXXXX678
event_idx.type == 2 and event_idx.code == Y and event_idx.info = 0xYYYYY678
OR
event_idx.type == 2 and event_idx.code == Z and event_idx.info = 0xXXXXX678
event_idx.type == 2 and event_idx.code == Z and event_idx.info = 0xYYYYY678

Further, the OpenSBI platform support can translate RAW event_idx into
final mhpmeventX CSR value using platform callback or describing it in DT/ACPI
for OpenSBI generic platform.
I understand that mapping of event_idx to HPMEVENT CSR value is
platform-dependent, and m-mode firmware of each platform knows how to
translate it. But there are still some problems which are ambiguous about
Event_list and event_idx. Could you please help to clarify the question as
follows:

Does the RAW events need to be mapped (i.e. "event_idx.type == 0x2,
event_idx.code == 0x1" is corresponding to some HPMEVENT value)
OR just use raw data for event_idx (i.e. event_idx.info and event_code
hold the raw data directly)? There are different issues for that respectively,
please see the cases as follows:
1) RAW events should be mapped like HW events and HW cache events:
- We assume event_idx.code == 0x1 is corresponding to HPMEVENT
value 0x1234, and event_idx.code == 0x2 is corresponding to HPMEVENT
value 0x5678, so in s-mode software, I expect that I would get the event
support list "Event_list" which has the elements 0x2001 and 0x2002.
When the user mode perf tool creates a RAW event 0x1234, Linux PMU
driver tries to find an appropriate counter by checking the event support
list "Event_list', but Linux PMU driver doesn't know the relationship
between 0x1234 and 0x2001, so it couldn't decide if this counter is good
for it.
2) Use RAW data of RAW event:
- The size of HPMEVENT CSR is MXLEN, so it could be 64-bits. The size of
event_idx.info + event_idx.code is only 59 bits, it doesn't seem
to be enough.
In addition, the event support list "Event_list" is consist of
event_idx.type
and event_idx.code, it doesn't include event_idx.info, so it would
be conflicting
in Event_list if there are events which have the same low 12-bits.
For example,
0xXXXXX678 is 0x2678 in Event_list, and 0xYYYYY678 is also 0x2678 in
Event_list. When the user mode perf tool creates a RAW event 0xXXXXX678,
Linux PMU driver tries to find an appropriate counter by checking the event
support list "Event_list", it couldn't decide if this counter is
good, because
it doesn't know the 0x2678 in Event_list means 0xXXXXX678 or 0xYYYYY678.

If I understand correctly above. we should come out a new format for Event_list
and event_idx or provide a new SBI call to fix that.


In other words, we have to distinguish all HARDWARE and SOFTWARE
events of a platform using 16bit event_idx.type and event_idx.code. The
event_idx.info is only additional parameter to a event.

Regards,
Anup



This means Linux RISC-V PMU only needs to deal with SBI calls for
handling all types of SBI PMU counters and events. In other words, no
need to parse DT/ACPI for event mappings.

Regards,
Anup





Regards,

Anup





From: Brian Grayson <brian.grayson@...>
Sent: 09 July 2020 12:35
To: Zong Li <zong.li@...>
Cc: Anup Patel <Anup.Patel@...>; Atish Patra
<Atish.Patra@...>; andrew@...;
tech-unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU
Extension



My question is, let's say I know that putting the value 0x12345678
into the
mhpmevent3 register gets me the event I want, and there is no
support for that event in the SBI spec/API. Will this API allow me
to program such an event, basically bypassing the usual mapping
functionality? perf basically allows you to say "I know this event
number is not one you know about, but it's the value I want placed
directly into the hardware." I want to ensure that the full
capabilities of the hardware will still be accessible through the
SBI spec in some sort of "raw" mode, and I didn't see a way for that
to happen right now. We don't want to restrict users to the lowest
common denominator of functionality.



Brian





On Wed, Jul 8, 2020 at 10:27 PM Zong Li <zong.li@...> wrote:

On Thu, Jul 9, 2020 at 1:06 AM Brian Grayson
<brian.grayson@...>
wrote:

Would there be a raw style interface to access all the
SBI-unaware
events, like perf's rNNN support?
Follow this question, in our current proposal, s-mode software
only knows the event_idx, and m-mode firmware takes care of the
mapping,
my
question is that s-mode software doesn't seem to understand the
meaning of each event_idx, that means, it just get the array of
all supported event_idx, but couldn't know which one is for what.
This also happened on u-mode program, for rNNN interface,
normally, we should refer to the processor specific documentation
for getting these details, and now, users won't know what value
they should give. Please correct me if I miss something. Thanks.

How would this work on a multicore system -- would the SBI calls
only
handle the current hart's counters? That seems easiest to deal with.

Brian


Re: Proposal v2: SBI PMU Extension

Zong Li
 

On Fri, Jul 10, 2020 at 11:52 AM Anup Patel <anup.patel@...> wrote:

Hi Greg,



My bad for the confusion.



Yes, I had agreed for 16bits event_idx.code and 4bits event_idx.type. I choose 12bits event_idx.code and 4bits event_idx.type so that we need only 2bytes per-event in Event_List of SBI_PMU_COUNTER_DESCRIBE call.



If you go for 16bits event_idx.code and 4bits event_idx.type then we need 4bytes (uint32_t) per-event in Event_List of SBI_PMU_COUNTER_DESCRIBE call and we only have 12bits for event_idx.info on RV32.
It seems to me that It would waste the space to use 4bytes per-event
for 20-bit data only, it is heavy especially when we have many events.




Do you still suggest using 16bits event_idx.code and 4bits event_idx.type ??



Regards,

Anup



From: Greg Favor <gfavor@...>
Sent: 10 July 2020 04:31
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Andrew Waterman <andrew@...>
Subject: Re: Proposal v2: SBI PMU Extension



Anup,



I missed a point in the v1 email thread where I was requesting for event_idx.code to be 16 bits along with a 4-bit event_idx.type (which you seemed to agree with), and then you asked whether it would be fine to have event_idx be XLEN bits wide - and I agreed. But I didn't notice in your attached example that event_idx.code was still 12 bits. It would be great if this is acceptable:



event_idx[XLEN-1:20] = info

event_idx[19:16] = type

event_idx[15:0] = code



Greg





On Sun, Jul 5, 2020 at 9:35 AM Anup Patel <Anup.Patel@...> wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER
CSRs. A RISC-V implementation can support monitoring various HARDWARE
events using limited number of HPMCOUNTER CSRs.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR,
HPMCOUNTER CSRs and SOFTWARE counters provided by SBI implementation.

To define SBI PMU extension, we first define counter_idx which is a
logical number assigned to a counter and event_idx which is an encoded
number representing the HARDWARE/SOFTWARE event to be monitored.

The SBI PMU event_idx is a XLEN bits wide number encoded as follows:
event_idx[XLEN-1:16] = info
event_idx[15:12] = type
event_idx[11:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event,
the event_idx.info is optional and can be passed zero whereas the
event_idx.code can be one of the following values:
enum sbi_pmu_hw_id {
SBI_PMU_HW_CPU_CYCLES = 0,
SBI_PMU_HW_INSTRUCTIONS = 1,
SBI_PMU_HW_CACHE_REFERENCES = 2,
SBI_PMU_HW_CACHE_MISSES = 3,
SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4,
SBI_PMU_HW_BRANCH_MISSES = 5,
SBI_PMU_HW_BUS_CYCLES = 6,
SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8,
SBI_PMU_HW_REF_CPU_CYCLES = 9,
SBI_PMU_HW_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_idx.info is optional and can be passed zero
whereas the event_idx.code is encoded as follows:
event_idx.code[11:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
SBI_PMU_HW_CACHE_L1D = 0,
SBI_PMU_HW_CACHE_L1I = 1,
SBI_PMU_HW_CACHE_LL = 2,
SBI_PMU_HW_CACHE_DTLB = 3,
SBI_PMU_HW_CACHE_ITLB = 4,
SBI_PMU_HW_CACHE_BPU = 5,
SBI_PMU_HW_CACHE_NODE = 6,
SBI_PMU_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
SBI_PMU_HW_CACHE_OP_READ = 0,
SBI_PMU_HW_CACHE_OP_WRITE = 1,
SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
SBI_PMU_HW_CACHE_RESULT_MISS = 1,
SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW
event, both event_idx.info and event_idx.code are platform dependent.

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE event,
event_idx.info is SBI implementation specific and event_idx.code can be
one of the following:
enum sbi_pmu_sw_id {
SBI_PMU_SW_MISALIGNED_LOAD = 0,
SBI_PMU_SW_MISALIGNED_STORE = 1,
SBI_PMU_SW_ILLEGAL_INSN = 2,
SBI_PMU_SW_LOCAL_SET_TIMER = 3,
SBI_PMU_SW_LOCAL_IPI = 4,
SBI_PMU_SW_LOCAL_FENCE_I = 5,
SBI_PMU_SW_LOCAL_SFENCE_VMA = 6,
SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7,
SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8,
SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10,
SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
SBI_PMU_SW_MAX, /* non-ABI */
};

In future, more events can be defined without breaking ABI compatibility
of SBI calls.

Using definition of counter_idx and event_idx, we can potentially have
the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
This call will return the number of COUNTERs
2. SBI_PMU_COUNTER_DESCRIBE
This call takes two parameters: 1) counter_idx 2) physical address
It will write the description of SBI PMU counter at specified physical
address. The details of the SBI PMU counter written at specified
physical address are as follows:
1. Name (64 bytes)
2. CSR_Number (2 bytes)
(CSR_Number <= 0xfff means counter is a RISC-V CSR)
(CSR_Number > 0xfff means counter is a SBI implementation counter)
(E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR)
3. CSR_Width (2 bytes)
(Number of CSR bits implemented in HW)
4. Event_Count (2 bytes)
(Number of events in Event_List array)
5. Event_List (2 * Event_Count bytes)
(This is an array of 16bit values where each 16bit value is the
supported event_idx.type and event_idx.code combination)
3. SBI_PMU_COUNTER_SET_PHYS_ADDR
This call takes two parameters: 1) counter_idx 2) physical address
It will set the physical address of memory location where the SBI
implementation will write the 64bit SOFTWARE counter. This SBI call
is only for counters not mapped to any CSR (i.e. only for counters
with CSR_Number > 0xfff).
4. SBI_PMU_COUNTER_START
This call takes two parameters: 1) counter_idx 2) event_idx
It will inform SBI implementation to configure and start/enable
specified counter on the calling HART to monitor specific event.
This SBI call will fail for counters which are not present and
specified event_idx is not supported by the counter.
5. SBI_PMU_COUNTER_STOP
This call takes one parameter: 1) counter_idx
It will inform SBI implementation to stop/disable specified counters
on the calling HART. This SBI call will fail for counters which are
not present.

From above, the RISC-V PMU driver will use most of the SBI calls at boot
time. Only SBI_PMU_COUNTER_START to be used once before using the counter.
The reading of counter is by reading CSR (for CSR_Number < 0xfff) OR by
reading memory location (for CSR_Offset >= 0xfff). The counter overflow
handling will have to be done in software by Linux kernel.

Using the SBI PMU extension, the M-mode runtime firmware (or Hypervisors)
can provide a standardized view of HARDWARE/SOFTWARE counters and events
to S-mode (or VS-mode) software.

The M-mode runtime firmware (OpenSBI) will need to know following
platform dependent information:
1. Possible event_idx values allowed (or supported) by a HARDWARE
counter (i.e. HPMCOUNTER)
2. Mapping of event_idx for HARDWARE event to HPMEVENT CSR value
3. Mapping of event_idx for HARDWARE CACHE event to HPMEVENT CSR value
4. Mapping of event_idx for HARDWARE RAW event to HPMEVENT CSR value
5. Additional platform-specific progamming required by any event_idx

All platform dependent information mentioned above, can be obtained
by M-mode runtime firmware (OpenSBI) from platform specific code. The
DT/ACPI can also be used to described 1), 2), 3), and 4) mentioned
above but 5) will always require platform specific code.

Regards,
Anup


Re: Proposal v2: SBI PMU Extension

Anup Patel
 

-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 10 July 2020 12:24
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Atish Patra
<Atish.Patra@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU
Extension

On Thu, Jul 9, 2020 at 6:21 PM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: tech-unixplatformspec@... <tech-
unixplatformspec@...> On Behalf Of Zong Li
Sent: 09 July 2020 14:42
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Atish Patra
<Atish.Patra@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU
Extension

On Thu, Jul 9, 2020 at 4:47 PM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 09 July 2020 14:09
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Atish Patra
<Atish.Patra@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI
PMU Extension

On Thu, Jul 9, 2020 at 3:57 PM Anup Patel <Anup.Patel@...>
wrote:

Based on my previous reply…



To monitor RAW event 0x12345678, user-space perf tool will
create user
space perf RAW event (i.e. perf_event_attr.type == 4 and
perf_event_attr.config = = 0x12345678). The Linux RISC-V PMU
driver will allocate and map matching HARDWARE counter which
supports specified corrosponding SBI RAW event (event_idx.type =
2, event_idx.code = 0x678 and event_idx.info = 0x12345).
Finally, the SBI_PMU_COUNTER_START call implemented by OpenSBI
will write
0x12345678 (or some platform specific translated value of
0x12345678) to
appropriate mhpmeventX CSR).



(Note: above we assume mhpmcounterX supports monitoring RAW
event
0x12345678 and OpenSBI is aware of this)
The Linux PMU driver should be aware of this as well, because
SBI_PMU_COUNTER_START takes a parameter for countex_idx which
is
fed
by Linux PMU driver.
The SBI_PMU_COUNTER_DESCRIBE call is designed considering aspect.
This call provides list of event_idx supported by a given
counter_idx (including RAW events) so when allocating a counter in
event_add() we can easily find matching counter.
Each element of Event_list consists of event_idx.type and
event_idx.code, how does it present the raw event like above?
(event_idx.info = 0x12345, event_idx.type = 0x2, event_idx.code =
0x678) It seems to conflict if we have the raw event 0xXXXXX678 and
0xYYYYY678.

The above break-up of 0x12345678 RAW event is only one of the possible
ways shown as example.

If we have two different RAW events 0xXXXXX678 and 0xYYYYY678 then
platform can show it as:
event_idx.type == 2 and event_idx.code == X and event_idx.info =
0xXXXXX678 event_idx.type == 2 and event_idx.code == Y and
event_idx.info = 0xYYYYY678 OR event_idx.type == 2 and event_idx.code
== Z and event_idx.info = 0xXXXXX678 event_idx.type == 2 and
event_idx.code == Z and event_idx.info = 0xYYYYY678

Further, the OpenSBI platform support can translate RAW event_idx into
final mhpmeventX CSR value using platform callback or describing it in
DT/ACPI for OpenSBI generic platform.
I understand that mapping of event_idx to HPMEVENT CSR value is platform-
dependent, and m-mode firmware of each platform knows how to translate
it. But there are still some problems which are ambiguous about Event_list
and event_idx. Could you please help to clarify the question as
follows:
Okay, let me clarify your queries related to RAW events.

Like mentioned previously, all events (including RAW events) will be
Uniquely identified by event_idx.type and event_idx.code only. The
event_idx.info does not identify an event rather event_idx.info is
only additional parameter.

It is not mandatory to assign event_idx.code for RAW events based
on actual value programmed in hpmevent CSR. A platform can always
assign pseudo event_idx.code for RAW events and translate it to
final hpmevent CSR value.

For RAW events, the Linux RISC-V PMU driver will create event_idx as
follows:
event_idx.info = perf_event_attr.config[59:12]
event_idx.type = 2
event_idx.code = perf_event_attr.config[11:0]

Currently, we are encoding event_info in event_idx along with
event_idx.type and event_idx.code so most significant 4bits of
perf_event_attr.config cannot be accommodated in event_idx.


Does the RAW events need to be mapped (i.e. "event_idx.type == 0x2,
event_idx.code == 0x1" is corresponding to some HPMEVENT value) OR just
use raw data for event_idx (i.e. event_idx.info and event_code hold the raw
data directly)? There are different issues for that respectively, please see the
cases as follows:
1) RAW events should be mapped like HW events and HW cache events:
- We assume event_idx.code == 0x1 is corresponding to HPMEVENT
value 0x1234, and event_idx.code == 0x2 is corresponding to HPMEVENT
value 0x5678, so in s-mode software, I expect that I would get the event
support list "Event_list" which has the elements 0x2001 and 0x2002.
When the user mode perf tool creates a RAW event 0x1234, Linux PMU
driver tries to find an appropriate counter by checking the event support
list "Event_list', but Linux PMU driver doesn't know the relationship
between 0x1234 and 0x2001, so it couldn't decide if this counter is good
for it.
It is not possible to treat RAW events exactly like HW events and HW cache
events because event_idx.code and event_idx.info will come from user-space.

Taking your example event_idx.code == 0x1 for HPMEVENT value 0x1234
and event_idx.code = 0x2 for HPMEVENT value 0x5678. This means
OpenSBI will translate 0x2001 to 0x1234 and 0x2002 0x5678 because
platform has choosen pseudo event_idx.code values for RAW events.

The platform vendor will have to provide SW documentation of the
chosen pseudo values 0x2001 and 0x2002 and mention that users
will need to use values 0x2001 and 0x2002 for trying RAW events
using perf tools.

In fact, documenting and translating RAW event_idx values is totally
a platform's responsibility.

My suggestion of deriving event_idx.code and event_idx.info values
from actual HPMEVENT CSR value of RAW events only helps simplify
documentation for platform vendors.

Clearly based on above Linux RISC-V PMU driver does not need to know
the final values being programmed in HPMEVENT CSRs.

2) Use RAW data of RAW event:
- The size of HPMEVENT CSR is MXLEN, so it could be 64-bits. The size of
event_idx.info + event_idx.code is only 59 bits, it doesn't seem to be
enough.
On RV64, event_idx.info + event_idx.code is 60bits (not 59 bits)
On RV32, event_idx.info + event_idx.code is 24bits

As suggested by Greg, this is large enough to support lots of events.

To avoid this limitation, initially I had kept separate "event_info"
parameter to SBI_PMU_COUNTER_START call and event_idx had
only two fields (type & code).

In addition, the event support list "Event_list" is consist of event_idx.type
and event_idx.code, it doesn't include event_idx.info, so it would be
conflicting
Like mentioned above, event_idx.info is only a parameter in configuring
event. The event_idx.info does not help uniquely identify an event.

The info field in event_idx is only to reduce parameters for
SBI_PMU_COUNTER_START.

I think it's better to move "info" field out of event_idx to avoid
more confusions.

in Event_list if there are events which have the same low 12-bits.
For example,
0xXXXXX678 is 0x2678 in Event_list, and 0xYYYYY678 is also 0x2678 in
Event_list. When the user mode perf tool creates a RAW event
0xXXXXX678,
Linux PMU driver tries to find an appropriate counter by checking the
event
support list "Event_list", it couldn't decide if this counter is good, because
it doesn't know the 0x2678 in Event_list means 0xXXXXX678 or 0xYYYYY678.

If I understand correctly above. we should come out a new format for
Event_list and event_idx or provide a new SBI call to fix that.
Like mentioned above, if two different RAW HPMEVENT CSR values
have lower 12bits then the platform vendor should assign pseudo
event_idx.code values for these events and translate it in OpenSBI.

I don't see why we need new format for Event_list.

Regards,
Anup



In other words, we have to distinguish all HARDWARE and SOFTWARE
events of a platform using 16bit event_idx.type and event_idx.code.
The event_idx.info is only additional parameter to a event.

Regards,
Anup



This means Linux RISC-V PMU only needs to deal with SBI calls for
handling all types of SBI PMU counters and events. In other words,
no need to parse DT/ACPI for event mappings.

Regards,
Anup





Regards,

Anup





From: Brian Grayson <brian.grayson@...>
Sent: 09 July 2020 12:35
To: Zong Li <zong.li@...>
Cc: Anup Patel <Anup.Patel@...>; Atish Patra
<Atish.Patra@...>; andrew@...;
tech-unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI
PMU Extension



My question is, let's say I know that putting the value
0x12345678 into the
mhpmevent3 register gets me the event I want, and there is no
support for that event in the SBI spec/API. Will this API allow
me to program such an event, basically bypassing the usual
mapping functionality? perf basically allows you to say "I know
this event number is not one you know about, but it's the value
I want placed directly into the hardware." I want to ensure that
the full capabilities of the hardware will still be accessible
through the SBI spec in some sort of "raw" mode, and I didn't
see a way for that to happen right now. We don't want to
restrict users to the lowest
common denominator of functionality.



Brian





On Wed, Jul 8, 2020 at 10:27 PM Zong Li <zong.li@...> wrote:

On Thu, Jul 9, 2020 at 1:06 AM Brian Grayson
<brian.grayson@...>
wrote:

Would there be a raw style interface to access all the
SBI-unaware
events, like perf's rNNN support?
Follow this question, in our current proposal, s-mode software
only knows the event_idx, and m-mode firmware takes care of
the mapping,
my
question is that s-mode software doesn't seem to understand
the meaning of each event_idx, that means, it just get the
array of all supported event_idx, but couldn't know which one is for
what.
This also happened on u-mode program, for rNNN interface,
normally, we should refer to the processor specific
documentation for getting these details, and now, users won't
know what value they should give. Please correct me if I miss
something. Thanks.

How would this work on a multicore system -- would the SBI
calls only
handle the current hart's counters? That seems easiest to deal with.

Brian


Re: Proposal v2: SBI PMU Extension

Anup Patel
 

Hi Greg,

 

Using 20bits for event_idx.type and event_idx.code is good to me.

 

Although, the “info” field in event_idx seems to be creating confusion. I suggest we have separate “event_info” parameter in SBi_PMU_COUNTER_START call and remove “info” field from event_idx.

 

The translation of event_idx+event_info to HPMEVENT CSR value will be optional for OpenSBI platform support code. By default, OpenSBI will write a value <xyz> to HPMEVENT CSR where lower 20bits of <xyz> is event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20 bits of event_info.

 

Do this sound okay ??

 

Regards,

Anup

 

 

From: Greg Favor <gfavor@...>
Sent: 10 July 2020 11:10
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Zong Li <zong.li@...>; Atish Patra <Atish.Patra@...>; andrew@...; tech-unixplatformspec@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension

 

On Thu, Jul 9, 2020 at 9:02 PM Anup Patel <Anup.Patel@...> wrote:

Translating event_idx[XLEN-1:0] to platform specific mhpmeventX CSR value will be optional in OpenSBI (M-mode runtime firmware). If a platform does not want to translate event_idx value then we will write event_idx as-in in mhpmeventX CSR.

 

Great.

 

Regarding total number of bits used by event_idx.type and event_idx.code, I am fine with both 16bits and 20bits approaches. If there is no further objection for using 20bits approach then I will go with that.

 

Yes, that would also be good.

 

Thanks,

Greg

 

 

Regards,

Anup

 


Re: Proposal v2: SBI PMU Extension

Zong Li
 

On Fri, Jul 10, 2020 at 5:51 PM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 10 July 2020 12:24
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Atish Patra
<Atish.Patra@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU
Extension

On Thu, Jul 9, 2020 at 6:21 PM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: tech-unixplatformspec@... <tech-
unixplatformspec@...> On Behalf Of Zong Li
Sent: 09 July 2020 14:42
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Atish Patra
<Atish.Patra@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU
Extension

On Thu, Jul 9, 2020 at 4:47 PM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 09 July 2020 14:09
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Atish Patra
<Atish.Patra@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI
PMU Extension

On Thu, Jul 9, 2020 at 3:57 PM Anup Patel <Anup.Patel@...>
wrote:

Based on my previous reply…



To monitor RAW event 0x12345678, user-space perf tool will
create user
space perf RAW event (i.e. perf_event_attr.type == 4 and
perf_event_attr.config = = 0x12345678). The Linux RISC-V PMU
driver will allocate and map matching HARDWARE counter which
supports specified corrosponding SBI RAW event (event_idx.type =
2, event_idx.code = 0x678 and event_idx.info = 0x12345).
Finally, the SBI_PMU_COUNTER_START call implemented by OpenSBI
will write
0x12345678 (or some platform specific translated value of
0x12345678) to
appropriate mhpmeventX CSR).



(Note: above we assume mhpmcounterX supports monitoring RAW
event
0x12345678 and OpenSBI is aware of this)
The Linux PMU driver should be aware of this as well, because
SBI_PMU_COUNTER_START takes a parameter for countex_idx which
is
fed
by Linux PMU driver.
The SBI_PMU_COUNTER_DESCRIBE call is designed considering aspect.
This call provides list of event_idx supported by a given
counter_idx (including RAW events) so when allocating a counter in
event_add() we can easily find matching counter.
Each element of Event_list consists of event_idx.type and
event_idx.code, how does it present the raw event like above?
(event_idx.info = 0x12345, event_idx.type = 0x2, event_idx.code =
0x678) It seems to conflict if we have the raw event 0xXXXXX678 and
0xYYYYY678.

The above break-up of 0x12345678 RAW event is only one of the possible
ways shown as example.

If we have two different RAW events 0xXXXXX678 and 0xYYYYY678 then
platform can show it as:
event_idx.type == 2 and event_idx.code == X and event_idx.info =
0xXXXXX678 event_idx.type == 2 and event_idx.code == Y and
event_idx.info = 0xYYYYY678 OR event_idx.type == 2 and event_idx.code
== Z and event_idx.info = 0xXXXXX678 event_idx.type == 2 and
event_idx.code == Z and event_idx.info = 0xYYYYY678

Further, the OpenSBI platform support can translate RAW event_idx into
final mhpmeventX CSR value using platform callback or describing it in
DT/ACPI for OpenSBI generic platform.
I understand that mapping of event_idx to HPMEVENT CSR value is platform-
dependent, and m-mode firmware of each platform knows how to translate
it. But there are still some problems which are ambiguous about Event_list
and event_idx. Could you please help to clarify the question as
follows:
Okay, let me clarify your queries related to RAW events.

Like mentioned previously, all events (including RAW events) will be
Uniquely identified by event_idx.type and event_idx.code only. The
event_idx.info does not identify an event rather event_idx.info is
only additional parameter.

It is not mandatory to assign event_idx.code for RAW events based
on actual value programmed in hpmevent CSR. A platform can always
assign pseudo event_idx.code for RAW events and translate it to
final hpmevent CSR value.

For RAW events, the Linux RISC-V PMU driver will create event_idx as
follows:
event_idx.info = perf_event_attr.config[59:12]
event_idx.type = 2
event_idx.code = perf_event_attr.config[11:0]

Currently, we are encoding event_info in event_idx along with
event_idx.type and event_idx.code so most significant 4bits of
perf_event_attr.config cannot be accommodated in event_idx.
That is what I am concerned about. It doesn't make sense to me
to abandon any bits, we couldn't limite vendor to reduce the space
of encoding of events. Spec allows using full MXLEN bit for value
of HPMEVENT CSR. It is good to see that we separate event_info
from event_idx. At least, we have the opportunity to reserve complete
MXLEN bits for RAW events.



Does the RAW events need to be mapped (i.e. "event_idx.type == 0x2,
event_idx.code == 0x1" is corresponding to some HPMEVENT value) OR just
use raw data for event_idx (i.e. event_idx.info and event_code hold the raw
data directly)? There are different issues for that respectively, please see the
cases as follows:
1) RAW events should be mapped like HW events and HW cache events:
- We assume event_idx.code == 0x1 is corresponding to HPMEVENT
value 0x1234, and event_idx.code == 0x2 is corresponding to HPMEVENT
value 0x5678, so in s-mode software, I expect that I would get the event
support list "Event_list" which has the elements 0x2001 and 0x2002.
When the user mode perf tool creates a RAW event 0x1234, Linux PMU
driver tries to find an appropriate counter by checking the event support
list "Event_list', but Linux PMU driver doesn't know the relationship
between 0x1234 and 0x2001, so it couldn't decide if this counter is good
for it.
It is not possible to treat RAW events exactly like HW events and HW cache
events because event_idx.code and event_idx.info will come from user-space.

Taking your example event_idx.code == 0x1 for HPMEVENT value 0x1234
and event_idx.code = 0x2 for HPMEVENT value 0x5678. This means
OpenSBI will translate 0x2001 to 0x1234 and 0x2002 0x5678 because
platform has choosen pseudo event_idx.code values for RAW events.

The platform vendor will have to provide SW documentation of the
chosen pseudo values 0x2001 and 0x2002 and mention that users
will need to use values 0x2001 and 0x2002 for trying RAW events
using perf tools.

In fact, documenting and translating RAW event_idx values is totally
a platform's responsibility.
Actually, I don't like to see using pseudo event_idx.code for RAW events,
using the actual HPMEVENT CSR value is more straightforward, then
vendor can only provide their platform spec, doesn't need to make
additional effort to provide the information for mapping of pseudo event_idx.
That is also why I think the length of event_idx.code isn't far enough for
RAW events. But if it is the consensus that pseudo event_idx.code can be an
optional way for RAW events, I would respect the decision.

Maybe separate event_idx.type is more clear for SBI_PMU_COUNTER_START,
take a parameter for event type, and a parameter for event_idx which
is consist of event_idx.info[63:16] + event_idx.code[15:0]


My suggestion of deriving event_idx.code and event_idx.info values
from actual HPMEVENT CSR value of RAW events only helps simplify
documentation for platform vendors.

Clearly based on above Linux RISC-V PMU driver does not need to know
the final values being programmed in HPMEVENT CSRs.

2) Use RAW data of RAW event:
- The size of HPMEVENT CSR is MXLEN, so it could be 64-bits. The size of
event_idx.info + event_idx.code is only 59 bits, it doesn't seem to be
enough.
On RV64, event_idx.info + event_idx.code is 60bits (not 59 bits)
On RV32, event_idx.info + event_idx.code is 24bits

As suggested by Greg, this is large enough to support lots of events.

To avoid this limitation, initially I had kept separate "event_info"
parameter to SBI_PMU_COUNTER_START call and event_idx had
only two fields (type & code).

In addition, the event support list "Event_list" is consist of event_idx.type
and event_idx.code, it doesn't include event_idx.info, so it would be
conflicting
Like mentioned above, event_idx.info is only a parameter in configuring
event. The event_idx.info does not help uniquely identify an event.

The info field in event_idx is only to reduce parameters for
SBI_PMU_COUNTER_START.

I think it's better to move "info" field out of event_idx to avoid
more confusions.

in Event_list if there are events which have the same low 12-bits.
For example,
0xXXXXX678 is 0x2678 in Event_list, and 0xYYYYY678 is also 0x2678 in
Event_list. When the user mode perf tool creates a RAW event
0xXXXXX678,
Linux PMU driver tries to find an appropriate counter by checking the
event
support list "Event_list", it couldn't decide if this counter is good, because
it doesn't know the 0x2678 in Event_list means 0xXXXXX678 or 0xYYYYY678.

If I understand correctly above. we should come out a new format for
Event_list and event_idx or provide a new SBI call to fix that.
Like mentioned above, if two different RAW HPMEVENT CSR values
have lower 12bits then the platform vendor should assign pseudo
event_idx.code values for these events and translate it in OpenSBI.

I don't see why we need new format for Event_list.

Regards,
Anup



In other words, we have to distinguish all HARDWARE and SOFTWARE
events of a platform using 16bit event_idx.type and event_idx.code.
The event_idx.info is only additional parameter to a event.

Regards,
Anup



This means Linux RISC-V PMU only needs to deal with SBI calls for
handling all types of SBI PMU counters and events. In other words,
no need to parse DT/ACPI for event mappings.

Regards,
Anup





Regards,

Anup





From: Brian Grayson <brian.grayson@...>
Sent: 09 July 2020 12:35
To: Zong Li <zong.li@...>
Cc: Anup Patel <Anup.Patel@...>; Atish Patra
<Atish.Patra@...>; andrew@...;
tech-unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI
PMU Extension



My question is, let's say I know that putting the value
0x12345678 into the
mhpmevent3 register gets me the event I want, and there is no
support for that event in the SBI spec/API. Will this API allow
me to program such an event, basically bypassing the usual
mapping functionality? perf basically allows you to say "I know
this event number is not one you know about, but it's the value
I want placed directly into the hardware." I want to ensure that
the full capabilities of the hardware will still be accessible
through the SBI spec in some sort of "raw" mode, and I didn't
see a way for that to happen right now. We don't want to
restrict users to the lowest
common denominator of functionality.



Brian





On Wed, Jul 8, 2020 at 10:27 PM Zong Li <zong.li@...> wrote:

On Thu, Jul 9, 2020 at 1:06 AM Brian Grayson
<brian.grayson@...>
wrote:

Would there be a raw style interface to access all the
SBI-unaware
events, like perf's rNNN support?
Follow this question, in our current proposal, s-mode software
only knows the event_idx, and m-mode firmware takes care of
the mapping,
my
question is that s-mode software doesn't seem to understand
the meaning of each event_idx, that means, it just get the
array of all supported event_idx, but couldn't know which one is for
what.
This also happened on u-mode program, for rNNN interface,
normally, we should refer to the processor specific
documentation for getting these details, and now, users won't
know what value they should give. Please correct me if I miss
something. Thanks.

How would this work on a multicore system -- would the SBI
calls only
handle the current hart's counters? That seems easiest to deal with.

Brian


Re: Proposal v2: SBI PMU Extension

Greg Favor
 

Anup,

That would be a perfectly fine way to go.

Thanks,
Greg

On Fri, Jul 10, 2020 at 3:54 AM Anup Patel <Anup.Patel@...> wrote:

Hi Greg,

 

Using 20bits for event_idx.type and event_idx.code is good to me.

 

Although, the “info” field in event_idx seems to be creating confusion. I suggest we have separate “event_info” parameter in SBi_PMU_COUNTER_START call and remove “info” field from event_idx.

 

The translation of event_idx+event_info to HPMEVENT CSR value will be optional for OpenSBI platform support code. By default, OpenSBI will write a value <xyz> to HPMEVENT CSR where lower 20bits of <xyz> is event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20 bits of event_info.

 

Do this sound okay ??

 

Regards,

Anup

 

 

From: Greg Favor <gfavor@...>
Sent: 10 July 2020 11:10
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Zong Li <zong.li@...>; Atish Patra <Atish.Patra@...>; andrew@...; tech-unixplatformspec@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension

 

On Thu, Jul 9, 2020 at 9:02 PM Anup Patel <Anup.Patel@...> wrote:

Translating event_idx[XLEN-1:0] to platform specific mhpmeventX CSR value will be optional in OpenSBI (M-mode runtime firmware). If a platform does not want to translate event_idx value then we will write event_idx as-in in mhpmeventX CSR.

 

Great.

 

Regarding total number of bits used by event_idx.type and event_idx.code, I am fine with both 16bits and 20bits approaches. If there is no further objection for using 20bits approach then I will go with that.

 

Yes, that would also be good.

 

Thanks,

Greg

 

 

Regards,

Anup

 


Re: Proposal v2: SBI PMU Extension

Anup Patel
 

-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 10 July 2020 21:23
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Atish Patra
<Atish.Patra@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU
Extension

On Fri, Jul 10, 2020 at 5:51 PM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 10 July 2020 12:24
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Atish Patra
<Atish.Patra@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU
Extension

On Thu, Jul 9, 2020 at 6:21 PM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: tech-unixplatformspec@... <tech-
unixplatformspec@...> On Behalf Of Zong Li
Sent: 09 July 2020 14:42
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Atish Patra
<Atish.Patra@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI
PMU Extension

On Thu, Jul 9, 2020 at 4:47 PM Anup Patel <Anup.Patel@...>
wrote:



-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 09 July 2020 14:09
To: Anup Patel <Anup.Patel@...>
Cc: Brian Grayson <brian.grayson@...>; Atish Patra
<Atish.Patra@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2:
SBI PMU Extension

On Thu, Jul 9, 2020 at 3:57 PM Anup Patel
<Anup.Patel@...>
wrote:

Based on my previous reply…



To monitor RAW event 0x12345678, user-space perf tool will
create user
space perf RAW event (i.e. perf_event_attr.type == 4 and
perf_event_attr.config = = 0x12345678). The Linux RISC-V PMU
driver will allocate and map matching HARDWARE counter which
supports specified corrosponding SBI RAW event
(event_idx.type = 2, event_idx.code = 0x678 and event_idx.info =
0x12345).
Finally, the SBI_PMU_COUNTER_START call implemented by
OpenSBI will write
0x12345678 (or some platform specific translated value of
0x12345678) to
appropriate mhpmeventX CSR).



(Note: above we assume mhpmcounterX supports monitoring
RAW
event
0x12345678 and OpenSBI is aware of this)
The Linux PMU driver should be aware of this as well,
because SBI_PMU_COUNTER_START takes a parameter for
countex_idx which
is
fed
by Linux PMU driver.
The SBI_PMU_COUNTER_DESCRIBE call is designed considering
aspect.
This call provides list of event_idx supported by a given
counter_idx (including RAW events) so when allocating a
counter in
event_add() we can easily find matching counter.
Each element of Event_list consists of event_idx.type and
event_idx.code, how does it present the raw event like above?
(event_idx.info = 0x12345, event_idx.type = 0x2, event_idx.code
=
0x678) It seems to conflict if we have the raw event 0xXXXXX678
and
0xYYYYY678.

The above break-up of 0x12345678 RAW event is only one of the
possible ways shown as example.

If we have two different RAW events 0xXXXXX678 and 0xYYYYY678 then
platform can show it as:
event_idx.type == 2 and event_idx.code == X and event_idx.info =
0xXXXXX678 event_idx.type == 2 and event_idx.code == Y and
event_idx.info = 0xYYYYY678 OR event_idx.type == 2 and
event_idx.code == Z and event_idx.info = 0xXXXXX678 event_idx.type
== 2 and event_idx.code == Z and event_idx.info = 0xYYYYY678

Further, the OpenSBI platform support can translate RAW event_idx
into final mhpmeventX CSR value using platform callback or
describing it in DT/ACPI for OpenSBI generic platform.
I understand that mapping of event_idx to HPMEVENT CSR value is
platform- dependent, and m-mode firmware of each platform knows
how
to translate it. But there are still some problems which are
ambiguous about Event_list and event_idx. Could you please help to
clarify the question as
follows:
Okay, let me clarify your queries related to RAW events.

Like mentioned previously, all events (including RAW events) will be
Uniquely identified by event_idx.type and event_idx.code only. The
event_idx.info does not identify an event rather event_idx.info is
only additional parameter.

It is not mandatory to assign event_idx.code for RAW events based on
actual value programmed in hpmevent CSR. A platform can always assign
pseudo event_idx.code for RAW events and translate it to final
hpmevent CSR value.

For RAW events, the Linux RISC-V PMU driver will create event_idx as
follows:
event_idx.info = perf_event_attr.config[59:12] event_idx.type = 2
event_idx.code = perf_event_attr.config[11:0]

Currently, we are encoding event_info in event_idx along with
event_idx.type and event_idx.code so most significant 4bits of
perf_event_attr.config cannot be accommodated in event_idx.
That is what I am concerned about. It doesn't make sense to me to abandon
any bits, we couldn't limite vendor to reduce the space of encoding of
events. Spec allows using full MXLEN bit for value of HPMEVENT CSR. It is
good to see that we separate event_info from event_idx. At least, we have
the opportunity to reserve complete MXLEN bits for RAW events.
Okay, we will have separate event_info parameter so that we don't
have to abandon bits and event_idx will only have "type" and "code".




Does the RAW events need to be mapped (i.e. "event_idx.type == 0x2,
event_idx.code == 0x1" is corresponding to some HPMEVENT value) OR
just use raw data for event_idx (i.e. event_idx.info and event_code
hold the raw data directly)? There are different issues for that
respectively, please see the cases as follows:
1) RAW events should be mapped like HW events and HW cache events:
- We assume event_idx.code == 0x1 is corresponding to HPMEVENT
value 0x1234, and event_idx.code == 0x2 is corresponding to
HPMEVENT
value 0x5678, so in s-mode software, I expect that I would get the
event
support list "Event_list" which has the elements 0x2001 and 0x2002.
When the user mode perf tool creates a RAW event 0x1234, Linux PMU
driver tries to find an appropriate counter by checking the event
support
list "Event_list', but Linux PMU driver doesn't know the relationship
between 0x1234 and 0x2001, so it couldn't decide if this counter is good
for it.
It is not possible to treat RAW events exactly like HW events and HW
cache events because event_idx.code and event_idx.info will come from
user-space.

Taking your example event_idx.code == 0x1 for HPMEVENT value 0x1234
and event_idx.code = 0x2 for HPMEVENT value 0x5678. This means
OpenSBI
will translate 0x2001 to 0x1234 and 0x2002 0x5678 because platform has
choosen pseudo event_idx.code values for RAW events.

The platform vendor will have to provide SW documentation of the
chosen pseudo values 0x2001 and 0x2002 and mention that users will
need to use values 0x2001 and 0x2002 for trying RAW events using perf
tools.

In fact, documenting and translating RAW event_idx values is totally a
platform's responsibility.
Actually, I don't like to see using pseudo event_idx.code for RAW events,
using the actual HPMEVENT CSR value is more straightforward, then vendor
can only provide their platform spec, doesn't need to make additional effort
to provide the information for mapping of pseudo event_idx.
That is also why I think the length of event_idx.code isn't far enough for RAW
events. But if it is the consensus that pseudo event_idx.code can be an
optional way for RAW events, I would respect the decision.

Maybe separate event_idx.type is more clear for
SBI_PMU_COUNTER_START, take a parameter for event type, and a
parameter for event_idx which is consist of event_idx.info[63:16] +
event_idx.code[15:0]
Okay, let me think about this more. I will try to address your concern
in v3 proposal.

Regards,
Anup


Proposal v3: SBI PMU Extension

Anup Patel
 

Hi All,

We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER
CSRs. A RISC-V implementation can support monitoring various HARDWARE
events using limited number of HPMCOUNTER CSRs.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters. The SBI
PMU extension will only manage per-HART (or per-CPU) HARDWARE/SOFTWARE
counters which include CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and
SOFTWARE counters provided by SBI implementation.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or Xvisor)
will provide a standardized view of HARDWARE/SOFTWARE counters and
events to S-mode (or VS-mode) software.

To define SBI PMU extension, we first define counter_idx which is a
logical number assigned to a counter and event_idx which is an encoded
number representing the HARDWARE/SOFTWARE event to be monitored. A
HARDWARE/SOFTWARE event can also have additional configuration/details
referred to as event_info.

The SBI PMU event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event,
the event_info is not required whereas the event_idx.code can be one
of the following values:
enum sbi_pmu_hw_id {
SBI_PMU_HW_CPU_CYCLES = 0,
SBI_PMU_HW_INSTRUCTIONS = 1,
SBI_PMU_HW_CACHE_REFERENCES = 2,
SBI_PMU_HW_CACHE_MISSES = 3,
SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4,
SBI_PMU_HW_BRANCH_MISSES = 5,
SBI_PMU_HW_BUS_CYCLES = 6,
SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8,
SBI_PMU_HW_REF_CPU_CYCLES = 9,
SBI_PMU_HW_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_info is not required whereas the event_idx.code
is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
SBI_PMU_HW_CACHE_L1D = 0,
SBI_PMU_HW_CACHE_L1I = 1,
SBI_PMU_HW_CACHE_LL = 2,
SBI_PMU_HW_CACHE_DTLB = 3,
SBI_PMU_HW_CACHE_ITLB = 4,
SBI_PMU_HW_CACHE_BPU = 5,
SBI_PMU_HW_CACHE_NODE = 6,
SBI_PMU_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
SBI_PMU_HW_CACHE_OP_READ = 0,
SBI_PMU_HW_CACHE_OP_WRITE = 1,
SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
SBI_PMU_HW_CACHE_RESULT_MISS = 1,
SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
RAW event, the event_idx.code should be zero and the event_info
parameter passed to SBI_PMU_COUNTER_SET_EVENT call (described below)
will have the RAW event value to be programmed in MHPMEVENT CSR (i.e.
the SBI implementation will not derive MHPMEVENT CSR value from
event_idx + event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is not required whereas the event_idx.code
can be one of the following:
enum sbi_pmu_sw_id {
SBI_PMU_SW_MISALIGNED_LOAD = 0,
SBI_PMU_SW_MISALIGNED_STORE = 1,
SBI_PMU_SW_ILLEGAL_INSN = 2,
SBI_PMU_SW_LOCAL_SET_TIMER = 3,
SBI_PMU_SW_LOCAL_IPI = 4,
SBI_PMU_SW_LOCAL_FENCE_I = 5,
SBI_PMU_SW_LOCAL_SFENCE_VMA = 6,
SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7,
SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8,
SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10,
SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
SBI_PMU_SW_MAX, /* non-ABI */
};

In future, more events can be defined without breaking SBI call
compatibility of SBI calls.

Using definition of counter_idx and event_idx, we can potentially have
the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
This call will return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
This call takes one parameter:
1) counter_idx
It will provide the CSR_Number and CSR_Width of underlying counter.
The value returned by SBI call is encoded as follows:
return_value[11:0] = CSR_Number
return_value[19:12] = CSR_Width (Number of bits implemented in HW)
If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
HARDWARE counter. This SBI call will fail for counters which are not
present.

3. SBI_PMU_COUNTER_SET_EVENT
This call takes three parameter:
1) counter_idx
2) event_idx
3) event_info
It will select an event to be monitored by given counter. If this
SBI call is not used for a counter to select an event then the
counter will montior default event selected for it at boot-time.
This SBI call will fail for counters which are not present. It will
also fail if specified event_idx + event_info combination is not
supported by given counter.

4. SBI_PMU_COUNTER_SET_PHYS_ADDR
This call takes two parameters:
1) counter_idx
2) 8byte aligned physical address
It will set the physical address of memory location where the SBI
implementation will write the 64bit SOFTWARE counter. This SBI call
is only for counters not mapped to any CSR (i.e. only for counters
with CSR_Number > 0xfff).

5. SBI_PMU_COUNTER_START
This call takes two parameters:
1) counter_idx
2) initial_value
It will inform SBI implementation to start/enable specified counter
with specified initial value. This SBI call will fail for counters
which are not present.

6. SBI_PMU_COUNTER_STOP
This call takes one parameter:
1) counter_idx
It will inform SBI implementation to stop/disable specified counters
on the calling HART. This SBI call will fail for counters which are
not present.

The M-mode runtime firmware (OpenSBI) Development Notes:

1. The M-mode runtime firmware will have to translate SBI PMU
event_idx and event_into into platform dependent MHPMEVENT CSR
value before starting/enabling a HARDWARE counter.

2. The M-mode runtime firmware (OpenSBI) will need to know following
platform dependent information:
A) Possible event_idx values allowed (or supported) by a HARDWARE
counter (i.e. HPMCOUNTER)
B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT CSR
value. This is optional for platform. By default, OpenSBI will
write a value <xyz> to MHPMEVENT CSR where lower 20bits of <xyz>
are event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20
bits of event_info
C) Additional platform-specific progamming required for selecting
event_idx + event_info combination. This is also optional for
platform.

3. All platform dependent information mentioned above, can be obtained
by M-mode runtime firmware (OpenSBI) from platform specific code.
The DT/ACPI can also be used to describe 2.A and 2.B mentioned above
but 2.C will always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
The Linux RISC-V driver can be platform driver with "riscv,sbi-pmu"
as DT compatible string and optional "interrupts" DT property. The
"interrupts" DT property if available should specify an edge-triggered
overflow interrupt for each HART. When "interrupts" DT property is
present, we might also need another DT property for mapping HARTID
to entries in "interrupts" DT property. The platform driver probe
will:
A) Need to ensure that underlying SBI implementation provides
SBI PMU extension using sbi_probe_extension() API of arch/riscv.
B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
call. If the counter is a SOFTWARE counter then use the
SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
of counter. The driver skip this in driver probe and instead
do this lazily in add() callback mentioned below.

2. event_init() callback
The event_init() callback will primarily translate user-space
perf_event_attr to SBI PMU event_idx and event_info. It can do
this in following way:
A) perf_event_attr.type == PERF_TYPE_HARDWARE
event_idx.type = 0x0
event_idx.code = Value from enum sbi_pmu_hw_id based on
perf_event_attr.config
event_info = 0
B) perf_event_attr.type == PERF_TYPE_HW_CACHE
event_idx.type = 0x1
event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
based on perf_event_attr.config
event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
based on perf_event_attr.config
event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
based on perf_event_attr.config
event_info = 0
C) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 0
event_idx.type = 0x2
event_idx.code = 0x0
event_info = perf_event_attr.config[62:0]
D) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 1
event_idx.type = 0xf
event_idx.code = Value from enum sbi_pmu_sw_id based on
perf_event_attr.config
event_info = 0
(Note: event_init() will fail if it is not able to figure out
event_idx and event_info value corresponding to perf_event_attr)
(Note: event_init() will not assign counter to perf_event because
it will be done by event_add())

3. add() callback
The add() callback of Linux RISC-V PMU driver will find a
free counter on current CPU/HART such that the perf_event
event_idx + event_info combination is supported by the counter.
To check-and-set event_idx + event_info combination for a
counter, we will use the SBI_PMU_COUNTER_SET_EVENT call.
The counter allocation and SBI_PMU_COUNTER_SET_EVENT call
can be futher optimized by looking at CSR details.
For example:
A) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_CPU_CYCLES, we should
prefer counter mapping to CYCLE CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
B) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_INSTRUCTIONS, we should
prefer counter mapping to INSTRET CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
C) For event_idx == 0xf, only perfer counters mapping
to 0xfff CSR (i.e. SOFTWARE counters).

4. del() callback
The del() callback of Linux RISC-V PMU driver will release
or free the counter.

5. start() callback
The start() callback of Linux RISC-V PMU driver will start
the counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
The stop() callback of Linux RISC-V PMU driver will stop
the counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup


Re: Proposal v3: SBI PMU Extension

Zong Li
 

On Mon, Jul 13, 2020 at 9:12 PM Anup Patel <Anup.Patel@...> wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER
CSRs. A RISC-V implementation can support monitoring various HARDWARE
events using limited number of HPMCOUNTER CSRs.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters. The SBI
PMU extension will only manage per-HART (or per-CPU) HARDWARE/SOFTWARE
counters which include CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and
SOFTWARE counters provided by SBI implementation.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or Xvisor)
will provide a standardized view of HARDWARE/SOFTWARE counters and
events to S-mode (or VS-mode) software.

To define SBI PMU extension, we first define counter_idx which is a
logical number assigned to a counter and event_idx which is an encoded
number representing the HARDWARE/SOFTWARE event to be monitored. A
HARDWARE/SOFTWARE event can also have additional configuration/details
referred to as event_info.

The SBI PMU event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event,
the event_info is not required whereas the event_idx.code can be one
of the following values:
enum sbi_pmu_hw_id {
SBI_PMU_HW_CPU_CYCLES = 0,
SBI_PMU_HW_INSTRUCTIONS = 1,
SBI_PMU_HW_CACHE_REFERENCES = 2,
SBI_PMU_HW_CACHE_MISSES = 3,
SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4,
SBI_PMU_HW_BRANCH_MISSES = 5,
SBI_PMU_HW_BUS_CYCLES = 6,
SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8,
SBI_PMU_HW_REF_CPU_CYCLES = 9,
SBI_PMU_HW_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_info is not required whereas the event_idx.code
is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
SBI_PMU_HW_CACHE_L1D = 0,
SBI_PMU_HW_CACHE_L1I = 1,
SBI_PMU_HW_CACHE_LL = 2,
SBI_PMU_HW_CACHE_DTLB = 3,
SBI_PMU_HW_CACHE_ITLB = 4,
SBI_PMU_HW_CACHE_BPU = 5,
SBI_PMU_HW_CACHE_NODE = 6,
SBI_PMU_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
SBI_PMU_HW_CACHE_OP_READ = 0,
SBI_PMU_HW_CACHE_OP_WRITE = 1,
SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
SBI_PMU_HW_CACHE_RESULT_MISS = 1,
SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
RAW event, the event_idx.code should be zero and the event_info
parameter passed to SBI_PMU_COUNTER_SET_EVENT call (described below)
will have the RAW event value to be programmed in MHPMEVENT CSR (i.e.
the SBI implementation will not derive MHPMEVENT CSR value from
event_idx + event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is not required whereas the event_idx.code
can be one of the following:
enum sbi_pmu_sw_id {
SBI_PMU_SW_MISALIGNED_LOAD = 0,
SBI_PMU_SW_MISALIGNED_STORE = 1,
SBI_PMU_SW_ILLEGAL_INSN = 2,
SBI_PMU_SW_LOCAL_SET_TIMER = 3,
SBI_PMU_SW_LOCAL_IPI = 4,
SBI_PMU_SW_LOCAL_FENCE_I = 5,
SBI_PMU_SW_LOCAL_SFENCE_VMA = 6,
SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7,
SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8,
SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10,
SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
SBI_PMU_SW_MAX, /* non-ABI */
};

In future, more events can be defined without breaking SBI call
compatibility of SBI calls.

Using definition of counter_idx and event_idx, we can potentially have
the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
This call will return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
This call takes one parameter:
1) counter_idx
It will provide the CSR_Number and CSR_Width of underlying counter.
The value returned by SBI call is encoded as follows:
return_value[11:0] = CSR_Number
return_value[19:12] = CSR_Width (Number of bits implemented in HW)
If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
HARDWARE counter. This SBI call will fail for counters which are not
present.

3. SBI_PMU_COUNTER_SET_EVENT
This call takes three parameter:
1) counter_idx
2) event_idx
3) event_info
It will select an event to be monitored by given counter. If this
SBI call is not used for a counter to select an event then the
counter will montior default event selected for it at boot-time.
This SBI call will fail for counters which are not present. It will
also fail if specified event_idx + event_info combination is not
supported by given counter.
It also seems to fail if the specified event is not supported by the given
counter, right? Then Linux driver could try to allocate the next free counter
when returning failure from this SBI calls.

Apart from this question above, this version of the proposal is great to me.

Thanks,
Zong


4. SBI_PMU_COUNTER_SET_PHYS_ADDR
This call takes two parameters:
1) counter_idx
2) 8byte aligned physical address
It will set the physical address of memory location where the SBI
implementation will write the 64bit SOFTWARE counter. This SBI call
is only for counters not mapped to any CSR (i.e. only for counters
with CSR_Number > 0xfff).

5. SBI_PMU_COUNTER_START
This call takes two parameters:
1) counter_idx
2) initial_value
It will inform SBI implementation to start/enable specified counter
with specified initial value. This SBI call will fail for counters
which are not present.

6. SBI_PMU_COUNTER_STOP
This call takes one parameter:
1) counter_idx
It will inform SBI implementation to stop/disable specified counters
on the calling HART. This SBI call will fail for counters which are
not present.

The M-mode runtime firmware (OpenSBI) Development Notes:

1. The M-mode runtime firmware will have to translate SBI PMU
event_idx and event_into into platform dependent MHPMEVENT CSR
value before starting/enabling a HARDWARE counter.

2. The M-mode runtime firmware (OpenSBI) will need to know following
platform dependent information:
A) Possible event_idx values allowed (or supported) by a HARDWARE
counter (i.e. HPMCOUNTER)
B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT CSR
value. This is optional for platform. By default, OpenSBI will
write a value <xyz> to MHPMEVENT CSR where lower 20bits of <xyz>
are event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20
bits of event_info
C) Additional platform-specific progamming required for selecting
event_idx + event_info combination. This is also optional for
platform.

3. All platform dependent information mentioned above, can be obtained
by M-mode runtime firmware (OpenSBI) from platform specific code.
The DT/ACPI can also be used to describe 2.A and 2.B mentioned above
but 2.C will always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
The Linux RISC-V driver can be platform driver with "riscv,sbi-pmu"
as DT compatible string and optional "interrupts" DT property. The
"interrupts" DT property if available should specify an edge-triggered
overflow interrupt for each HART. When "interrupts" DT property is
present, we might also need another DT property for mapping HARTID
to entries in "interrupts" DT property. The platform driver probe
will:
A) Need to ensure that underlying SBI implementation provides
SBI PMU extension using sbi_probe_extension() API of arch/riscv.
B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
call. If the counter is a SOFTWARE counter then use the
SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
of counter. The driver skip this in driver probe and instead
do this lazily in add() callback mentioned below.

2. event_init() callback
The event_init() callback will primarily translate user-space
perf_event_attr to SBI PMU event_idx and event_info. It can do
this in following way:
A) perf_event_attr.type == PERF_TYPE_HARDWARE
event_idx.type = 0x0
event_idx.code = Value from enum sbi_pmu_hw_id based on
perf_event_attr.config
event_info = 0
B) perf_event_attr.type == PERF_TYPE_HW_CACHE
event_idx.type = 0x1
event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
based on perf_event_attr.config
event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
based on perf_event_attr.config
event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
based on perf_event_attr.config
event_info = 0
C) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 0
event_idx.type = 0x2
event_idx.code = 0x0
event_info = perf_event_attr.config[62:0]
D) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 1
event_idx.type = 0xf
event_idx.code = Value from enum sbi_pmu_sw_id based on
perf_event_attr.config
event_info = 0
(Note: event_init() will fail if it is not able to figure out
event_idx and event_info value corresponding to perf_event_attr)
(Note: event_init() will not assign counter to perf_event because
it will be done by event_add())

3. add() callback
The add() callback of Linux RISC-V PMU driver will find a
free counter on current CPU/HART such that the perf_event
event_idx + event_info combination is supported by the counter.
To check-and-set event_idx + event_info combination for a
counter, we will use the SBI_PMU_COUNTER_SET_EVENT call.
The counter allocation and SBI_PMU_COUNTER_SET_EVENT call
can be futher optimized by looking at CSR details.
For example:
A) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_CPU_CYCLES, we should
prefer counter mapping to CYCLE CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
B) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_INSTRUCTIONS, we should
prefer counter mapping to INSTRET CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
C) For event_idx == 0xf, only perfer counters mapping
to 0xfff CSR (i.e. SOFTWARE counters).

4. del() callback
The del() callback of Linux RISC-V PMU driver will release
or free the counter.

5. start() callback
The start() callback of Linux RISC-V PMU driver will start
the counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
The stop() callback of Linux RISC-V PMU driver will stop
the counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup


Re: Proposal v3: SBI PMU Extension

Anup Patel
 

-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 14 July 2020 09:02
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra
<Atish.Patra@...>; andrew@...; gfavor@...
Subject: Re: Proposal v3: SBI PMU Extension

On Mon, Jul 13, 2020 at 9:12 PM Anup Patel <Anup.Patel@...> wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension but we do have
HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER
CSRs. A RISC-V implementation can support monitoring various HARDWARE
events using limited number of HPMCOUNTER CSRs.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters. The
SBI
PMU extension will only manage per-HART (or per-CPU)
HARDWARE/SOFTWARE
counters which include CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and
SOFTWARE counters provided by SBI implementation.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or
Xvisor) will provide a standardized view of HARDWARE/SOFTWARE
counters
and events to S-mode (or VS-mode) software.

To define SBI PMU extension, we first define counter_idx which is a
logical number assigned to a counter and event_idx which is an encoded
number representing the HARDWARE/SOFTWARE event to be monitored.
A
HARDWARE/SOFTWARE event can also have additional
configuration/details
referred to as event_info.

The SBI PMU event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE
event, the event_info is not required whereas the event_idx.code can
be one of the following values:
enum sbi_pmu_hw_id {
SBI_PMU_HW_CPU_CYCLES = 0,
SBI_PMU_HW_INSTRUCTIONS = 1,
SBI_PMU_HW_CACHE_REFERENCES = 2,
SBI_PMU_HW_CACHE_MISSES = 3,
SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4,
SBI_PMU_HW_BRANCH_MISSES = 5,
SBI_PMU_HW_BUS_CYCLES = 6,
SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8,
SBI_PMU_HW_REF_CPU_CYCLES = 9,
SBI_PMU_HW_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For
HARDWARE
CACHE event, the event_info is not required whereas the event_idx.code
is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
SBI_PMU_HW_CACHE_L1D = 0,
SBI_PMU_HW_CACHE_L1I = 1,
SBI_PMU_HW_CACHE_LL = 2,
SBI_PMU_HW_CACHE_DTLB = 3,
SBI_PMU_HW_CACHE_ITLB = 4,
SBI_PMU_HW_CACHE_BPU = 5,
SBI_PMU_HW_CACHE_NODE = 6,
SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum
sbi_pmu_hw_cache_op_id
{
SBI_PMU_HW_CACHE_OP_READ = 0,
SBI_PMU_HW_CACHE_OP_WRITE = 1,
SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
SBI_PMU_HW_CACHE_RESULT_MISS = 1,
SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
RAW event, the event_idx.code should be zero and the event_info
parameter passed to SBI_PMU_COUNTER_SET_EVENT call (described
below)
will have the RAW event value to be programmed in MHPMEVENT CSR (i.e.
the SBI implementation will not derive MHPMEVENT CSR value from
event_idx + event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is not required whereas the event_idx.code can
be one of the following:
enum sbi_pmu_sw_id {
SBI_PMU_SW_MISALIGNED_LOAD = 0,
SBI_PMU_SW_MISALIGNED_STORE = 1,
SBI_PMU_SW_ILLEGAL_INSN = 2,
SBI_PMU_SW_LOCAL_SET_TIMER = 3,
SBI_PMU_SW_LOCAL_IPI = 4,
SBI_PMU_SW_LOCAL_FENCE_I = 5,
SBI_PMU_SW_LOCAL_SFENCE_VMA = 6,
SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7,
SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8,
SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10,
SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
SBI_PMU_SW_MAX, /* non-ABI */
};

In future, more events can be defined without breaking SBI call
compatibility of SBI calls.

Using definition of counter_idx and event_idx, we can potentially have
the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
This call will return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
This call takes one parameter:
1) counter_idx
It will provide the CSR_Number and CSR_Width of underlying counter.
The value returned by SBI call is encoded as follows:
return_value[11:0] = CSR_Number
return_value[19:12] = CSR_Width (Number of bits implemented in
HW)
If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
HARDWARE counter. This SBI call will fail for counters which are not
present.

3. SBI_PMU_COUNTER_SET_EVENT
This call takes three parameter:
1) counter_idx
2) event_idx
3) event_info
It will select an event to be monitored by given counter. If this
SBI call is not used for a counter to select an event then the
counter will montior default event selected for it at boot-time.
This SBI call will fail for counters which are not present. It will
also fail if specified event_idx + event_info combination is not
supported by given counter.
It also seems to fail if the specified event is not supported by the given
counter, right? Then Linux driver could try to allocate the next free counter
when returning failure from this SBI calls.
Yes, this call will fail if event_idx + event_info combination is not supported
by given counter_idx. It is expected that Linux driver will try another
free counter if SBI_PMU_COUNTER_SET_EVENT call fails. I have suggested
few ideas on how to reduce SBI_PMU_COUNTER_SET_EVENT calls by
looking at CSR number assigned to counter.


Apart from this question above, this version of the proposal is great to me.
Cool 😊

Regards,
Anup


Thanks,
Zong


4. SBI_PMU_COUNTER_SET_PHYS_ADDR
This call takes two parameters:
1) counter_idx
2) 8byte aligned physical address
It will set the physical address of memory location where the SBI
implementation will write the 64bit SOFTWARE counter. This SBI call
is only for counters not mapped to any CSR (i.e. only for counters
with CSR_Number > 0xfff).

5. SBI_PMU_COUNTER_START
This call takes two parameters:
1) counter_idx
2) initial_value
It will inform SBI implementation to start/enable specified counter
with specified initial value. This SBI call will fail for counters
which are not present.

6. SBI_PMU_COUNTER_STOP
This call takes one parameter:
1) counter_idx
It will inform SBI implementation to stop/disable specified counters
on the calling HART. This SBI call will fail for counters which are
not present.

The M-mode runtime firmware (OpenSBI) Development Notes:

1. The M-mode runtime firmware will have to translate SBI PMU
event_idx and event_into into platform dependent MHPMEVENT CSR
value before starting/enabling a HARDWARE counter.

2. The M-mode runtime firmware (OpenSBI) will need to know following
platform dependent information:
A) Possible event_idx values allowed (or supported) by a HARDWARE
counter (i.e. HPMCOUNTER)
B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT
CSR
value. This is optional for platform. By default, OpenSBI will
write a value <xyz> to MHPMEVENT CSR where lower 20bits of <xyz>
are event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20
bits of event_info
C) Additional platform-specific progamming required for selecting
event_idx + event_info combination. This is also optional for
platform.

3. All platform dependent information mentioned above, can be obtained
by M-mode runtime firmware (OpenSBI) from platform specific code.
The DT/ACPI can also be used to describe 2.A and 2.B mentioned above
but 2.C will always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
The Linux RISC-V driver can be platform driver with "riscv,sbi-pmu"
as DT compatible string and optional "interrupts" DT property. The
"interrupts" DT property if available should specify an edge-triggered
overflow interrupt for each HART. When "interrupts" DT property is
present, we might also need another DT property for mapping HARTID
to entries in "interrupts" DT property. The platform driver probe
will:
A) Need to ensure that underlying SBI implementation provides
SBI PMU extension using sbi_probe_extension() API of arch/riscv.
B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
call. If the counter is a SOFTWARE counter then use the
SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
of counter. The driver skip this in driver probe and instead
do this lazily in add() callback mentioned below.

2. event_init() callback
The event_init() callback will primarily translate user-space
perf_event_attr to SBI PMU event_idx and event_info. It can do
this in following way:
A) perf_event_attr.type == PERF_TYPE_HARDWARE
event_idx.type = 0x0
event_idx.code = Value from enum sbi_pmu_hw_id based on
perf_event_attr.config
event_info = 0
B) perf_event_attr.type == PERF_TYPE_HW_CACHE
event_idx.type = 0x1
event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
based on perf_event_attr.config
event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
based on perf_event_attr.config
event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
based on perf_event_attr.config
event_info = 0
C) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 0
event_idx.type = 0x2
event_idx.code = 0x0
event_info = perf_event_attr.config[62:0]
D) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 1
event_idx.type = 0xf
event_idx.code = Value from enum sbi_pmu_sw_id based on
perf_event_attr.config
event_info = 0
(Note: event_init() will fail if it is not able to figure out
event_idx and event_info value corresponding to perf_event_attr)
(Note: event_init() will not assign counter to perf_event because
it will be done by event_add())

3. add() callback
The add() callback of Linux RISC-V PMU driver will find a
free counter on current CPU/HART such that the perf_event
event_idx + event_info combination is supported by the counter.
To check-and-set event_idx + event_info combination for a
counter, we will use the SBI_PMU_COUNTER_SET_EVENT call.
The counter allocation and SBI_PMU_COUNTER_SET_EVENT call
can be futher optimized by looking at CSR details.
For example:
A) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_CPU_CYCLES, we should
prefer counter mapping to CYCLE CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
B) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_INSTRUCTIONS, we should
prefer counter mapping to INSTRET CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
C) For event_idx == 0xf, only perfer counters mapping
to 0xfff CSR (i.e. SOFTWARE counters).

4. del() callback
The del() callback of Linux RISC-V PMU driver will release
or free the counter.

5. start() callback
The start() callback of Linux RISC-V PMU driver will start
the counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
The stop() callback of Linux RISC-V PMU driver will stop
the counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup


Re: Proposal v3: SBI PMU Extension

Brian Grayson
 

Should there also be a way to atomically specify start/stop for a set of counters, or is the latency of N SBI start/stop calls short enough that starting or stopping N counters will not take that long? For a lot of cores today, N is very small, like 2 for some cores, but as RISC-V cores continue to grow in capability, N could easily become 4 to 8 for the core, another set in the L2, another set in the L3, etc.

Brian

On Mon, Jul 13, 2020 at 10:41 PM Anup Patel <anup.patel@...> wrote:


> -----Original Message-----
> From: Zong Li <zong.li@...>
> Sent: 14 July 2020 09:02
> To: Anup Patel <Anup.Patel@...>
> Cc: tech-unixplatformspec@...; Atish Patra
> <Atish.Patra@...>; andrew@...; gfavor@...
> Subject: Re: Proposal v3: SBI PMU Extension
>
> On Mon, Jul 13, 2020 at 9:12 PM Anup Patel <Anup.Patel@...> wrote:
> >
> > Hi All,
> >
> > We don't have a dedicated RISC-V PMU extension but we do have
> HARDWARE
> > performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER
> > CSRs. A RISC-V implementation can support monitoring various HARDWARE
> > events using limited number of HPMCOUNTER CSRs.
> >
> > In addition to HARDWARE performance counters, a SBI implementation
> > (e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
> > events such as number of RFENCEs, number of IPIs, number of misaligned
> > load/store instructions, number of illegal instructions, etc.
> >
> > We propose SBI PMU extension, which will help S-mode (or VS-mode)
> > software to discover and configure HARDWARE/SOFTWARE counters. The
> SBI
> > PMU extension will only manage per-HART (or per-CPU)
> HARDWARE/SOFTWARE
> > counters which include CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and
> > SOFTWARE counters provided by SBI implementation.
> >
> > Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or
> > Xvisor) will provide a standardized view of HARDWARE/SOFTWARE
> counters
> > and events to S-mode (or VS-mode) software.
> >
> > To define SBI PMU extension, we first define counter_idx which is a
> > logical number assigned to a counter and event_idx which is an encoded
> > number representing the HARDWARE/SOFTWARE event to be monitored.
> A
> > HARDWARE/SOFTWARE event can also have additional
> configuration/details
> > referred to as event_info.
> >
> > The SBI PMU event_idx is a 20bits wide number encoded as follows:
> > event_idx[19:16] = type
> > event_idx[15:0] = code
> >
> > If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE
> > event, the event_info is not required whereas the event_idx.code can
> > be one of the following values:
> > enum sbi_pmu_hw_id {
> >     SBI_PMU_HW_CPU_CYCLES              = 0,
> >     SBI_PMU_HW_INSTRUCTIONS            = 1,
> >     SBI_PMU_HW_CACHE_REFERENCES        = 2,
> >     SBI_PMU_HW_CACHE_MISSES            = 3,
> >     SBI_PMU_HW_BRANCH_INSTRUCTIONS     = 4,
> >     SBI_PMU_HW_BRANCH_MISSES           = 5,
> >     SBI_PMU_HW_BUS_CYCLES              = 6,
> >     SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
> >     SBI_PMU_HW_STALLED_CYCLES_BACKEND  = 8,
> >     SBI_PMU_HW_REF_CPU_CYCLES          = 9,
> >     SBI_PMU_HW_MAX,                    /* non-ABI */
> > };
> > (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
> >
> > If event_idx.type == 0x1 then it is HARDWARE CACHE event. For
> HARDWARE
> > CACHE event, the event_info is not required whereas the event_idx.code
> > is encoded as follows:
> > event_idx.code[15:3] = cache_id
> > event_idx.code[2:1] = op_id
> > event_idx.code[0:0] = result_id
> > enum sbi_pmu_hw_cache_id {
> >     SBI_PMU_HW_CACHE_L1D  = 0,
> >     SBI_PMU_HW_CACHE_L1I  = 1,
> >     SBI_PMU_HW_CACHE_LL   = 2,
> >     SBI_PMU_HW_CACHE_DTLB = 3,
> >     SBI_PMU_HW_CACHE_ITLB = 4,
> >     SBI_PMU_HW_CACHE_BPU  = 5,
> >     SBI_PMU_HW_CACHE_NODE = 6,
> >     SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum
> sbi_pmu_hw_cache_op_id
> > {
> >     SBI_PMU_HW_CACHE_OP_READ     = 0,
> >     SBI_PMU_HW_CACHE_OP_WRITE    = 1,
> >     SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
> >     SBI_PMU_HW_CACHE_OP_MAX,     /* non-ABI */
> > };
> > enum sbi_pmu_hw_cache_op_result_id {
> >     SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
> >     SBI_PMU_HW_CACHE_RESULT_MISS   = 1,
> >     SBI_PMU_HW_CACHE_RESULT_MAX,   /* non-ABI */
> > };
> > (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
> >
> > If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
> > RAW event, the event_idx.code should be zero and the event_info
> > parameter passed to SBI_PMU_COUNTER_SET_EVENT call (described
> below)
> > will have the RAW event value to be programmed in MHPMEVENT CSR (i.e.
> > the SBI implementation will not derive MHPMEVENT CSR value from
> > event_idx + event_info).
> >
> > If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
> > event, the event_info is not required whereas the event_idx.code can
> > be one of the following:
> > enum sbi_pmu_sw_id {
> >     SBI_PMU_SW_MISALIGNED_LOAD        = 0,
> >     SBI_PMU_SW_MISALIGNED_STORE       = 1,
> >     SBI_PMU_SW_ILLEGAL_INSN           = 2,
> >     SBI_PMU_SW_LOCAL_SET_TIMER        = 3,
> >     SBI_PMU_SW_LOCAL_IPI              = 4,
> >     SBI_PMU_SW_LOCAL_FENCE_I          = 5,
> >     SBI_PMU_SW_LOCAL_SFENCE_VMA       = 6,
> >     SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID  = 7,
> >     SBI_PMU_SW_LOCAL_HFENCE_GVMA      = 8,
> >     SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
> >     SBI_PMU_SW_LOCAL_HFENCE_VVMA      = 10,
> >     SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
> >     SBI_PMU_SW_MAX,                   /* non-ABI */
> > };
> >
> > In future, more events can be defined without breaking SBI call
> > compatibility of SBI calls.
> >
> > Using definition of counter_idx and event_idx, we can potentially have
> > the following SBI calls:
> >
> > 1. SBI_PMU_NUM_COUNTERS
> >    This call will return the number of COUNTERs
> >
> > 2. SBI_PMU_COUNTER_GET_CSR
> >    This call takes one parameter:
> >       1) counter_idx
> >    It will provide the CSR_Number and CSR_Width of underlying counter.
> >    The value returned by SBI call is encoded as follows:
> >       return_value[11:0] = CSR_Number
> >           return_value[19:12] = CSR_Width (Number of bits implemented in
> HW)
> >    If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
> >    HARDWARE counter. This SBI call will fail for counters which are not
> >    present.
> >
> > 3. SBI_PMU_COUNTER_SET_EVENT
> >    This call takes three parameter:
> >       1) counter_idx
> >       2) event_idx
> >       3) event_info
> >    It will select an event to be monitored by given counter. If this
> >    SBI call is not used for a counter to select an event then the
> >    counter will montior default event selected for it at boot-time.
> >    This SBI call will fail for counters which are not present. It will
> >    also fail if specified event_idx + event_info combination is not
> >    supported by given counter.
>
> It also seems to fail if the specified event is not supported by the given
> counter, right? Then Linux driver could try to allocate the next free counter
> when returning failure from this SBI calls.

Yes, this call will fail if event_idx + event_info combination is not supported
by given counter_idx. It is expected that Linux driver will try another
free counter if SBI_PMU_COUNTER_SET_EVENT call fails. I have suggested
few ideas on how to reduce SBI_PMU_COUNTER_SET_EVENT calls by
looking at CSR number assigned to counter.

>
> Apart from this question above, this version of the proposal is great to me.

Cool 😊

Regards,
Anup

>
> Thanks,
> Zong
>
> >
> > 4. SBI_PMU_COUNTER_SET_PHYS_ADDR
> >    This call takes two parameters:
> >       1) counter_idx
> >       2) 8byte aligned physical address
> >    It will set the physical address of memory location where the SBI
> >    implementation will write the 64bit SOFTWARE counter. This SBI call
> >    is only for counters not mapped to any CSR (i.e. only for counters
> >    with CSR_Number > 0xfff).
> >
> > 5. SBI_PMU_COUNTER_START
> >    This call takes two parameters:
> >       1) counter_idx
> >       2) initial_value
> >    It will inform SBI implementation to start/enable specified counter
> >    with specified initial value. This SBI call will fail for counters
> >    which are not present.
> >
> > 6. SBI_PMU_COUNTER_STOP
> >    This call takes one parameter:
> >       1) counter_idx
> >    It will inform SBI implementation to stop/disable specified counters
> >    on the calling HART. This SBI call will fail for counters which are
> >    not present.
> >
> > The M-mode runtime firmware (OpenSBI) Development Notes:
> >
> > 1. The M-mode runtime firmware will have to translate SBI PMU
> >    event_idx and event_into into platform dependent MHPMEVENT CSR
> >    value before starting/enabling a HARDWARE counter.
> >
> > 2. The M-mode runtime firmware (OpenSBI) will need to know following
> >    platform dependent information:
> >    A) Possible event_idx values allowed (or supported) by a HARDWARE
> >       counter (i.e. HPMCOUNTER)
> >    B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT
> CSR
> >       value. This is optional for platform. By default, OpenSBI will
> >       write a value <xyz> to MHPMEVENT CSR where lower 20bits of <xyz>
> >       are event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20
> >       bits of event_info
> >    C) Additional platform-specific progamming required for selecting
> >       event_idx + event_info combination. This is also optional for
> >       platform.
> >
> > 3. All platform dependent information mentioned above, can be obtained
> >    by M-mode runtime firmware (OpenSBI) from platform specific code.
> >    The DT/ACPI can also be used to describe 2.A and 2.B mentioned above
> >    but 2.C will always require platform specific code.
> >
> > Linux RISC-V PMU Driver Development Notes:
> >
> > 1. Driver probe
> >    The Linux RISC-V driver can be platform driver with "riscv,sbi-pmu"
> >    as DT compatible string and optional "interrupts" DT property. The
> >    "interrupts" DT property if available should specify an edge-triggered
> >    overflow interrupt for each HART. When "interrupts" DT property is
> >    present, we might also need another DT property for mapping HARTID
> >    to entries in "interrupts" DT property. The platform driver probe
> >    will:
> >    A) Need to ensure that underlying SBI implementation provides
> >       SBI PMU extension using sbi_probe_extension() API of arch/riscv.
> >    B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
> >    C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
> >       call. If the counter is a SOFTWARE counter then use the
> >           SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
> >       of counter. The driver skip this in driver probe and instead
> >           do this lazily in add() callback mentioned below.
> >
> > 2. event_init() callback
> >    The event_init() callback will primarily translate user-space
> >    perf_event_attr to SBI PMU event_idx and event_info. It can do
> >    this in following way:
> >    A) perf_event_attr.type == PERF_TYPE_HARDWARE
> >       event_idx.type = 0x0
> >       event_idx.code = Value from enum sbi_pmu_hw_id based on
> >                            perf_event_attr.config
> >       event_info = 0
> >    B) perf_event_attr.type == PERF_TYPE_HW_CACHE
> >       event_idx.type = 0x1
> >       event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
> >                                     based on perf_event_attr.config
> >       event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
> >                                  based on perf_event_attr.config
> >       event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
> >                                      based on perf_event_attr.config
> >       event_info = 0
> >    C) perf_event_attr.type == PERF_TYPE_RAW and
> >       perf_event_attr.config[63:63] == 0
> >       event_idx.type = 0x2
> >           event_idx.code = 0x0
> >           event_info = perf_event_attr.config[62:0]
> >    D) perf_event_attr.type == PERF_TYPE_RAW and
> >       perf_event_attr.config[63:63] == 1
> >       event_idx.type = 0xf
> >           event_idx.code = Value from enum sbi_pmu_sw_id based on
> >                            perf_event_attr.config
> >           event_info = 0
> >    (Note: event_init() will fail if it is not able to figure out
> >     event_idx and event_info value corresponding to perf_event_attr)
> >    (Note: event_init() will not assign counter to perf_event because
> >     it will be done by event_add())
> >
> > 3. add() callback
> >    The add() callback of Linux RISC-V PMU driver will find a
> >    free counter on current CPU/HART such that the perf_event
> >    event_idx + event_info combination is supported by the counter.
> >    To check-and-set event_idx + event_info combination for a
> >    counter, we will use the SBI_PMU_COUNTER_SET_EVENT call.
> >    The counter allocation and SBI_PMU_COUNTER_SET_EVENT call
> >    can be futher optimized by looking at CSR details.
> >    For example:
> >    A) For event_idx.type == 0 and
> >       event_idx.code == SBI_PMU_HW_CPU_CYCLES, we should
> >           prefer counter mapping to CYCLE CSR and skip doing
> >           SBI_PMU_COUNTER_SET_EVENT call.
> >    B) For event_idx.type == 0 and
> >       event_idx.code == SBI_PMU_HW_INSTRUCTIONS, we should
> >           prefer counter mapping to INSTRET CSR and skip doing
> >           SBI_PMU_COUNTER_SET_EVENT call.
> >    C) For event_idx == 0xf, only perfer counters mapping
> >       to 0xfff CSR (i.e. SOFTWARE counters).
> >
> > 4. del() callback
> >    The del() callback of Linux RISC-V PMU driver will release
> >    or free the counter.
> >
> > 5. start() callback
> >    The start() callback of Linux RISC-V PMU driver will start
> >    the counter using the SBI_PMU_COUNTER_START call.
> >
> > 6. stop() callback
> >    The stop() callback of Linux RISC-V PMU driver will stop
> >    the counter using the SBI_PMU_COUNTER_STOP call.
> >
> > Regards,
> > Anup




Re: Proposal v3: SBI PMU Extension

Anup Patel
 

One SBI call to start/stop  N counters will certainly be faster than N SBI calls.

 

We did not include SBI calls to start/stop a set of counters because Linux perf drivers only require mechanism to start/stop one counter.

 

Regards,

Anup

 

From: Brian Grayson <brian.grayson@...>
Sent: 14 July 2020 18:58
To: Anup Patel <Anup.Patel@...>
Cc: Zong Li <zong.li@...>; tech-unixplatformspec@...; Atish Patra <Atish.Patra@...>; andrew@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v3: SBI PMU Extension

 

Should there also be a way to atomically specify start/stop for a set of counters, or is the latency of N SBI start/stop calls short enough that starting or stopping N counters will not take that long? For a lot of cores today, N is very small, like 2 for some cores, but as RISC-V cores continue to grow in capability, N could easily become 4 to 8 for the core, another set in the L2, another set in the L3, etc.

 

Brian

 

On Mon, Jul 13, 2020 at 10:41 PM Anup Patel <anup.patel@...> wrote:



> -----Original Message-----
> From: Zong Li <zong.li@...>
> Sent: 14 July 2020 09:02
> To: Anup Patel <Anup.Patel@...>
> Cc: tech-unixplatformspec@...; Atish Patra
> <Atish.Patra@...>; andrew@...; gfavor@...
> Subject: Re: Proposal v3: SBI PMU Extension
>
> On Mon, Jul 13, 2020 at 9:12 PM Anup Patel <Anup.Patel@...> wrote:
> >
> > Hi All,
> >
> > We don't have a dedicated RISC-V PMU extension but we do have
> HARDWARE
> > performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER
> > CSRs. A RISC-V implementation can support monitoring various HARDWARE
> > events using limited number of HPMCOUNTER CSRs.
> >
> > In addition to HARDWARE performance counters, a SBI implementation
> > (e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
> > events such as number of RFENCEs, number of IPIs, number of misaligned
> > load/store instructions, number of illegal instructions, etc.
> >
> > We propose SBI PMU extension, which will help S-mode (or VS-mode)
> > software to discover and configure HARDWARE/SOFTWARE counters. The
> SBI
> > PMU extension will only manage per-HART (or per-CPU)
> HARDWARE/SOFTWARE
> > counters which include CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and
> > SOFTWARE counters provided by SBI implementation.
> >
> > Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or
> > Xvisor) will provide a standardized view of HARDWARE/SOFTWARE
> counters
> > and events to S-mode (or VS-mode) software.
> >
> > To define SBI PMU extension, we first define counter_idx which is a
> > logical number assigned to a counter and event_idx which is an encoded
> > number representing the HARDWARE/SOFTWARE event to be monitored.
> A
> > HARDWARE/SOFTWARE event can also have additional
> configuration/details
> > referred to as event_info.
> >
> > The SBI PMU event_idx is a 20bits wide number encoded as follows:
> > event_idx[19:16] = type
> > event_idx[15:0] = code
> >
> > If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE
> > event, the event_info is not required whereas the event_idx.code can
> > be one of the following values:
> > enum sbi_pmu_hw_id {
> >     SBI_PMU_HW_CPU_CYCLES              = 0,
> >     SBI_PMU_HW_INSTRUCTIONS            = 1,
> >     SBI_PMU_HW_CACHE_REFERENCES        = 2,
> >     SBI_PMU_HW_CACHE_MISSES            = 3,
> >     SBI_PMU_HW_BRANCH_INSTRUCTIONS     = 4,
> >     SBI_PMU_HW_BRANCH_MISSES           = 5,
> >     SBI_PMU_HW_BUS_CYCLES              = 6,
> >     SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
> >     SBI_PMU_HW_STALLED_CYCLES_BACKEND  = 8,
> >     SBI_PMU_HW_REF_CPU_CYCLES          = 9,
> >     SBI_PMU_HW_MAX,                    /* non-ABI */
> > };
> > (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
> >
> > If event_idx.type == 0x1 then it is HARDWARE CACHE event. For
> HARDWARE
> > CACHE event, the event_info is not required whereas the event_idx.code
> > is encoded as follows:
> > event_idx.code[15:3] = cache_id
> > event_idx.code[2:1] = op_id
> > event_idx.code[0:0] = result_id
> > enum sbi_pmu_hw_cache_id {
> >     SBI_PMU_HW_CACHE_L1D  = 0,
> >     SBI_PMU_HW_CACHE_L1I  = 1,
> >     SBI_PMU_HW_CACHE_LL   = 2,
> >     SBI_PMU_HW_CACHE_DTLB = 3,
> >     SBI_PMU_HW_CACHE_ITLB = 4,
> >     SBI_PMU_HW_CACHE_BPU  = 5,
> >     SBI_PMU_HW_CACHE_NODE = 6,
> >     SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum
> sbi_pmu_hw_cache_op_id
> > {
> >     SBI_PMU_HW_CACHE_OP_READ     = 0,
> >     SBI_PMU_HW_CACHE_OP_WRITE    = 1,
> >     SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
> >     SBI_PMU_HW_CACHE_OP_MAX,     /* non-ABI */
> > };
> > enum sbi_pmu_hw_cache_op_result_id {
> >     SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
> >     SBI_PMU_HW_CACHE_RESULT_MISS   = 1,
> >     SBI_PMU_HW_CACHE_RESULT_MAX,   /* non-ABI */
> > };
> > (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
> >
> > If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
> > RAW event, the event_idx.code should be zero and the event_info
> > parameter passed to SBI_PMU_COUNTER_SET_EVENT call (described
> below)
> > will have the RAW event value to be programmed in MHPMEVENT CSR (i.e.
> > the SBI implementation will not derive MHPMEVENT CSR value from
> > event_idx + event_info).
> >
> > If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
> > event, the event_info is not required whereas the event_idx.code can
> > be one of the following:
> > enum sbi_pmu_sw_id {
> >     SBI_PMU_SW_MISALIGNED_LOAD        = 0,
> >     SBI_PMU_SW_MISALIGNED_STORE       = 1,
> >     SBI_PMU_SW_ILLEGAL_INSN           = 2,
> >     SBI_PMU_SW_LOCAL_SET_TIMER        = 3,
> >     SBI_PMU_SW_LOCAL_IPI              = 4,
> >     SBI_PMU_SW_LOCAL_FENCE_I          = 5,
> >     SBI_PMU_SW_LOCAL_SFENCE_VMA       = 6,
> >     SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID  = 7,
> >     SBI_PMU_SW_LOCAL_HFENCE_GVMA      = 8,
> >     SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
> >     SBI_PMU_SW_LOCAL_HFENCE_VVMA      = 10,
> >     SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
> >     SBI_PMU_SW_MAX,                   /* non-ABI */
> > };
> >
> > In future, more events can be defined without breaking SBI call
> > compatibility of SBI calls.
> >
> > Using definition of counter_idx and event_idx, we can potentially have
> > the following SBI calls:
> >
> > 1. SBI_PMU_NUM_COUNTERS
> >    This call will return the number of COUNTERs
> >
> > 2. SBI_PMU_COUNTER_GET_CSR
> >    This call takes one parameter:
> >       1) counter_idx
> >    It will provide the CSR_Number and CSR_Width of underlying counter.
> >    The value returned by SBI call is encoded as follows:
> >       return_value[11:0] = CSR_Number
> >           return_value[19:12] = CSR_Width (Number of bits implemented in
> HW)
> >    If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
> >    HARDWARE counter. This SBI call will fail for counters which are not
> >    present.
> >
> > 3. SBI_PMU_COUNTER_SET_EVENT
> >    This call takes three parameter:
> >       1) counter_idx
> >       2) event_idx
> >       3) event_info
> >    It will select an event to be monitored by given counter. If this
> >    SBI call is not used for a counter to select an event then the
> >    counter will montior default event selected for it at boot-time.
> >    This SBI call will fail for counters which are not present. It will
> >    also fail if specified event_idx + event_info combination is not
> >    supported by given counter.
>
> It also seems to fail if the specified event is not supported by the given
> counter, right? Then Linux driver could try to allocate the next free counter
> when returning failure from this SBI calls.

Yes, this call will fail if event_idx + event_info combination is not supported
by given counter_idx. It is expected that Linux driver will try another
free counter if SBI_PMU_COUNTER_SET_EVENT call fails. I have suggested
few ideas on how to reduce SBI_PMU_COUNTER_SET_EVENT calls by
looking at CSR number assigned to counter.

>
> Apart from this question above, this version of the proposal is great to me.

Cool 😊

Regards,
Anup

>
> Thanks,
> Zong
>
> >
> > 4. SBI_PMU_COUNTER_SET_PHYS_ADDR
> >    This call takes two parameters:
> >       1) counter_idx
> >       2) 8byte aligned physical address
> >    It will set the physical address of memory location where the SBI
> >    implementation will write the 64bit SOFTWARE counter. This SBI call
> >    is only for counters not mapped to any CSR (i.e. only for counters
> >    with CSR_Number > 0xfff).
> >
> > 5. SBI_PMU_COUNTER_START
> >    This call takes two parameters:
> >       1) counter_idx
> >       2) initial_value
> >    It will inform SBI implementation to start/enable specified counter
> >    with specified initial value. This SBI call will fail for counters
> >    which are not present.
> >
> > 6. SBI_PMU_COUNTER_STOP
> >    This call takes one parameter:
> >       1) counter_idx
> >    It will inform SBI implementation to stop/disable specified counters
> >    on the calling HART. This SBI call will fail for counters which are
> >    not present.
> >
> > The M-mode runtime firmware (OpenSBI) Development Notes:
> >
> > 1. The M-mode runtime firmware will have to translate SBI PMU
> >    event_idx and event_into into platform dependent MHPMEVENT CSR
> >    value before starting/enabling a HARDWARE counter.
> >
> > 2. The M-mode runtime firmware (OpenSBI) will need to know following
> >    platform dependent information:
> >    A) Possible event_idx values allowed (or supported) by a HARDWARE
> >       counter (i.e. HPMCOUNTER)
> >    B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT
> CSR
> >       value. This is optional for platform. By default, OpenSBI will
> >       write a value <xyz> to MHPMEVENT CSR where lower 20bits of <xyz>
> >       are event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20
> >       bits of event_info
> >    C) Additional platform-specific progamming required for selecting
> >       event_idx + event_info combination. This is also optional for
> >       platform.
> >
> > 3. All platform dependent information mentioned above, can be obtained
> >    by M-mode runtime firmware (OpenSBI) from platform specific code.
> >    The DT/ACPI can also be used to describe 2.A and 2.B mentioned above
> >    but 2.C will always require platform specific code.
> >
> > Linux RISC-V PMU Driver Development Notes:
> >
> > 1. Driver probe
> >    The Linux RISC-V driver can be platform driver with "riscv,sbi-pmu"
> >    as DT compatible string and optional "interrupts" DT property. The
> >    "interrupts" DT property if available should specify an edge-triggered
> >    overflow interrupt for each HART. When "interrupts" DT property is
> >    present, we might also need another DT property for mapping HARTID
> >    to entries in "interrupts" DT property. The platform driver probe
> >    will:
> >    A) Need to ensure that underlying SBI implementation provides
> >       SBI PMU extension using sbi_probe_extension() API of arch/riscv.
> >    B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
> >    C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
> >       call. If the counter is a SOFTWARE counter then use the
> >           SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
> >       of counter. The driver skip this in driver probe and instead
> >           do this lazily in add() callback mentioned below.
> >
> > 2. event_init() callback
> >    The event_init() callback will primarily translate user-space
> >    perf_event_attr to SBI PMU event_idx and event_info. It can do
> >    this in following way:
> >    A) perf_event_attr.type == PERF_TYPE_HARDWARE
> >       event_idx.type = 0x0
> >       event_idx.code = Value from enum sbi_pmu_hw_id based on
> >                            perf_event_attr.config
> >       event_info = 0
> >    B) perf_event_attr.type == PERF_TYPE_HW_CACHE
> >       event_idx.type = 0x1
> >       event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
> >                                     based on perf_event_attr.config
> >       event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
> >                                  based on perf_event_attr.config
> >       event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
> >                                      based on perf_event_attr.config
> >       event_info = 0
> >    C) perf_event_attr.type == PERF_TYPE_RAW and
> >       perf_event_attr.config[63:63] == 0
> >       event_idx.type = 0x2
> >           event_idx.code = 0x0
> >           event_info = perf_event_attr.config[62:0]
> >    D) perf_event_attr.type == PERF_TYPE_RAW and
> >       perf_event_attr.config[63:63] == 1
> >       event_idx.type = 0xf
> >           event_idx.code = Value from enum sbi_pmu_sw_id based on
> >                            perf_event_attr.config
> >           event_info = 0
> >    (Note: event_init() will fail if it is not able to figure out
> >     event_idx and event_info value corresponding to perf_event_attr)
> >    (Note: event_init() will not assign counter to perf_event because
> >     it will be done by event_add())
> >
> > 3. add() callback
> >    The add() callback of Linux RISC-V PMU driver will find a
> >    free counter on current CPU/HART such that the perf_event
> >    event_idx + event_info combination is supported by the counter.
> >    To check-and-set event_idx + event_info combination for a
> >    counter, we will use the SBI_PMU_COUNTER_SET_EVENT call.
> >    The counter allocation and SBI_PMU_COUNTER_SET_EVENT call
> >    can be futher optimized by looking at CSR details.
> >    For example:
> >    A) For event_idx.type == 0 and
> >       event_idx.code == SBI_PMU_HW_CPU_CYCLES, we should
> >           prefer counter mapping to CYCLE CSR and skip doing
> >           SBI_PMU_COUNTER_SET_EVENT call.
> >    B) For event_idx.type == 0 and
> >       event_idx.code == SBI_PMU_HW_INSTRUCTIONS, we should
> >           prefer counter mapping to INSTRET CSR and skip doing
> >           SBI_PMU_COUNTER_SET_EVENT call.
> >    C) For event_idx == 0xf, only perfer counters mapping
> >       to 0xfff CSR (i.e. SOFTWARE counters).
> >
> > 4. del() callback
> >    The del() callback of Linux RISC-V PMU driver will release
> >    or free the counter.
> >
> > 5. start() callback
> >    The start() callback of Linux RISC-V PMU driver will start
> >    the counter using the SBI_PMU_COUNTER_START call.
> >
> > 6. stop() callback
> >    The stop() callback of Linux RISC-V PMU driver will stop
> >    the counter using the SBI_PMU_COUNTER_STOP call.
> >
> > Regards,
> > Anup

141 - 160 of 1847