Proposal v4: SBI PMU Extension


Anup Patel
 

Hi All,

We don't have a dedicated RISC-V PMU extension for all privilege modes
but we do have M-mode HARDWARE performance counters such as MCYCLE CSR,
MINSTRET CSR, and MHPMCOUNTER CSRs which are read-only for S-mode and
U-mode. A RISC-V implementation can support monitoring of various
HARDWARE events using limited number of HARDWARE performance counters.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters. The SBI
PMU extension will only manage per-HART (or per-CPU) HARDWARE/SOFTWARE
counters.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or Xvisor)
will provide a standardized view of HARDWARE/SOFTWARE counters and
events to S-mode (or VS-mode) software.

Before defining SBI PMU extension calls, we first define counter_idx,
event_idx, and event_info entities. The counter_idx is a logical number
assigned to each HARDWARE/SOFTWARE counter. The event_idx represents a
HARDWARE/SOFTWARE event whereas event_info represents additional
configuration/parameters for the event.

The event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event,
the event_info is optional and can be zero whereas the event_idx.code
can be one of the following values:
enum sbi_pmu_hw_id {
SBI_PMU_HW_CPU_CYCLES = 0,
SBI_PMU_HW_INSTRUCTIONS = 1,
SBI_PMU_HW_CACHE_REFERENCES = 2,
SBI_PMU_HW_CACHE_MISSES = 3,
SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4,
SBI_PMU_HW_BRANCH_MISSES = 5,
SBI_PMU_HW_BUS_CYCLES = 6,
SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8,
SBI_PMU_HW_REF_CPU_CYCLES = 9,
SBI_PMU_HW_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_info is optional and can be zero whereas the
event_idx.code is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
SBI_PMU_HW_CACHE_L1D = 0,
SBI_PMU_HW_CACHE_L1I = 1,
SBI_PMU_HW_CACHE_LL = 2,
SBI_PMU_HW_CACHE_DTLB = 3,
SBI_PMU_HW_CACHE_ITLB = 4,
SBI_PMU_HW_CACHE_BPU = 5,
SBI_PMU_HW_CACHE_NODE = 6,
SBI_PMU_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
SBI_PMU_HW_CACHE_OP_READ = 0,
SBI_PMU_HW_CACHE_OP_WRITE = 1,
SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
SBI_PMU_HW_CACHE_RESULT_MISS = 1,
SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
RAW event, the event_idx.code should be zero and the event_info
parameter passed to SBI_PMU_COUNTER_CONFIG_MATCHING call (described
below) will have the RAW event value to be programmed in MHPMEVENT
CSR (i.e. the SBI implementation will not derive MHPMEVENT CSR value
from event_idx and event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is optional and can be zero whereas the
event_idx.code can be one of the following:
enum sbi_pmu_sw_id {
SBI_PMU_SW_MISALIGNED_LOAD = 0,
SBI_PMU_SW_MISALIGNED_STORE = 1,
SBI_PMU_SW_ILLEGAL_INSN = 2,
SBI_PMU_SW_LOCAL_SET_TIMER = 3,
SBI_PMU_SW_LOCAL_IPI = 4,
SBI_PMU_SW_LOCAL_FENCE_I = 5,
SBI_PMU_SW_LOCAL_SFENCE_VMA = 6,
SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7,
SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8,
SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10,
SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
SBI_PMU_SW_MAX, /* non-ABI */
};

In future, more events can be defined without breaking SBI call
backward-compatibility.

Using above definitions of counter_idx, event_idx, and event_info
we can potentially have following SBI calls:

1. SBI_PMU_NUM_COUNTERS
Return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
This call takes one parameter:
1) counter_idx
Provide the CSR_Number and CSR_Width of underlying counter.
The value returned by SBI call is encoded as follows:
return_value[11:0] = CSR_Number
return_value[19:12] = CSR_Width (Number of bits implemented in HW)
return_value[XLEN-1:20] = Reserved
If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
HARDWARE counter. This SBI call will fail for counters which are not
present.

3. SBI_PMU_COUNTER_CONFIG_MATCHING
This call takes three parameter:
1) counter_idx_base
2) counter_idx_mask
3) event_idx
4) event_info
Find and configure a counter from a set of counters which can monitor
specified event. The counter_idx_base and counter_idx_mask parameters
represent the set of counters whereas the event_idx and event_info
represent the event to monitor. Upon success the SBI call will return
the counter_idx of the counter which has been configured to monitor
specified event. This SBI call will fail if it is unable to find a
counter which can monitor specified event or the set of counters
specified via counter_idx_base and counter_idx_mask has an invalid
counter.

4. SBI_PMU_COUNTER_SET_PHYS_ADDR
This call takes two parameters:
1) counter_idx
2) 8byte aligned physical address
It will set the physical address of memory location where the SBI
implementation will write the 64bit SOFTWARE counter. This SBI call
is only for counters not mapped to any CSR (i.e. only for counters
with CSR_Number == 0xfff).

5. SBI_PMU_COUNTER_START
This call takes two parameters:
1) counter_idx
2) initial_value
It will inform SBI implementation to start/enable specified counter
with specified initial value. This SBI call will fail for counters
which are not present.

6. SBI_PMU_COUNTER_STOP
This call takes one parameter:
1) counter_idx
It will inform SBI implementation to stop/disable specified counters
on the calling HART. This SBI call will fail for counters which are
not present.

The OpenSBI (M-mode runtime firmware) Development Notes:

1. The OpenSBI firmware will translate event_idx and event_into into
platform dependent MHPMEVENT CSR value before starting/enabling a
HARDWARE counter.

2. The OpenSBI firmware will need to know following platform dependent
information:
A) Possible event_idx values allowed (or supported) by a HARDWARE
counter (i.e. MHPMCOUNTER)
B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT CSR
value. This is optional and by default OpenSBI will write a value
<xyz> to MHPMEVENT CSR where lower 20bits of <xyz> is event_idx
and upper XLEN-20 bits of <xyz> are lower XLEN-20 bits of event_info
C) Additional platform-specific programming required for selecting
event_idx + event_info combination is also optional for platform.

3. All platform dependent information mentioned above, can be obtained
by OpenSBI firmware from platform specific code. The DT/ACPI can
also be used to describe 2.A and 2.B mentioned above but 2.C will
always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
The Linux RISC-V driver can be platform driver with "riscv,pmu"
as DT compatible string and optional "interrupts" DT property. The
"interrupts" DT property if available should specify overflow
interrupt for each HART. When "interrupts" DT property is present,
we might also need another DT property for mapping HARTID to entries
in "interrupts" DT property. The platform driver probe will:
A) Need to ensure that underlying SBI implementation provides
SBI PMU extension using sbi_probe_extension() API of arch/riscv.
B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
call. If the counter is a SOFTWARE counter then use the
SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
of counter. The driver can skip this in driver probe and
instead do this lazily in add() callback mentioned below.

2. event_init() callback
The event_init() callback will primarily translate user-space
perf_event_attr to SBI PMU event_idx and event_info. It can do
this in following way:
A) perf_event_attr.type == PERF_TYPE_HARDWARE
event_idx.type = 0x0
event_idx.code = Value from enum sbi_pmu_hw_id based on
perf_event_attr.config
event_info = 0
B) perf_event_attr.type == PERF_TYPE_HW_CACHE
event_idx.type = 0x1
event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
based on perf_event_attr.config
event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
based on perf_event_attr.config
event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
based on perf_event_attr.config
event_info = 0
C) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 0
event_idx.type = 0x2
event_idx.code = 0x0
event_info = perf_event_attr.config[62:0]
D) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 1
event_idx.type = 0xf
event_idx.code = Value from enum sbi_pmu_sw_id based on
perf_event_attr.config
event_info = 0
(Note: event_init() will fail if it is not able to figure out
event_idx and event_info value corresponding to perf_event_attr)
(Note: event_init() will not assign counter to perf_event because
it will be done by add() callback)

3. add() callback
The add() callback of Linux RISC-V PMU driver will find a free
counter on current CPU/HART such that the event_idx and event_info
combination is supported by the counter. To find-and-configure
a counter to monitor event_idx and event_info combination from
a set of counters, we will use the SBI_PMU_COUNTER_CONFIG_MATCHING
call.

4. del() callback
The del() callback of Linux RISC-V PMU driver will release or
free the counter.

5. start() callback
The start() callback of Linux RISC-V PMU driver will start the
counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
The stop() callback of Linux RISC-V PMU driver will stop the
counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup


Greg Favor
 

Anup,

What does SBI_PMU_NUM_COUNTERS return insofar as distinguishing hardware versus software counters?

Greg

On Thu, Aug 6, 2020 at 6:52 AM Anup Patel <Anup.Patel@...> wrote:
Hi All,

We don't have a dedicated RISC-V PMU extension for all privilege modes
but we do have M-mode HARDWARE performance counters such as MCYCLE CSR,
MINSTRET CSR, and MHPMCOUNTER CSRs which are read-only for S-mode and
U-mode. A RISC-V implementation can support monitoring of various
HARDWARE events using limited number of HARDWARE performance counters.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters. The SBI
PMU extension will only manage per-HART (or per-CPU) HARDWARE/SOFTWARE
counters.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or Xvisor)
will provide a standardized view of HARDWARE/SOFTWARE counters and
events to S-mode (or VS-mode) software.

Before defining SBI PMU extension calls, we first define counter_idx,
event_idx, and event_info entities. The counter_idx is a logical number
assigned to each HARDWARE/SOFTWARE counter. The event_idx represents a
HARDWARE/SOFTWARE event whereas event_info represents additional
configuration/parameters for the event.

The event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event,
the event_info is optional and can be zero whereas the event_idx.code
can be one of the following values:
enum sbi_pmu_hw_id {
    SBI_PMU_HW_CPU_CYCLES              = 0,
    SBI_PMU_HW_INSTRUCTIONS            = 1,
    SBI_PMU_HW_CACHE_REFERENCES        = 2,
    SBI_PMU_HW_CACHE_MISSES            = 3,
    SBI_PMU_HW_BRANCH_INSTRUCTIONS     = 4,
    SBI_PMU_HW_BRANCH_MISSES           = 5,
    SBI_PMU_HW_BUS_CYCLES              = 6,
    SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
    SBI_PMU_HW_STALLED_CYCLES_BACKEND  = 8,
    SBI_PMU_HW_REF_CPU_CYCLES          = 9,
    SBI_PMU_HW_MAX,                    /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_info is optional and can be zero whereas the
event_idx.code is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
    SBI_PMU_HW_CACHE_L1D  = 0,
    SBI_PMU_HW_CACHE_L1I  = 1,
    SBI_PMU_HW_CACHE_LL   = 2,
    SBI_PMU_HW_CACHE_DTLB = 3,
    SBI_PMU_HW_CACHE_ITLB = 4,
    SBI_PMU_HW_CACHE_BPU  = 5,
    SBI_PMU_HW_CACHE_NODE = 6,
    SBI_PMU_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
    SBI_PMU_HW_CACHE_OP_READ     = 0,
    SBI_PMU_HW_CACHE_OP_WRITE    = 1,
    SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
    SBI_PMU_HW_CACHE_OP_MAX,     /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
    SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
    SBI_PMU_HW_CACHE_RESULT_MISS   = 1,
    SBI_PMU_HW_CACHE_RESULT_MAX,   /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
RAW event, the event_idx.code should be zero and the event_info
parameter passed to SBI_PMU_COUNTER_CONFIG_MATCHING call (described
below) will have the RAW event value to be programmed in MHPMEVENT
CSR (i.e. the SBI implementation will not derive MHPMEVENT CSR value
from event_idx and event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is optional and can be zero whereas the
event_idx.code can be one of the following:
enum sbi_pmu_sw_id {
    SBI_PMU_SW_MISALIGNED_LOAD        = 0,
    SBI_PMU_SW_MISALIGNED_STORE       = 1,
    SBI_PMU_SW_ILLEGAL_INSN           = 2,
    SBI_PMU_SW_LOCAL_SET_TIMER        = 3,
    SBI_PMU_SW_LOCAL_IPI              = 4,
    SBI_PMU_SW_LOCAL_FENCE_I          = 5,
    SBI_PMU_SW_LOCAL_SFENCE_VMA       = 6,
    SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID  = 7,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA      = 8,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA      = 10,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
    SBI_PMU_SW_MAX,                   /* non-ABI */
};

In future, more events can be defined without breaking SBI call
backward-compatibility.

Using above definitions of counter_idx, event_idx, and event_info
we can potentially have following SBI calls:

1. SBI_PMU_NUM_COUNTERS
   Return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
   This call takes one parameter:
      1) counter_idx
   Provide the CSR_Number and CSR_Width of underlying counter.
   The value returned by SBI call is encoded as follows:
      return_value[11:0] = CSR_Number
      return_value[19:12] = CSR_Width (Number of bits implemented in HW)
          return_value[XLEN-1:20] = Reserved
   If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
   HARDWARE counter. This SBI call will fail for counters which are not
   present.

3. SBI_PMU_COUNTER_CONFIG_MATCHING
   This call takes three parameter:
      1) counter_idx_base
      2) counter_idx_mask
      3) event_idx
      4) event_info
   Find and configure a counter from a set of counters which can monitor
   specified event. The counter_idx_base and counter_idx_mask parameters
   represent the set of counters whereas the event_idx and event_info
   represent the event to monitor. Upon success the SBI call will return
   the counter_idx of the counter which has been configured to monitor
   specified event.  This SBI call will fail if it is unable to find a
   counter which can monitor specified event or the set of counters
   specified via counter_idx_base and counter_idx_mask has an invalid
   counter.

4. SBI_PMU_COUNTER_SET_PHYS_ADDR
   This call takes two parameters:
      1) counter_idx
      2) 8byte aligned physical address
   It will set the physical address of memory location where the SBI
   implementation will write the 64bit SOFTWARE counter. This SBI call
   is only for counters not mapped to any CSR (i.e. only for counters
   with CSR_Number == 0xfff).

5. SBI_PMU_COUNTER_START
   This call takes two parameters:
      1) counter_idx
      2) initial_value
   It will inform SBI implementation to start/enable specified counter
   with specified initial value. This SBI call will fail for counters
   which are not present.

6. SBI_PMU_COUNTER_STOP
   This call takes one parameter:
      1) counter_idx
   It will inform SBI implementation to stop/disable specified counters
   on the calling HART. This SBI call will fail for counters which are
   not present.

The OpenSBI (M-mode runtime firmware) Development Notes:

1. The OpenSBI firmware will translate event_idx and event_into into
   platform dependent MHPMEVENT CSR value before starting/enabling a
   HARDWARE counter.

2. The OpenSBI firmware will need to know following platform dependent
   information:
   A) Possible event_idx values allowed (or supported) by a HARDWARE
      counter (i.e. MHPMCOUNTER)
   B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT CSR
      value. This is optional and by default OpenSBI will write a value
          <xyz> to MHPMEVENT CSR where lower 20bits of <xyz> is event_idx
          and upper XLEN-20 bits of <xyz> are lower XLEN-20 bits of event_info
   C) Additional platform-specific programming required for selecting
      event_idx + event_info combination is also optional for platform.

3. All platform dependent information mentioned above, can be obtained
   by OpenSBI firmware from platform specific code. The DT/ACPI can
   also be used to describe 2.A and 2.B mentioned above but 2.C will
   always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
   The Linux RISC-V driver can be platform driver with "riscv,pmu"
   as DT compatible string and optional "interrupts" DT property. The
   "interrupts" DT property if available should specify overflow
   interrupt for each HART. When "interrupts" DT property is present,
   we might also need another DT property for mapping HARTID to entries
   in "interrupts" DT property. The platform driver probe will:
   A) Need to ensure that underlying SBI implementation provides
      SBI PMU extension using sbi_probe_extension() API of arch/riscv.
   B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
   C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
      call. If the counter is a SOFTWARE counter then use the
          SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
      of counter. The driver can skip this in driver probe and
          instead do this lazily in add() callback mentioned below.

2. event_init() callback
   The event_init() callback will primarily translate user-space
   perf_event_attr to SBI PMU event_idx and event_info. It can do
   this in following way:
   A) perf_event_attr.type == PERF_TYPE_HARDWARE
      event_idx.type = 0x0
      event_idx.code = Value from enum sbi_pmu_hw_id based on
                           perf_event_attr.config
      event_info = 0
   B) perf_event_attr.type == PERF_TYPE_HW_CACHE
      event_idx.type = 0x1
      event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
                                    based on perf_event_attr.config
      event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
                                 based on perf_event_attr.config
      event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
                                     based on perf_event_attr.config
      event_info = 0
   C) perf_event_attr.type == PERF_TYPE_RAW and
      perf_event_attr.config[63:63] == 0
      event_idx.type = 0x2
          event_idx.code = 0x0
          event_info = perf_event_attr.config[62:0]
   D) perf_event_attr.type == PERF_TYPE_RAW and
      perf_event_attr.config[63:63] == 1
      event_idx.type = 0xf
          event_idx.code = Value from enum sbi_pmu_sw_id based on
                           perf_event_attr.config
          event_info = 0
   (Note: event_init() will fail if it is not able to figure out
    event_idx and event_info value corresponding to perf_event_attr)
   (Note: event_init() will not assign counter to perf_event because
    it will be done by add() callback)

3. add() callback
   The add() callback of Linux RISC-V PMU driver will find a free
   counter on current CPU/HART such that the event_idx and event_info
   combination is supported by the counter. To find-and-configure
   a counter to monitor event_idx and event_info combination from
   a set of counters, we will use the SBI_PMU_COUNTER_CONFIG_MATCHING
   call.

4. del() callback
   The del() callback of Linux RISC-V PMU driver will release or
   free the counter.

5. start() callback
   The start() callback of Linux RISC-V PMU driver will start the
   counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
   The stop() callback of Linux RISC-V PMU driver will stop the
   counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup


Jonathan Behrens <behrensj@...>
 

I like this proposal! A couple comments...

In a couple places you say "the event_info is optional and can be zero". Does this mean that SBI providers must ignore the field, or that non-zero values are reserved (meaning software must set it to zero), or that other values have SBI implementation specific semantics? Or something else?

The OpenSBI firmware will need to know following platform dependent information: [...]

    B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT CSR value. This is optional and by default OpenSBI will write a value <xyz> to MHPMEVENT CSR where lower 20bits of <xyz> is event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20 bits of event_info

This seems to contradict the previous point. By saying how OpenSBI is going to use the event_info field, you are effectively requiring that the OS properly set it. I'd rather the SBI provider just return an error if it can't figure out the proper mappings to MHPMEVENT CSR values. That way there is no risk that the S-mode software accidentally ends up tracking the wrong performance counter because it didn't know what to set event_info to. Put another way, the SBI provider is supposed to be the one that deals with platform specific issues, so the operating system doesn't have to.

It would also be nice if possible to pin down how S-mode software can learn the meanings of raw counters. Is it sufficient to look at the tuple of (mvendorid, marchid, mimpid)? Even just some commentary text with guidance could be helpful here.

Finally, I wanted to ask about the SBI_PMU_COUNTER_SET_PHYS_ADDR function. Apologies if this has been answered already, but I think this might not work well with the enhanced PMP proposal that is designed to allow most of DRAM to be marked as S/U-mode only. The proposal allows regions to be shared between M-mode and S/U-mode but presumably an implementation would prefer to require only a single shared region with all counters instead of needing to use NUM_COUNTERS number of PMP entries. This could be enabled by making the interface be SBI_PMU_COUNTER_GET_PHYS_ADDR so the firmware gets to pick the locations. On this front, another thing to watch is the memory attributes proposals coming out of the virtual memory task group: shared mappings might have performance costs (to avoid issues with mismatches between M-mode and S-mode attributes).

Jonathan

On Thu, Aug 6, 2020 at 9:52 AM Anup Patel via lists.riscv.org <anup.patel=wdc.com@...> wrote:
Hi All,

We don't have a dedicated RISC-V PMU extension for all privilege modes
but we do have M-mode HARDWARE performance counters such as MCYCLE CSR,
MINSTRET CSR, and MHPMCOUNTER CSRs which are read-only for S-mode and
U-mode. A RISC-V implementation can support monitoring of various
HARDWARE events using limited number of HARDWARE performance counters.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters. The SBI
PMU extension will only manage per-HART (or per-CPU) HARDWARE/SOFTWARE
counters.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or Xvisor)
will provide a standardized view of HARDWARE/SOFTWARE counters and
events to S-mode (or VS-mode) software.

Before defining SBI PMU extension calls, we first define counter_idx,
event_idx, and event_info entities. The counter_idx is a logical number
assigned to each HARDWARE/SOFTWARE counter. The event_idx represents a
HARDWARE/SOFTWARE event whereas event_info represents additional
configuration/parameters for the event.

The event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event,
the event_info is optional and can be zero whereas the event_idx.code
can be one of the following values:
enum sbi_pmu_hw_id {
    SBI_PMU_HW_CPU_CYCLES              = 0,
    SBI_PMU_HW_INSTRUCTIONS            = 1,
    SBI_PMU_HW_CACHE_REFERENCES        = 2,
    SBI_PMU_HW_CACHE_MISSES            = 3,
    SBI_PMU_HW_BRANCH_INSTRUCTIONS     = 4,
    SBI_PMU_HW_BRANCH_MISSES           = 5,
    SBI_PMU_HW_BUS_CYCLES              = 6,
    SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
    SBI_PMU_HW_STALLED_CYCLES_BACKEND  = 8,
    SBI_PMU_HW_REF_CPU_CYCLES          = 9,
    SBI_PMU_HW_MAX,                    /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_info is optional and can be zero whereas the
event_idx.code is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
    SBI_PMU_HW_CACHE_L1D  = 0,
    SBI_PMU_HW_CACHE_L1I  = 1,
    SBI_PMU_HW_CACHE_LL   = 2,
    SBI_PMU_HW_CACHE_DTLB = 3,
    SBI_PMU_HW_CACHE_ITLB = 4,
    SBI_PMU_HW_CACHE_BPU  = 5,
    SBI_PMU_HW_CACHE_NODE = 6,
    SBI_PMU_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
    SBI_PMU_HW_CACHE_OP_READ     = 0,
    SBI_PMU_HW_CACHE_OP_WRITE    = 1,
    SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
    SBI_PMU_HW_CACHE_OP_MAX,     /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
    SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
    SBI_PMU_HW_CACHE_RESULT_MISS   = 1,
    SBI_PMU_HW_CACHE_RESULT_MAX,   /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
RAW event, the event_idx.code should be zero and the event_info
parameter passed to SBI_PMU_COUNTER_CONFIG_MATCHING call (described
below) will have the RAW event value to be programmed in MHPMEVENT
CSR (i.e. the SBI implementation will not derive MHPMEVENT CSR value
from event_idx and event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is optional and can be zero whereas the
event_idx.code can be one of the following:
enum sbi_pmu_sw_id {
    SBI_PMU_SW_MISALIGNED_LOAD        = 0,
    SBI_PMU_SW_MISALIGNED_STORE       = 1,
    SBI_PMU_SW_ILLEGAL_INSN           = 2,
    SBI_PMU_SW_LOCAL_SET_TIMER        = 3,
    SBI_PMU_SW_LOCAL_IPI              = 4,
    SBI_PMU_SW_LOCAL_FENCE_I          = 5,
    SBI_PMU_SW_LOCAL_SFENCE_VMA       = 6,
    SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID  = 7,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA      = 8,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA      = 10,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
    SBI_PMU_SW_MAX,                   /* non-ABI */
};

In future, more events can be defined without breaking SBI call
backward-compatibility.

Using above definitions of counter_idx, event_idx, and event_info
we can potentially have following SBI calls:

1. SBI_PMU_NUM_COUNTERS
   Return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
   This call takes one parameter:
      1) counter_idx
   Provide the CSR_Number and CSR_Width of underlying counter.
   The value returned by SBI call is encoded as follows:
      return_value[11:0] = CSR_Number
      return_value[19:12] = CSR_Width (Number of bits implemented in HW)
          return_value[XLEN-1:20] = Reserved
   If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
   HARDWARE counter. This SBI call will fail for counters which are not
   present.

3. SBI_PMU_COUNTER_CONFIG_MATCHING
   This call takes three parameter:
      1) counter_idx_base
      2) counter_idx_mask
      3) event_idx
      4) event_info
   Find and configure a counter from a set of counters which can monitor
   specified event. The counter_idx_base and counter_idx_mask parameters
   represent the set of counters whereas the event_idx and event_info
   represent the event to monitor. Upon success the SBI call will return
   the counter_idx of the counter which has been configured to monitor
   specified event.  This SBI call will fail if it is unable to find a
   counter which can monitor specified event or the set of counters
   specified via counter_idx_base and counter_idx_mask has an invalid
   counter.

4. SBI_PMU_COUNTER_SET_PHYS_ADDR
   This call takes two parameters:
      1) counter_idx
      2) 8byte aligned physical address
   It will set the physical address of memory location where the SBI
   implementation will write the 64bit SOFTWARE counter. This SBI call
   is only for counters not mapped to any CSR (i.e. only for counters
   with CSR_Number == 0xfff).

5. SBI_PMU_COUNTER_START
   This call takes two parameters:
      1) counter_idx
      2) initial_value
   It will inform SBI implementation to start/enable specified counter
   with specified initial value. This SBI call will fail for counters
   which are not present.

6. SBI_PMU_COUNTER_STOP
   This call takes one parameter:
      1) counter_idx
   It will inform SBI implementation to stop/disable specified counters
   on the calling HART. This SBI call will fail for counters which are
   not present.

The OpenSBI (M-mode runtime firmware) Development Notes:

1. The OpenSBI firmware will translate event_idx and event_into into
   platform dependent MHPMEVENT CSR value before starting/enabling a
   HARDWARE counter.

2. The OpenSBI firmware will need to know following platform dependent
   information:
   A) Possible event_idx values allowed (or supported) by a HARDWARE
      counter (i.e. MHPMCOUNTER)
   B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT CSR
      value. This is optional and by default OpenSBI will write a value
          <xyz> to MHPMEVENT CSR where lower 20bits of <xyz> is event_idx
          and upper XLEN-20 bits of <xyz> are lower XLEN-20 bits of event_info
   C) Additional platform-specific programming required for selecting
      event_idx + event_info combination is also optional for platform.

3. All platform dependent information mentioned above, can be obtained
   by OpenSBI firmware from platform specific code. The DT/ACPI can
   also be used to describe 2.A and 2.B mentioned above but 2.C will
   always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
   The Linux RISC-V driver can be platform driver with "riscv,pmu"
   as DT compatible string and optional "interrupts" DT property. The
   "interrupts" DT property if available should specify overflow
   interrupt for each HART. When "interrupts" DT property is present,
   we might also need another DT property for mapping HARTID to entries
   in "interrupts" DT property. The platform driver probe will:
   A) Need to ensure that underlying SBI implementation provides
      SBI PMU extension using sbi_probe_extension() API of arch/riscv.
   B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
   C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
      call. If the counter is a SOFTWARE counter then use the
          SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
      of counter. The driver can skip this in driver probe and
          instead do this lazily in add() callback mentioned below.

2. event_init() callback
   The event_init() callback will primarily translate user-space
   perf_event_attr to SBI PMU event_idx and event_info. It can do
   this in following way:
   A) perf_event_attr.type == PERF_TYPE_HARDWARE
      event_idx.type = 0x0
      event_idx.code = Value from enum sbi_pmu_hw_id based on
                           perf_event_attr.config
      event_info = 0
   B) perf_event_attr.type == PERF_TYPE_HW_CACHE
      event_idx.type = 0x1
      event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
                                    based on perf_event_attr.config
      event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
                                 based on perf_event_attr.config
      event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
                                     based on perf_event_attr.config
      event_info = 0
   C) perf_event_attr.type == PERF_TYPE_RAW and
      perf_event_attr.config[63:63] == 0
      event_idx.type = 0x2
          event_idx.code = 0x0
          event_info = perf_event_attr.config[62:0]
   D) perf_event_attr.type == PERF_TYPE_RAW and
      perf_event_attr.config[63:63] == 1
      event_idx.type = 0xf
          event_idx.code = Value from enum sbi_pmu_sw_id based on
                           perf_event_attr.config
          event_info = 0
   (Note: event_init() will fail if it is not able to figure out
    event_idx and event_info value corresponding to perf_event_attr)
   (Note: event_init() will not assign counter to perf_event because
    it will be done by add() callback)

3. add() callback
   The add() callback of Linux RISC-V PMU driver will find a free
   counter on current CPU/HART such that the event_idx and event_info
   combination is supported by the counter. To find-and-configure
   a counter to monitor event_idx and event_info combination from
   a set of counters, we will use the SBI_PMU_COUNTER_CONFIG_MATCHING
   call.

4. del() callback
   The del() callback of Linux RISC-V PMU driver will release or
   free the counter.

5. start() callback
   The start() callback of Linux RISC-V PMU driver will start the
   counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
   The stop() callback of Linux RISC-V PMU driver will stop the
   counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup




Greg Favor
 

On Thu, Aug 6, 2020 at 11:36 AM Jonathan Behrens <behrensj@...> wrote:
It would also be nice if possible to pin down how S-mode software can learn the meanings of raw counters. Is it sufficient to look at the tuple of (mvendorid, marchid, mimpid)? Even just some commentary text with guidance could be helpful here.

But by definition RAW events are all implementation-specific (unless or until an arch extension standardizes a set of RAW events and their encodings).  At best, the software discovery method that the tech-config TG has started developing, could maybe be used to provide this implementation-specific information.  (Or unstandardized code could do its own discovery based on looking at things like mvendorid/marchid/mimpid.)
 
Finally, I wanted to ask about the SBI_PMU_COUNTER_SET_PHYS_ADDR function. Apologies if this has been answered already, but I think this might not work well with the enhanced PMP proposal that is designed to allow most of DRAM to be marked as S/U-mode only.

Interesting point.  Requiring M mode to have access to most S/U mode memory would defeat a lot of the purpose and security benefits of Enhanced PMP (aka PMPv2).
 
The proposal allows regions to be shared between M-mode and S/U-mode but presumably an implementation would prefer to require only a single shared region with all counters instead of needing to use NUM_COUNTERS number of PMP entries. This could be enabled by making the interface be SBI_PMU_COUNTER_GET_PHYS_ADDR so the firmware gets to pick the locations.

A key question is, in any case, who allocates the memory where software counters are placed?  How is that memory allocated in coordination with the OS or hypervisor?

Shouldn't the OS or hypervisor do the allocation and then tell M-mode the address of that block of memory?  Then M-mode can allocate space for individual counters from that.  And, in the context of PMPv2, that block of memory would be allocated from an existing "shared" PMP region.
 
On this front, another thing to watch is the memory attributes proposals coming out of the virtual memory task group: shared mappings might have performance costs (to avoid issues with mismatches between M-mode and S-mode attributes).

All the more reason that the OS/hypervisor should be allocating the block of memory for software counters.  The  OS/hypervisor will be aware of the memory attributes set up in the page tables and can make sure to use appropriate attribute settings in the PTEs that map this memory.

Greg
 


Anup Patel
 

Hi Greg,

 

The SBI_PMU_NUM_COUNTERS call will return total number of counters (HARDWARE as well as SOFTWARE).

 

We can distinguish between HARDWARE and SOFTWARE counters using the CSR_Number returned by SBI_PMU_COUNTER_GET_CSR call.

(Note: CSR_Number = 0xfff means it is SOFTWARE counter)

 

Regards,

Anup

 

From: Greg Favor <gfavor@...>
Sent: 07 August 2020 00:01
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra <Atish.Patra@...>; Andrew Waterman <andrew@...>
Subject: Re: Proposal v4: SBI PMU Extension

 

Anup,

 

What does SBI_PMU_NUM_COUNTERS return insofar as distinguishing hardware versus software counters?

 

Greg

 

On Thu, Aug 6, 2020 at 6:52 AM Anup Patel <Anup.Patel@...> wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension for all privilege modes
but we do have M-mode HARDWARE performance counters such as MCYCLE CSR,
MINSTRET CSR, and MHPMCOUNTER CSRs which are read-only for S-mode and
U-mode. A RISC-V implementation can support monitoring of various
HARDWARE events using limited number of HARDWARE performance counters.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters. The SBI
PMU extension will only manage per-HART (or per-CPU) HARDWARE/SOFTWARE
counters.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or Xvisor)
will provide a standardized view of HARDWARE/SOFTWARE counters and
events to S-mode (or VS-mode) software.

Before defining SBI PMU extension calls, we first define counter_idx,
event_idx, and event_info entities. The counter_idx is a logical number
assigned to each HARDWARE/SOFTWARE counter. The event_idx represents a
HARDWARE/SOFTWARE event whereas event_info represents additional
configuration/parameters for the event.

The event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event,
the event_info is optional and can be zero whereas the event_idx.code
can be one of the following values:
enum sbi_pmu_hw_id {
    SBI_PMU_HW_CPU_CYCLES              = 0,
    SBI_PMU_HW_INSTRUCTIONS            = 1,
    SBI_PMU_HW_CACHE_REFERENCES        = 2,
    SBI_PMU_HW_CACHE_MISSES            = 3,
    SBI_PMU_HW_BRANCH_INSTRUCTIONS     = 4,
    SBI_PMU_HW_BRANCH_MISSES           = 5,
    SBI_PMU_HW_BUS_CYCLES              = 6,
    SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
    SBI_PMU_HW_STALLED_CYCLES_BACKEND  = 8,
    SBI_PMU_HW_REF_CPU_CYCLES          = 9,
    SBI_PMU_HW_MAX,                    /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_info is optional and can be zero whereas the
event_idx.code is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
    SBI_PMU_HW_CACHE_L1D  = 0,
    SBI_PMU_HW_CACHE_L1I  = 1,
    SBI_PMU_HW_CACHE_LL   = 2,
    SBI_PMU_HW_CACHE_DTLB = 3,
    SBI_PMU_HW_CACHE_ITLB = 4,
    SBI_PMU_HW_CACHE_BPU  = 5,
    SBI_PMU_HW_CACHE_NODE = 6,
    SBI_PMU_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
    SBI_PMU_HW_CACHE_OP_READ     = 0,
    SBI_PMU_HW_CACHE_OP_WRITE    = 1,
    SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
    SBI_PMU_HW_CACHE_OP_MAX,     /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
    SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
    SBI_PMU_HW_CACHE_RESULT_MISS   = 1,
    SBI_PMU_HW_CACHE_RESULT_MAX,   /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
RAW event, the event_idx.code should be zero and the event_info
parameter passed to SBI_PMU_COUNTER_CONFIG_MATCHING call (described
below) will have the RAW event value to be programmed in MHPMEVENT
CSR (i.e. the SBI implementation will not derive MHPMEVENT CSR value
from event_idx and event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is optional and can be zero whereas the
event_idx.code can be one of the following:
enum sbi_pmu_sw_id {
    SBI_PMU_SW_MISALIGNED_LOAD        = 0,
    SBI_PMU_SW_MISALIGNED_STORE       = 1,
    SBI_PMU_SW_ILLEGAL_INSN           = 2,
    SBI_PMU_SW_LOCAL_SET_TIMER        = 3,
    SBI_PMU_SW_LOCAL_IPI              = 4,
    SBI_PMU_SW_LOCAL_FENCE_I          = 5,
    SBI_PMU_SW_LOCAL_SFENCE_VMA       = 6,
    SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID  = 7,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA      = 8,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA      = 10,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
    SBI_PMU_SW_MAX,                   /* non-ABI */
};

In future, more events can be defined without breaking SBI call
backward-compatibility.

Using above definitions of counter_idx, event_idx, and event_info
we can potentially have following SBI calls:

1. SBI_PMU_NUM_COUNTERS
   Return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
   This call takes one parameter:
      1) counter_idx
   Provide the CSR_Number and CSR_Width of underlying counter.
   The value returned by SBI call is encoded as follows:
      return_value[11:0] = CSR_Number
      return_value[19:12] = CSR_Width (Number of bits implemented in HW)
          return_value[XLEN-1:20] = Reserved
   If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
   HARDWARE counter. This SBI call will fail for counters which are not
   present.

3. SBI_PMU_COUNTER_CONFIG_MATCHING
   This call takes three parameter:
      1) counter_idx_base
      2) counter_idx_mask
      3) event_idx
      4) event_info
   Find and configure a counter from a set of counters which can monitor
   specified event. The counter_idx_base and counter_idx_mask parameters
   represent the set of counters whereas the event_idx and event_info
   represent the event to monitor. Upon success the SBI call will return
   the counter_idx of the counter which has been configured to monitor
   specified event.  This SBI call will fail if it is unable to find a
   counter which can monitor specified event or the set of counters
   specified via counter_idx_base and counter_idx_mask has an invalid
   counter.

4. SBI_PMU_COUNTER_SET_PHYS_ADDR
   This call takes two parameters:
      1) counter_idx
      2) 8byte aligned physical address
   It will set the physical address of memory location where the SBI
   implementation will write the 64bit SOFTWARE counter. This SBI call
   is only for counters not mapped to any CSR (i.e. only for counters
   with CSR_Number == 0xfff).

5. SBI_PMU_COUNTER_START
   This call takes two parameters:
      1) counter_idx
      2) initial_value
   It will inform SBI implementation to start/enable specified counter
   with specified initial value. This SBI call will fail for counters
   which are not present.

6. SBI_PMU_COUNTER_STOP
   This call takes one parameter:
      1) counter_idx
   It will inform SBI implementation to stop/disable specified counters
   on the calling HART. This SBI call will fail for counters which are
   not present.

The OpenSBI (M-mode runtime firmware) Development Notes:

1. The OpenSBI firmware will translate event_idx and event_into into
   platform dependent MHPMEVENT CSR value before starting/enabling a
   HARDWARE counter.

2. The OpenSBI firmware will need to know following platform dependent
   information:
   A) Possible event_idx values allowed (or supported) by a HARDWARE
      counter (i.e. MHPMCOUNTER)
   B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT CSR
      value. This is optional and by default OpenSBI will write a value
          <xyz> to MHPMEVENT CSR where lower 20bits of <xyz> is event_idx
          and upper XLEN-20 bits of <xyz> are lower XLEN-20 bits of event_info
   C) Additional platform-specific programming required for selecting
      event_idx + event_info combination is also optional for platform.

3. All platform dependent information mentioned above, can be obtained
   by OpenSBI firmware from platform specific code. The DT/ACPI can
   also be used to describe 2.A and 2.B mentioned above but 2.C will
   always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
   The Linux RISC-V driver can be platform driver with "riscv,pmu"
   as DT compatible string and optional "interrupts" DT property. The
   "interrupts" DT property if available should specify overflow
   interrupt for each HART. When "interrupts" DT property is present,
   we might also need another DT property for mapping HARTID to entries
   in "interrupts" DT property. The platform driver probe will:
   A) Need to ensure that underlying SBI implementation provides
      SBI PMU extension using sbi_probe_extension() API of arch/riscv.
   B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
   C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
      call. If the counter is a SOFTWARE counter then use the
          SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
      of counter. The driver can skip this in driver probe and
          instead do this lazily in add() callback mentioned below.

2. event_init() callback
   The event_init() callback will primarily translate user-space
   perf_event_attr to SBI PMU event_idx and event_info. It can do
   this in following way:
   A) perf_event_attr.type == PERF_TYPE_HARDWARE
      event_idx.type = 0x0
      event_idx.code = Value from enum sbi_pmu_hw_id based on
                           perf_event_attr.config
      event_info = 0
   B) perf_event_attr.type == PERF_TYPE_HW_CACHE
      event_idx.type = 0x1
      event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
                                    based on perf_event_attr.config
      event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
                                 based on perf_event_attr.config
      event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
                                     based on perf_event_attr.config
      event_info = 0
   C) perf_event_attr.type == PERF_TYPE_RAW and
      perf_event_attr.config[63:63] == 0
      event_idx.type = 0x2
          event_idx.code = 0x0
          event_info = perf_event_attr.config[62:0]
   D) perf_event_attr.type == PERF_TYPE_RAW and
      perf_event_attr.config[63:63] == 1
      event_idx.type = 0xf
          event_idx.code = Value from enum sbi_pmu_sw_id based on
                           perf_event_attr.config
          event_info = 0
   (Note: event_init() will fail if it is not able to figure out
    event_idx and event_info value corresponding to perf_event_attr)
   (Note: event_init() will not assign counter to perf_event because
    it will be done by add() callback)

3. add() callback
   The add() callback of Linux RISC-V PMU driver will find a free
   counter on current CPU/HART such that the event_idx and event_info
   combination is supported by the counter. To find-and-configure
   a counter to monitor event_idx and event_info combination from
   a set of counters, we will use the SBI_PMU_COUNTER_CONFIG_MATCHING
   call.

4. del() callback
   The del() callback of Linux RISC-V PMU driver will release or
   free the counter.

5. start() callback
   The start() callback of Linux RISC-V PMU driver will start the
   counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
   The stop() callback of Linux RISC-V PMU driver will stop the
   counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup


Greg Favor
 

Anup,

Wouldn't software want to more easily and directly know from SBI_PMU_NUM_COUNTERS how many hardware counters and software counters there are, instead of having to then call SBI_PMU_COUNTER_GET_CSR N times to figure that out?

Or are you expecting that software is going to have to call SBI_PMU_COUNTER_GET_CSR N times in any case (after first calling SBI_PMU_NUM_COUNTERS to get 'N')?

Greg

P.S. I'm guessing that the valid/supported counters are the first N value of counter_idx from 0 to N-1.  Yes?

On Thu, Aug 6, 2020 at 10:33 PM Anup Patel <Anup.Patel@...> wrote:

Hi Greg,

 

The SBI_PMU_NUM_COUNTERS call will return total number of counters (HARDWARE as well as SOFTWARE).

 

We can distinguish between HARDWARE and SOFTWARE counters using the CSR_Number returned by SBI_PMU_COUNTER_GET_CSR call.

(Note: CSR_Number = 0xfff means it is SOFTWARE counter)

 

Regards,

Anup

 

From: Greg Favor <gfavor@...>
Sent: 07 August 2020 00:01
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra <Atish.Patra@...>; Andrew Waterman <andrew@...>
Subject: Re: Proposal v4: SBI PMU Extension

 

Anup,

 

What does SBI_PMU_NUM_COUNTERS return insofar as distinguishing hardware versus software counters?

 

Greg

 

On Thu, Aug 6, 2020 at 6:52 AM Anup Patel <Anup.Patel@...> wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension for all privilege modes
but we do have M-mode HARDWARE performance counters such as MCYCLE CSR,
MINSTRET CSR, and MHPMCOUNTER CSRs which are read-only for S-mode and
U-mode. A RISC-V implementation can support monitoring of various
HARDWARE events using limited number of HARDWARE performance counters.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters. The SBI
PMU extension will only manage per-HART (or per-CPU) HARDWARE/SOFTWARE
counters.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or Xvisor)
will provide a standardized view of HARDWARE/SOFTWARE counters and
events to S-mode (or VS-mode) software.

Before defining SBI PMU extension calls, we first define counter_idx,
event_idx, and event_info entities. The counter_idx is a logical number
assigned to each HARDWARE/SOFTWARE counter. The event_idx represents a
HARDWARE/SOFTWARE event whereas event_info represents additional
configuration/parameters for the event.

The event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event,
the event_info is optional and can be zero whereas the event_idx.code
can be one of the following values:
enum sbi_pmu_hw_id {
    SBI_PMU_HW_CPU_CYCLES              = 0,
    SBI_PMU_HW_INSTRUCTIONS            = 1,
    SBI_PMU_HW_CACHE_REFERENCES        = 2,
    SBI_PMU_HW_CACHE_MISSES            = 3,
    SBI_PMU_HW_BRANCH_INSTRUCTIONS     = 4,
    SBI_PMU_HW_BRANCH_MISSES           = 5,
    SBI_PMU_HW_BUS_CYCLES              = 6,
    SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
    SBI_PMU_HW_STALLED_CYCLES_BACKEND  = 8,
    SBI_PMU_HW_REF_CPU_CYCLES          = 9,
    SBI_PMU_HW_MAX,                    /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_info is optional and can be zero whereas the
event_idx.code is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
    SBI_PMU_HW_CACHE_L1D  = 0,
    SBI_PMU_HW_CACHE_L1I  = 1,
    SBI_PMU_HW_CACHE_LL   = 2,
    SBI_PMU_HW_CACHE_DTLB = 3,
    SBI_PMU_HW_CACHE_ITLB = 4,
    SBI_PMU_HW_CACHE_BPU  = 5,
    SBI_PMU_HW_CACHE_NODE = 6,
    SBI_PMU_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
    SBI_PMU_HW_CACHE_OP_READ     = 0,
    SBI_PMU_HW_CACHE_OP_WRITE    = 1,
    SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
    SBI_PMU_HW_CACHE_OP_MAX,     /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
    SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
    SBI_PMU_HW_CACHE_RESULT_MISS   = 1,
    SBI_PMU_HW_CACHE_RESULT_MAX,   /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
RAW event, the event_idx.code should be zero and the event_info
parameter passed to SBI_PMU_COUNTER_CONFIG_MATCHING call (described
below) will have the RAW event value to be programmed in MHPMEVENT
CSR (i.e. the SBI implementation will not derive MHPMEVENT CSR value
from event_idx and event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is optional and can be zero whereas the
event_idx.code can be one of the following:
enum sbi_pmu_sw_id {
    SBI_PMU_SW_MISALIGNED_LOAD        = 0,
    SBI_PMU_SW_MISALIGNED_STORE       = 1,
    SBI_PMU_SW_ILLEGAL_INSN           = 2,
    SBI_PMU_SW_LOCAL_SET_TIMER        = 3,
    SBI_PMU_SW_LOCAL_IPI              = 4,
    SBI_PMU_SW_LOCAL_FENCE_I          = 5,
    SBI_PMU_SW_LOCAL_SFENCE_VMA       = 6,
    SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID  = 7,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA      = 8,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA      = 10,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
    SBI_PMU_SW_MAX,                   /* non-ABI */
};

In future, more events can be defined without breaking SBI call
backward-compatibility.

Using above definitions of counter_idx, event_idx, and event_info
we can potentially have following SBI calls:

1. SBI_PMU_NUM_COUNTERS
   Return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
   This call takes one parameter:
      1) counter_idx
   Provide the CSR_Number and CSR_Width of underlying counter.
   The value returned by SBI call is encoded as follows:
      return_value[11:0] = CSR_Number
      return_value[19:12] = CSR_Width (Number of bits implemented in HW)
          return_value[XLEN-1:20] = Reserved
   If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
   HARDWARE counter. This SBI call will fail for counters which are not
   present.

3. SBI_PMU_COUNTER_CONFIG_MATCHING
   This call takes three parameter:
      1) counter_idx_base
      2) counter_idx_mask
      3) event_idx
      4) event_info
   Find and configure a counter from a set of counters which can monitor
   specified event. The counter_idx_base and counter_idx_mask parameters
   represent the set of counters whereas the event_idx and event_info
   represent the event to monitor. Upon success the SBI call will return
   the counter_idx of the counter which has been configured to monitor
   specified event.  This SBI call will fail if it is unable to find a
   counter which can monitor specified event or the set of counters
   specified via counter_idx_base and counter_idx_mask has an invalid
   counter.

4. SBI_PMU_COUNTER_SET_PHYS_ADDR
   This call takes two parameters:
      1) counter_idx
      2) 8byte aligned physical address
   It will set the physical address of memory location where the SBI
   implementation will write the 64bit SOFTWARE counter. This SBI call
   is only for counters not mapped to any CSR (i.e. only for counters
   with CSR_Number == 0xfff).

5. SBI_PMU_COUNTER_START
   This call takes two parameters:
      1) counter_idx
      2) initial_value
   It will inform SBI implementation to start/enable specified counter
   with specified initial value. This SBI call will fail for counters
   which are not present.

6. SBI_PMU_COUNTER_STOP
   This call takes one parameter:
      1) counter_idx
   It will inform SBI implementation to stop/disable specified counters
   on the calling HART. This SBI call will fail for counters which are
   not present.

The OpenSBI (M-mode runtime firmware) Development Notes:

1. The OpenSBI firmware will translate event_idx and event_into into
   platform dependent MHPMEVENT CSR value before starting/enabling a
   HARDWARE counter.

2. The OpenSBI firmware will need to know following platform dependent
   information:
   A) Possible event_idx values allowed (or supported) by a HARDWARE
      counter (i.e. MHPMCOUNTER)
   B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT CSR
      value. This is optional and by default OpenSBI will write a value
          <xyz> to MHPMEVENT CSR where lower 20bits of <xyz> is event_idx
          and upper XLEN-20 bits of <xyz> are lower XLEN-20 bits of event_info
   C) Additional platform-specific programming required for selecting
      event_idx + event_info combination is also optional for platform.

3. All platform dependent information mentioned above, can be obtained
   by OpenSBI firmware from platform specific code. The DT/ACPI can
   also be used to describe 2.A and 2.B mentioned above but 2.C will
   always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
   The Linux RISC-V driver can be platform driver with "riscv,pmu"
   as DT compatible string and optional "interrupts" DT property. The
   "interrupts" DT property if available should specify overflow
   interrupt for each HART. When "interrupts" DT property is present,
   we might also need another DT property for mapping HARTID to entries
   in "interrupts" DT property. The platform driver probe will:
   A) Need to ensure that underlying SBI implementation provides
      SBI PMU extension using sbi_probe_extension() API of arch/riscv.
   B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
   C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
      call. If the counter is a SOFTWARE counter then use the
          SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
      of counter. The driver can skip this in driver probe and
          instead do this lazily in add() callback mentioned below.

2. event_init() callback
   The event_init() callback will primarily translate user-space
   perf_event_attr to SBI PMU event_idx and event_info. It can do
   this in following way:
   A) perf_event_attr.type == PERF_TYPE_HARDWARE
      event_idx.type = 0x0
      event_idx.code = Value from enum sbi_pmu_hw_id based on
                           perf_event_attr.config
      event_info = 0
   B) perf_event_attr.type == PERF_TYPE_HW_CACHE
      event_idx.type = 0x1
      event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
                                    based on perf_event_attr.config
      event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
                                 based on perf_event_attr.config
      event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
                                     based on perf_event_attr.config
      event_info = 0
   C) perf_event_attr.type == PERF_TYPE_RAW and
      perf_event_attr.config[63:63] == 0
      event_idx.type = 0x2
          event_idx.code = 0x0
          event_info = perf_event_attr.config[62:0]
   D) perf_event_attr.type == PERF_TYPE_RAW and
      perf_event_attr.config[63:63] == 1
      event_idx.type = 0xf
          event_idx.code = Value from enum sbi_pmu_sw_id based on
                           perf_event_attr.config
          event_info = 0
   (Note: event_init() will fail if it is not able to figure out
    event_idx and event_info value corresponding to perf_event_attr)
   (Note: event_init() will not assign counter to perf_event because
    it will be done by add() callback)

3. add() callback
   The add() callback of Linux RISC-V PMU driver will find a free
   counter on current CPU/HART such that the event_idx and event_info
   combination is supported by the counter. To find-and-configure
   a counter to monitor event_idx and event_info combination from
   a set of counters, we will use the SBI_PMU_COUNTER_CONFIG_MATCHING
   call.

4. del() callback
   The del() callback of Linux RISC-V PMU driver will release or
   free the counter.

5. start() callback
   The start() callback of Linux RISC-V PMU driver will start the
   counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
   The stop() callback of Linux RISC-V PMU driver will stop the
   counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup


Anup Patel
 

Hi Greg,

 

We want to allow RISC-V implementation specific CSRs (apart from various HPMCOUNTER CSRs) as HARDWARE counters. Due to this reason, we have decoupled counter_idx from CSR number and we treat counter_idx as logical number assigned to HARDWARE/SOFTWARE counter.

 

The S-mode software needs to call SBI_PMU_COUNTER_GET_CSR only once for each counter which can be done at boot-time OR lazily once before using the counter.

 

Regards,

Anup

 

From: Greg Favor <gfavor@...>
Sent: 07 August 2020 11:13
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra <Atish.Patra@...>; Andrew Waterman <andrew@...>
Subject: Re: Proposal v4: SBI PMU Extension

 

Anup,

 

Wouldn't software want to more easily and directly know from SBI_PMU_NUM_COUNTERS how many hardware counters and software counters there are, instead of having to then call SBI_PMU_COUNTER_GET_CSR N times to figure that out?

 

Or are you expecting that software is going to have to call SBI_PMU_COUNTER_GET_CSR N times in any case (after first calling SBI_PMU_NUM_COUNTERS to get 'N')?

 

Greg

 

P.S. I'm guessing that the valid/supported counters are the first N value of counter_idx from 0 to N-1.  Yes?

 

On Thu, Aug 6, 2020 at 10:33 PM Anup Patel <Anup.Patel@...> wrote:

Hi Greg,

 

The SBI_PMU_NUM_COUNTERS call will return total number of counters (HARDWARE as well as SOFTWARE).

 

We can distinguish between HARDWARE and SOFTWARE counters using the CSR_Number returned by SBI_PMU_COUNTER_GET_CSR call.

(Note: CSR_Number = 0xfff means it is SOFTWARE counter)

 

Regards,

Anup

 

From: Greg Favor <gfavor@...>
Sent: 07 August 2020 00:01
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra <Atish.Patra@...>; Andrew Waterman <andrew@...>
Subject: Re: Proposal v4: SBI PMU Extension

 

Anup,

 

What does SBI_PMU_NUM_COUNTERS return insofar as distinguishing hardware versus software counters?

 

Greg

 

On Thu, Aug 6, 2020 at 6:52 AM Anup Patel <Anup.Patel@...> wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension for all privilege modes
but we do have M-mode HARDWARE performance counters such as MCYCLE CSR,
MINSTRET CSR, and MHPMCOUNTER CSRs which are read-only for S-mode and
U-mode. A RISC-V implementation can support monitoring of various
HARDWARE events using limited number of HARDWARE performance counters.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters. The SBI
PMU extension will only manage per-HART (or per-CPU) HARDWARE/SOFTWARE
counters.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or Xvisor)
will provide a standardized view of HARDWARE/SOFTWARE counters and
events to S-mode (or VS-mode) software.

Before defining SBI PMU extension calls, we first define counter_idx,
event_idx, and event_info entities. The counter_idx is a logical number
assigned to each HARDWARE/SOFTWARE counter. The event_idx represents a
HARDWARE/SOFTWARE event whereas event_info represents additional
configuration/parameters for the event.

The event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event,
the event_info is optional and can be zero whereas the event_idx.code
can be one of the following values:
enum sbi_pmu_hw_id {
    SBI_PMU_HW_CPU_CYCLES              = 0,
    SBI_PMU_HW_INSTRUCTIONS            = 1,
    SBI_PMU_HW_CACHE_REFERENCES        = 2,
    SBI_PMU_HW_CACHE_MISSES            = 3,
    SBI_PMU_HW_BRANCH_INSTRUCTIONS     = 4,
    SBI_PMU_HW_BRANCH_MISSES           = 5,
    SBI_PMU_HW_BUS_CYCLES              = 6,
    SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
    SBI_PMU_HW_STALLED_CYCLES_BACKEND  = 8,
    SBI_PMU_HW_REF_CPU_CYCLES          = 9,
    SBI_PMU_HW_MAX,                    /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_info is optional and can be zero whereas the
event_idx.code is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
    SBI_PMU_HW_CACHE_L1D  = 0,
    SBI_PMU_HW_CACHE_L1I  = 1,
    SBI_PMU_HW_CACHE_LL   = 2,
    SBI_PMU_HW_CACHE_DTLB = 3,
    SBI_PMU_HW_CACHE_ITLB = 4,
    SBI_PMU_HW_CACHE_BPU  = 5,
    SBI_PMU_HW_CACHE_NODE = 6,
    SBI_PMU_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
    SBI_PMU_HW_CACHE_OP_READ     = 0,
    SBI_PMU_HW_CACHE_OP_WRITE    = 1,
    SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
    SBI_PMU_HW_CACHE_OP_MAX,     /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
    SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
    SBI_PMU_HW_CACHE_RESULT_MISS   = 1,
    SBI_PMU_HW_CACHE_RESULT_MAX,   /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
RAW event, the event_idx.code should be zero and the event_info
parameter passed to SBI_PMU_COUNTER_CONFIG_MATCHING call (described
below) will have the RAW event value to be programmed in MHPMEVENT
CSR (i.e. the SBI implementation will not derive MHPMEVENT CSR value
from event_idx and event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is optional and can be zero whereas the
event_idx.code can be one of the following:
enum sbi_pmu_sw_id {
    SBI_PMU_SW_MISALIGNED_LOAD        = 0,
    SBI_PMU_SW_MISALIGNED_STORE       = 1,
    SBI_PMU_SW_ILLEGAL_INSN           = 2,
    SBI_PMU_SW_LOCAL_SET_TIMER        = 3,
    SBI_PMU_SW_LOCAL_IPI              = 4,
    SBI_PMU_SW_LOCAL_FENCE_I          = 5,
    SBI_PMU_SW_LOCAL_SFENCE_VMA       = 6,
    SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID  = 7,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA      = 8,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA      = 10,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
    SBI_PMU_SW_MAX,                   /* non-ABI */
};

In future, more events can be defined without breaking SBI call
backward-compatibility.

Using above definitions of counter_idx, event_idx, and event_info
we can potentially have following SBI calls:

1. SBI_PMU_NUM_COUNTERS
   Return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
   This call takes one parameter:
      1) counter_idx
   Provide the CSR_Number and CSR_Width of underlying counter.
   The value returned by SBI call is encoded as follows:
      return_value[11:0] = CSR_Number
      return_value[19:12] = CSR_Width (Number of bits implemented in HW)
          return_value[XLEN-1:20] = Reserved
   If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
   HARDWARE counter. This SBI call will fail for counters which are not
   present.

3. SBI_PMU_COUNTER_CONFIG_MATCHING
   This call takes three parameter:
      1) counter_idx_base
      2) counter_idx_mask
      3) event_idx
      4) event_info
   Find and configure a counter from a set of counters which can monitor
   specified event. The counter_idx_base and counter_idx_mask parameters
   represent the set of counters whereas the event_idx and event_info
   represent the event to monitor. Upon success the SBI call will return
   the counter_idx of the counter which has been configured to monitor
   specified event.  This SBI call will fail if it is unable to find a
   counter which can monitor specified event or the set of counters
   specified via counter_idx_base and counter_idx_mask has an invalid
   counter.

4. SBI_PMU_COUNTER_SET_PHYS_ADDR
   This call takes two parameters:
      1) counter_idx
      2) 8byte aligned physical address
   It will set the physical address of memory location where the SBI
   implementation will write the 64bit SOFTWARE counter. This SBI call
   is only for counters not mapped to any CSR (i.e. only for counters
   with CSR_Number == 0xfff).

5. SBI_PMU_COUNTER_START
   This call takes two parameters:
      1) counter_idx
      2) initial_value
   It will inform SBI implementation to start/enable specified counter
   with specified initial value. This SBI call will fail for counters
   which are not present.

6. SBI_PMU_COUNTER_STOP
   This call takes one parameter:
      1) counter_idx
   It will inform SBI implementation to stop/disable specified counters
   on the calling HART. This SBI call will fail for counters which are
   not present.

The OpenSBI (M-mode runtime firmware) Development Notes:

1. The OpenSBI firmware will translate event_idx and event_into into
   platform dependent MHPMEVENT CSR value before starting/enabling a
   HARDWARE counter.

2. The OpenSBI firmware will need to know following platform dependent
   information:
   A) Possible event_idx values allowed (or supported) by a HARDWARE
      counter (i.e. MHPMCOUNTER)
   B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT CSR
      value. This is optional and by default OpenSBI will write a value
          <xyz> to MHPMEVENT CSR where lower 20bits of <xyz> is event_idx
          and upper XLEN-20 bits of <xyz> are lower XLEN-20 bits of event_info
   C) Additional platform-specific programming required for selecting
      event_idx + event_info combination is also optional for platform.

3. All platform dependent information mentioned above, can be obtained
   by OpenSBI firmware from platform specific code. The DT/ACPI can
   also be used to describe 2.A and 2.B mentioned above but 2.C will
   always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
   The Linux RISC-V driver can be platform driver with "riscv,pmu"
   as DT compatible string and optional "interrupts" DT property. The
   "interrupts" DT property if available should specify overflow
   interrupt for each HART. When "interrupts" DT property is present,
   we might also need another DT property for mapping HARTID to entries
   in "interrupts" DT property. The platform driver probe will:
   A) Need to ensure that underlying SBI implementation provides
      SBI PMU extension using sbi_probe_extension() API of arch/riscv.
   B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
   C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
      call. If the counter is a SOFTWARE counter then use the
          SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
      of counter. The driver can skip this in driver probe and
          instead do this lazily in add() callback mentioned below.

2. event_init() callback
   The event_init() callback will primarily translate user-space
   perf_event_attr to SBI PMU event_idx and event_info. It can do
   this in following way:
   A) perf_event_attr.type == PERF_TYPE_HARDWARE
      event_idx.type = 0x0
      event_idx.code = Value from enum sbi_pmu_hw_id based on
                           perf_event_attr.config
      event_info = 0
   B) perf_event_attr.type == PERF_TYPE_HW_CACHE
      event_idx.type = 0x1
      event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
                                    based on perf_event_attr.config
      event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
                                 based on perf_event_attr.config
      event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
                                     based on perf_event_attr.config
      event_info = 0
   C) perf_event_attr.type == PERF_TYPE_RAW and
      perf_event_attr.config[63:63] == 0
      event_idx.type = 0x2
          event_idx.code = 0x0
          event_info = perf_event_attr.config[62:0]
   D) perf_event_attr.type == PERF_TYPE_RAW and
      perf_event_attr.config[63:63] == 1
      event_idx.type = 0xf
          event_idx.code = Value from enum sbi_pmu_sw_id based on
                           perf_event_attr.config
          event_info = 0
   (Note: event_init() will fail if it is not able to figure out
    event_idx and event_info value corresponding to perf_event_attr)
   (Note: event_init() will not assign counter to perf_event because
    it will be done by add() callback)

3. add() callback
   The add() callback of Linux RISC-V PMU driver will find a free
   counter on current CPU/HART such that the event_idx and event_info
   combination is supported by the counter. To find-and-configure
   a counter to monitor event_idx and event_info combination from
   a set of counters, we will use the SBI_PMU_COUNTER_CONFIG_MATCHING
   call.

4. del() callback
   The del() callback of Linux RISC-V PMU driver will release or
   free the counter.

5. start() callback
   The start() callback of Linux RISC-V PMU driver will start the
   counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
   The stop() callback of Linux RISC-V PMU driver will stop the
   counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup


Greg Favor
 

Thanks.  That's what I was speculating in the second half of my question, i.e. software will have to call SBI_PMU_COUNTER_GET_CSR N times no matter what (after obtaining 'N' from a call to SBI_PMU_NUM_COUNTERS).  Then software will know which counters are hardware counters to later allocate from and which are software counters to later allocate from.

Consequently there is no value in SBI_PMU_NUM_COUNTERS returning separate values for the number of hardware and software counters.

Greg


On Thu, Aug 6, 2020 at 10:50 PM Anup Patel <Anup.Patel@...> wrote:

Hi Greg,

 

We want to allow RISC-V implementation specific CSRs (apart from various HPMCOUNTER CSRs) as HARDWARE counters. Due to this reason, we have decoupled counter_idx from CSR number and we treat counter_idx as logical number assigned to HARDWARE/SOFTWARE counter.

 

The S-mode software needs to call SBI_PMU_COUNTER_GET_CSR only once for each counter which can be done at boot-time OR lazily once before using the counter.

 

Regards,

Anup

 

From: Greg Favor <gfavor@...>
Sent: 07 August 2020 11:13
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra <Atish.Patra@...>; Andrew Waterman <andrew@...>
Subject: Re: Proposal v4: SBI PMU Extension

 

Anup,

 

Wouldn't software want to more easily and directly know from SBI_PMU_NUM_COUNTERS how many hardware counters and software counters there are, instead of having to then call SBI_PMU_COUNTER_GET_CSR N times to figure that out?

 

Or are you expecting that software is going to have to call SBI_PMU_COUNTER_GET_CSR N times in any case (after first calling SBI_PMU_NUM_COUNTERS to get 'N')?

 

Greg

 

P.S. I'm guessing that the valid/supported counters are the first N value of counter_idx from 0 to N-1.  Yes?

 

On Thu, Aug 6, 2020 at 10:33 PM Anup Patel <Anup.Patel@...> wrote:

Hi Greg,

 

The SBI_PMU_NUM_COUNTERS call will return total number of counters (HARDWARE as well as SOFTWARE).

 

We can distinguish between HARDWARE and SOFTWARE counters using the CSR_Number returned by SBI_PMU_COUNTER_GET_CSR call.

(Note: CSR_Number = 0xfff means it is SOFTWARE counter)

 

Regards,

Anup

 

From: Greg Favor <gfavor@...>
Sent: 07 August 2020 00:01
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra <Atish.Patra@...>; Andrew Waterman <andrew@...>
Subject: Re: Proposal v4: SBI PMU Extension

 

Anup,

 

What does SBI_PMU_NUM_COUNTERS return insofar as distinguishing hardware versus software counters?

 

Greg

 

On Thu, Aug 6, 2020 at 6:52 AM Anup Patel <Anup.Patel@...> wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension for all privilege modes
but we do have M-mode HARDWARE performance counters such as MCYCLE CSR,
MINSTRET CSR, and MHPMCOUNTER CSRs which are read-only for S-mode and
U-mode. A RISC-V implementation can support monitoring of various
HARDWARE events using limited number of HARDWARE performance counters.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters. The SBI
PMU extension will only manage per-HART (or per-CPU) HARDWARE/SOFTWARE
counters.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or Xvisor)
will provide a standardized view of HARDWARE/SOFTWARE counters and
events to S-mode (or VS-mode) software.

Before defining SBI PMU extension calls, we first define counter_idx,
event_idx, and event_info entities. The counter_idx is a logical number
assigned to each HARDWARE/SOFTWARE counter. The event_idx represents a
HARDWARE/SOFTWARE event whereas event_info represents additional
configuration/parameters for the event.

The event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event,
the event_info is optional and can be zero whereas the event_idx.code
can be one of the following values:
enum sbi_pmu_hw_id {
    SBI_PMU_HW_CPU_CYCLES              = 0,
    SBI_PMU_HW_INSTRUCTIONS            = 1,
    SBI_PMU_HW_CACHE_REFERENCES        = 2,
    SBI_PMU_HW_CACHE_MISSES            = 3,
    SBI_PMU_HW_BRANCH_INSTRUCTIONS     = 4,
    SBI_PMU_HW_BRANCH_MISSES           = 5,
    SBI_PMU_HW_BUS_CYCLES              = 6,
    SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
    SBI_PMU_HW_STALLED_CYCLES_BACKEND  = 8,
    SBI_PMU_HW_REF_CPU_CYCLES          = 9,
    SBI_PMU_HW_MAX,                    /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_info is optional and can be zero whereas the
event_idx.code is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
    SBI_PMU_HW_CACHE_L1D  = 0,
    SBI_PMU_HW_CACHE_L1I  = 1,
    SBI_PMU_HW_CACHE_LL   = 2,
    SBI_PMU_HW_CACHE_DTLB = 3,
    SBI_PMU_HW_CACHE_ITLB = 4,
    SBI_PMU_HW_CACHE_BPU  = 5,
    SBI_PMU_HW_CACHE_NODE = 6,
    SBI_PMU_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
    SBI_PMU_HW_CACHE_OP_READ     = 0,
    SBI_PMU_HW_CACHE_OP_WRITE    = 1,
    SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
    SBI_PMU_HW_CACHE_OP_MAX,     /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
    SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
    SBI_PMU_HW_CACHE_RESULT_MISS   = 1,
    SBI_PMU_HW_CACHE_RESULT_MAX,   /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
RAW event, the event_idx.code should be zero and the event_info
parameter passed to SBI_PMU_COUNTER_CONFIG_MATCHING call (described
below) will have the RAW event value to be programmed in MHPMEVENT
CSR (i.e. the SBI implementation will not derive MHPMEVENT CSR value
from event_idx and event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is optional and can be zero whereas the
event_idx.code can be one of the following:
enum sbi_pmu_sw_id {
    SBI_PMU_SW_MISALIGNED_LOAD        = 0,
    SBI_PMU_SW_MISALIGNED_STORE       = 1,
    SBI_PMU_SW_ILLEGAL_INSN           = 2,
    SBI_PMU_SW_LOCAL_SET_TIMER        = 3,
    SBI_PMU_SW_LOCAL_IPI              = 4,
    SBI_PMU_SW_LOCAL_FENCE_I          = 5,
    SBI_PMU_SW_LOCAL_SFENCE_VMA       = 6,
    SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID  = 7,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA      = 8,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA      = 10,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
    SBI_PMU_SW_MAX,                   /* non-ABI */
};

In future, more events can be defined without breaking SBI call
backward-compatibility.

Using above definitions of counter_idx, event_idx, and event_info
we can potentially have following SBI calls:

1. SBI_PMU_NUM_COUNTERS
   Return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
   This call takes one parameter:
      1) counter_idx
   Provide the CSR_Number and CSR_Width of underlying counter.
   The value returned by SBI call is encoded as follows:
      return_value[11:0] = CSR_Number
      return_value[19:12] = CSR_Width (Number of bits implemented in HW)
          return_value[XLEN-1:20] = Reserved
   If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
   HARDWARE counter. This SBI call will fail for counters which are not
   present.

3. SBI_PMU_COUNTER_CONFIG_MATCHING
   This call takes three parameter:
      1) counter_idx_base
      2) counter_idx_mask
      3) event_idx
      4) event_info
   Find and configure a counter from a set of counters which can monitor
   specified event. The counter_idx_base and counter_idx_mask parameters
   represent the set of counters whereas the event_idx and event_info
   represent the event to monitor. Upon success the SBI call will return
   the counter_idx of the counter which has been configured to monitor
   specified event.  This SBI call will fail if it is unable to find a
   counter which can monitor specified event or the set of counters
   specified via counter_idx_base and counter_idx_mask has an invalid
   counter.

4. SBI_PMU_COUNTER_SET_PHYS_ADDR
   This call takes two parameters:
      1) counter_idx
      2) 8byte aligned physical address
   It will set the physical address of memory location where the SBI
   implementation will write the 64bit SOFTWARE counter. This SBI call
   is only for counters not mapped to any CSR (i.e. only for counters
   with CSR_Number == 0xfff).

5. SBI_PMU_COUNTER_START
   This call takes two parameters:
      1) counter_idx
      2) initial_value
   It will inform SBI implementation to start/enable specified counter
   with specified initial value. This SBI call will fail for counters
   which are not present.

6. SBI_PMU_COUNTER_STOP
   This call takes one parameter:
      1) counter_idx
   It will inform SBI implementation to stop/disable specified counters
   on the calling HART. This SBI call will fail for counters which are
   not present.

The OpenSBI (M-mode runtime firmware) Development Notes:

1. The OpenSBI firmware will translate event_idx and event_into into
   platform dependent MHPMEVENT CSR value before starting/enabling a
   HARDWARE counter.

2. The OpenSBI firmware will need to know following platform dependent
   information:
   A) Possible event_idx values allowed (or supported) by a HARDWARE
      counter (i.e. MHPMCOUNTER)
   B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT CSR
      value. This is optional and by default OpenSBI will write a value
          <xyz> to MHPMEVENT CSR where lower 20bits of <xyz> is event_idx
          and upper XLEN-20 bits of <xyz> are lower XLEN-20 bits of event_info
   C) Additional platform-specific programming required for selecting
      event_idx + event_info combination is also optional for platform.

3. All platform dependent information mentioned above, can be obtained
   by OpenSBI firmware from platform specific code. The DT/ACPI can
   also be used to describe 2.A and 2.B mentioned above but 2.C will
   always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
   The Linux RISC-V driver can be platform driver with "riscv,pmu"
   as DT compatible string and optional "interrupts" DT property. The
   "interrupts" DT property if available should specify overflow
   interrupt for each HART. When "interrupts" DT property is present,
   we might also need another DT property for mapping HARTID to entries
   in "interrupts" DT property. The platform driver probe will:
   A) Need to ensure that underlying SBI implementation provides
      SBI PMU extension using sbi_probe_extension() API of arch/riscv.
   B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
   C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
      call. If the counter is a SOFTWARE counter then use the
          SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
      of counter. The driver can skip this in driver probe and
          instead do this lazily in add() callback mentioned below.

2. event_init() callback
   The event_init() callback will primarily translate user-space
   perf_event_attr to SBI PMU event_idx and event_info. It can do
   this in following way:
   A) perf_event_attr.type == PERF_TYPE_HARDWARE
      event_idx.type = 0x0
      event_idx.code = Value from enum sbi_pmu_hw_id based on
                           perf_event_attr.config
      event_info = 0
   B) perf_event_attr.type == PERF_TYPE_HW_CACHE
      event_idx.type = 0x1
      event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
                                    based on perf_event_attr.config
      event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
                                 based on perf_event_attr.config
      event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
                                     based on perf_event_attr.config
      event_info = 0
   C) perf_event_attr.type == PERF_TYPE_RAW and
      perf_event_attr.config[63:63] == 0
      event_idx.type = 0x2
          event_idx.code = 0x0
          event_info = perf_event_attr.config[62:0]
   D) perf_event_attr.type == PERF_TYPE_RAW and
      perf_event_attr.config[63:63] == 1
      event_idx.type = 0xf
          event_idx.code = Value from enum sbi_pmu_sw_id based on
                           perf_event_attr.config
          event_info = 0
   (Note: event_init() will fail if it is not able to figure out
    event_idx and event_info value corresponding to perf_event_attr)
   (Note: event_init() will not assign counter to perf_event because
    it will be done by add() callback)

3. add() callback
   The add() callback of Linux RISC-V PMU driver will find a free
   counter on current CPU/HART such that the event_idx and event_info
   combination is supported by the counter. To find-and-configure
   a counter to monitor event_idx and event_info combination from
   a set of counters, we will use the SBI_PMU_COUNTER_CONFIG_MATCHING
   call.

4. del() callback
   The del() callback of Linux RISC-V PMU driver will release or
   free the counter.

5. start() callback
   The start() callback of Linux RISC-V PMU driver will start the
   counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
   The stop() callback of Linux RISC-V PMU driver will stop the
   counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup


Anup Patel
 

Hi Jonathan,

 

I agree “the event_info is optional and can be zero” is totally misleading for HARDWARE and CACHE events. Thanks for pointing. Eventually, the event_info will be used to pass event filter, overflow interrupt, etc configuration flags on RISC-V system having enhanced HPMCOUNTERs (Greg’s Proposal).

 

How about “the event_info is additional configuration and can be zero by default” ? Suggestions ??

 

You are right about the SBI_PMU_COUNTER_SET_PHYS_ADDR call. I did not consider ePMP proposal. Thanks for pointing.

 

Your idea of SBI_PMU_COUNTER_GET_PHYS_ADDR call look good to me but (like you already mentioned) we are inviting issues related to shared mappings (or conflicting cache attributes) with SBI_PMU_COUNTER_SET_PHYS_ADDR and SBI_PMU_COUNTER_GET_PHYS_ADDR calls.

 

I have another suggestion. How about replacing SBI_PMU_COUNTER_GET_PHYS_ADDR call with SBI_PMU_SOFTWARE_COUNTER_READ call to read SOFTWARE counters. The SBI_PMU_SOFTWARE_COUNTER_READ call will always fail for HARDWARE counters. The downside here is the overhead of SBI call to read SOFTWARE counter.

 

Regards,

Anup

 

From: Jonathan Behrens <behrensj@...>
Sent: 07 August 2020 00:06
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra <Atish.Patra@...>; Andrew Waterman <andrew@...>; Greg Favor <gfavor@...>
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v4: SBI PMU Extension

 

I like this proposal! A couple comments...

 

In a couple places you say "the event_info is optional and can be zero". Does this mean that SBI providers must ignore the field, or that non-zero values are reserved (meaning software must set it to zero), or that other values have SBI implementation specific semantics? Or something else?

 

The OpenSBI firmware will need to know following platform dependent information: [...]

 

    B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT CSR value. This is optional and by default OpenSBI will write a value <xyz> to MHPMEVENT CSR where lower 20bits of <xyz> is event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20 bits of event_info

 

This seems to contradict the previous point. By saying how OpenSBI is going to use the event_info field, you are effectively requiring that the OS properly set it. I'd rather the SBI provider just return an error if it can't figure out the proper mappings to MHPMEVENT CSR values. That way there is no risk that the S-mode software accidentally ends up tracking the wrong performance counter because it didn't know what to set event_info to. Put another way, the SBI provider is supposed to be the one that deals with platform specific issues, so the operating system doesn't have to.

 

It would also be nice if possible to pin down how S-mode software can learn the meanings of raw counters. Is it sufficient to look at the tuple of (mvendorid, marchid, mimpid)? Even just some commentary text with guidance could be helpful here.

 

Finally, I wanted to ask about the SBI_PMU_COUNTER_SET_PHYS_ADDR function. Apologies if this has been answered already, but I think this might not work well with the enhanced PMP proposal that is designed to allow most of DRAM to be marked as S/U-mode only. The proposal allows regions to be shared between M-mode and S/U-mode but presumably an implementation would prefer to require only a single shared region with all counters instead of needing to use NUM_COUNTERS number of PMP entries. This could be enabled by making the interface be SBI_PMU_COUNTER_GET_PHYS_ADDR so the firmware gets to pick the locations. On this front, another thing to watch is the memory attributes proposals coming out of the virtual memory task group: shared mappings might have performance costs (to avoid issues with mismatches between M-mode and S-mode attributes).

 

Jonathan

 

On Thu, Aug 6, 2020 at 9:52 AM Anup Patel via lists.riscv.org <anup.patel=wdc.com@...> wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension for all privilege modes
but we do have M-mode HARDWARE performance counters such as MCYCLE CSR,
MINSTRET CSR, and MHPMCOUNTER CSRs which are read-only for S-mode and
U-mode. A RISC-V implementation can support monitoring of various
HARDWARE events using limited number of HARDWARE performance counters.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters. The SBI
PMU extension will only manage per-HART (or per-CPU) HARDWARE/SOFTWARE
counters.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or Xvisor)
will provide a standardized view of HARDWARE/SOFTWARE counters and
events to S-mode (or VS-mode) software.

Before defining SBI PMU extension calls, we first define counter_idx,
event_idx, and event_info entities. The counter_idx is a logical number
assigned to each HARDWARE/SOFTWARE counter. The event_idx represents a
HARDWARE/SOFTWARE event whereas event_info represents additional
configuration/parameters for the event.

The event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event,
the event_info is optional and can be zero whereas the event_idx.code
can be one of the following values:
enum sbi_pmu_hw_id {
    SBI_PMU_HW_CPU_CYCLES              = 0,
    SBI_PMU_HW_INSTRUCTIONS            = 1,
    SBI_PMU_HW_CACHE_REFERENCES        = 2,
    SBI_PMU_HW_CACHE_MISSES            = 3,
    SBI_PMU_HW_BRANCH_INSTRUCTIONS     = 4,
    SBI_PMU_HW_BRANCH_MISSES           = 5,
    SBI_PMU_HW_BUS_CYCLES              = 6,
    SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
    SBI_PMU_HW_STALLED_CYCLES_BACKEND  = 8,
    SBI_PMU_HW_REF_CPU_CYCLES          = 9,
    SBI_PMU_HW_MAX,                    /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_info is optional and can be zero whereas the
event_idx.code is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
    SBI_PMU_HW_CACHE_L1D  = 0,
    SBI_PMU_HW_CACHE_L1I  = 1,
    SBI_PMU_HW_CACHE_LL   = 2,
    SBI_PMU_HW_CACHE_DTLB = 3,
    SBI_PMU_HW_CACHE_ITLB = 4,
    SBI_PMU_HW_CACHE_BPU  = 5,
    SBI_PMU_HW_CACHE_NODE = 6,
    SBI_PMU_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
    SBI_PMU_HW_CACHE_OP_READ     = 0,
    SBI_PMU_HW_CACHE_OP_WRITE    = 1,
    SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
    SBI_PMU_HW_CACHE_OP_MAX,     /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
    SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
    SBI_PMU_HW_CACHE_RESULT_MISS   = 1,
    SBI_PMU_HW_CACHE_RESULT_MAX,   /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
RAW event, the event_idx.code should be zero and the event_info
parameter passed to SBI_PMU_COUNTER_CONFIG_MATCHING call (described
below) will have the RAW event value to be programmed in MHPMEVENT
CSR (i.e. the SBI implementation will not derive MHPMEVENT CSR value
from event_idx and event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is optional and can be zero whereas the
event_idx.code can be one of the following:
enum sbi_pmu_sw_id {
    SBI_PMU_SW_MISALIGNED_LOAD        = 0,
    SBI_PMU_SW_MISALIGNED_STORE       = 1,
    SBI_PMU_SW_ILLEGAL_INSN           = 2,
    SBI_PMU_SW_LOCAL_SET_TIMER        = 3,
    SBI_PMU_SW_LOCAL_IPI              = 4,
    SBI_PMU_SW_LOCAL_FENCE_I          = 5,
    SBI_PMU_SW_LOCAL_SFENCE_VMA       = 6,
    SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID  = 7,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA      = 8,
    SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA      = 10,
    SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
    SBI_PMU_SW_MAX,                   /* non-ABI */
};

In future, more events can be defined without breaking SBI call
backward-compatibility.

Using above definitions of counter_idx, event_idx, and event_info
we can potentially have following SBI calls:

1. SBI_PMU_NUM_COUNTERS
   Return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
   This call takes one parameter:
      1) counter_idx
   Provide the CSR_Number and CSR_Width of underlying counter.
   The value returned by SBI call is encoded as follows:
      return_value[11:0] = CSR_Number
      return_value[19:12] = CSR_Width (Number of bits implemented in HW)
          return_value[XLEN-1:20] = Reserved
   If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
   HARDWARE counter. This SBI call will fail for counters which are not
   present.

3. SBI_PMU_COUNTER_CONFIG_MATCHING
   This call takes three parameter:
      1) counter_idx_base
      2) counter_idx_mask
      3) event_idx
      4) event_info
   Find and configure a counter from a set of counters which can monitor
   specified event. The counter_idx_base and counter_idx_mask parameters
   represent the set of counters whereas the event_idx and event_info
   represent the event to monitor. Upon success the SBI call will return
   the counter_idx of the counter which has been configured to monitor
   specified event.  This SBI call will fail if it is unable to find a
   counter which can monitor specified event or the set of counters
   specified via counter_idx_base and counter_idx_mask has an invalid
   counter.

4. SBI_PMU_COUNTER_SET_PHYS_ADDR
   This call takes two parameters:
      1) counter_idx
      2) 8byte aligned physical address
   It will set the physical address of memory location where the SBI
   implementation will write the 64bit SOFTWARE counter. This SBI call
   is only for counters not mapped to any CSR (i.e. only for counters
   with CSR_Number == 0xfff).

5. SBI_PMU_COUNTER_START
   This call takes two parameters:
      1) counter_idx
      2) initial_value
   It will inform SBI implementation to start/enable specified counter
   with specified initial value. This SBI call will fail for counters
   which are not present.

6. SBI_PMU_COUNTER_STOP
   This call takes one parameter:
      1) counter_idx
   It will inform SBI implementation to stop/disable specified counters
   on the calling HART. This SBI call will fail for counters which are
   not present.

The OpenSBI (M-mode runtime firmware) Development Notes:

1. The OpenSBI firmware will translate event_idx and event_into into
   platform dependent MHPMEVENT CSR value before starting/enabling a
   HARDWARE counter.

2. The OpenSBI firmware will need to know following platform dependent
   information:
   A) Possible event_idx values allowed (or supported) by a HARDWARE
      counter (i.e. MHPMCOUNTER)
   B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT CSR
      value. This is optional and by default OpenSBI will write a value
          <xyz> to MHPMEVENT CSR where lower 20bits of <xyz> is event_idx
          and upper XLEN-20 bits of <xyz> are lower XLEN-20 bits of event_info
   C) Additional platform-specific programming required for selecting
      event_idx + event_info combination is also optional for platform.

3. All platform dependent information mentioned above, can be obtained
   by OpenSBI firmware from platform specific code. The DT/ACPI can
   also be used to describe 2.A and 2.B mentioned above but 2.C will
   always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
   The Linux RISC-V driver can be platform driver with "riscv,pmu"
   as DT compatible string and optional "interrupts" DT property. The
   "interrupts" DT property if available should specify overflow
   interrupt for each HART. When "interrupts" DT property is present,
   we might also need another DT property for mapping HARTID to entries
   in "interrupts" DT property. The platform driver probe will:
   A) Need to ensure that underlying SBI implementation provides
      SBI PMU extension using sbi_probe_extension() API of arch/riscv.
   B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
   C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
      call. If the counter is a SOFTWARE counter then use the
          SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
      of counter. The driver can skip this in driver probe and
          instead do this lazily in add() callback mentioned below.

2. event_init() callback
   The event_init() callback will primarily translate user-space
   perf_event_attr to SBI PMU event_idx and event_info. It can do
   this in following way:
   A) perf_event_attr.type == PERF_TYPE_HARDWARE
      event_idx.type = 0x0
      event_idx.code = Value from enum sbi_pmu_hw_id based on
                           perf_event_attr.config
      event_info = 0
   B) perf_event_attr.type == PERF_TYPE_HW_CACHE
      event_idx.type = 0x1
      event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
                                    based on perf_event_attr.config
      event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
                                 based on perf_event_attr.config
      event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
                                     based on perf_event_attr.config
      event_info = 0
   C) perf_event_attr.type == PERF_TYPE_RAW and
      perf_event_attr.config[63:63] == 0
      event_idx.type = 0x2
          event_idx.code = 0x0
          event_info = perf_event_attr.config[62:0]
   D) perf_event_attr.type == PERF_TYPE_RAW and
      perf_event_attr.config[63:63] == 1
      event_idx.type = 0xf
          event_idx.code = Value from enum sbi_pmu_sw_id based on
                           perf_event_attr.config
          event_info = 0
   (Note: event_init() will fail if it is not able to figure out
    event_idx and event_info value corresponding to perf_event_attr)
   (Note: event_init() will not assign counter to perf_event because
    it will be done by add() callback)

3. add() callback
   The add() callback of Linux RISC-V PMU driver will find a free
   counter on current CPU/HART such that the event_idx and event_info
   combination is supported by the counter. To find-and-configure
   a counter to monitor event_idx and event_info combination from
   a set of counters, we will use the SBI_PMU_COUNTER_CONFIG_MATCHING
   call.

4. del() callback
   The del() callback of Linux RISC-V PMU driver will release or
   free the counter.

5. start() callback
   The start() callback of Linux RISC-V PMU driver will start the
   counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
   The stop() callback of Linux RISC-V PMU driver will stop the
   counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup


Greg Favor
 

I realize that in my comments below to Jonathon's  SBI_PMU_COUNTER_SET_PHYS_ADDR question,  I should have asked (to Anup) whether all allocation of counters is to be done by OS/hypervisor and M-mode firmware only needs to be told the address of each individual software counter (via  SBI_PMU_COUNTER_SET_PHYS_ADDR).  It seems like that is the case.

Also note that since both S/HS mode and M mode have to understand where "shared" PMPv2 memory resides, that issue doesn't force the need for M-mode to be the one allocating memory for software counters.

Greg

On Thu, Aug 6, 2020 at 1:03 PM Greg Favor <gfavor@...> wrote:
On Thu, Aug 6, 2020 at 11:36 AM Jonathan Behrens <behrensj@...> wrote:
Finally, I wanted to ask about the SBI_PMU_COUNTER_SET_PHYS_ADDR function. Apologies if this has been answered already, but I think this might not work well with the enhanced PMP proposal that is designed to allow most of DRAM to be marked as S/U-mode only.

Interesting point.  Requiring M mode to have access to most S/U mode memory would defeat a lot of the purpose and security benefits of Enhanced PMP (aka PMPv2).
 
The proposal allows regions to be shared between M-mode and S/U-mode but presumably an implementation would prefer to require only a single shared region with all counters instead of needing to use NUM_COUNTERS number of PMP entries. This could be enabled by making the interface be SBI_PMU_COUNTER_GET_PHYS_ADDR so the firmware gets to pick the locations.

A key question is, in any case, who allocates the memory where software counters are placed?  How is that memory allocated in coordination with the OS or hypervisor?

Shouldn't the OS or hypervisor do the allocation and then tell M-mode the address of that block of memory?  Then M-mode can allocate space for individual counters from that.  And, in the context of PMPv2, that block of memory would be allocated from an existing "shared" PMP region.
 
On this front, another thing to watch is the memory attributes proposals coming out of the virtual memory task group: shared mappings might have performance costs (to avoid issues with mismatches between M-mode and S-mode attributes).

All the more reason that the OS/hypervisor should be allocating the block of memory for software counters.  The  OS/hypervisor will be aware of the memory attributes set up in the page tables and can make sure to use appropriate attribute settings in the PTEs that map this memory.

Greg
 


Greg Favor
 

On Thu, Aug 6, 2020 at 11:36 PM Anup Patel <Anup.Patel@...> wrote:

Hi Jonathan,

 

I agree “the event_info is optional and can be zero” is totally misleading for HARDWARE and CACHE events. Thanks for pointing. Eventually, the event_info will be used to pass event filter, overflow interrupt, etc configuration flags on RISC-V system having enhanced HPMCOUNTERs (Greg’s Proposal).

 

How about “the event_info is additional configuration and can be zero by default” ? Suggestions ??


Allowing event_info to be zero by default is OK.  But as some of the higher bits of mhpmevent CSRs become standardized (or someone has non-standard bits up there), event_info will need to be non-zero for designs that implement any of these bits.
 

 

You are right about the SBI_PMU_COUNTER_SET_PHYS_ADDR call. I did not consider ePMP proposal. Thanks for pointing.

 

Your idea of SBI_PMU_COUNTER_GET_PHYS_ADDR call look good to me but (like you already mentioned) we are inviting issues related to shared mappings (or conflicting cache attributes) with SBI_PMU_COUNTER_SET_PHYS_ADDR and SBI_PMU_COUNTER_GET_PHYS_ADDR calls.

 

I have another suggestion. How about replacing SBI_PMU_COUNTER_GET_PHYS_ADDR call with SBI_PMU_SOFTWARE_COUNTER_READ call to read SOFTWARE counters. The SBI_PMU_SOFTWARE_COUNTER_READ call will always fail for HARDWARE counters. The downside here is the overhead of SBI call to read SOFTWARE counter.


That approach could be much lower performance if one is periodically sampling a software counter.  In essence sampling hardware counters will be quite fast, while sampling software counters will be quite slow.

Further, does this approach imply that S/HS mode software has to call M-mode to increment a software counter (which would be horrible performance-wise)?  Otherwise if S/HS mode is still getting the PA for the counter, then this approach didn't avoid the PMP issue.

Greg
 


Greg Favor
 

If my last comments are valid, then it seems like we're back to having either SBI_PMU_COUNTER_GET_PHYS_ADDR or SBI_PMU_COUNTER_SET_PHYS_ADDR.  In which case (since both S/HS and M modes need to understand where the shared PMP region(s) are), it seems like the choice boils down to which side should be doing the allocation of memory for software counters.  I think, as far as managing attribute consistency between M-mode accesses and S/HS-mode accesses to a software counter, S/HS mode should be able to avoid mismatched attributes in its page tables with either choice.

Although what could be tricky is that software will want to use an AMO instruction to increment a software counter, but in many systems AMO's may only be supported to cacheable memory.  Which would require the PMA (as well as the page tables) for the shared region to specify the cacheable attribute (to avoid mismatched attributes).  At which point this wouldn't be a problem.

Greg

On Thu, Aug 6, 2020 at 11:46 PM Greg Favor via lists.riscv.org <gfavor=ventanamicro.com@...> wrote:
On Thu, Aug 6, 2020 at 11:36 PM Anup Patel <Anup.Patel@...> wrote:

Your idea of SBI_PMU_COUNTER_GET_PHYS_ADDR call look good to me but (like you already mentioned) we are inviting issues related to shared mappings (or conflicting cache attributes) with SBI_PMU_COUNTER_SET_PHYS_ADDR and SBI_PMU_COUNTER_GET_PHYS_ADDR calls.

 

I have another suggestion. How about replacing SBI_PMU_COUNTER_GET_PHYS_ADDR call with SBI_PMU_SOFTWARE_COUNTER_READ call to read SOFTWARE counters. The SBI_PMU_SOFTWARE_COUNTER_READ call will always fail for HARDWARE counters. The downside here is the overhead of SBI call to read SOFTWARE counter.

 


Anup Patel
 

The SBI_PMU_SOFTWARE_COUNTER_READ call certainly has performance issue. The only benefit it provides is no shared mapping. Anyway, we can’t compromise performance so let’s drop this idea.

 

Regards,

Anup

 

From: Greg Favor <gfavor@...>
Sent: 07 August 2020 12:16
To: Anup Patel <Anup.Patel@...>
Cc: Jonathan Behrens <behrensj@...>; tech-unixplatformspec@...; Atish Patra <Atish.Patra@...>; Andrew Waterman <andrew@...>
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v4: SBI PMU Extension

 

On Thu, Aug 6, 2020 at 11:36 PM Anup Patel <Anup.Patel@...> wrote:

Hi Jonathan,

 

I agree “the event_info is optional and can be zero” is totally misleading for HARDWARE and CACHE events. Thanks for pointing. Eventually, the event_info will be used to pass event filter, overflow interrupt, etc configuration flags on RISC-V system having enhanced HPMCOUNTERs (Greg’s Proposal).

 

How about “the event_info is additional configuration and can be zero by default” ? Suggestions ??

 

Allowing event_info to be zero by default is OK.  But as some of the higher bits of mhpmevent CSRs become standardized (or someone has non-standard bits up there), event_info will need to be non-zero for designs that implement any of these bits.

 

 

You are right about the SBI_PMU_COUNTER_SET_PHYS_ADDR call. I did not consider ePMP proposal. Thanks for pointing.

 

Your idea of SBI_PMU_COUNTER_GET_PHYS_ADDR call look good to me but (like you already mentioned) we are inviting issues related to shared mappings (or conflicting cache attributes) with SBI_PMU_COUNTER_SET_PHYS_ADDR and SBI_PMU_COUNTER_GET_PHYS_ADDR calls.

 

I have another suggestion. How about replacing SBI_PMU_COUNTER_GET_PHYS_ADDR call with SBI_PMU_SOFTWARE_COUNTER_READ call to read SOFTWARE counters. The SBI_PMU_SOFTWARE_COUNTER_READ call will always fail for HARDWARE counters. The downside here is the overhead of SBI call to read SOFTWARE counter.

 

That approach could be much lower performance if one is periodically sampling a software counter.  In essence sampling hardware counters will be quite fast, while sampling software counters will be quite slow.


Further, does this approach imply that S/HS mode software has to call M-mode to increment a software counter (which would be horrible performance-wise)?  Otherwise if S/HS mode is still getting the PA for the counter, then this approach didn't avoid the PMP issue.

 

Greg

 


Anup Patel
 

For SBI_PMU_COUNTER_GET_PHYS_ADDR call, both OpenSBI and Hypervisors will have to show SOFTWARE counter memory as reserved memory regions in device-tree. This is straight forward for OpenSBI but it will become complicated for KVM because KVM user-space will have to allocate software

counter memory and inform it’s location to KVM kernel module using additional RISC-V specific IOCTLs.

 

I am leaning back to SBI_PMU_COUNTER_SET_PHYS_ADDR call because it is much cleaner/simpler if S/HS-mode allocates memory for SOFTWARE counters. To handle PMPv2 security features, we might end-up configuring a separate PMP region for each SOFTWARE counter whenever SBI_PMU_COUNTER_SET_PHYS_ADDR call is received in OpenSBI.

 

Regards,

Anup

 

From: Greg Favor <gfavor@...>
Sent: 07 August 2020 12:26
To: Greg Favor <gfavor@...>
Cc: Anup Patel <Anup.Patel@...>; Jonathan Behrens <behrensj@...>; tech-unixplatformspec@...; Atish Patra <Atish.Patra@...>; Andrew Waterman <andrew@...>
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v4: SBI PMU Extension

 

If my last comments are valid, then it seems like we're back to having either SBI_PMU_COUNTER_GET_PHYS_ADDR or SBI_PMU_COUNTER_SET_PHYS_ADDR.  In which case (since both S/HS and M modes need to understand where the shared PMP region(s) are), it seems like the choice boils down to which side should be doing the allocation of memory for software counters.  I think, as far as managing attribute consistency between M-mode accesses and S/HS-mode accesses to a software counter, S/HS mode should be able to avoid mismatched attributes in its page tables with either choice.

 

Although what could be tricky is that software will want to use an AMO instruction to increment a software counter, but in many systems AMO's may only be supported to cacheable memory.  Which would require the PMA (as well as the page tables) for the shared region to specify the cacheable attribute (to avoid mismatched attributes).  At which point this wouldn't be a problem.

 

Greg

 

On Thu, Aug 6, 2020 at 11:46 PM Greg Favor via lists.riscv.org <gfavor=ventanamicro.com@...> wrote:

On Thu, Aug 6, 2020 at 11:36 PM Anup Patel <Anup.Patel@...> wrote:

Your idea of SBI_PMU_COUNTER_GET_PHYS_ADDR call look good to me but (like you already mentioned) we are inviting issues related to shared mappings (or conflicting cache attributes) with SBI_PMU_COUNTER_SET_PHYS_ADDR and SBI_PMU_COUNTER_GET_PHYS_ADDR calls.

 

I have another suggestion. How about replacing SBI_PMU_COUNTER_GET_PHYS_ADDR call with SBI_PMU_SOFTWARE_COUNTER_READ call to read SOFTWARE counters. The SBI_PMU_SOFTWARE_COUNTER_READ call will always fail for HARDWARE counters. The downside here is the overhead of SBI call to read SOFTWARE counter.

 


Jonathan Behrens <behrensj@...>
 

I agree “the event_info is optional and can be zero” is totally misleading for HARDWARE and CACHE events. Thanks for pointing. Eventually, the event_info will be used to pass event filter, overflow interrupt, etc configuration flags on RISC-V system having enhanced HPMCOUNTERs (Greg’s Proposal).

 

How about “the event_info is additional configuration and can be zero by default” ? Suggestions ??


 How about "Non-zero values for event_info are reserved. Future versions of this specification may use them to pass configuration flags like event filter, overflow interrupt, etc."

I am leaning back to SBI_PMU_COUNTER_SET_PHYS_ADDR call because it is much cleaner/simpler if S/HS-mode allocates memory for SOFTWARE counters. To handle PMPv2 security features, we might end-up configuring a separate PMP region for each SOFTWARE counter whenever SBI_PMU_COUNTER_SET_PHYS_ADDR call is received in OpenSBI.

 

There likely wouldn't be enough PMP entries to have one per counter. Currently the max possible is 16 entries and some implementations have fewer. Even if the number gets increased to 64 PMP entries, people aren't going to want to devote a significant fraction to this one feature. And now that I think about it, there was talk about trying to lock all PMP entries at boot, which would prevent any dynamic entries at all.

Alternate proposal: Have the device tree specify which regions of memory are shared with M-mode, and then require the OS to allocate out of those. KVM and other hypervisors would just say "all of memory", while OpenSBI could designate a few KB of shared memory if it was using PMPv2, or also say "all of memory" if not.

Jonathan

On Fri, Aug 7, 2020 at 4:34 AM Anup Patel via lists.riscv.org <anup.patel=wdc.com@...> wrote:

For SBI_PMU_COUNTER_GET_PHYS_ADDR call, both OpenSBI and Hypervisors will have to show SOFTWARE counter memory as reserved memory regions in device-tree. This is straight forward for OpenSBI but it will become complicated for KVM because KVM user-space will have to allocate software

counter memory and inform it’s location to KVM kernel module using additional RISC-V specific IOCTLs.

 

I am leaning back to SBI_PMU_COUNTER_SET_PHYS_ADDR call because it is much cleaner/simpler if S/HS-mode allocates memory for SOFTWARE counters. To handle PMPv2 security features, we might end-up configuring a separate PMP region for each SOFTWARE counter whenever SBI_PMU_COUNTER_SET_PHYS_ADDR call is received in OpenSBI.

 

Regards,

Anup

 

From: Greg Favor <gfavor@...>
Sent: 07 August 2020 12:26
To: Greg Favor <gfavor@...>
Cc: Anup Patel <Anup.Patel@...>; Jonathan Behrens <behrensj@...>; tech-unixplatformspec@...; Atish Patra <Atish.Patra@...>; Andrew Waterman <andrew@...>
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v4: SBI PMU Extension

 

If my last comments are valid, then it seems like we're back to having either SBI_PMU_COUNTER_GET_PHYS_ADDR or SBI_PMU_COUNTER_SET_PHYS_ADDR.  In which case (since both S/HS and M modes need to understand where the shared PMP region(s) are), it seems like the choice boils down to which side should be doing the allocation of memory for software counters.  I think, as far as managing attribute consistency between M-mode accesses and S/HS-mode accesses to a software counter, S/HS mode should be able to avoid mismatched attributes in its page tables with either choice.

 

Although what could be tricky is that software will want to use an AMO instruction to increment a software counter, but in many systems AMO's may only be supported to cacheable memory.  Which would require the PMA (as well as the page tables) for the shared region to specify the cacheable attribute (to avoid mismatched attributes).  At which point this wouldn't be a problem.

 

Greg

 

On Thu, Aug 6, 2020 at 11:46 PM Greg Favor via lists.riscv.org <gfavor=ventanamicro.com@...> wrote:

On Thu, Aug 6, 2020 at 11:36 PM Anup Patel <Anup.Patel@...> wrote:

Your idea of SBI_PMU_COUNTER_GET_PHYS_ADDR call look good to me but (like you already mentioned) we are inviting issues related to shared mappings (or conflicting cache attributes) with SBI_PMU_COUNTER_SET_PHYS_ADDR and SBI_PMU_COUNTER_GET_PHYS_ADDR calls.

 

I have another suggestion. How about replacing SBI_PMU_COUNTER_GET_PHYS_ADDR call with SBI_PMU_SOFTWARE_COUNTER_READ call to read SOFTWARE counters. The SBI_PMU_SOFTWARE_COUNTER_READ call will always fail for HARDWARE counters. The downside here is the overhead of SBI call to read SOFTWARE counter.

 


Greg Favor
 

On Fri, Aug 7, 2020 at 7:35 AM Jonathan Behrens <behrensj@...> wrote:

I am leaning back to SBI_PMU_COUNTER_SET_PHYS_ADDR call because it is much cleaner/simpler if S/HS-mode allocates memory for SOFTWARE counters. To handle PMPv2 security features, we might end-up configuring a separate PMP region for each SOFTWARE counter whenever SBI_PMU_COUNTER_SET_PHYS_ADDR call is received in OpenSBI.

 

There likely wouldn't be enough PMP entries to have one per counter. Currently the max possible is 16 entries and some implementations have fewer. Even if the number gets increased to 64 PMP entries, people aren't going to want to devote a significant fraction to this one feature. And now that I think about it, there was talk about trying to lock all PMP entries at boot, which would prevent any dynamic entries at all.

What I imagine is that a block of memory is allocated up front for software counters, and with PMPv2 that is made from within a "shared" region of memory.  Individual counters are then allocated from that block of memory.

When that shared region is configured in a PMP at boot time, the sizing and layout of it of course would reflect the fact that a block of it is going to be used for software counters (as well as reflecting all the other data structures that space is being allowed for).
 

Alternate proposal: Have the device tree specify which regions of memory are shared with M-mode, and then require the OS to allocate out of those. KVM and other hypervisors would just say "all of memory", while OpenSBI could designate a few KB of shared memory if it was using PMPv2, or also say "all of memory" if not.

It sounds like a guest OS, based on the DT it was given, will have an understanding of what it thinks are "shared" regions.  Those would be mapped by the hypervisor to actual system address space and an actual "shared" region.  Although with many VMs and the coarse 4KB granularity of mapping addresses, this would probably become a bit of a problem.  Which really comes down to how would software counters be virtualized if both a guest OS and M-mode can be accessing them?

Which begs the question of why does M-mode need to access an OS's software counters?  If the answer could be turned into a No, then this whole issue goes away.

Greg


Anup Patel
 

Instead of having separate physical address for each SOFTWARE counter, we can have single base physical address for all SOFTWARE counters. Each SOFTWARE counter will be written at address = <base> + <counter_idx> * 8. This means OpenSBI will only need one PMPv2 entry to map SOFTWARE counters.

 

Like I mentioned previously, the device tree based shared memory details will make things complicated for KVM RISC-V because we will need to add special RISC-V specific IOCTL in KVM to inform location of shared memory from KVM user-space to KVM kernel space.

 

The SOFTWARE counters mentioned in SBI PMU extension are for counting SBI implementation events (e.g. number of misaligned load/store traps, number of RFENCES handled, number of IPIs injected, etc). These SOFTWARE counters have nothing to do with Linux perf software counters. Similarly, Hypervisors can provide it’s own SBI SOFTWARE counters for hypervisor specific events.

 

Regards,

Anup

 

From: Greg Favor <gfavor@...>
Sent: 07 August 2020 22:42
To: Jonathan Behrens <behrensj@...>
Cc: Anup Patel <Anup.Patel@...>; tech-unixplatformspec@...; Atish Patra <Atish.Patra@...>; Andrew Waterman <andrew@...>
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v4: SBI PMU Extension

 

On Fri, Aug 7, 2020 at 7:35 AM Jonathan Behrens <behrensj@...> wrote:

I am leaning back to SBI_PMU_COUNTER_SET_PHYS_ADDR call because it is much cleaner/simpler if S/HS-mode allocates memory for SOFTWARE counters. To handle PMPv2 security features, we might end-up configuring a separate PMP region for each SOFTWARE counter whenever SBI_PMU_COUNTER_SET_PHYS_ADDR call is received in OpenSBI.

 

There likely wouldn't be enough PMP entries to have one per counter. Currently the max possible is 16 entries and some implementations have fewer. Even if the number gets increased to 64 PMP entries, people aren't going to want to devote a significant fraction to this one feature. And now that I think about it, there was talk about trying to lock all PMP entries at boot, which would prevent any dynamic entries at all.

 

What I imagine is that a block of memory is allocated up front for software counters, and with PMPv2 that is made from within a "shared" region of memory.  Individual counters are then allocated from that block of memory.

 

When that shared region is configured in a PMP at boot time, the sizing and layout of it of course would reflect the fact that a block of it is going to be used for software counters (as well as reflecting all the other data structures that space is being allowed for).

 

 

Alternate proposal: Have the device tree specify which regions of memory are shared with M-mode, and then require the OS to allocate out of those. KVM and other hypervisors would just say "all of memory", while OpenSBI could designate a few KB of shared memory if it was using PMPv2, or also say "all of memory" if not.

 

It sounds like a guest OS, based on the DT it was given, will have an understanding of what it thinks are "shared" regions.  Those would be mapped by the hypervisor to actual system address space and an actual "shared" region.  Although with many VMs and the coarse 4KB granularity of mapping addresses, this would probably become a bit of a problem.  Which really comes down to how would software counters be virtualized if both a guest OS and M-mode can be accessing them?

 

Which begs the question of why does M-mode need to access an OS's software counters?  If the answer could be turned into a No, then this whole issue goes away.

 

Greg

 


Greg Favor
 

Ahh, I didn't realize that these were only M-mode software counters, i.e. they are only controlled and incremented by M mode..  That makes things clearer now.

In which case, why not go with the SBI_PMU_SOFTWARE_COUNTER_READ approach you had mentioned yesterday?  That would be a much cleaner, simpler solution.  And for these M-mode only software counters, the call overhead of sampling these once in a while would be alright.  (My performance concerns yesterday were wrt sampling of OS/hypervisor software counters.)

Greg

On Fri, Aug 7, 2020 at 10:40 AM Anup Patel <Anup.Patel@...> wrote:

The SOFTWARE counters mentioned in SBI PMU extension are for counting SBI implementation events (e.g. number of misaligned load/store traps, number of RFENCES handled, number of IPIs injected, etc). These SOFTWARE counters have nothing to do with Linux perf software counters. Similarly, Hypervisors can provide it’s own SBI SOFTWARE counters for hypervisor specific events.

 

Regards,

Anup



Jonathan Behrens <behrensj@...>
 

Instead of SBI_PMU_SOFTWARE_COUNTER_READ, we could even just trap-and-emulate accesses to HPMCOUNTERx. The performance difference should be relatively minimal (at least compared to the shared memory approach).

Jonathan


On Fri, Aug 7, 2020 at 2:01 PM Greg Favor via lists.riscv.org <gfavor=ventanamicro.com@...> wrote:
Ahh, I didn't realize that these were only M-mode software counters, i.e. they are only controlled and incremented by M mode..  That makes things clearer now.

In which case, why not go with the SBI_PMU_SOFTWARE_COUNTER_READ approach you had mentioned yesterday?  That would be a much cleaner, simpler solution.  And for these M-mode only software counters, the call overhead of sampling these once in a while would be alright.  (My performance concerns yesterday were wrt sampling of OS/hypervisor software counters.)

Greg

On Fri, Aug 7, 2020 at 10:40 AM Anup Patel <Anup.Patel@...> wrote:

The SOFTWARE counters mentioned in SBI PMU extension are for counting SBI implementation events (e.g. number of misaligned load/store traps, number of RFENCES handled, number of IPIs injected, etc). These SOFTWARE counters have nothing to do with Linux perf software counters. Similarly, Hypervisors can provide it’s own SBI SOFTWARE counters for hypervisor specific events.

 

Regards,

Anup



Anup Patel
 

Trap-n-emulate illegal instructions is bit expensive particularly when MTVAL/STVAL CSR does not have the illegal instruction encoding because in OpenSBI/Hypervisors we end-up doing unprivileged access to read illegal instruction.

 

SBI call to read SOFTWARE counter will be much master because don’t need to read-and-decode instructions.

 

Even Hypervisors will expose HS-mode software counters (i.e. controlled and incremented by HS-mode) to VS-mode as SBI PMU SOFTWARE counters so trapping overhead will be still there for Hypervisors if we go for SBI_PMU_SOFTWARE_COUNTER_READ approach.

 

Regards,

Anup

 

From: Jonathan Behrens <behrensj@...>
Sent: 07 August 2020 23:48
To: Greg Favor <gfavor@...>
Cc: Anup Patel <Anup.Patel@...>; tech-unixplatformspec@...; Atish Patra <Atish.Patra@...>; Andrew Waterman <andrew@...>
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v4: SBI PMU Extension

 

Instead of SBI_PMU_SOFTWARE_COUNTER_READ, we could even just trap-and-emulate accesses to HPMCOUNTERx. The performance difference should be relatively minimal (at least compared to the shared memory approach).

 

Jonathan

 

 

On Fri, Aug 7, 2020 at 2:01 PM Greg Favor via lists.riscv.org <gfavor=ventanamicro.com@...> wrote:

Ahh, I didn't realize that these were only M-mode software counters, i.e. they are only controlled and incremented by M mode..  That makes things clearer now.

 

In which case, why not go with the SBI_PMU_SOFTWARE_COUNTER_READ approach you had mentioned yesterday?  That would be a much cleaner, simpler solution.  And for these M-mode only software counters, the call overhead of sampling these once in a while would be alright.  (My performance concerns yesterday were wrt sampling of OS/hypervisor software counters.)

 

Greg

 

On Fri, Aug 7, 2020 at 10:40 AM Anup Patel <Anup.Patel@...> wrote:

The SOFTWARE counters mentioned in SBI PMU extension are for counting SBI implementation events (e.g. number of misaligned load/store traps, number of RFENCES handled, number of IPIs injected, etc). These SOFTWARE counters have nothing to do with Linux perf software counters. Similarly, Hypervisors can provide it’s own SBI SOFTWARE counters for hypervisor specific events.

 

Regards,

Anup