Proposal v3: SBI PMU Extension


Anup Patel
 

Hi All,

We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER
CSRs. A RISC-V implementation can support monitoring various HARDWARE
events using limited number of HPMCOUNTER CSRs.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters. The SBI
PMU extension will only manage per-HART (or per-CPU) HARDWARE/SOFTWARE
counters which include CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and
SOFTWARE counters provided by SBI implementation.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or Xvisor)
will provide a standardized view of HARDWARE/SOFTWARE counters and
events to S-mode (or VS-mode) software.

To define SBI PMU extension, we first define counter_idx which is a
logical number assigned to a counter and event_idx which is an encoded
number representing the HARDWARE/SOFTWARE event to be monitored. A
HARDWARE/SOFTWARE event can also have additional configuration/details
referred to as event_info.

The SBI PMU event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event,
the event_info is not required whereas the event_idx.code can be one
of the following values:
enum sbi_pmu_hw_id {
SBI_PMU_HW_CPU_CYCLES = 0,
SBI_PMU_HW_INSTRUCTIONS = 1,
SBI_PMU_HW_CACHE_REFERENCES = 2,
SBI_PMU_HW_CACHE_MISSES = 3,
SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4,
SBI_PMU_HW_BRANCH_MISSES = 5,
SBI_PMU_HW_BUS_CYCLES = 6,
SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8,
SBI_PMU_HW_REF_CPU_CYCLES = 9,
SBI_PMU_HW_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_info is not required whereas the event_idx.code
is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
SBI_PMU_HW_CACHE_L1D = 0,
SBI_PMU_HW_CACHE_L1I = 1,
SBI_PMU_HW_CACHE_LL = 2,
SBI_PMU_HW_CACHE_DTLB = 3,
SBI_PMU_HW_CACHE_ITLB = 4,
SBI_PMU_HW_CACHE_BPU = 5,
SBI_PMU_HW_CACHE_NODE = 6,
SBI_PMU_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
SBI_PMU_HW_CACHE_OP_READ = 0,
SBI_PMU_HW_CACHE_OP_WRITE = 1,
SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
SBI_PMU_HW_CACHE_RESULT_MISS = 1,
SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
RAW event, the event_idx.code should be zero and the event_info
parameter passed to SBI_PMU_COUNTER_SET_EVENT call (described below)
will have the RAW event value to be programmed in MHPMEVENT CSR (i.e.
the SBI implementation will not derive MHPMEVENT CSR value from
event_idx + event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is not required whereas the event_idx.code
can be one of the following:
enum sbi_pmu_sw_id {
SBI_PMU_SW_MISALIGNED_LOAD = 0,
SBI_PMU_SW_MISALIGNED_STORE = 1,
SBI_PMU_SW_ILLEGAL_INSN = 2,
SBI_PMU_SW_LOCAL_SET_TIMER = 3,
SBI_PMU_SW_LOCAL_IPI = 4,
SBI_PMU_SW_LOCAL_FENCE_I = 5,
SBI_PMU_SW_LOCAL_SFENCE_VMA = 6,
SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7,
SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8,
SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10,
SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
SBI_PMU_SW_MAX, /* non-ABI */
};

In future, more events can be defined without breaking SBI call
compatibility of SBI calls.

Using definition of counter_idx and event_idx, we can potentially have
the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
This call will return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
This call takes one parameter:
1) counter_idx
It will provide the CSR_Number and CSR_Width of underlying counter.
The value returned by SBI call is encoded as follows:
return_value[11:0] = CSR_Number
return_value[19:12] = CSR_Width (Number of bits implemented in HW)
If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
HARDWARE counter. This SBI call will fail for counters which are not
present.

3. SBI_PMU_COUNTER_SET_EVENT
This call takes three parameter:
1) counter_idx
2) event_idx
3) event_info
It will select an event to be monitored by given counter. If this
SBI call is not used for a counter to select an event then the
counter will montior default event selected for it at boot-time.
This SBI call will fail for counters which are not present. It will
also fail if specified event_idx + event_info combination is not
supported by given counter.

4. SBI_PMU_COUNTER_SET_PHYS_ADDR
This call takes two parameters:
1) counter_idx
2) 8byte aligned physical address
It will set the physical address of memory location where the SBI
implementation will write the 64bit SOFTWARE counter. This SBI call
is only for counters not mapped to any CSR (i.e. only for counters
with CSR_Number > 0xfff).

5. SBI_PMU_COUNTER_START
This call takes two parameters:
1) counter_idx
2) initial_value
It will inform SBI implementation to start/enable specified counter
with specified initial value. This SBI call will fail for counters
which are not present.

6. SBI_PMU_COUNTER_STOP
This call takes one parameter:
1) counter_idx
It will inform SBI implementation to stop/disable specified counters
on the calling HART. This SBI call will fail for counters which are
not present.

The M-mode runtime firmware (OpenSBI) Development Notes:

1. The M-mode runtime firmware will have to translate SBI PMU
event_idx and event_into into platform dependent MHPMEVENT CSR
value before starting/enabling a HARDWARE counter.

2. The M-mode runtime firmware (OpenSBI) will need to know following
platform dependent information:
A) Possible event_idx values allowed (or supported) by a HARDWARE
counter (i.e. HPMCOUNTER)
B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT CSR
value. This is optional for platform. By default, OpenSBI will
write a value <xyz> to MHPMEVENT CSR where lower 20bits of <xyz>
are event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20
bits of event_info
C) Additional platform-specific progamming required for selecting
event_idx + event_info combination. This is also optional for
platform.

3. All platform dependent information mentioned above, can be obtained
by M-mode runtime firmware (OpenSBI) from platform specific code.
The DT/ACPI can also be used to describe 2.A and 2.B mentioned above
but 2.C will always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
The Linux RISC-V driver can be platform driver with "riscv,sbi-pmu"
as DT compatible string and optional "interrupts" DT property. The
"interrupts" DT property if available should specify an edge-triggered
overflow interrupt for each HART. When "interrupts" DT property is
present, we might also need another DT property for mapping HARTID
to entries in "interrupts" DT property. The platform driver probe
will:
A) Need to ensure that underlying SBI implementation provides
SBI PMU extension using sbi_probe_extension() API of arch/riscv.
B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
call. If the counter is a SOFTWARE counter then use the
SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
of counter. The driver skip this in driver probe and instead
do this lazily in add() callback mentioned below.

2. event_init() callback
The event_init() callback will primarily translate user-space
perf_event_attr to SBI PMU event_idx and event_info. It can do
this in following way:
A) perf_event_attr.type == PERF_TYPE_HARDWARE
event_idx.type = 0x0
event_idx.code = Value from enum sbi_pmu_hw_id based on
perf_event_attr.config
event_info = 0
B) perf_event_attr.type == PERF_TYPE_HW_CACHE
event_idx.type = 0x1
event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
based on perf_event_attr.config
event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
based on perf_event_attr.config
event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
based on perf_event_attr.config
event_info = 0
C) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 0
event_idx.type = 0x2
event_idx.code = 0x0
event_info = perf_event_attr.config[62:0]
D) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 1
event_idx.type = 0xf
event_idx.code = Value from enum sbi_pmu_sw_id based on
perf_event_attr.config
event_info = 0
(Note: event_init() will fail if it is not able to figure out
event_idx and event_info value corresponding to perf_event_attr)
(Note: event_init() will not assign counter to perf_event because
it will be done by event_add())

3. add() callback
The add() callback of Linux RISC-V PMU driver will find a
free counter on current CPU/HART such that the perf_event
event_idx + event_info combination is supported by the counter.
To check-and-set event_idx + event_info combination for a
counter, we will use the SBI_PMU_COUNTER_SET_EVENT call.
The counter allocation and SBI_PMU_COUNTER_SET_EVENT call
can be futher optimized by looking at CSR details.
For example:
A) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_CPU_CYCLES, we should
prefer counter mapping to CYCLE CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
B) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_INSTRUCTIONS, we should
prefer counter mapping to INSTRET CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
C) For event_idx == 0xf, only perfer counters mapping
to 0xfff CSR (i.e. SOFTWARE counters).

4. del() callback
The del() callback of Linux RISC-V PMU driver will release
or free the counter.

5. start() callback
The start() callback of Linux RISC-V PMU driver will start
the counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
The stop() callback of Linux RISC-V PMU driver will stop
the counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup


Zong Li
 

On Mon, Jul 13, 2020 at 9:12 PM Anup Patel <Anup.Patel@...> wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER
CSRs. A RISC-V implementation can support monitoring various HARDWARE
events using limited number of HPMCOUNTER CSRs.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters. The SBI
PMU extension will only manage per-HART (or per-CPU) HARDWARE/SOFTWARE
counters which include CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and
SOFTWARE counters provided by SBI implementation.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or Xvisor)
will provide a standardized view of HARDWARE/SOFTWARE counters and
events to S-mode (or VS-mode) software.

To define SBI PMU extension, we first define counter_idx which is a
logical number assigned to a counter and event_idx which is an encoded
number representing the HARDWARE/SOFTWARE event to be monitored. A
HARDWARE/SOFTWARE event can also have additional configuration/details
referred to as event_info.

The SBI PMU event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event,
the event_info is not required whereas the event_idx.code can be one
of the following values:
enum sbi_pmu_hw_id {
SBI_PMU_HW_CPU_CYCLES = 0,
SBI_PMU_HW_INSTRUCTIONS = 1,
SBI_PMU_HW_CACHE_REFERENCES = 2,
SBI_PMU_HW_CACHE_MISSES = 3,
SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4,
SBI_PMU_HW_BRANCH_MISSES = 5,
SBI_PMU_HW_BUS_CYCLES = 6,
SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8,
SBI_PMU_HW_REF_CPU_CYCLES = 9,
SBI_PMU_HW_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_info is not required whereas the event_idx.code
is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
SBI_PMU_HW_CACHE_L1D = 0,
SBI_PMU_HW_CACHE_L1I = 1,
SBI_PMU_HW_CACHE_LL = 2,
SBI_PMU_HW_CACHE_DTLB = 3,
SBI_PMU_HW_CACHE_ITLB = 4,
SBI_PMU_HW_CACHE_BPU = 5,
SBI_PMU_HW_CACHE_NODE = 6,
SBI_PMU_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
SBI_PMU_HW_CACHE_OP_READ = 0,
SBI_PMU_HW_CACHE_OP_WRITE = 1,
SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
SBI_PMU_HW_CACHE_RESULT_MISS = 1,
SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
RAW event, the event_idx.code should be zero and the event_info
parameter passed to SBI_PMU_COUNTER_SET_EVENT call (described below)
will have the RAW event value to be programmed in MHPMEVENT CSR (i.e.
the SBI implementation will not derive MHPMEVENT CSR value from
event_idx + event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is not required whereas the event_idx.code
can be one of the following:
enum sbi_pmu_sw_id {
SBI_PMU_SW_MISALIGNED_LOAD = 0,
SBI_PMU_SW_MISALIGNED_STORE = 1,
SBI_PMU_SW_ILLEGAL_INSN = 2,
SBI_PMU_SW_LOCAL_SET_TIMER = 3,
SBI_PMU_SW_LOCAL_IPI = 4,
SBI_PMU_SW_LOCAL_FENCE_I = 5,
SBI_PMU_SW_LOCAL_SFENCE_VMA = 6,
SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7,
SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8,
SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10,
SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
SBI_PMU_SW_MAX, /* non-ABI */
};

In future, more events can be defined without breaking SBI call
compatibility of SBI calls.

Using definition of counter_idx and event_idx, we can potentially have
the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
This call will return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
This call takes one parameter:
1) counter_idx
It will provide the CSR_Number and CSR_Width of underlying counter.
The value returned by SBI call is encoded as follows:
return_value[11:0] = CSR_Number
return_value[19:12] = CSR_Width (Number of bits implemented in HW)
If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
HARDWARE counter. This SBI call will fail for counters which are not
present.

3. SBI_PMU_COUNTER_SET_EVENT
This call takes three parameter:
1) counter_idx
2) event_idx
3) event_info
It will select an event to be monitored by given counter. If this
SBI call is not used for a counter to select an event then the
counter will montior default event selected for it at boot-time.
This SBI call will fail for counters which are not present. It will
also fail if specified event_idx + event_info combination is not
supported by given counter.
It also seems to fail if the specified event is not supported by the given
counter, right? Then Linux driver could try to allocate the next free counter
when returning failure from this SBI calls.

Apart from this question above, this version of the proposal is great to me.

Thanks,
Zong


4. SBI_PMU_COUNTER_SET_PHYS_ADDR
This call takes two parameters:
1) counter_idx
2) 8byte aligned physical address
It will set the physical address of memory location where the SBI
implementation will write the 64bit SOFTWARE counter. This SBI call
is only for counters not mapped to any CSR (i.e. only for counters
with CSR_Number > 0xfff).

5. SBI_PMU_COUNTER_START
This call takes two parameters:
1) counter_idx
2) initial_value
It will inform SBI implementation to start/enable specified counter
with specified initial value. This SBI call will fail for counters
which are not present.

6. SBI_PMU_COUNTER_STOP
This call takes one parameter:
1) counter_idx
It will inform SBI implementation to stop/disable specified counters
on the calling HART. This SBI call will fail for counters which are
not present.

The M-mode runtime firmware (OpenSBI) Development Notes:

1. The M-mode runtime firmware will have to translate SBI PMU
event_idx and event_into into platform dependent MHPMEVENT CSR
value before starting/enabling a HARDWARE counter.

2. The M-mode runtime firmware (OpenSBI) will need to know following
platform dependent information:
A) Possible event_idx values allowed (or supported) by a HARDWARE
counter (i.e. HPMCOUNTER)
B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT CSR
value. This is optional for platform. By default, OpenSBI will
write a value <xyz> to MHPMEVENT CSR where lower 20bits of <xyz>
are event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20
bits of event_info
C) Additional platform-specific progamming required for selecting
event_idx + event_info combination. This is also optional for
platform.

3. All platform dependent information mentioned above, can be obtained
by M-mode runtime firmware (OpenSBI) from platform specific code.
The DT/ACPI can also be used to describe 2.A and 2.B mentioned above
but 2.C will always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
The Linux RISC-V driver can be platform driver with "riscv,sbi-pmu"
as DT compatible string and optional "interrupts" DT property. The
"interrupts" DT property if available should specify an edge-triggered
overflow interrupt for each HART. When "interrupts" DT property is
present, we might also need another DT property for mapping HARTID
to entries in "interrupts" DT property. The platform driver probe
will:
A) Need to ensure that underlying SBI implementation provides
SBI PMU extension using sbi_probe_extension() API of arch/riscv.
B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
call. If the counter is a SOFTWARE counter then use the
SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
of counter. The driver skip this in driver probe and instead
do this lazily in add() callback mentioned below.

2. event_init() callback
The event_init() callback will primarily translate user-space
perf_event_attr to SBI PMU event_idx and event_info. It can do
this in following way:
A) perf_event_attr.type == PERF_TYPE_HARDWARE
event_idx.type = 0x0
event_idx.code = Value from enum sbi_pmu_hw_id based on
perf_event_attr.config
event_info = 0
B) perf_event_attr.type == PERF_TYPE_HW_CACHE
event_idx.type = 0x1
event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
based on perf_event_attr.config
event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
based on perf_event_attr.config
event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
based on perf_event_attr.config
event_info = 0
C) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 0
event_idx.type = 0x2
event_idx.code = 0x0
event_info = perf_event_attr.config[62:0]
D) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 1
event_idx.type = 0xf
event_idx.code = Value from enum sbi_pmu_sw_id based on
perf_event_attr.config
event_info = 0
(Note: event_init() will fail if it is not able to figure out
event_idx and event_info value corresponding to perf_event_attr)
(Note: event_init() will not assign counter to perf_event because
it will be done by event_add())

3. add() callback
The add() callback of Linux RISC-V PMU driver will find a
free counter on current CPU/HART such that the perf_event
event_idx + event_info combination is supported by the counter.
To check-and-set event_idx + event_info combination for a
counter, we will use the SBI_PMU_COUNTER_SET_EVENT call.
The counter allocation and SBI_PMU_COUNTER_SET_EVENT call
can be futher optimized by looking at CSR details.
For example:
A) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_CPU_CYCLES, we should
prefer counter mapping to CYCLE CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
B) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_INSTRUCTIONS, we should
prefer counter mapping to INSTRET CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
C) For event_idx == 0xf, only perfer counters mapping
to 0xfff CSR (i.e. SOFTWARE counters).

4. del() callback
The del() callback of Linux RISC-V PMU driver will release
or free the counter.

5. start() callback
The start() callback of Linux RISC-V PMU driver will start
the counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
The stop() callback of Linux RISC-V PMU driver will stop
the counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup


Anup Patel
 

-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 14 July 2020 09:02
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra
<Atish.Patra@...>; andrew@...; gfavor@...
Subject: Re: Proposal v3: SBI PMU Extension

On Mon, Jul 13, 2020 at 9:12 PM Anup Patel <Anup.Patel@...> wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension but we do have
HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER
CSRs. A RISC-V implementation can support monitoring various HARDWARE
events using limited number of HPMCOUNTER CSRs.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters. The
SBI
PMU extension will only manage per-HART (or per-CPU)
HARDWARE/SOFTWARE
counters which include CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and
SOFTWARE counters provided by SBI implementation.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or
Xvisor) will provide a standardized view of HARDWARE/SOFTWARE
counters
and events to S-mode (or VS-mode) software.

To define SBI PMU extension, we first define counter_idx which is a
logical number assigned to a counter and event_idx which is an encoded
number representing the HARDWARE/SOFTWARE event to be monitored.
A
HARDWARE/SOFTWARE event can also have additional
configuration/details
referred to as event_info.

The SBI PMU event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE
event, the event_info is not required whereas the event_idx.code can
be one of the following values:
enum sbi_pmu_hw_id {
SBI_PMU_HW_CPU_CYCLES = 0,
SBI_PMU_HW_INSTRUCTIONS = 1,
SBI_PMU_HW_CACHE_REFERENCES = 2,
SBI_PMU_HW_CACHE_MISSES = 3,
SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4,
SBI_PMU_HW_BRANCH_MISSES = 5,
SBI_PMU_HW_BUS_CYCLES = 6,
SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8,
SBI_PMU_HW_REF_CPU_CYCLES = 9,
SBI_PMU_HW_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For
HARDWARE
CACHE event, the event_info is not required whereas the event_idx.code
is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
SBI_PMU_HW_CACHE_L1D = 0,
SBI_PMU_HW_CACHE_L1I = 1,
SBI_PMU_HW_CACHE_LL = 2,
SBI_PMU_HW_CACHE_DTLB = 3,
SBI_PMU_HW_CACHE_ITLB = 4,
SBI_PMU_HW_CACHE_BPU = 5,
SBI_PMU_HW_CACHE_NODE = 6,
SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum
sbi_pmu_hw_cache_op_id
{
SBI_PMU_HW_CACHE_OP_READ = 0,
SBI_PMU_HW_CACHE_OP_WRITE = 1,
SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
SBI_PMU_HW_CACHE_RESULT_MISS = 1,
SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
RAW event, the event_idx.code should be zero and the event_info
parameter passed to SBI_PMU_COUNTER_SET_EVENT call (described
below)
will have the RAW event value to be programmed in MHPMEVENT CSR (i.e.
the SBI implementation will not derive MHPMEVENT CSR value from
event_idx + event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is not required whereas the event_idx.code can
be one of the following:
enum sbi_pmu_sw_id {
SBI_PMU_SW_MISALIGNED_LOAD = 0,
SBI_PMU_SW_MISALIGNED_STORE = 1,
SBI_PMU_SW_ILLEGAL_INSN = 2,
SBI_PMU_SW_LOCAL_SET_TIMER = 3,
SBI_PMU_SW_LOCAL_IPI = 4,
SBI_PMU_SW_LOCAL_FENCE_I = 5,
SBI_PMU_SW_LOCAL_SFENCE_VMA = 6,
SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7,
SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8,
SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10,
SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
SBI_PMU_SW_MAX, /* non-ABI */
};

In future, more events can be defined without breaking SBI call
compatibility of SBI calls.

Using definition of counter_idx and event_idx, we can potentially have
the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
This call will return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
This call takes one parameter:
1) counter_idx
It will provide the CSR_Number and CSR_Width of underlying counter.
The value returned by SBI call is encoded as follows:
return_value[11:0] = CSR_Number
return_value[19:12] = CSR_Width (Number of bits implemented in
HW)
If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
HARDWARE counter. This SBI call will fail for counters which are not
present.

3. SBI_PMU_COUNTER_SET_EVENT
This call takes three parameter:
1) counter_idx
2) event_idx
3) event_info
It will select an event to be monitored by given counter. If this
SBI call is not used for a counter to select an event then the
counter will montior default event selected for it at boot-time.
This SBI call will fail for counters which are not present. It will
also fail if specified event_idx + event_info combination is not
supported by given counter.
It also seems to fail if the specified event is not supported by the given
counter, right? Then Linux driver could try to allocate the next free counter
when returning failure from this SBI calls.
Yes, this call will fail if event_idx + event_info combination is not supported
by given counter_idx. It is expected that Linux driver will try another
free counter if SBI_PMU_COUNTER_SET_EVENT call fails. I have suggested
few ideas on how to reduce SBI_PMU_COUNTER_SET_EVENT calls by
looking at CSR number assigned to counter.


Apart from this question above, this version of the proposal is great to me.
Cool ūüėä

Regards,
Anup


Thanks,
Zong


4. SBI_PMU_COUNTER_SET_PHYS_ADDR
This call takes two parameters:
1) counter_idx
2) 8byte aligned physical address
It will set the physical address of memory location where the SBI
implementation will write the 64bit SOFTWARE counter. This SBI call
is only for counters not mapped to any CSR (i.e. only for counters
with CSR_Number > 0xfff).

5. SBI_PMU_COUNTER_START
This call takes two parameters:
1) counter_idx
2) initial_value
It will inform SBI implementation to start/enable specified counter
with specified initial value. This SBI call will fail for counters
which are not present.

6. SBI_PMU_COUNTER_STOP
This call takes one parameter:
1) counter_idx
It will inform SBI implementation to stop/disable specified counters
on the calling HART. This SBI call will fail for counters which are
not present.

The M-mode runtime firmware (OpenSBI) Development Notes:

1. The M-mode runtime firmware will have to translate SBI PMU
event_idx and event_into into platform dependent MHPMEVENT CSR
value before starting/enabling a HARDWARE counter.

2. The M-mode runtime firmware (OpenSBI) will need to know following
platform dependent information:
A) Possible event_idx values allowed (or supported) by a HARDWARE
counter (i.e. HPMCOUNTER)
B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT
CSR
value. This is optional for platform. By default, OpenSBI will
write a value <xyz> to MHPMEVENT CSR where lower 20bits of <xyz>
are event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20
bits of event_info
C) Additional platform-specific progamming required for selecting
event_idx + event_info combination. This is also optional for
platform.

3. All platform dependent information mentioned above, can be obtained
by M-mode runtime firmware (OpenSBI) from platform specific code.
The DT/ACPI can also be used to describe 2.A and 2.B mentioned above
but 2.C will always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
The Linux RISC-V driver can be platform driver with "riscv,sbi-pmu"
as DT compatible string and optional "interrupts" DT property. The
"interrupts" DT property if available should specify an edge-triggered
overflow interrupt for each HART. When "interrupts" DT property is
present, we might also need another DT property for mapping HARTID
to entries in "interrupts" DT property. The platform driver probe
will:
A) Need to ensure that underlying SBI implementation provides
SBI PMU extension using sbi_probe_extension() API of arch/riscv.
B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
call. If the counter is a SOFTWARE counter then use the
SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
of counter. The driver skip this in driver probe and instead
do this lazily in add() callback mentioned below.

2. event_init() callback
The event_init() callback will primarily translate user-space
perf_event_attr to SBI PMU event_idx and event_info. It can do
this in following way:
A) perf_event_attr.type == PERF_TYPE_HARDWARE
event_idx.type = 0x0
event_idx.code = Value from enum sbi_pmu_hw_id based on
perf_event_attr.config
event_info = 0
B) perf_event_attr.type == PERF_TYPE_HW_CACHE
event_idx.type = 0x1
event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
based on perf_event_attr.config
event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
based on perf_event_attr.config
event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
based on perf_event_attr.config
event_info = 0
C) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 0
event_idx.type = 0x2
event_idx.code = 0x0
event_info = perf_event_attr.config[62:0]
D) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 1
event_idx.type = 0xf
event_idx.code = Value from enum sbi_pmu_sw_id based on
perf_event_attr.config
event_info = 0
(Note: event_init() will fail if it is not able to figure out
event_idx and event_info value corresponding to perf_event_attr)
(Note: event_init() will not assign counter to perf_event because
it will be done by event_add())

3. add() callback
The add() callback of Linux RISC-V PMU driver will find a
free counter on current CPU/HART such that the perf_event
event_idx + event_info combination is supported by the counter.
To check-and-set event_idx + event_info combination for a
counter, we will use the SBI_PMU_COUNTER_SET_EVENT call.
The counter allocation and SBI_PMU_COUNTER_SET_EVENT call
can be futher optimized by looking at CSR details.
For example:
A) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_CPU_CYCLES, we should
prefer counter mapping to CYCLE CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
B) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_INSTRUCTIONS, we should
prefer counter mapping to INSTRET CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
C) For event_idx == 0xf, only perfer counters mapping
to 0xfff CSR (i.e. SOFTWARE counters).

4. del() callback
The del() callback of Linux RISC-V PMU driver will release
or free the counter.

5. start() callback
The start() callback of Linux RISC-V PMU driver will start
the counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
The stop() callback of Linux RISC-V PMU driver will stop
the counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup


Brian Grayson
 

Should there also be a way to atomically specify start/stop for a set of counters, or is the latency of N SBI start/stop calls short enough that starting or stopping N counters will not take that long? For a lot of cores today, N is very small, like 2 for some cores, but as RISC-V cores continue to grow in capability, N could easily become 4 to 8 for the core, another set in the L2, another set in the L3, etc.

Brian

On Mon, Jul 13, 2020 at 10:41 PM Anup Patel <anup.patel@...> wrote:


> -----Original Message-----
> From: Zong Li <zong.li@...>
> Sent: 14 July 2020 09:02
> To: Anup Patel <Anup.Patel@...>
> Cc: tech-unixplatformspec@...; Atish Patra
> <Atish.Patra@...>; andrew@...; gfavor@...
> Subject: Re: Proposal v3: SBI PMU Extension
>
> On Mon, Jul 13, 2020 at 9:12 PM Anup Patel <Anup.Patel@...> wrote:
> >
> > Hi All,
> >
> > We don't have a dedicated RISC-V PMU extension but we do have
> HARDWARE
> > performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER
> > CSRs. A RISC-V implementation can support monitoring various HARDWARE
> > events using limited number of HPMCOUNTER CSRs.
> >
> > In addition to HARDWARE performance counters, a SBI implementation
> > (e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
> > events such as number of RFENCEs, number of IPIs, number of misaligned
> > load/store instructions, number of illegal instructions, etc.
> >
> > We propose SBI PMU extension, which will help S-mode (or VS-mode)
> > software to discover and configure HARDWARE/SOFTWARE counters. The
> SBI
> > PMU extension will only manage per-HART (or per-CPU)
> HARDWARE/SOFTWARE
> > counters which include CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and
> > SOFTWARE counters provided by SBI implementation.
> >
> > Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or
> > Xvisor) will provide a standardized view of HARDWARE/SOFTWARE
> counters
> > and events to S-mode (or VS-mode) software.
> >
> > To define SBI PMU extension, we first define counter_idx which is a
> > logical number assigned to a counter and event_idx which is an encoded
> > number representing the HARDWARE/SOFTWARE event to be monitored.
> A
> > HARDWARE/SOFTWARE event can also have additional
> configuration/details
> > referred to as event_info.
> >
> > The SBI PMU event_idx is a 20bits wide number encoded as follows:
> > event_idx[19:16] = type
> > event_idx[15:0] = code
> >
> > If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE
> > event, the event_info is not required whereas the event_idx.code can
> > be one of the following values:
> > enum sbi_pmu_hw_id {
> >     SBI_PMU_HW_CPU_CYCLES              = 0,
> >     SBI_PMU_HW_INSTRUCTIONS            = 1,
> >     SBI_PMU_HW_CACHE_REFERENCES        = 2,
> >     SBI_PMU_HW_CACHE_MISSES            = 3,
> >     SBI_PMU_HW_BRANCH_INSTRUCTIONS     = 4,
> >     SBI_PMU_HW_BRANCH_MISSES           = 5,
> >     SBI_PMU_HW_BUS_CYCLES              = 6,
> >     SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
> >     SBI_PMU_HW_STALLED_CYCLES_BACKEND  = 8,
> >     SBI_PMU_HW_REF_CPU_CYCLES          = 9,
> >     SBI_PMU_HW_MAX,                    /* non-ABI */
> > };
> > (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
> >
> > If event_idx.type == 0x1 then it is HARDWARE CACHE event. For
> HARDWARE
> > CACHE event, the event_info is not required whereas the event_idx.code
> > is encoded as follows:
> > event_idx.code[15:3] = cache_id
> > event_idx.code[2:1] = op_id
> > event_idx.code[0:0] = result_id
> > enum sbi_pmu_hw_cache_id {
> >     SBI_PMU_HW_CACHE_L1D  = 0,
> >     SBI_PMU_HW_CACHE_L1I  = 1,
> >     SBI_PMU_HW_CACHE_LL   = 2,
> >     SBI_PMU_HW_CACHE_DTLB = 3,
> >     SBI_PMU_HW_CACHE_ITLB = 4,
> >     SBI_PMU_HW_CACHE_BPU  = 5,
> >     SBI_PMU_HW_CACHE_NODE = 6,
> >     SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum
> sbi_pmu_hw_cache_op_id
> > {
> >     SBI_PMU_HW_CACHE_OP_READ     = 0,
> >     SBI_PMU_HW_CACHE_OP_WRITE    = 1,
> >     SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
> >     SBI_PMU_HW_CACHE_OP_MAX,     /* non-ABI */
> > };
> > enum sbi_pmu_hw_cache_op_result_id {
> >     SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
> >     SBI_PMU_HW_CACHE_RESULT_MISS   = 1,
> >     SBI_PMU_HW_CACHE_RESULT_MAX,   /* non-ABI */
> > };
> > (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
> >
> > If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
> > RAW event, the event_idx.code should be zero and the event_info
> > parameter passed to SBI_PMU_COUNTER_SET_EVENT call (described
> below)
> > will have the RAW event value to be programmed in MHPMEVENT CSR (i.e.
> > the SBI implementation will not derive MHPMEVENT CSR value from
> > event_idx + event_info).
> >
> > If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
> > event, the event_info is not required whereas the event_idx.code can
> > be one of the following:
> > enum sbi_pmu_sw_id {
> >     SBI_PMU_SW_MISALIGNED_LOAD        = 0,
> >     SBI_PMU_SW_MISALIGNED_STORE       = 1,
> >     SBI_PMU_SW_ILLEGAL_INSN           = 2,
> >     SBI_PMU_SW_LOCAL_SET_TIMER        = 3,
> >     SBI_PMU_SW_LOCAL_IPI              = 4,
> >     SBI_PMU_SW_LOCAL_FENCE_I          = 5,
> >     SBI_PMU_SW_LOCAL_SFENCE_VMA       = 6,
> >     SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID  = 7,
> >     SBI_PMU_SW_LOCAL_HFENCE_GVMA      = 8,
> >     SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
> >     SBI_PMU_SW_LOCAL_HFENCE_VVMA      = 10,
> >     SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
> >     SBI_PMU_SW_MAX,                   /* non-ABI */
> > };
> >
> > In future, more events can be defined without breaking SBI call
> > compatibility of SBI calls.
> >
> > Using definition of counter_idx and event_idx, we can potentially have
> > the following SBI calls:
> >
> > 1. SBI_PMU_NUM_COUNTERS
> >    This call will return the number of COUNTERs
> >
> > 2. SBI_PMU_COUNTER_GET_CSR
> >    This call takes one parameter:
> >       1) counter_idx
> >    It will provide the CSR_Number and CSR_Width of underlying counter.
> >    The value returned by SBI call is encoded as follows:
> >       return_value[11:0] = CSR_Number
> >           return_value[19:12] = CSR_Width (Number of bits implemented in
> HW)
> >    If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
> >    HARDWARE counter. This SBI call will fail for counters which are not
> >    present.
> >
> > 3. SBI_PMU_COUNTER_SET_EVENT
> >    This call takes three parameter:
> >       1) counter_idx
> >       2) event_idx
> >       3) event_info
> >    It will select an event to be monitored by given counter. If this
> >    SBI call is not used for a counter to select an event then the
> >    counter will montior default event selected for it at boot-time.
> >    This SBI call will fail for counters which are not present. It will
> >    also fail if specified event_idx + event_info combination is not
> >    supported by given counter.
>
> It also seems to fail if the specified event is not supported by the given
> counter, right? Then Linux driver could try to allocate the next free counter
> when returning failure from this SBI calls.

Yes, this call will fail if event_idx + event_info combination is not supported
by given counter_idx. It is expected that Linux driver will try another
free counter if SBI_PMU_COUNTER_SET_EVENT call fails. I have suggested
few ideas on how to reduce SBI_PMU_COUNTER_SET_EVENT calls by
looking at CSR number assigned to counter.

>
> Apart from this question above, this version of the proposal is great to me.

Cool ūüėä

Regards,
Anup

>
> Thanks,
> Zong
>
> >
> > 4. SBI_PMU_COUNTER_SET_PHYS_ADDR
> >    This call takes two parameters:
> >       1) counter_idx
> >       2) 8byte aligned physical address
> >    It will set the physical address of memory location where the SBI
> >    implementation will write the 64bit SOFTWARE counter. This SBI call
> >    is only for counters not mapped to any CSR (i.e. only for counters
> >    with CSR_Number > 0xfff).
> >
> > 5. SBI_PMU_COUNTER_START
> >    This call takes two parameters:
> >       1) counter_idx
> >       2) initial_value
> >    It will inform SBI implementation to start/enable specified counter
> >    with specified initial value. This SBI call will fail for counters
> >    which are not present.
> >
> > 6. SBI_PMU_COUNTER_STOP
> >    This call takes one parameter:
> >       1) counter_idx
> >    It will inform SBI implementation to stop/disable specified counters
> >    on the calling HART. This SBI call will fail for counters which are
> >    not present.
> >
> > The M-mode runtime firmware (OpenSBI) Development Notes:
> >
> > 1. The M-mode runtime firmware will have to translate SBI PMU
> >    event_idx and event_into into platform dependent MHPMEVENT CSR
> >    value before starting/enabling a HARDWARE counter.
> >
> > 2. The M-mode runtime firmware (OpenSBI) will need to know following
> >    platform dependent information:
> >    A) Possible event_idx values allowed (or supported) by a HARDWARE
> >       counter (i.e. HPMCOUNTER)
> >    B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT
> CSR
> >       value. This is optional for platform. By default, OpenSBI will
> >       write a value <xyz> to MHPMEVENT CSR where lower 20bits of <xyz>
> >       are event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20
> >       bits of event_info
> >    C) Additional platform-specific progamming required for selecting
> >       event_idx + event_info combination. This is also optional for
> >       platform.
> >
> > 3. All platform dependent information mentioned above, can be obtained
> >    by M-mode runtime firmware (OpenSBI) from platform specific code.
> >    The DT/ACPI can also be used to describe 2.A and 2.B mentioned above
> >    but 2.C will always require platform specific code.
> >
> > Linux RISC-V PMU Driver Development Notes:
> >
> > 1. Driver probe
> >    The Linux RISC-V driver can be platform driver with "riscv,sbi-pmu"
> >    as DT compatible string and optional "interrupts" DT property. The
> >    "interrupts" DT property if available should specify an edge-triggered
> >    overflow interrupt for each HART. When "interrupts" DT property is
> >    present, we might also need another DT property for mapping HARTID
> >    to entries in "interrupts" DT property. The platform driver probe
> >    will:
> >    A) Need to ensure that underlying SBI implementation provides
> >       SBI PMU extension using sbi_probe_extension() API of arch/riscv.
> >    B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
> >    C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
> >       call. If the counter is a SOFTWARE counter then use the
> >           SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
> >       of counter. The driver skip this in driver probe and instead
> >           do this lazily in add() callback mentioned below.
> >
> > 2. event_init() callback
> >    The event_init() callback will primarily translate user-space
> >    perf_event_attr to SBI PMU event_idx and event_info. It can do
> >    this in following way:
> >    A) perf_event_attr.type == PERF_TYPE_HARDWARE
> >       event_idx.type = 0x0
> >       event_idx.code = Value from enum sbi_pmu_hw_id based on
> >                            perf_event_attr.config
> >       event_info = 0
> >    B) perf_event_attr.type == PERF_TYPE_HW_CACHE
> >       event_idx.type = 0x1
> >       event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
> >                                     based on perf_event_attr.config
> >       event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
> >                                  based on perf_event_attr.config
> >       event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
> >                                      based on perf_event_attr.config
> >       event_info = 0
> >    C) perf_event_attr.type == PERF_TYPE_RAW and
> >       perf_event_attr.config[63:63] == 0
> >       event_idx.type = 0x2
> >           event_idx.code = 0x0
> >           event_info = perf_event_attr.config[62:0]
> >    D) perf_event_attr.type == PERF_TYPE_RAW and
> >       perf_event_attr.config[63:63] == 1
> >       event_idx.type = 0xf
> >           event_idx.code = Value from enum sbi_pmu_sw_id based on
> >                            perf_event_attr.config
> >           event_info = 0
> >    (Note: event_init() will fail if it is not able to figure out
> >     event_idx and event_info value corresponding to perf_event_attr)
> >    (Note: event_init() will not assign counter to perf_event because
> >     it will be done by event_add())
> >
> > 3. add() callback
> >    The add() callback of Linux RISC-V PMU driver will find a
> >    free counter on current CPU/HART such that the perf_event
> >    event_idx + event_info combination is supported by the counter.
> >    To check-and-set event_idx + event_info combination for a
> >    counter, we will use the SBI_PMU_COUNTER_SET_EVENT call.
> >    The counter allocation and SBI_PMU_COUNTER_SET_EVENT call
> >    can be futher optimized by looking at CSR details.
> >    For example:
> >    A) For event_idx.type == 0 and
> >       event_idx.code == SBI_PMU_HW_CPU_CYCLES, we should
> >           prefer counter mapping to CYCLE CSR and skip doing
> >           SBI_PMU_COUNTER_SET_EVENT call.
> >    B) For event_idx.type == 0 and
> >       event_idx.code == SBI_PMU_HW_INSTRUCTIONS, we should
> >           prefer counter mapping to INSTRET CSR and skip doing
> >           SBI_PMU_COUNTER_SET_EVENT call.
> >    C) For event_idx == 0xf, only perfer counters mapping
> >       to 0xfff CSR (i.e. SOFTWARE counters).
> >
> > 4. del() callback
> >    The del() callback of Linux RISC-V PMU driver will release
> >    or free the counter.
> >
> > 5. start() callback
> >    The start() callback of Linux RISC-V PMU driver will start
> >    the counter using the SBI_PMU_COUNTER_START call.
> >
> > 6. stop() callback
> >    The stop() callback of Linux RISC-V PMU driver will stop
> >    the counter using the SBI_PMU_COUNTER_STOP call.
> >
> > Regards,
> > Anup




Anup Patel
 

One SBI call to start/stop  N counters will certainly be faster than N SBI calls.

 

We did not include SBI calls to start/stop a set of counters because Linux perf drivers only require mechanism to start/stop one counter.

 

Regards,

Anup

 

From: Brian Grayson <brian.grayson@...>
Sent: 14 July 2020 18:58
To: Anup Patel <Anup.Patel@...>
Cc: Zong Li <zong.li@...>; tech-unixplatformspec@...; Atish Patra <Atish.Patra@...>; andrew@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v3: SBI PMU Extension

 

Should there also be a way to atomically specify start/stop for a set of counters, or is the latency of N SBI start/stop calls short enough that starting or stopping N counters will not take that long? For a lot of cores today, N is very small, like 2 for some cores, but as RISC-V cores continue to grow in capability, N could easily become 4 to 8 for the core, another set in the L2, another set in the L3, etc.

 

Brian

 

On Mon, Jul 13, 2020 at 10:41 PM Anup Patel <anup.patel@...> wrote:



> -----Original Message-----
> From: Zong Li <zong.li@...>
> Sent: 14 July 2020 09:02
> To: Anup Patel <Anup.Patel@...>
> Cc: tech-unixplatformspec@...; Atish Patra
> <Atish.Patra@...>; andrew@...; gfavor@...
> Subject: Re: Proposal v3: SBI PMU Extension
>
> On Mon, Jul 13, 2020 at 9:12 PM Anup Patel <Anup.Patel@...> wrote:
> >
> > Hi All,
> >
> > We don't have a dedicated RISC-V PMU extension but we do have
> HARDWARE
> > performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER
> > CSRs. A RISC-V implementation can support monitoring various HARDWARE
> > events using limited number of HPMCOUNTER CSRs.
> >
> > In addition to HARDWARE performance counters, a SBI implementation
> > (e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
> > events such as number of RFENCEs, number of IPIs, number of misaligned
> > load/store instructions, number of illegal instructions, etc.
> >
> > We propose SBI PMU extension, which will help S-mode (or VS-mode)
> > software to discover and configure HARDWARE/SOFTWARE counters. The
> SBI
> > PMU extension will only manage per-HART (or per-CPU)
> HARDWARE/SOFTWARE
> > counters which include CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and
> > SOFTWARE counters provided by SBI implementation.
> >
> > Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or
> > Xvisor) will provide a standardized view of HARDWARE/SOFTWARE
> counters
> > and events to S-mode (or VS-mode) software.
> >
> > To define SBI PMU extension, we first define counter_idx which is a
> > logical number assigned to a counter and event_idx which is an encoded
> > number representing the HARDWARE/SOFTWARE event to be monitored.
> A
> > HARDWARE/SOFTWARE event can also have additional
> configuration/details
> > referred to as event_info.
> >
> > The SBI PMU event_idx is a 20bits wide number encoded as follows:
> > event_idx[19:16] = type
> > event_idx[15:0] = code
> >
> > If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE
> > event, the event_info is not required whereas the event_idx.code can
> > be one of the following values:
> > enum sbi_pmu_hw_id {
> >     SBI_PMU_HW_CPU_CYCLES              = 0,
> >     SBI_PMU_HW_INSTRUCTIONS            = 1,
> >     SBI_PMU_HW_CACHE_REFERENCES        = 2,
> >     SBI_PMU_HW_CACHE_MISSES            = 3,
> >     SBI_PMU_HW_BRANCH_INSTRUCTIONS     = 4,
> >     SBI_PMU_HW_BRANCH_MISSES           = 5,
> >     SBI_PMU_HW_BUS_CYCLES              = 6,
> >     SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
> >     SBI_PMU_HW_STALLED_CYCLES_BACKEND  = 8,
> >     SBI_PMU_HW_REF_CPU_CYCLES          = 9,
> >     SBI_PMU_HW_MAX,                    /* non-ABI */
> > };
> > (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
> >
> > If event_idx.type == 0x1 then it is HARDWARE CACHE event. For
> HARDWARE
> > CACHE event, the event_info is not required whereas the event_idx.code
> > is encoded as follows:
> > event_idx.code[15:3] = cache_id
> > event_idx.code[2:1] = op_id
> > event_idx.code[0:0] = result_id
> > enum sbi_pmu_hw_cache_id {
> >     SBI_PMU_HW_CACHE_L1D  = 0,
> >     SBI_PMU_HW_CACHE_L1I  = 1,
> >     SBI_PMU_HW_CACHE_LL   = 2,
> >     SBI_PMU_HW_CACHE_DTLB = 3,
> >     SBI_PMU_HW_CACHE_ITLB = 4,
> >     SBI_PMU_HW_CACHE_BPU  = 5,
> >     SBI_PMU_HW_CACHE_NODE = 6,
> >     SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum
> sbi_pmu_hw_cache_op_id
> > {
> >     SBI_PMU_HW_CACHE_OP_READ     = 0,
> >     SBI_PMU_HW_CACHE_OP_WRITE    = 1,
> >     SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
> >     SBI_PMU_HW_CACHE_OP_MAX,     /* non-ABI */
> > };
> > enum sbi_pmu_hw_cache_op_result_id {
> >     SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
> >     SBI_PMU_HW_CACHE_RESULT_MISS   = 1,
> >     SBI_PMU_HW_CACHE_RESULT_MAX,   /* non-ABI */
> > };
> > (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
> >
> > If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
> > RAW event, the event_idx.code should be zero and the event_info
> > parameter passed to SBI_PMU_COUNTER_SET_EVENT call (described
> below)
> > will have the RAW event value to be programmed in MHPMEVENT CSR (i.e.
> > the SBI implementation will not derive MHPMEVENT CSR value from
> > event_idx + event_info).
> >
> > If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
> > event, the event_info is not required whereas the event_idx.code can
> > be one of the following:
> > enum sbi_pmu_sw_id {
> >     SBI_PMU_SW_MISALIGNED_LOAD        = 0,
> >     SBI_PMU_SW_MISALIGNED_STORE       = 1,
> >     SBI_PMU_SW_ILLEGAL_INSN           = 2,
> >     SBI_PMU_SW_LOCAL_SET_TIMER        = 3,
> >     SBI_PMU_SW_LOCAL_IPI              = 4,
> >     SBI_PMU_SW_LOCAL_FENCE_I          = 5,
> >     SBI_PMU_SW_LOCAL_SFENCE_VMA       = 6,
> >     SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID  = 7,
> >     SBI_PMU_SW_LOCAL_HFENCE_GVMA      = 8,
> >     SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
> >     SBI_PMU_SW_LOCAL_HFENCE_VVMA      = 10,
> >     SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
> >     SBI_PMU_SW_MAX,                   /* non-ABI */
> > };
> >
> > In future, more events can be defined without breaking SBI call
> > compatibility of SBI calls.
> >
> > Using definition of counter_idx and event_idx, we can potentially have
> > the following SBI calls:
> >
> > 1. SBI_PMU_NUM_COUNTERS
> >    This call will return the number of COUNTERs
> >
> > 2. SBI_PMU_COUNTER_GET_CSR
> >    This call takes one parameter:
> >       1) counter_idx
> >    It will provide the CSR_Number and CSR_Width of underlying counter.
> >    The value returned by SBI call is encoded as follows:
> >       return_value[11:0] = CSR_Number
> >           return_value[19:12] = CSR_Width (Number of bits implemented in
> HW)
> >    If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
> >    HARDWARE counter. This SBI call will fail for counters which are not
> >    present.
> >
> > 3. SBI_PMU_COUNTER_SET_EVENT
> >    This call takes three parameter:
> >       1) counter_idx
> >       2) event_idx
> >       3) event_info
> >    It will select an event to be monitored by given counter. If this
> >    SBI call is not used for a counter to select an event then the
> >    counter will montior default event selected for it at boot-time.
> >    This SBI call will fail for counters which are not present. It will
> >    also fail if specified event_idx + event_info combination is not
> >    supported by given counter.
>
> It also seems to fail if the specified event is not supported by the given
> counter, right? Then Linux driver could try to allocate the next free counter
> when returning failure from this SBI calls.

Yes, this call will fail if event_idx + event_info combination is not supported
by given counter_idx. It is expected that Linux driver will try another
free counter if SBI_PMU_COUNTER_SET_EVENT call fails. I have suggested
few ideas on how to reduce SBI_PMU_COUNTER_SET_EVENT calls by
looking at CSR number assigned to counter.

>
> Apart from this question above, this version of the proposal is great to me.

Cool ūüėä

Regards,
Anup

>
> Thanks,
> Zong
>
> >
> > 4. SBI_PMU_COUNTER_SET_PHYS_ADDR
> >    This call takes two parameters:
> >       1) counter_idx
> >       2) 8byte aligned physical address
> >    It will set the physical address of memory location where the SBI
> >    implementation will write the 64bit SOFTWARE counter. This SBI call
> >    is only for counters not mapped to any CSR (i.e. only for counters
> >    with CSR_Number > 0xfff).
> >
> > 5. SBI_PMU_COUNTER_START
> >    This call takes two parameters:
> >       1) counter_idx
> >       2) initial_value
> >    It will inform SBI implementation to start/enable specified counter
> >    with specified initial value. This SBI call will fail for counters
> >    which are not present.
> >
> > 6. SBI_PMU_COUNTER_STOP
> >    This call takes one parameter:
> >       1) counter_idx
> >    It will inform SBI implementation to stop/disable specified counters
> >    on the calling HART. This SBI call will fail for counters which are
> >    not present.
> >
> > The M-mode runtime firmware (OpenSBI) Development Notes:
> >
> > 1. The M-mode runtime firmware will have to translate SBI PMU
> >    event_idx and event_into into platform dependent MHPMEVENT CSR
> >    value before starting/enabling a HARDWARE counter.
> >
> > 2. The M-mode runtime firmware (OpenSBI) will need to know following
> >    platform dependent information:
> >    A) Possible event_idx values allowed (or supported) by a HARDWARE
> >       counter (i.e. HPMCOUNTER)
> >    B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT
> CSR
> >       value. This is optional for platform. By default, OpenSBI will
> >       write a value <xyz> to MHPMEVENT CSR where lower 20bits of <xyz>
> >       are event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20
> >       bits of event_info
> >    C) Additional platform-specific progamming required for selecting
> >       event_idx + event_info combination. This is also optional for
> >       platform.
> >
> > 3. All platform dependent information mentioned above, can be obtained
> >    by M-mode runtime firmware (OpenSBI) from platform specific code.
> >    The DT/ACPI can also be used to describe 2.A and 2.B mentioned above
> >    but 2.C will always require platform specific code.
> >
> > Linux RISC-V PMU Driver Development Notes:
> >
> > 1. Driver probe
> >    The Linux RISC-V driver can be platform driver with "riscv,sbi-pmu"
> >    as DT compatible string and optional "interrupts" DT property. The
> >    "interrupts" DT property if available should specify an edge-triggered
> >    overflow interrupt for each HART. When "interrupts" DT property is
> >    present, we might also need another DT property for mapping HARTID
> >    to entries in "interrupts" DT property. The platform driver probe
> >    will:
> >    A) Need to ensure that underlying SBI implementation provides
> >       SBI PMU extension using sbi_probe_extension() API of arch/riscv.
> >    B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
> >    C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
> >       call. If the counter is a SOFTWARE counter then use the
> >           SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
> >       of counter. The driver skip this in driver probe and instead
> >           do this lazily in add() callback mentioned below.
> >
> > 2. event_init() callback
> >    The event_init() callback will primarily translate user-space
> >    perf_event_attr to SBI PMU event_idx and event_info. It can do
> >    this in following way:
> >    A) perf_event_attr.type == PERF_TYPE_HARDWARE
> >       event_idx.type = 0x0
> >       event_idx.code = Value from enum sbi_pmu_hw_id based on
> >                            perf_event_attr.config
> >       event_info = 0
> >    B) perf_event_attr.type == PERF_TYPE_HW_CACHE
> >       event_idx.type = 0x1
> >       event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
> >                                     based on perf_event_attr.config
> >       event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
> >                                  based on perf_event_attr.config
> >       event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
> >                                      based on perf_event_attr.config
> >       event_info = 0
> >    C) perf_event_attr.type == PERF_TYPE_RAW and
> >       perf_event_attr.config[63:63] == 0
> >       event_idx.type = 0x2
> >           event_idx.code = 0x0
> >           event_info = perf_event_attr.config[62:0]
> >    D) perf_event_attr.type == PERF_TYPE_RAW and
> >       perf_event_attr.config[63:63] == 1
> >       event_idx.type = 0xf
> >           event_idx.code = Value from enum sbi_pmu_sw_id based on
> >                            perf_event_attr.config
> >           event_info = 0
> >    (Note: event_init() will fail if it is not able to figure out
> >     event_idx and event_info value corresponding to perf_event_attr)
> >    (Note: event_init() will not assign counter to perf_event because
> >     it will be done by event_add())
> >
> > 3. add() callback
> >    The add() callback of Linux RISC-V PMU driver will find a
> >    free counter on current CPU/HART such that the perf_event
> >    event_idx + event_info combination is supported by the counter.
> >    To check-and-set event_idx + event_info combination for a
> >    counter, we will use the SBI_PMU_COUNTER_SET_EVENT call.
> >    The counter allocation and SBI_PMU_COUNTER_SET_EVENT call
> >    can be futher optimized by looking at CSR details.
> >    For example:
> >    A) For event_idx.type == 0 and
> >       event_idx.code == SBI_PMU_HW_CPU_CYCLES, we should
> >           prefer counter mapping to CYCLE CSR and skip doing
> >           SBI_PMU_COUNTER_SET_EVENT call.
> >    B) For event_idx.type == 0 and
> >       event_idx.code == SBI_PMU_HW_INSTRUCTIONS, we should
> >           prefer counter mapping to INSTRET CSR and skip doing
> >           SBI_PMU_COUNTER_SET_EVENT call.
> >    C) For event_idx == 0xf, only perfer counters mapping
> >       to 0xfff CSR (i.e. SOFTWARE counters).
> >
> > 4. del() callback
> >    The del() callback of Linux RISC-V PMU driver will release
> >    or free the counter.
> >
> > 5. start() callback
> >    The start() callback of Linux RISC-V PMU driver will start
> >    the counter using the SBI_PMU_COUNTER_START call.
> >
> > 6. stop() callback
> >    The stop() callback of Linux RISC-V PMU driver will stop
> >    the counter using the SBI_PMU_COUNTER_STOP call.
> >
> > Regards,
> > Anup


Zong Li
 

On Tue, Jul 14, 2020 at 11:40 AM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 14 July 2020 09:02
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra
<Atish.Patra@...>; andrew@...; gfavor@...
Subject: Re: Proposal v3: SBI PMU Extension

On Mon, Jul 13, 2020 at 9:12 PM Anup Patel <Anup.Patel@...> wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension but we do have
HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER
CSRs. A RISC-V implementation can support monitoring various HARDWARE
events using limited number of HPMCOUNTER CSRs.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters. The
SBI
PMU extension will only manage per-HART (or per-CPU)
HARDWARE/SOFTWARE
counters which include CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and
SOFTWARE counters provided by SBI implementation.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or
Xvisor) will provide a standardized view of HARDWARE/SOFTWARE
counters
and events to S-mode (or VS-mode) software.

To define SBI PMU extension, we first define counter_idx which is a
logical number assigned to a counter and event_idx which is an encoded
number representing the HARDWARE/SOFTWARE event to be monitored.
A
HARDWARE/SOFTWARE event can also have additional
configuration/details
referred to as event_info.

The SBI PMU event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE
event, the event_info is not required whereas the event_idx.code can
be one of the following values:
enum sbi_pmu_hw_id {
SBI_PMU_HW_CPU_CYCLES = 0,
SBI_PMU_HW_INSTRUCTIONS = 1,
SBI_PMU_HW_CACHE_REFERENCES = 2,
SBI_PMU_HW_CACHE_MISSES = 3,
SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4,
SBI_PMU_HW_BRANCH_MISSES = 5,
SBI_PMU_HW_BUS_CYCLES = 6,
SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8,
SBI_PMU_HW_REF_CPU_CYCLES = 9,
SBI_PMU_HW_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For
HARDWARE
CACHE event, the event_info is not required whereas the event_idx.code
is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
SBI_PMU_HW_CACHE_L1D = 0,
SBI_PMU_HW_CACHE_L1I = 1,
SBI_PMU_HW_CACHE_LL = 2,
SBI_PMU_HW_CACHE_DTLB = 3,
SBI_PMU_HW_CACHE_ITLB = 4,
SBI_PMU_HW_CACHE_BPU = 5,
SBI_PMU_HW_CACHE_NODE = 6,
SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum
sbi_pmu_hw_cache_op_id
{
SBI_PMU_HW_CACHE_OP_READ = 0,
SBI_PMU_HW_CACHE_OP_WRITE = 1,
SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
SBI_PMU_HW_CACHE_RESULT_MISS = 1,
SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE
RAW event, the event_idx.code should be zero and the event_info
parameter passed to SBI_PMU_COUNTER_SET_EVENT call (described
below)
will have the RAW event value to be programmed in MHPMEVENT CSR (i.e.
the SBI implementation will not derive MHPMEVENT CSR value from
event_idx + event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is not required whereas the event_idx.code can
be one of the following:
enum sbi_pmu_sw_id {
SBI_PMU_SW_MISALIGNED_LOAD = 0,
SBI_PMU_SW_MISALIGNED_STORE = 1,
SBI_PMU_SW_ILLEGAL_INSN = 2,
SBI_PMU_SW_LOCAL_SET_TIMER = 3,
SBI_PMU_SW_LOCAL_IPI = 4,
SBI_PMU_SW_LOCAL_FENCE_I = 5,
SBI_PMU_SW_LOCAL_SFENCE_VMA = 6,
SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7,
SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8,
SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10,
SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
SBI_PMU_SW_MAX, /* non-ABI */
};

In future, more events can be defined without breaking SBI call
compatibility of SBI calls.

Using definition of counter_idx and event_idx, we can potentially have
the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
This call will return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
This call takes one parameter:
1) counter_idx
It will provide the CSR_Number and CSR_Width of underlying counter.
The value returned by SBI call is encoded as follows:
return_value[11:0] = CSR_Number
return_value[19:12] = CSR_Width (Number of bits implemented in
HW)
If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
HARDWARE counter. This SBI call will fail for counters which are not
present.

3. SBI_PMU_COUNTER_SET_EVENT
This call takes three parameter:
1) counter_idx
2) event_idx
3) event_info
It will select an event to be monitored by given counter. If this
SBI call is not used for a counter to select an event then the
counter will montior default event selected for it at boot-time.
This SBI call will fail for counters which are not present. It will
also fail if specified event_idx + event_info combination is not
supported by given counter.
It also seems to fail if the specified event is not supported by the given
counter, right? Then Linux driver could try to allocate the next free counter
when returning failure from this SBI calls.
Yes, this call will fail if event_idx + event_info combination is not supported
by given counter_idx. It is expected that Linux driver will try another
free counter if SBI_PMU_COUNTER_SET_EVENT call fails. I have suggested
few ideas on how to reduce SBI_PMU_COUNTER_SET_EVENT calls by
looking at CSR number assigned to counter.
Could it put the bitmap of counters which support the given event into
ret.value ($a1)
if it fails for the given counter, then s-mode software can
conveniently find which
next one is a good counter for the event. But there is a constraint
that the maximum
number of counters need to assume to be less than XLEN. Do you think
it is feasible?



Apart from this question above, this version of the proposal is great to me.
Cool

Regards,
Anup


Thanks,
Zong


4. SBI_PMU_COUNTER_SET_PHYS_ADDR
This call takes two parameters:
1) counter_idx
2) 8byte aligned physical address
It will set the physical address of memory location where the SBI
implementation will write the 64bit SOFTWARE counter. This SBI call
is only for counters not mapped to any CSR (i.e. only for counters
with CSR_Number > 0xfff).

5. SBI_PMU_COUNTER_START
This call takes two parameters:
1) counter_idx
2) initial_value
It will inform SBI implementation to start/enable specified counter
with specified initial value. This SBI call will fail for counters
which are not present.

6. SBI_PMU_COUNTER_STOP
This call takes one parameter:
1) counter_idx
It will inform SBI implementation to stop/disable specified counters
on the calling HART. This SBI call will fail for counters which are
not present.

The M-mode runtime firmware (OpenSBI) Development Notes:

1. The M-mode runtime firmware will have to translate SBI PMU
event_idx and event_into into platform dependent MHPMEVENT CSR
value before starting/enabling a HARDWARE counter.

2. The M-mode runtime firmware (OpenSBI) will need to know following
platform dependent information:
A) Possible event_idx values allowed (or supported) by a HARDWARE
counter (i.e. HPMCOUNTER)
B) Mapping of event_idx for HARDWARE/CACHE event to MHPMEVENT
CSR
value. This is optional for platform. By default, OpenSBI will
write a value <xyz> to MHPMEVENT CSR where lower 20bits of <xyz>
are event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20
bits of event_info
C) Additional platform-specific progamming required for selecting
event_idx + event_info combination. This is also optional for
platform.

3. All platform dependent information mentioned above, can be obtained
by M-mode runtime firmware (OpenSBI) from platform specific code.
The DT/ACPI can also be used to describe 2.A and 2.B mentioned above
but 2.C will always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
The Linux RISC-V driver can be platform driver with "riscv,sbi-pmu"
as DT compatible string and optional "interrupts" DT property. The
"interrupts" DT property if available should specify an edge-triggered
overflow interrupt for each HART. When "interrupts" DT property is
present, we might also need another DT property for mapping HARTID
to entries in "interrupts" DT property. The platform driver probe
will:
A) Need to ensure that underlying SBI implementation provides
SBI PMU extension using sbi_probe_extension() API of arch/riscv.
B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
C) Get CSR details of each counter using SBI_PMU_COUNTER_GET_CSR
call. If the counter is a SOFTWARE counter then use the
SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
of counter. The driver skip this in driver probe and instead
do this lazily in add() callback mentioned below.

2. event_init() callback
The event_init() callback will primarily translate user-space
perf_event_attr to SBI PMU event_idx and event_info. It can do
this in following way:
A) perf_event_attr.type == PERF_TYPE_HARDWARE
event_idx.type = 0x0
event_idx.code = Value from enum sbi_pmu_hw_id based on
perf_event_attr.config
event_info = 0
B) perf_event_attr.type == PERF_TYPE_HW_CACHE
event_idx.type = 0x1
event_idx.code.cache_id = Value from enum sbi_pmu_hw_cache_id
based on perf_event_attr.config
event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
based on perf_event_attr.config
event_idx.code.result_id = Value from enum sbi_pmu_hw_result_id
based on perf_event_attr.config
event_info = 0
C) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 0
event_idx.type = 0x2
event_idx.code = 0x0
event_info = perf_event_attr.config[62:0]
D) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 1
event_idx.type = 0xf
event_idx.code = Value from enum sbi_pmu_sw_id based on
perf_event_attr.config
event_info = 0
(Note: event_init() will fail if it is not able to figure out
event_idx and event_info value corresponding to perf_event_attr)
(Note: event_init() will not assign counter to perf_event because
it will be done by event_add())

3. add() callback
The add() callback of Linux RISC-V PMU driver will find a
free counter on current CPU/HART such that the perf_event
event_idx + event_info combination is supported by the counter.
To check-and-set event_idx + event_info combination for a
counter, we will use the SBI_PMU_COUNTER_SET_EVENT call.
The counter allocation and SBI_PMU_COUNTER_SET_EVENT call
can be futher optimized by looking at CSR details.
For example:
A) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_CPU_CYCLES, we should
prefer counter mapping to CYCLE CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
B) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_INSTRUCTIONS, we should
prefer counter mapping to INSTRET CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
C) For event_idx == 0xf, only perfer counters mapping
to 0xfff CSR (i.e. SOFTWARE counters).

4. del() callback
The del() callback of Linux RISC-V PMU driver will release
or free the counter.

5. start() callback
The start() callback of Linux RISC-V PMU driver will start
the counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
The stop() callback of Linux RISC-V PMU driver will stop
the counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup


Anup Patel
 

-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 15 July 2020 07:40
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra
<Atish.Patra@...>; andrew@...; gfavor@...
Subject: Re: Proposal v3: SBI PMU Extension

On Tue, Jul 14, 2020 at 11:40 AM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 14 July 2020 09:02
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra
<Atish.Patra@...>; andrew@...;
gfavor@...
Subject: Re: Proposal v3: SBI PMU Extension

On Mon, Jul 13, 2020 at 9:12 PM Anup Patel <Anup.Patel@...>
wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension but we do have
HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and
HPMCOUNTER CSRs. A RISC-V implementation can support monitoring
various HARDWARE events using limited number of HPMCOUNTER
CSRs.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of
misaligned load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters.
The
SBI
PMU extension will only manage per-HART (or per-CPU)
HARDWARE/SOFTWARE
counters which include CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs
and
SOFTWARE counters provided by SBI implementation.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or
Xvisor) will provide a standardized view of HARDWARE/SOFTWARE
counters
and events to S-mode (or VS-mode) software.

To define SBI PMU extension, we first define counter_idx which is
a logical number assigned to a counter and event_idx which is an
encoded number representing the HARDWARE/SOFTWARE event to be
monitored.
A
HARDWARE/SOFTWARE event can also have additional
configuration/details
referred to as event_info.

The SBI PMU event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE
event, the event_info is not required whereas the event_idx.code
can be one of the following values:
enum sbi_pmu_hw_id {
SBI_PMU_HW_CPU_CYCLES = 0,
SBI_PMU_HW_INSTRUCTIONS = 1,
SBI_PMU_HW_CACHE_REFERENCES = 2,
SBI_PMU_HW_CACHE_MISSES = 3,
SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4,
SBI_PMU_HW_BRANCH_MISSES = 5,
SBI_PMU_HW_BUS_CYCLES = 6,
SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8,
SBI_PMU_HW_REF_CPU_CYCLES = 9,
SBI_PMU_HW_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For
HARDWARE
CACHE event, the event_info is not required whereas the
event_idx.code is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
SBI_PMU_HW_CACHE_L1D = 0,
SBI_PMU_HW_CACHE_L1I = 1,
SBI_PMU_HW_CACHE_LL = 2,
SBI_PMU_HW_CACHE_DTLB = 3,
SBI_PMU_HW_CACHE_ITLB = 4,
SBI_PMU_HW_CACHE_BPU = 5,
SBI_PMU_HW_CACHE_NODE = 6,
SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum
sbi_pmu_hw_cache_op_id
{
SBI_PMU_HW_CACHE_OP_READ = 0,
SBI_PMU_HW_CACHE_OP_WRITE = 1,
SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
SBI_PMU_HW_CACHE_RESULT_MISS = 1,
SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For
HARDWARE RAW event, the event_idx.code should be zero and the
event_info parameter passed to SBI_PMU_COUNTER_SET_EVENT call
(described
below)
will have the RAW event value to be programmed in MHPMEVENT CSR
(i.e.
the SBI implementation will not derive MHPMEVENT CSR value from
event_idx + event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is not required whereas the event_idx.code
can be one of the following:
enum sbi_pmu_sw_id {
SBI_PMU_SW_MISALIGNED_LOAD = 0,
SBI_PMU_SW_MISALIGNED_STORE = 1,
SBI_PMU_SW_ILLEGAL_INSN = 2,
SBI_PMU_SW_LOCAL_SET_TIMER = 3,
SBI_PMU_SW_LOCAL_IPI = 4,
SBI_PMU_SW_LOCAL_FENCE_I = 5,
SBI_PMU_SW_LOCAL_SFENCE_VMA = 6,
SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7,
SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8,
SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10,
SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
SBI_PMU_SW_MAX, /* non-ABI */
};

In future, more events can be defined without breaking SBI call
compatibility of SBI calls.

Using definition of counter_idx and event_idx, we can potentially
have the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
This call will return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
This call takes one parameter:
1) counter_idx
It will provide the CSR_Number and CSR_Width of underlying counter.
The value returned by SBI call is encoded as follows:
return_value[11:0] = CSR_Number
return_value[19:12] = CSR_Width (Number of bits
implemented in
HW)
If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
HARDWARE counter. This SBI call will fail for counters which are not
present.

3. SBI_PMU_COUNTER_SET_EVENT
This call takes three parameter:
1) counter_idx
2) event_idx
3) event_info
It will select an event to be monitored by given counter. If this
SBI call is not used for a counter to select an event then the
counter will montior default event selected for it at boot-time.
This SBI call will fail for counters which are not present. It will
also fail if specified event_idx + event_info combination is not
supported by given counter.
It also seems to fail if the specified event is not supported by the
given counter, right? Then Linux driver could try to allocate the
next free counter when returning failure from this SBI calls.
Yes, this call will fail if event_idx + event_info combination is not
supported by given counter_idx. It is expected that Linux driver will
try another free counter if SBI_PMU_COUNTER_SET_EVENT call fails. I
have suggested few ideas on how to reduce
SBI_PMU_COUNTER_SET_EVENT
calls by looking at CSR number assigned to counter.
Could it put the bitmap of counters which support the given event into
ret.value ($a1) if it fails for the given counter, then s-mode software can
conveniently find which next one is a good counter for the event. But there
is a constraint that the maximum number of counters need to assume to be
less than XLEN. Do you think it is feasible?
Yes, this is feasible and can further reduce SBI calls but if we go this
route then SBI_PMU_COUNTER_SET_EVENT name is not appropriate.

How about this ??

3. SBI_PMU_COUNTER_CONFIG_MATCHING
This call takes three parameter:
1) counter_idx_base
2) counter_idx_mask
3) event_idx
4) event_info
It will find and configure a counter from a set of counters which can
monitor specified event. The counter_idx_base and counter_idx_mask
parameters represent the set of counters whereas the event_idx and
event_info represent the event to monitor. Upon success the SBI call
will return the counter_idx of the counter which has been configured
to monitor specified event. This SBI call will fail if it is unable to find
a counter which can monitor specified event. It will also fail if the set of
counters pointers specified via counter_idx_base and counter_idx_mask
has an invalid counter.




Apart from this question above, this version of the proposal is great to
me.

Cool

Regards,
Anup


Thanks,
Zong


4. SBI_PMU_COUNTER_SET_PHYS_ADDR
This call takes two parameters:
1) counter_idx
2) 8byte aligned physical address
It will set the physical address of memory location where the SBI
implementation will write the 64bit SOFTWARE counter. This SBI call
is only for counters not mapped to any CSR (i.e. only for counters
with CSR_Number > 0xfff).

5. SBI_PMU_COUNTER_START
This call takes two parameters:
1) counter_idx
2) initial_value
It will inform SBI implementation to start/enable specified counter
with specified initial value. This SBI call will fail for counters
which are not present.

6. SBI_PMU_COUNTER_STOP
This call takes one parameter:
1) counter_idx
It will inform SBI implementation to stop/disable specified counters
on the calling HART. This SBI call will fail for counters which are
not present.

The M-mode runtime firmware (OpenSBI) Development Notes:

1. The M-mode runtime firmware will have to translate SBI PMU
event_idx and event_into into platform dependent MHPMEVENT CSR
value before starting/enabling a HARDWARE counter.

2. The M-mode runtime firmware (OpenSBI) will need to know
following
platform dependent information:
A) Possible event_idx values allowed (or supported) by a HARDWARE
counter (i.e. HPMCOUNTER)
B) Mapping of event_idx for HARDWARE/CACHE event to
MHPMEVENT
CSR
value. This is optional for platform. By default, OpenSBI will
write a value <xyz> to MHPMEVENT CSR where lower 20bits of
<xyz>
are event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20
bits of event_info
C) Additional platform-specific progamming required for selecting
event_idx + event_info combination. This is also optional for
platform.

3. All platform dependent information mentioned above, can be
obtained
by M-mode runtime firmware (OpenSBI) from platform specific code.
The DT/ACPI can also be used to describe 2.A and 2.B mentioned
above
but 2.C will always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
The Linux RISC-V driver can be platform driver with "riscv,sbi-pmu"
as DT compatible string and optional "interrupts" DT property. The
"interrupts" DT property if available should specify an edge-triggered
overflow interrupt for each HART. When "interrupts" DT property is
present, we might also need another DT property for mapping HARTID
to entries in "interrupts" DT property. The platform driver probe
will:
A) Need to ensure that underlying SBI implementation provides
SBI PMU extension using sbi_probe_extension() API of arch/riscv.
B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
C) Get CSR details of each counter using
SBI_PMU_COUNTER_GET_CSR
call. If the counter is a SOFTWARE counter then use the
SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
of counter. The driver skip this in driver probe and instead
do this lazily in add() callback mentioned below.

2. event_init() callback
The event_init() callback will primarily translate user-space
perf_event_attr to SBI PMU event_idx and event_info. It can do
this in following way:
A) perf_event_attr.type == PERF_TYPE_HARDWARE
event_idx.type = 0x0
event_idx.code = Value from enum sbi_pmu_hw_id based on
perf_event_attr.config
event_info = 0
B) perf_event_attr.type == PERF_TYPE_HW_CACHE
event_idx.type = 0x1
event_idx.code.cache_id = Value from enum
sbi_pmu_hw_cache_id
based on perf_event_attr.config
event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
based on perf_event_attr.config
event_idx.code.result_id = Value from enum
sbi_pmu_hw_result_id
based on perf_event_attr.config
event_info = 0
C) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 0
event_idx.type = 0x2
event_idx.code = 0x0
event_info = perf_event_attr.config[62:0]
D) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 1
event_idx.type = 0xf
event_idx.code = Value from enum sbi_pmu_sw_id based on
perf_event_attr.config
event_info = 0
(Note: event_init() will fail if it is not able to figure out
event_idx and event_info value corresponding to perf_event_attr)
(Note: event_init() will not assign counter to perf_event because
it will be done by event_add())

3. add() callback
The add() callback of Linux RISC-V PMU driver will find a
free counter on current CPU/HART such that the perf_event
event_idx + event_info combination is supported by the counter.
To check-and-set event_idx + event_info combination for a
counter, we will use the SBI_PMU_COUNTER_SET_EVENT call.
The counter allocation and SBI_PMU_COUNTER_SET_EVENT call
can be futher optimized by looking at CSR details.
For example:
A) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_CPU_CYCLES, we should
prefer counter mapping to CYCLE CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
B) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_INSTRUCTIONS, we should
prefer counter mapping to INSTRET CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
C) For event_idx == 0xf, only perfer counters mapping
to 0xfff CSR (i.e. SOFTWARE counters).

4. del() callback
The del() callback of Linux RISC-V PMU driver will release
or free the counter.

5. start() callback
The start() callback of Linux RISC-V PMU driver will start
the counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
The stop() callback of Linux RISC-V PMU driver will stop
the counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup


Greg Favor
 

On Tue, Jul 14, 2020 at 9:32 PM Anup Patel <Anup.Patel@...> wrote:
Yes, this is feasible and can further reduce SBI calls but if we go this
route then SBI_PMU_COUNTER_SET_EVENT name is not appropriate.

Why not have SBI_PMU_COUNTER_SET_EVENT return a 32b hpmcounter bit mask when it fails (that - as Zong suggested - identifies hardware counters that do support the requested event)?  Then SBI_PMU_COUNTER_SET_EVENT doesn't need to change.  The caller can simply call it again with a different counter_idx that it knows should succeed.

Or have SBI_PMU_COUNTER_SET_EVENT take a boolean argument indicating whether to try and set up just the specified counter, or to examine all counters and try to pick one that supports the requested event?  Then no bit mask ever needs to be returned.

Also, in any case, I assume the SBI routine needs to take the 'mcounteren' CSR into account and only set up a counter that mcounteren makes available to lower privilege modes?  (And any returned bit mask would also reflect only counters that have their mcounteren bits set to '1'.)

Greg
 
How about this ??

3. SBI_PMU_COUNTER_CONFIG_MATCHING
   This call takes three parameter:
      1) counter_idx_base
      2) counter_idx_mask
      3) event_idx
      4) event_info
   It will find and configure a counter from a set of counters which can
   monitor specified event. The counter_idx_base and counter_idx_mask
   parameters represent the set of counters whereas the event_idx and
   event_info represent the event to monitor. Upon success the SBI call
   will return the counter_idx of the counter which has been configured
   to monitor specified event.  This SBI call will fail if it is unable to find
   a counter which can monitor specified event. It will also fail if the set of
   counters pointers specified via counter_idx_base and counter_idx_mask
   has an invalid counter.


Greg Favor
 

Anup,

What is the plan with regards to scounteren and hcounteren?  Is the caller (whether an OS or a hypervisor) supposed to take into account the relevant *counteren CSR's when specifying counter_idx in the call to SBI_PMU_COUNTER_SET_EVENT?  And the M-mode SBI_PMU_COUNTER_SET_EVENT routine only worries about mcounteren?

It seems like this is necessary since the M-mode SBI_PMU_COUNTER_SET_EVENT routine won't know which privilege mode was the original requester (and hence which other *counteren CSR's are relevant).

Greg


On Tue, Jul 14, 2020 at 10:31 PM Greg Favor via lists.riscv.org <gfavor=ventanamicro.com@...> wrote:
On Tue, Jul 14, 2020 at 9:32 PM Anup Patel <Anup.Patel@...> wrote:
Yes, this is feasible and can further reduce SBI calls but if we go this
route then SBI_PMU_COUNTER_SET_EVENT name is not appropriate.

Why not have SBI_PMU_COUNTER_SET_EVENT return a 32b hpmcounter bit mask when it fails (that - as Zong suggested - identifies hardware counters that do support the requested event)?  Then SBI_PMU_COUNTER_SET_EVENT doesn't need to change.  The caller can simply call it again with a different counter_idx that it knows should succeed.

Or have SBI_PMU_COUNTER_SET_EVENT take a boolean argument indicating whether to try and set up just the specified counter, or to examine all counters and try to pick one that supports the requested event?  Then no bit mask ever needs to be returned.

Also, in any case, I assume the SBI routine needs to take the 'mcounteren' CSR into account and only set up a counter that mcounteren makes available to lower privilege modes?  (And any returned bit mask would also reflect only counters that have their mcounteren bits set to '1'.)

Greg
 
How about this ??

3. SBI_PMU_COUNTER_CONFIG_MATCHING
   This call takes three parameter:
      1) counter_idx_base
      2) counter_idx_mask
      3) event_idx
      4) event_info
   It will find and configure a counter from a set of counters which can
   monitor specified event. The counter_idx_base and counter_idx_mask
   parameters represent the set of counters whereas the event_idx and
   event_info represent the event to monitor. Upon success the SBI call
   will return the counter_idx of the counter which has been configured
   to monitor specified event.  This SBI call will fail if it is unable to find
   a counter which can monitor specified event. It will also fail if the set of
   counters pointers specified via counter_idx_base and counter_idx_mask
   has an invalid counter.


Zong Li
 

On Wed, Jul 15, 2020 at 12:32 PM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 15 July 2020 07:40
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra
<Atish.Patra@...>; andrew@...; gfavor@...
Subject: Re: Proposal v3: SBI PMU Extension

On Tue, Jul 14, 2020 at 11:40 AM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 14 July 2020 09:02
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra
<Atish.Patra@...>; andrew@...;
gfavor@...
Subject: Re: Proposal v3: SBI PMU Extension

On Mon, Jul 13, 2020 at 9:12 PM Anup Patel <Anup.Patel@...>
wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension but we do have
HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and
HPMCOUNTER CSRs. A RISC-V implementation can support monitoring
various HARDWARE events using limited number of HPMCOUNTER
CSRs.

In addition to HARDWARE performance counters, a SBI implementation
(e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for
events such as number of RFENCEs, number of IPIs, number of
misaligned load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or VS-mode)
software to discover and configure HARDWARE/SOFTWARE counters.
The
SBI
PMU extension will only manage per-HART (or per-CPU)
HARDWARE/SOFTWARE
counters which include CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs
and
SOFTWARE counters provided by SBI implementation.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM, or
Xvisor) will provide a standardized view of HARDWARE/SOFTWARE
counters
and events to S-mode (or VS-mode) software.

To define SBI PMU extension, we first define counter_idx which is
a logical number assigned to a counter and event_idx which is an
encoded number representing the HARDWARE/SOFTWARE event to be
monitored.
A
HARDWARE/SOFTWARE event can also have additional
configuration/details
referred to as event_info.

The SBI PMU event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE
event, the event_info is not required whereas the event_idx.code
can be one of the following values:
enum sbi_pmu_hw_id {
SBI_PMU_HW_CPU_CYCLES = 0,
SBI_PMU_HW_INSTRUCTIONS = 1,
SBI_PMU_HW_CACHE_REFERENCES = 2,
SBI_PMU_HW_CACHE_MISSES = 3,
SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4,
SBI_PMU_HW_BRANCH_MISSES = 5,
SBI_PMU_HW_BUS_CYCLES = 6,
SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8,
SBI_PMU_HW_REF_CPU_CYCLES = 9,
SBI_PMU_HW_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For
HARDWARE
CACHE event, the event_info is not required whereas the
event_idx.code is encoded as follows:
event_idx.code[15:3] = cache_id
event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
SBI_PMU_HW_CACHE_L1D = 0,
SBI_PMU_HW_CACHE_L1I = 1,
SBI_PMU_HW_CACHE_LL = 2,
SBI_PMU_HW_CACHE_DTLB = 3,
SBI_PMU_HW_CACHE_ITLB = 4,
SBI_PMU_HW_CACHE_BPU = 5,
SBI_PMU_HW_CACHE_NODE = 6,
SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum
sbi_pmu_hw_cache_op_id
{
SBI_PMU_HW_CACHE_OP_READ = 0,
SBI_PMU_HW_CACHE_OP_WRITE = 1,
SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
SBI_PMU_HW_CACHE_RESULT_MISS = 1,
SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For
HARDWARE RAW event, the event_idx.code should be zero and the
event_info parameter passed to SBI_PMU_COUNTER_SET_EVENT call
(described
below)
will have the RAW event value to be programmed in MHPMEVENT CSR
(i.e.
the SBI implementation will not derive MHPMEVENT CSR value from
event_idx + event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE
event, the event_info is not required whereas the event_idx.code
can be one of the following:
enum sbi_pmu_sw_id {
SBI_PMU_SW_MISALIGNED_LOAD = 0,
SBI_PMU_SW_MISALIGNED_STORE = 1,
SBI_PMU_SW_ILLEGAL_INSN = 2,
SBI_PMU_SW_LOCAL_SET_TIMER = 3,
SBI_PMU_SW_LOCAL_IPI = 4,
SBI_PMU_SW_LOCAL_FENCE_I = 5,
SBI_PMU_SW_LOCAL_SFENCE_VMA = 6,
SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7,
SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8,
SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10,
SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
SBI_PMU_SW_MAX, /* non-ABI */
};

In future, more events can be defined without breaking SBI call
compatibility of SBI calls.

Using definition of counter_idx and event_idx, we can potentially
have the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
This call will return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
This call takes one parameter:
1) counter_idx
It will provide the CSR_Number and CSR_Width of underlying counter.
The value returned by SBI call is encoded as follows:
return_value[11:0] = CSR_Number
return_value[19:12] = CSR_Width (Number of bits
implemented in
HW)
If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it is
HARDWARE counter. This SBI call will fail for counters which are not
present.

3. SBI_PMU_COUNTER_SET_EVENT
This call takes three parameter:
1) counter_idx
2) event_idx
3) event_info
It will select an event to be monitored by given counter. If this
SBI call is not used for a counter to select an event then the
counter will montior default event selected for it at boot-time.
This SBI call will fail for counters which are not present. It will
also fail if specified event_idx + event_info combination is not
supported by given counter.
It also seems to fail if the specified event is not supported by the
given counter, right? Then Linux driver could try to allocate the
next free counter when returning failure from this SBI calls.
Yes, this call will fail if event_idx + event_info combination is not
supported by given counter_idx. It is expected that Linux driver will
try another free counter if SBI_PMU_COUNTER_SET_EVENT call fails. I
have suggested few ideas on how to reduce
SBI_PMU_COUNTER_SET_EVENT
calls by looking at CSR number assigned to counter.
Could it put the bitmap of counters which support the given event into
ret.value ($a1) if it fails for the given counter, then s-mode software can
conveniently find which next one is a good counter for the event. But there
is a constraint that the maximum number of counters need to assume to be
less than XLEN. Do you think it is feasible?
Yes, this is feasible and can further reduce SBI calls but if we go this
route then SBI_PMU_COUNTER_SET_EVENT name is not appropriate.

How about this ??

3. SBI_PMU_COUNTER_CONFIG_MATCHING
This call takes three parameter:
1) counter_idx_base
2) counter_idx_mask
3) event_idx
4) event_info
It will find and configure a counter from a set of counters which can
monitor specified event. The counter_idx_base and counter_idx_mask
parameters represent the set of counters whereas the event_idx and
event_info represent the event to monitor. Upon success the SBI call
will return the counter_idx of the counter which has been configured
to monitor specified event. This SBI call will fail if it is unable to find
a counter which can monitor specified event. It will also fail if the set of
counters pointers specified via counter_idx_base and counter_idx_mask
has an invalid counter.
It looks like the m-mode firmware would be responsible for selecting
a suitable counter and taking the allocation work for s-mode software?
Could you elaborate or give a example to show how counter_idx_base
and counter_idx_mask represent the set of counters? It seems likes
the set of counters can be represented by one parameter, each bit is
corresponding to one counter_idx.




Apart from this question above, this version of the proposal is great to
me.

Cool

Regards,
Anup


Thanks,
Zong


4. SBI_PMU_COUNTER_SET_PHYS_ADDR
This call takes two parameters:
1) counter_idx
2) 8byte aligned physical address
It will set the physical address of memory location where the SBI
implementation will write the 64bit SOFTWARE counter. This SBI call
is only for counters not mapped to any CSR (i.e. only for counters
with CSR_Number > 0xfff).

5. SBI_PMU_COUNTER_START
This call takes two parameters:
1) counter_idx
2) initial_value
It will inform SBI implementation to start/enable specified counter
with specified initial value. This SBI call will fail for counters
which are not present.

6. SBI_PMU_COUNTER_STOP
This call takes one parameter:
1) counter_idx
It will inform SBI implementation to stop/disable specified counters
on the calling HART. This SBI call will fail for counters which are
not present.

The M-mode runtime firmware (OpenSBI) Development Notes:

1. The M-mode runtime firmware will have to translate SBI PMU
event_idx and event_into into platform dependent MHPMEVENT CSR
value before starting/enabling a HARDWARE counter.

2. The M-mode runtime firmware (OpenSBI) will need to know
following
platform dependent information:
A) Possible event_idx values allowed (or supported) by a HARDWARE
counter (i.e. HPMCOUNTER)
B) Mapping of event_idx for HARDWARE/CACHE event to
MHPMEVENT
CSR
value. This is optional for platform. By default, OpenSBI will
write a value <xyz> to MHPMEVENT CSR where lower 20bits of
<xyz>
are event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-20
bits of event_info
C) Additional platform-specific progamming required for selecting
event_idx + event_info combination. This is also optional for
platform.

3. All platform dependent information mentioned above, can be
obtained
by M-mode runtime firmware (OpenSBI) from platform specific code.
The DT/ACPI can also be used to describe 2.A and 2.B mentioned
above
but 2.C will always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
The Linux RISC-V driver can be platform driver with "riscv,sbi-pmu"
as DT compatible string and optional "interrupts" DT property. The
"interrupts" DT property if available should specify an edge-triggered
overflow interrupt for each HART. When "interrupts" DT property is
present, we might also need another DT property for mapping HARTID
to entries in "interrupts" DT property. The platform driver probe
will:
A) Need to ensure that underlying SBI implementation provides
SBI PMU extension using sbi_probe_extension() API of arch/riscv.
B) Detect number of counters using SBI_PMU_NUM_COUNTERS call
C) Get CSR details of each counter using
SBI_PMU_COUNTER_GET_CSR
call. If the counter is a SOFTWARE counter then use the
SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory location
of counter. The driver skip this in driver probe and instead
do this lazily in add() callback mentioned below.

2. event_init() callback
The event_init() callback will primarily translate user-space
perf_event_attr to SBI PMU event_idx and event_info. It can do
this in following way:
A) perf_event_attr.type == PERF_TYPE_HARDWARE
event_idx.type = 0x0
event_idx.code = Value from enum sbi_pmu_hw_id based on
perf_event_attr.config
event_info = 0
B) perf_event_attr.type == PERF_TYPE_HW_CACHE
event_idx.type = 0x1
event_idx.code.cache_id = Value from enum
sbi_pmu_hw_cache_id
based on perf_event_attr.config
event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
based on perf_event_attr.config
event_idx.code.result_id = Value from enum
sbi_pmu_hw_result_id
based on perf_event_attr.config
event_info = 0
C) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 0
event_idx.type = 0x2
event_idx.code = 0x0
event_info = perf_event_attr.config[62:0]
D) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 1
event_idx.type = 0xf
event_idx.code = Value from enum sbi_pmu_sw_id based on
perf_event_attr.config
event_info = 0
(Note: event_init() will fail if it is not able to figure out
event_idx and event_info value corresponding to perf_event_attr)
(Note: event_init() will not assign counter to perf_event because
it will be done by event_add())

3. add() callback
The add() callback of Linux RISC-V PMU driver will find a
free counter on current CPU/HART such that the perf_event
event_idx + event_info combination is supported by the counter.
To check-and-set event_idx + event_info combination for a
counter, we will use the SBI_PMU_COUNTER_SET_EVENT call.
The counter allocation and SBI_PMU_COUNTER_SET_EVENT call
can be futher optimized by looking at CSR details.
For example:
A) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_CPU_CYCLES, we should
prefer counter mapping to CYCLE CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
B) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_INSTRUCTIONS, we should
prefer counter mapping to INSTRET CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
C) For event_idx == 0xf, only perfer counters mapping
to 0xfff CSR (i.e. SOFTWARE counters).

4. del() callback
The del() callback of Linux RISC-V PMU driver will release
or free the counter.

5. start() callback
The start() callback of Linux RISC-V PMU driver will start
the counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
The stop() callback of Linux RISC-V PMU driver will stop
the counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup


Anup Patel
 

Hi Greg,

 

The SBI PMU extension provider for HS-mode is M-mode runtime firmware (OpenSBI) and for VS-mode the provider is HS-mode (Hypervisor).

 

We will enable HARDWARE counters in HCOUNTEREN CSR when hypervisor receives SBI_PMU_COUNTER_START call from VS-mode.

 

The standard interface between Linux user-space and Linux kernel-space is perf SYCALLs/IOCTLs. Despite this, we can support apps who want direct HARDWARE CSR access by enabling HARDWARE counter in SCOUNTEREN CSR. The Linux RISC-V PMU driver can do this in the start() callback.

 

Regards,

Anup

 

From: tech-unixplatformspec@... <tech-unixplatformspec@...> On Behalf Of Greg Favor
Sent: 15 July 2020 11:08
To: Greg Favor <gfavor@...>
Cc: Anup Patel <Anup.Patel@...>; Zong Li <zong.li@...>; tech-unixplatformspec@...; Atish Patra <Atish.Patra@...>; andrew@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v3: SBI PMU Extension

 

Anup,

 

What is the plan with regards to scounteren and hcounteren?  Is the caller (whether an OS or a hypervisor) supposed to take into account the relevant *counteren CSR's when specifying counter_idx in the call to SBI_PMU_COUNTER_SET_EVENT?  And the M-mode SBI_PMU_COUNTER_SET_EVENT routine only worries about mcounteren?

 

It seems like this is necessary since the M-mode SBI_PMU_COUNTER_SET_EVENT routine won't know which privilege mode was the original requester (and hence which other *counteren CSR's are relevant).

 

Greg

 

 

On Tue, Jul 14, 2020 at 10:31 PM Greg Favor via lists.riscv.org <gfavor=ventanamicro.com@...> wrote:

On Tue, Jul 14, 2020 at 9:32 PM Anup Patel <Anup.Patel@...> wrote:

Yes, this is feasible and can further reduce SBI calls but if we go this
route then SBI_PMU_COUNTER_SET_EVENT name is not appropriate.

 

Why not have SBI_PMU_COUNTER_SET_EVENT return a 32b hpmcounter bit mask when it fails (that - as Zong suggested - identifies hardware counters that do support the requested event)?  Then SBI_PMU_COUNTER_SET_EVENT doesn't need to change.  The caller can simply call it again with a different counter_idx that it knows should succeed.

 

Or have SBI_PMU_COUNTER_SET_EVENT take a boolean argument indicating whether to try and set up just the specified counter, or to examine all counters and try to pick one that supports the requested event?  Then no bit mask ever needs to be returned.

 

Also, in any case, I assume the SBI routine needs to take the 'mcounteren' CSR into account and only set up a counter that mcounteren makes available to lower privilege modes?  (And any returned bit mask would also reflect only counters that have their mcounteren bits set to '1'.)

 

Greg

 

How about this ??

3. SBI_PMU_COUNTER_CONFIG_MATCHING
   This call takes three parameter:
      1) counter_idx_base
      2) counter_idx_mask
      3) event_idx
      4) event_info
   It will find and configure a counter from a set of counters which can
   monitor specified event. The counter_idx_base and counter_idx_mask
   parameters represent the set of counters whereas the event_idx and
   event_info represent the event to monitor. Upon success the SBI call
   will return the counter_idx of the counter which has been configured
   to monitor specified event.  This SBI call will fail if it is unable to find
   a counter which can monitor specified event. It will also fail if the set of
   counters pointers specified via counter_idx_base and counter_idx_mask
   has an invalid counter.


Anup Patel
 

-----Original Message-----
From: tech-unixplatformspec@... <tech-
unixplatformspec@...> On Behalf Of Zong Li
Sent: 15 July 2020 13:36
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra
<Atish.Patra@...>; andrew@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v3: SBI PMU
Extension

On Wed, Jul 15, 2020 at 12:32 PM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 15 July 2020 07:40
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra
<Atish.Patra@...>; andrew@...;
gfavor@...
Subject: Re: Proposal v3: SBI PMU Extension

On Tue, Jul 14, 2020 at 11:40 AM Anup Patel <Anup.Patel@...>
wrote:



-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 14 July 2020 09:02
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Atish Patra
<Atish.Patra@...>; andrew@...;
gfavor@...
Subject: Re: Proposal v3: SBI PMU Extension

On Mon, Jul 13, 2020 at 9:12 PM Anup Patel <Anup.Patel@...>
wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension but we do have
HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and
HPMCOUNTER CSRs. A RISC-V implementation can support
monitoring various HARDWARE events using limited number of
HPMCOUNTER
CSRs.

In addition to HARDWARE performance counters, a SBI
implementation (e.g. OpenSBI, Xvisor, KVM, etc) can provide
SOFTWARE counters for events such as number of RFENCEs,
number
of IPIs, number of misaligned load/store instructions, number of
illegal instructions, etc.

We propose SBI PMU extension, which will help S-mode (or
VS-mode) software to discover and configure
HARDWARE/SOFTWARE counters.
The
SBI
PMU extension will only manage per-HART (or per-CPU)
HARDWARE/SOFTWARE
counters which include CYCLE CSR, INSTRET CSR, HPMCOUNTER
CSRs
and
SOFTWARE counters provided by SBI implementation.

Using SBI PMU extension, a SBI implementation (OpenSBI, KVM,
or
Xvisor) will provide a standardized view of HARDWARE/SOFTWARE
counters
and events to S-mode (or VS-mode) software.

To define SBI PMU extension, we first define counter_idx which
is a logical number assigned to a counter and event_idx which
is an encoded number representing the HARDWARE/SOFTWARE
event
to be
monitored.
A
HARDWARE/SOFTWARE event can also have additional
configuration/details
referred to as event_info.

The SBI PMU event_idx is a 20bits wide number encoded as follows:
event_idx[19:16] = type
event_idx[15:0] = code

If event_idx.type == 0x0 then it is HARDWARE event. For
HARDWARE event, the event_info is not required whereas the
event_idx.code can be one of the following values:
enum sbi_pmu_hw_id {
SBI_PMU_HW_CPU_CYCLES = 0,
SBI_PMU_HW_INSTRUCTIONS = 1,
SBI_PMU_HW_CACHE_REFERENCES = 2,
SBI_PMU_HW_CACHE_MISSES = 3,
SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4,
SBI_PMU_HW_BRANCH_MISSES = 5,
SBI_PMU_HW_BUS_CYCLES = 6,
SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8,
SBI_PMU_HW_REF_CPU_CYCLES = 9,
SBI_PMU_HW_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event. For
HARDWARE
CACHE event, the event_info is not required whereas the
event_idx.code is encoded as follows:
event_idx.code[15:3] = cache_id event_idx.code[2:1] = op_id
event_idx.code[0:0] = result_id enum sbi_pmu_hw_cache_id {
SBI_PMU_HW_CACHE_L1D = 0,
SBI_PMU_HW_CACHE_L1I = 1,
SBI_PMU_HW_CACHE_LL = 2,
SBI_PMU_HW_CACHE_DTLB = 3,
SBI_PMU_HW_CACHE_ITLB = 4,
SBI_PMU_HW_CACHE_BPU = 5,
SBI_PMU_HW_CACHE_NODE = 6,
SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum
sbi_pmu_hw_cache_op_id
{
SBI_PMU_HW_CACHE_OP_READ = 0,
SBI_PMU_HW_CACHE_OP_WRITE = 1,
SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
SBI_PMU_HW_CACHE_RESULT_MISS = 1,
SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */
};
(NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW event. For
HARDWARE RAW event, the event_idx.code should be zero and the
event_info parameter passed to SBI_PMU_COUNTER_SET_EVENT
call
(described
below)
will have the RAW event value to be programmed in MHPMEVENT
CSR
(i.e.
the SBI implementation will not derive MHPMEVENT CSR value
from event_idx + event_info).

If event_idx.type == 0xf then it is SOFTWARE event. For
SOFTWARE event, the event_info is not required whereas the
event_idx.code can be one of the following:
enum sbi_pmu_sw_id {
SBI_PMU_SW_MISALIGNED_LOAD = 0,
SBI_PMU_SW_MISALIGNED_STORE = 1,
SBI_PMU_SW_ILLEGAL_INSN = 2,
SBI_PMU_SW_LOCAL_SET_TIMER = 3,
SBI_PMU_SW_LOCAL_IPI = 4,
SBI_PMU_SW_LOCAL_FENCE_I = 5,
SBI_PMU_SW_LOCAL_SFENCE_VMA = 6,
SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7,
SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8,
SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10,
SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
SBI_PMU_SW_MAX, /* non-ABI */
};

In future, more events can be defined without breaking SBI
call compatibility of SBI calls.

Using definition of counter_idx and event_idx, we can
potentially have the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
This call will return the number of COUNTERs

2. SBI_PMU_COUNTER_GET_CSR
This call takes one parameter:
1) counter_idx
It will provide the CSR_Number and CSR_Width of underlying
counter.
The value returned by SBI call is encoded as follows:
return_value[11:0] = CSR_Number
return_value[19:12] = CSR_Width (Number of bits
implemented in
HW)
If CSR_Number == 0xfff then it is SOFTWARE counter otherwise it
is
HARDWARE counter. This SBI call will fail for counters which are not
present.

3. SBI_PMU_COUNTER_SET_EVENT
This call takes three parameter:
1) counter_idx
2) event_idx
3) event_info
It will select an event to be monitored by given counter. If this
SBI call is not used for a counter to select an event then the
counter will montior default event selected for it at boot-time.
This SBI call will fail for counters which are not present. It will
also fail if specified event_idx + event_info combination is not
supported by given counter.
It also seems to fail if the specified event is not supported by
the given counter, right? Then Linux driver could try to
allocate the next free counter when returning failure from this SBI
calls.

Yes, this call will fail if event_idx + event_info combination is
not supported by given counter_idx. It is expected that Linux
driver will try another free counter if SBI_PMU_COUNTER_SET_EVENT
call fails. I have suggested few ideas on how to reduce
SBI_PMU_COUNTER_SET_EVENT
calls by looking at CSR number assigned to counter.
Could it put the bitmap of counters which support the given event
into ret.value ($a1) if it fails for the given counter, then s-mode
software can conveniently find which next one is a good counter for
the event. But there is a constraint that the maximum number of
counters need to assume to be less than XLEN. Do you think it is feasible?
Yes, this is feasible and can further reduce SBI calls but if we go
this route then SBI_PMU_COUNTER_SET_EVENT name is not appropriate.

How about this ??

3. SBI_PMU_COUNTER_CONFIG_MATCHING
This call takes three parameter:
1) counter_idx_base
2) counter_idx_mask
3) event_idx
4) event_info
It will find and configure a counter from a set of counters which can
monitor specified event. The counter_idx_base and counter_idx_mask
parameters represent the set of counters whereas the event_idx and
event_info represent the event to monitor. Upon success the SBI call
will return the counter_idx of the counter which has been configured
to monitor specified event. This SBI call will fail if it is unable to find
a counter which can monitor specified event. It will also fail if the set of
counters pointers specified via counter_idx_base and counter_idx_mask
has an invalid counter.
It looks like the m-mode firmware would be responsible for selecting a
suitable counter and taking the allocation work for s-mode software?
The allocation work of counters will still be owned by S-mode software.

Only selecting a counter from a set of available counters based on event
to be monitored will be done by SBI_PMU_COUNTER_CONFIG_MATCHING.

In fact, SBI_PMU_COUNTER_CONFIG_MATCHING is equivalent to
SBI_PMU_COUNTER_SET_EVENT if we pass "counter_idx_mask = 0x1"
and "counter_idx_base = counter_idx"

Could you elaborate or give a example to show how counter_idx_base and
counter_idx_mask represent the set of counters? It seems likes the set of
Let's say Linux RISC-V PMU driver is tracking free/available counters
for each HART using a per-HART bitmap. Now let's assume that on
HART X we have free counters 3,6,8,9,13,20, ... and some user-space
app creates perf_event on HART X.

In this case, with SBI_PMU_COUNTER_SET_EVENT call we will have to
try each available counter one-by-one to find counter that supports
required event_idx + event_info combination in add() callback of
Linux RISC-V PMU driver.

Instead of this, using SBI_PMU_COUNTER_CONFIG_MATCHING call
we let SBI implementation select a matching counter from a set of
available counters.

counters can be represented by one parameter, each bit is corresponding to
one counter_idx.
We have to consider both RV32 and RV64 here. On RV32, XLEN = 32 so
having only one parameter will limit us to 32 counters. This is too strict
for RV32 and we don't have any room for SOFTWARE counters.

Let's not limit number of counters by XLEN.

The counter_idx_base and counter_idx_mask approach is similar to
"hart_mask_base" and "hart_mask" parameters of SBI_SEND_IPI call.

Regards,
Anup





Apart from this question above, this version of the proposal is
great to
me.

Cool

Regards,
Anup


Thanks,
Zong


4. SBI_PMU_COUNTER_SET_PHYS_ADDR
This call takes two parameters:
1) counter_idx
2) 8byte aligned physical address
It will set the physical address of memory location where the SBI
implementation will write the 64bit SOFTWARE counter. This SBI
call
is only for counters not mapped to any CSR (i.e. only for counters
with CSR_Number > 0xfff).

5. SBI_PMU_COUNTER_START
This call takes two parameters:
1) counter_idx
2) initial_value
It will inform SBI implementation to start/enable specified counter
with specified initial value. This SBI call will fail for counters
which are not present.

6. SBI_PMU_COUNTER_STOP
This call takes one parameter:
1) counter_idx
It will inform SBI implementation to stop/disable specified
counters
on the calling HART. This SBI call will fail for counters which are
not present.

The M-mode runtime firmware (OpenSBI) Development Notes:

1. The M-mode runtime firmware will have to translate SBI PMU
event_idx and event_into into platform dependent MHPMEVENT
CSR
value before starting/enabling a HARDWARE counter.

2. The M-mode runtime firmware (OpenSBI) will need to know
following
platform dependent information:
A) Possible event_idx values allowed (or supported) by a
HARDWARE
counter (i.e. HPMCOUNTER)
B) Mapping of event_idx for HARDWARE/CACHE event to
MHPMEVENT
CSR
value. This is optional for platform. By default, OpenSBI will
write a value <xyz> to MHPMEVENT CSR where lower 20bits
of
<xyz>
are event_idx and upper XLEN-20 bits of <xyz> are lower XLEN-
20
bits of event_info
C) Additional platform-specific progamming required for selecting
event_idx + event_info combination. This is also optional for
platform.

3. All platform dependent information mentioned above, can be
obtained
by M-mode runtime firmware (OpenSBI) from platform specific
code.
The DT/ACPI can also be used to describe 2.A and 2.B
mentioned
above
but 2.C will always require platform specific code.

Linux RISC-V PMU Driver Development Notes:

1. Driver probe
The Linux RISC-V driver can be platform driver with "riscv,sbi-pmu"
as DT compatible string and optional "interrupts" DT property. The
"interrupts" DT property if available should specify an edge-
triggered
overflow interrupt for each HART. When "interrupts" DT property
is
present, we might also need another DT property for mapping
HARTID
to entries in "interrupts" DT property. The platform driver probe
will:
A) Need to ensure that underlying SBI implementation provides
SBI PMU extension using sbi_probe_extension() API of
arch/riscv.
B) Detect number of counters using SBI_PMU_NUM_COUNTERS
call
C) Get CSR details of each counter using
SBI_PMU_COUNTER_GET_CSR
call. If the counter is a SOFTWARE counter then use the
SBI_PMU_COUNTER_SET_PHYS_ADDR call to set memory
location
of counter. The driver skip this in driver probe and instead
do this lazily in add() callback mentioned below.

2. event_init() callback
The event_init() callback will primarily translate user-space
perf_event_attr to SBI PMU event_idx and event_info. It can do
this in following way:
A) perf_event_attr.type == PERF_TYPE_HARDWARE
event_idx.type = 0x0
event_idx.code = Value from enum sbi_pmu_hw_id based on
perf_event_attr.config
event_info = 0
B) perf_event_attr.type == PERF_TYPE_HW_CACHE
event_idx.type = 0x1
event_idx.code.cache_id = Value from enum
sbi_pmu_hw_cache_id
based on perf_event_attr.config
event_idx.code.op_id = Value from enum sbi_pmu_hw_op_id
based on perf_event_attr.config
event_idx.code.result_id = Value from enum
sbi_pmu_hw_result_id
based on perf_event_attr.config
event_info = 0
C) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 0
event_idx.type = 0x2
event_idx.code = 0x0
event_info = perf_event_attr.config[62:0]
D) perf_event_attr.type == PERF_TYPE_RAW and
perf_event_attr.config[63:63] == 1
event_idx.type = 0xf
event_idx.code = Value from enum sbi_pmu_sw_id based on
perf_event_attr.config
event_info = 0
(Note: event_init() will fail if it is not able to figure out
event_idx and event_info value corresponding to
perf_event_attr)
(Note: event_init() will not assign counter to perf_event because
it will be done by event_add())

3. add() callback
The add() callback of Linux RISC-V PMU driver will find a
free counter on current CPU/HART such that the perf_event
event_idx + event_info combination is supported by the counter.
To check-and-set event_idx + event_info combination for a
counter, we will use the SBI_PMU_COUNTER_SET_EVENT call.
The counter allocation and SBI_PMU_COUNTER_SET_EVENT call
can be futher optimized by looking at CSR details.
For example:
A) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_CPU_CYCLES, we should
prefer counter mapping to CYCLE CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
B) For event_idx.type == 0 and
event_idx.code == SBI_PMU_HW_INSTRUCTIONS, we should
prefer counter mapping to INSTRET CSR and skip doing
SBI_PMU_COUNTER_SET_EVENT call.
C) For event_idx == 0xf, only perfer counters mapping
to 0xfff CSR (i.e. SOFTWARE counters).

4. del() callback
The del() callback of Linux RISC-V PMU driver will release
or free the counter.

5. start() callback
The start() callback of Linux RISC-V PMU driver will start
the counter using the SBI_PMU_COUNTER_START call.

6. stop() callback
The stop() callback of Linux RISC-V PMU driver will stop
the counter using the SBI_PMU_COUNTER_STOP call.

Regards,
Anup