
Anup Patel
Hi All,
We don't have a dedicated RISC-V PMU extension but we do have HARDWARE performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER CSRs. A RISC-V implementation can support monitoring various HARDWARE events using limited number of HPMCOUNTER CSRs.
In addition to HARDWARE performance counters, a SBI implementation (e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for events such as number of RFENCEs, number of IPIs, number of misaligned load/store instructions, number of illegal instructions, etc.
We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and SOFTWARE counters provided by SBI implementation.
To define SBI PMU extension, we first define counter_idx which is a logical number assigned to a counter and event_idx which is an encoded number representing the HARDWARE/SOFTWARE event to be monitored.
The SBI PMU event_idx is a XLEN bits wide number encoded as follows: event_idx[XLEN-1:16] = info event_idx[15:12] = type event_idx[11:0] = code
If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code can be one of the following values: enum sbi_pmu_hw_id { SBI_PMU_HW_CPU_CYCLES = 0, SBI_PMU_HW_INSTRUCTIONS = 1, SBI_PMU_HW_CACHE_REFERENCES = 2, SBI_PMU_HW_CACHE_MISSES = 3, SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4, SBI_PMU_HW_BRANCH_MISSES = 5, SBI_PMU_HW_BUS_CYCLES = 6, SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7, SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8, SBI_PMU_HW_REF_CPU_CYCLES = 9, SBI_PMU_HW_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE CACHE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code is encoded as follows: event_idx.code[11:3] = cache_id event_idx.code[2:1] = op_id event_idx.code[0:0] = result_id enum sbi_pmu_hw_cache_id { SBI_PMU_HW_CACHE_L1D = 0, SBI_PMU_HW_CACHE_L1I = 1, SBI_PMU_HW_CACHE_LL = 2, SBI_PMU_HW_CACHE_DTLB = 3, SBI_PMU_HW_CACHE_ITLB = 4, SBI_PMU_HW_CACHE_BPU = 5, SBI_PMU_HW_CACHE_NODE = 6, SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_id { SBI_PMU_HW_CACHE_OP_READ = 0, SBI_PMU_HW_CACHE_OP_WRITE = 1, SBI_PMU_HW_CACHE_OP_PREFETCH = 2, SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_result_id { SBI_PMU_HW_CACHE_RESULT_ACCESS = 0, SBI_PMU_HW_CACHE_RESULT_MISS = 1, SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW event, both event_idx.info and event_idx.code are platform dependent.
If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE event, event_idx.info is SBI implementation specific and event_idx.code can be one of the following: enum sbi_pmu_sw_id { SBI_PMU_SW_MISALIGNED_LOAD = 0, SBI_PMU_SW_MISALIGNED_STORE = 1, SBI_PMU_SW_ILLEGAL_INSN = 2, SBI_PMU_SW_LOCAL_SET_TIMER = 3, SBI_PMU_SW_LOCAL_IPI = 4, SBI_PMU_SW_LOCAL_FENCE_I = 5, SBI_PMU_SW_LOCAL_SFENCE_VMA = 6, SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7, SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8, SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9, SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10, SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11, SBI_PMU_SW_MAX, /* non-ABI */ };
In future, more events can be defined without breaking ABI compatibility of SBI calls.
Using definition of counter_idx and event_idx, we can potentially have the following SBI calls:
1. SBI_PMU_NUM_COUNTERS This call will return the number of COUNTERs 2. SBI_PMU_COUNTER_DESCRIBE This call takes two parameters: 1) counter_idx 2) physical address It will write the description of SBI PMU counter at specified physical address. The details of the SBI PMU counter written at specified physical address are as follows: 1. Name (64 bytes) 2. CSR_Number (2 bytes) (CSR_Number <= 0xfff means counter is a RISC-V CSR) (CSR_Number > 0xfff means counter is a SBI implementation counter) (E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR) 3. CSR_Width (2 bytes) (Number of CSR bits implemented in HW) 4. Event_Count (2 bytes) (Number of events in Event_List array) 5. Event_List (2 * Event_Count bytes) (This is an array of 16bit values where each 16bit value is the supported event_idx.type and event_idx.code combination) 3. SBI_PMU_COUNTER_SET_PHYS_ADDR This call takes two parameters: 1) counter_idx 2) physical address It will set the physical address of memory location where the SBI implementation will write the 64bit SOFTWARE counter. This SBI call is only for counters not mapped to any CSR (i.e. only for counters with CSR_Number > 0xfff). 4. SBI_PMU_COUNTER_START This call takes two parameters: 1) counter_idx 2) event_idx It will inform SBI implementation to configure and start/enable specified counter on the calling HART to monitor specific event. This SBI call will fail for counters which are not present and specified event_idx is not supported by the counter. 5. SBI_PMU_COUNTER_STOP This call takes one parameter: 1) counter_idx It will inform SBI implementation to stop/disable specified counters on the calling HART. This SBI call will fail for counters which are not present.
From above, the RISC-V PMU driver will use most of the SBI calls at boot time. Only SBI_PMU_COUNTER_START to be used once before using the counter. The reading of counter is by reading CSR (for CSR_Number < 0xfff) OR by reading memory location (for CSR_Offset >= 0xfff). The counter overflow handling will have to be done in software by Linux kernel.
Using the SBI PMU extension, the M-mode runtime firmware (or Hypervisors) can provide a standardized view of HARDWARE/SOFTWARE counters and events to S-mode (or VS-mode) software.
The M-mode runtime firmware (OpenSBI) will need to know following platform dependent information: 1. Possible event_idx values allowed (or supported) by a HARDWARE counter (i.e. HPMCOUNTER) 2. Mapping of event_idx for HARDWARE event to HPMEVENT CSR value 3. Mapping of event_idx for HARDWARE CACHE event to HPMEVENT CSR value 4. Mapping of event_idx for HARDWARE RAW event to HPMEVENT CSR value 5. Additional platform-specific progamming required by any event_idx
All platform dependent information mentioned above, can be obtained by M-mode runtime firmware (OpenSBI) from platform specific code. The DT/ACPI can also be used to described 1), 2), 3), and 4) mentioned above but 5) will always require platform specific code.
Regards, Anup
|
|
On Mon, Jul 6, 2020 at 12:35 AM Anup Patel <anup.patel@...> wrote: Hi All,
We don't have a dedicated RISC-V PMU extension but we do have HARDWARE performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER CSRs. A RISC-V implementation can support monitoring various HARDWARE events using limited number of HPMCOUNTER CSRs.
In addition to HARDWARE performance counters, a SBI implementation (e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for events such as number of RFENCEs, number of IPIs, number of misaligned load/store instructions, number of illegal instructions, etc.
We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and SOFTWARE counters provided by SBI implementation.
To define SBI PMU extension, we first define counter_idx which is a logical number assigned to a counter and event_idx which is an encoded
Is there more detail about counter_idx? I was wondering that 1. What is the ordering of logical numbers for HW and SW counters? I think that the logical numbers are assigned by OpenSBI. 2. How to know the logical number of counter_idx of each HW and SW counters from s-mode? I guess that we need to know the logical numbers of all counters before we invoke a SBI call. number representing the HARDWARE/SOFTWARE event to be monitored.
The SBI PMU event_idx is a XLEN bits wide number encoded as follows: event_idx[XLEN-1:16] = info event_idx[15:12] = type event_idx[11:0] = code
If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code can be one of the following values: enum sbi_pmu_hw_id { SBI_PMU_HW_CPU_CYCLES = 0, SBI_PMU_HW_INSTRUCTIONS = 1, SBI_PMU_HW_CACHE_REFERENCES = 2, SBI_PMU_HW_CACHE_MISSES = 3, SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4, SBI_PMU_HW_BRANCH_MISSES = 5, SBI_PMU_HW_BUS_CYCLES = 6, SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7, SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8, SBI_PMU_HW_REF_CPU_CYCLES = 9, SBI_PMU_HW_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE CACHE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code is encoded as follows: event_idx.code[11:3] = cache_id event_idx.code[2:1] = op_id event_idx.code[0:0] = result_id enum sbi_pmu_hw_cache_id { SBI_PMU_HW_CACHE_L1D = 0, SBI_PMU_HW_CACHE_L1I = 1, SBI_PMU_HW_CACHE_LL = 2, SBI_PMU_HW_CACHE_DTLB = 3, SBI_PMU_HW_CACHE_ITLB = 4, SBI_PMU_HW_CACHE_BPU = 5, SBI_PMU_HW_CACHE_NODE = 6, SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_id { SBI_PMU_HW_CACHE_OP_READ = 0, SBI_PMU_HW_CACHE_OP_WRITE = 1, SBI_PMU_HW_CACHE_OP_PREFETCH = 2, SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_result_id { SBI_PMU_HW_CACHE_RESULT_ACCESS = 0, SBI_PMU_HW_CACHE_RESULT_MISS = 1, SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW event, both event_idx.info and event_idx.code are platform dependent.
If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE event, event_idx.info is SBI implementation specific and event_idx.code can be one of the following: enum sbi_pmu_sw_id { SBI_PMU_SW_MISALIGNED_LOAD = 0, SBI_PMU_SW_MISALIGNED_STORE = 1, SBI_PMU_SW_ILLEGAL_INSN = 2, SBI_PMU_SW_LOCAL_SET_TIMER = 3, SBI_PMU_SW_LOCAL_IPI = 4, SBI_PMU_SW_LOCAL_FENCE_I = 5, SBI_PMU_SW_LOCAL_SFENCE_VMA = 6, SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7, SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8, SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9, SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10, SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11, SBI_PMU_SW_MAX, /* non-ABI */ };
In future, more events can be defined without breaking ABI compatibility of SBI calls.
Using definition of counter_idx and event_idx, we can potentially have the following SBI calls:
1. SBI_PMU_NUM_COUNTERS This call will return the number of COUNTERs Is it for the SW counters and we get the number of HW counters by DT? Or does it return the number of HW and SW counters both? If so, how to distinguish the number of HW and SW? The other question is that the number of SW counters is defined by the core of OpenSBI or platform-dependent? 2. SBI_PMU_COUNTER_DESCRIBE This call takes two parameters: 1) counter_idx 2) physical address It will write the description of SBI PMU counter at specified physical address. The details of the SBI PMU counter written at specified physical address are as follows: 1. Name (64 bytes) 2. CSR_Number (2 bytes) (CSR_Number <= 0xfff means counter is a RISC-V CSR) (CSR_Number > 0xfff means counter is a SBI implementation counter) (E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR) 3. CSR_Width (2 bytes) (Number of CSR bits implemented in HW) 4. Event_Count (2 bytes) (Number of events in Event_List array) 5. Event_List (2 * Event_Count bytes) (This is an array of 16bit values where each 16bit value is the supported event_idx.type and event_idx.code combination) What is the size we should allocate for this physical address? In my understanding, we need to allocate the pages in s-mode first, then pass the address of the pages to the second parameter, but we don't know the event_counter before we allocate the space for it, so it might across the boundary if event_count is very big. 3. SBI_PMU_COUNTER_SET_PHYS_ADDR This call takes two parameters: 1) counter_idx 2) physical address It will set the physical address of memory location where the SBI implementation will write the 64bit SOFTWARE counter. This SBI call is only for counters not mapped to any CSR (i.e. only for counters with CSR_Number > 0xfff). 4. SBI_PMU_COUNTER_START This call takes two parameters: 1) counter_idx 2) event_idx It will inform SBI implementation to configure and start/enable specified counter on the calling HART to monitor specific event. This SBI call will fail for counters which are not present and specified event_idx is not supported by the counter. 5. SBI_PMU_COUNTER_STOP This call takes one parameter: 1) counter_idx It will inform SBI implementation to stop/disable specified counters on the calling HART. This SBI call will fail for counters which are not present.
From above, the RISC-V PMU driver will use most of the SBI calls at boot time. Only SBI_PMU_COUNTER_START to be used once before using the counter. The reading of counter is by reading CSR (for CSR_Number < 0xfff) OR by reading memory location (for CSR_Offset >= 0xfff). The counter overflow handling will have to be done in software by Linux kernel.
Using the SBI PMU extension, the M-mode runtime firmware (or Hypervisors) can provide a standardized view of HARDWARE/SOFTWARE counters and events to S-mode (or VS-mode) software.
The M-mode runtime firmware (OpenSBI) will need to know following platform dependent information: 1. Possible event_idx values allowed (or supported) by a HARDWARE counter (i.e. HPMCOUNTER) 2. Mapping of event_idx for HARDWARE event to HPMEVENT CSR value 3. Mapping of event_idx for HARDWARE CACHE event to HPMEVENT CSR value 4. Mapping of event_idx for HARDWARE RAW event to HPMEVENT CSR value 5. Additional platform-specific progamming required by any event_idx
All platform dependent information mentioned above, can be obtained by M-mode runtime firmware (OpenSBI) from platform specific code. The DT/ACPI can also be used to described 1), 2), 3), and 4) mentioned above but 5) will always require platform specific code. I would update the next version of DT file to describe the points from 1) to 4). Thanks. As you mentioned before, it would be hard to sync the platform specific code with the DT of real use. I prefer to get 1), 2), 3) and 4) from DT first on each platform, and use platform specific code if DT is unavailable. (generic platform use DT certainly), then we could maximally reduce the inconsistency. Regards, Anup
|
|

Anup Patel
toggle quoted message
Show quoted text
-----Original Message----- From: Zong Li <zong.li@...> Sent: 06 July 2020 13:59 To: Anup Patel <Anup.Patel@...> Cc: tech-unixplatformspec@...; Andrew Waterman <andrew@...>; Greg Favor <gfavor@...> Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Mon, Jul 6, 2020 at 12:35 AM Anup Patel <anup.patel@...> wrote:
Hi All,
We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER CSRs. A RISC-V implementation can support monitoring various HARDWARE events using limited number of HPMCOUNTER CSRs.
In addition to HARDWARE performance counters, a SBI implementation (e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for events such as number of RFENCEs, number of IPIs, number of misaligned load/store instructions, number of illegal instructions, etc.
We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and SOFTWARE counters provided by SBI implementation.
To define SBI PMU extension, we first define counter_idx which is a logical number assigned to a counter and event_idx which is an encoded Is there more detail about counter_idx? I was wondering that 1. What is the ordering of logical numbers for HW and SW counters? I think that the logical numbers are assigned by OpenSBI. Like mentioned here, counter_idx is a logical index for all available counters (i.e. HARDWARE and SOFTWARE). The SBI implementation (i.e. OpenSBI, Xvisor RISC-V, or KVM RISC-V) can assign counter_idx to HARDWARE and SOFTWARE counters in any order it likes. 2. How to know the logical number of counter_idx of each HW and SW counters from s-mode? I guess that we need to know the logical numbers of all counters before we invoke a SBI call. The SBI_PMU_COUNTER_DESCRIBE call mentioned below will tell us whether given counter_idx maps to a HARDWARE counter or SOFTWARE counter based on CSR_Number info returned by SBI_PMU_COUNTER_DESCRIBE call.
number representing the HARDWARE/SOFTWARE event to be monitored.
The SBI PMU event_idx is a XLEN bits wide number encoded as follows: event_idx[XLEN-1:16] = info event_idx[15:12] = type event_idx[11:0] = code
If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code can be one of the following values: enum sbi_pmu_hw_id { SBI_PMU_HW_CPU_CYCLES = 0, SBI_PMU_HW_INSTRUCTIONS = 1, SBI_PMU_HW_CACHE_REFERENCES = 2, SBI_PMU_HW_CACHE_MISSES = 3, SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4, SBI_PMU_HW_BRANCH_MISSES = 5, SBI_PMU_HW_BUS_CYCLES = 6, SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7, SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8, SBI_PMU_HW_REF_CPU_CYCLES = 9, SBI_PMU_HW_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code is encoded as follows: event_idx.code[11:3] = cache_id event_idx.code[2:1] = op_id event_idx.code[0:0] = result_id enum sbi_pmu_hw_cache_id { SBI_PMU_HW_CACHE_L1D = 0, SBI_PMU_HW_CACHE_L1I = 1, SBI_PMU_HW_CACHE_LL = 2, SBI_PMU_HW_CACHE_DTLB = 3, SBI_PMU_HW_CACHE_ITLB = 4, SBI_PMU_HW_CACHE_BPU = 5, SBI_PMU_HW_CACHE_NODE = 6, SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_id
{ SBI_PMU_HW_CACHE_OP_READ = 0, SBI_PMU_HW_CACHE_OP_WRITE = 1, SBI_PMU_HW_CACHE_OP_PREFETCH = 2, SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_result_id { SBI_PMU_HW_CACHE_RESULT_ACCESS = 0, SBI_PMU_HW_CACHE_RESULT_MISS = 1, SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW event, both event_idx.info and event_idx.code are platform dependent.
If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE event, event_idx.info is SBI implementation specific and event_idx.code can be one of the following: enum sbi_pmu_sw_id { SBI_PMU_SW_MISALIGNED_LOAD = 0, SBI_PMU_SW_MISALIGNED_STORE = 1, SBI_PMU_SW_ILLEGAL_INSN = 2, SBI_PMU_SW_LOCAL_SET_TIMER = 3, SBI_PMU_SW_LOCAL_IPI = 4, SBI_PMU_SW_LOCAL_FENCE_I = 5, SBI_PMU_SW_LOCAL_SFENCE_VMA = 6, SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7, SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8, SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9, SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10, SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11, SBI_PMU_SW_MAX, /* non-ABI */ };
In future, more events can be defined without breaking ABI compatibility of SBI calls.
Using definition of counter_idx and event_idx, we can potentially have the following SBI calls:
1. SBI_PMU_NUM_COUNTERS This call will return the number of COUNTERs Is it for the SW counters and we get the number of HW counters by DT? Or does it return the number of HW and SW counters both? If so, how to distinguish the number of HW and SW?
This call returns total number of counters (i.e. HARDWARE and SOFTWARE both) The other question is that the number of SW counters is defined by the core of OpenSBI or platform-dependent? Number of SW counters are defined by SBI implementation (i.e. OpenSBI, Xvisor RISC-V, and KVM RISC-V). Most likely SW counters will not include any platform-dependent SW counters although this is design choice of SBI implementation.
2. SBI_PMU_COUNTER_DESCRIBE This call takes two parameters: 1) counter_idx 2) physical address It will write the description of SBI PMU counter at specified physical address. The details of the SBI PMU counter written at specified physical address are as follows: 1. Name (64 bytes) 2. CSR_Number (2 bytes) (CSR_Number <= 0xfff means counter is a RISC-V CSR) (CSR_Number > 0xfff means counter is a SBI implementation counter) (E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR) 3. CSR_Width (2 bytes) (Number of CSR bits implemented in HW) 4. Event_Count (2 bytes) (Number of events in Event_List array) 5. Event_List (2 * Event_Count bytes) (This is an array of 16bit values where each 16bit value is the supported event_idx.type and event_idx.code combination) What is the size we should allocate for this physical address? In my understanding, we need to allocate the pages in s-mode first, then pass the address of the pages to the second parameter, but we don't know the event_counter before we allocate the space for it, so it might across the boundary if event_count is very big.
Theoretically, Event_Count cannot be more than 65535. I think we should have SBI_PMU_NUM_EVENTS calls which will return number of events supported by given counter_idx. This will help S-mode software to determine amount of memory to allocate for SBI_PMU_COUNTER_DESCRIBE.
3. SBI_PMU_COUNTER_SET_PHYS_ADDR This call takes two parameters: 1) counter_idx 2) physical address It will set the physical address of memory location where the SBI implementation will write the 64bit SOFTWARE counter. This SBI call is only for counters not mapped to any CSR (i.e. only for counters with CSR_Number > 0xfff). 4. SBI_PMU_COUNTER_START This call takes two parameters: 1) counter_idx 2) event_idx It will inform SBI implementation to configure and start/enable specified counter on the calling HART to monitor specific event. This SBI call will fail for counters which are not present and specified event_idx is not supported by the counter. 5. SBI_PMU_COUNTER_STOP This call takes one parameter: 1) counter_idx It will inform SBI implementation to stop/disable specified counters on the calling HART. This SBI call will fail for counters which are not present.
From above, the RISC-V PMU driver will use most of the SBI calls at boot time. Only SBI_PMU_COUNTER_START to be used once before using the counter.
The reading of counter is by reading CSR (for CSR_Number < 0xfff) OR by reading memory location (for CSR_Offset >= 0xfff). The counter overflow handling will have to be done in software by Linux kernel.
Using the SBI PMU extension, the M-mode runtime firmware (or Hypervisors) can provide a standardized view of HARDWARE/SOFTWARE counters and events to S-mode (or VS-mode) software.
The M-mode runtime firmware (OpenSBI) will need to know following platform dependent information: 1. Possible event_idx values allowed (or supported) by a HARDWARE counter (i.e. HPMCOUNTER) 2. Mapping of event_idx for HARDWARE event to HPMEVENT CSR value 3. Mapping of event_idx for HARDWARE CACHE event to HPMEVENT CSR value 4.
Mapping of event_idx for HARDWARE RAW event to HPMEVENT CSR value 5.
Additional platform-specific progamming required by any event_idx
All platform dependent information mentioned above, can be obtained by M-mode runtime firmware (OpenSBI) from platform specific code. The DT/ACPI can also be used to described 1), 2), 3), and 4) mentioned above but 5) will always require platform specific code. I would update the next version of DT file to describe the points from 1) to 4). Thanks. As you mentioned before, it would be hard to sync the platform specific code with the DT of real use. I prefer to get 1), 2), 3) and 4) from DT first on each platform, and use platform specific code if DT is unavailable. (generic platform use DT certainly), then we could maximally reduce the inconsistency.
It should platform's choice on how it wants to describe HARDWARE events and HARDWARE counters. The OpenSBI generic platform will tend to use DT based parsing of HARDWARE events and HARDWARE counters but other platform can do things differently. The S-mode software (i.e. Linux) should not get HARDWARE events and HARDWARE counters from DT because DT describes HARDWARE and DT will not include SOFTWARE events and SOFTWARE counters. Also, SOFTWARE events and SOFTWARE counters will change for given platform as OpenSBI continues to improve so it will be hard to keep the DT in sync. The best thing for S-mode software would be to depend on one method of discovering all counters and supported events which is the SBI_PMU_COUNTER_DESCRIBE call. In other words, no need for platform driver for Linux RISC-V PMU driver instead depend only on sbi_probe_extension() to detect SBI PMU extension. Regards, Anup
|
|
On Tue, Jul 7, 2020 at 12:21 AM Anup Patel <anup.patel@...> wrote:
-----Original Message----- From: Zong Li <zong.li@...> Sent: 06 July 2020 13:59 To: Anup Patel <Anup.Patel@...> Cc: tech-unixplatformspec@...; Andrew Waterman <andrew@...>; Greg Favor <gfavor@...> Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Mon, Jul 6, 2020 at 12:35 AM Anup Patel <anup.patel@...> wrote:
Hi All,
We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER CSRs. A RISC-V implementation can support monitoring various HARDWARE events using limited number of HPMCOUNTER CSRs.
In addition to HARDWARE performance counters, a SBI implementation (e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for events such as number of RFENCEs, number of IPIs, number of misaligned load/store instructions, number of illegal instructions, etc.
We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and SOFTWARE counters provided by SBI implementation.
To define SBI PMU extension, we first define counter_idx which is a logical number assigned to a counter and event_idx which is an encoded Is there more detail about counter_idx? I was wondering that 1. What is the ordering of logical numbers for HW and SW counters? I think that the logical numbers are assigned by OpenSBI. Like mentioned here, counter_idx is a logical index for all available counters (i.e. HARDWARE and SOFTWARE). The SBI implementation (i.e. OpenSBI, Xvisor RISC-V, or KVM RISC-V) can assign counter_idx to HARDWARE and SOFTWARE counters in any order it likes.
2. How to know the logical number of counter_idx of each HW and SW counters from s-mode? I guess that we need to know the logical numbers of all counters before we invoke a SBI call. The SBI_PMU_COUNTER_DESCRIBE call mentioned below will tell us whether given counter_idx maps to a HARDWARE counter or SOFTWARE counter based on CSR_Number info returned by SBI_PMU_COUNTER_DESCRIBE call.
OK, I assume the logical number of counte_idx is sequential and started from zero here, so during initialization of s-mode software, we could get the total number 'N' of counters by SBI_PMU_NUM_COUNTERS first, then loop the N times to identify capability of each counter. Does it align your ideas?
number representing the HARDWARE/SOFTWARE event to be monitored.
The SBI PMU event_idx is a XLEN bits wide number encoded as follows: event_idx[XLEN-1:16] = info event_idx[15:12] = type event_idx[11:0] = code
If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code can be one of the following values: enum sbi_pmu_hw_id { SBI_PMU_HW_CPU_CYCLES = 0, SBI_PMU_HW_INSTRUCTIONS = 1, SBI_PMU_HW_CACHE_REFERENCES = 2, SBI_PMU_HW_CACHE_MISSES = 3, SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4, SBI_PMU_HW_BRANCH_MISSES = 5, SBI_PMU_HW_BUS_CYCLES = 6, SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7, SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8, SBI_PMU_HW_REF_CPU_CYCLES = 9, SBI_PMU_HW_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code is encoded as follows: event_idx.code[11:3] = cache_id event_idx.code[2:1] = op_id event_idx.code[0:0] = result_id enum sbi_pmu_hw_cache_id { SBI_PMU_HW_CACHE_L1D = 0, SBI_PMU_HW_CACHE_L1I = 1, SBI_PMU_HW_CACHE_LL = 2, SBI_PMU_HW_CACHE_DTLB = 3, SBI_PMU_HW_CACHE_ITLB = 4, SBI_PMU_HW_CACHE_BPU = 5, SBI_PMU_HW_CACHE_NODE = 6, SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_id
{ SBI_PMU_HW_CACHE_OP_READ = 0, SBI_PMU_HW_CACHE_OP_WRITE = 1, SBI_PMU_HW_CACHE_OP_PREFETCH = 2, SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_result_id { SBI_PMU_HW_CACHE_RESULT_ACCESS = 0, SBI_PMU_HW_CACHE_RESULT_MISS = 1, SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW event, both event_idx.info and event_idx.code are platform dependent.
If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE event, event_idx.info is SBI implementation specific and event_idx.code can be one of the following: enum sbi_pmu_sw_id { SBI_PMU_SW_MISALIGNED_LOAD = 0, SBI_PMU_SW_MISALIGNED_STORE = 1, SBI_PMU_SW_ILLEGAL_INSN = 2, SBI_PMU_SW_LOCAL_SET_TIMER = 3, SBI_PMU_SW_LOCAL_IPI = 4, SBI_PMU_SW_LOCAL_FENCE_I = 5, SBI_PMU_SW_LOCAL_SFENCE_VMA = 6, SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7, SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8, SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9, SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10, SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11, SBI_PMU_SW_MAX, /* non-ABI */ };
In future, more events can be defined without breaking ABI compatibility of SBI calls.
Using definition of counter_idx and event_idx, we can potentially have the following SBI calls:
1. SBI_PMU_NUM_COUNTERS This call will return the number of COUNTERs Is it for the SW counters and we get the number of HW counters by DT? Or does it return the number of HW and SW counters both? If so, how to distinguish the number of HW and SW? This call returns total number of counters (i.e. HARDWARE and SOFTWARE both)
The other question is that the number of SW counters is defined by the core of OpenSBI or platform-dependent? Number of SW counters are defined by SBI implementation (i.e. OpenSBI, Xvisor RISC-V, and KVM RISC-V). Most likely SW counters will not include any platform-dependent SW counters although this is design choice of SBI implementation.
OK, I got it. It would be enough, thanks.
2. SBI_PMU_COUNTER_DESCRIBE This call takes two parameters: 1) counter_idx 2) physical address It will write the description of SBI PMU counter at specified physical address. The details of the SBI PMU counter written at specified physical address are as follows: 1. Name (64 bytes) 2. CSR_Number (2 bytes) (CSR_Number <= 0xfff means counter is a RISC-V CSR) (CSR_Number > 0xfff means counter is a SBI implementation counter) (E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR) 3. CSR_Width (2 bytes) (Number of CSR bits implemented in HW) 4. Event_Count (2 bytes) (Number of events in Event_List array) 5. Event_List (2 * Event_Count bytes) (This is an array of 16bit values where each 16bit value is the supported event_idx.type and event_idx.code combination) What is the size we should allocate for this physical address? In my understanding, we need to allocate the pages in s-mode first, then pass the address of the pages to the second parameter, but we don't know the event_counter before we allocate the space for it, so it might across the boundary if event_count is very big. Theoretically, Event_Count cannot be more than 65535.
I think we should have SBI_PMU_NUM_EVENTS calls which will return number of events supported by given counter_idx. This will help S-mode software to determine amount of memory to allocate for SBI_PMU_COUNTER_DESCRIBE.
Sounds good to me.
3. SBI_PMU_COUNTER_SET_PHYS_ADDR This call takes two parameters: 1) counter_idx 2) physical address It will set the physical address of memory location where the SBI implementation will write the 64bit SOFTWARE counter. This SBI call is only for counters not mapped to any CSR (i.e. only for counters with CSR_Number > 0xfff). 4. SBI_PMU_COUNTER_START This call takes two parameters: 1) counter_idx 2) event_idx It will inform SBI implementation to configure and start/enable specified counter on the calling HART to monitor specific event. This SBI call will fail for counters which are not present and specified event_idx is not supported by the counter. 5. SBI_PMU_COUNTER_STOP This call takes one parameter: 1) counter_idx It will inform SBI implementation to stop/disable specified counters on the calling HART. This SBI call will fail for counters which are not present.
From above, the RISC-V PMU driver will use most of the SBI calls at boot time. Only SBI_PMU_COUNTER_START to be used once before using the counter.
The reading of counter is by reading CSR (for CSR_Number < 0xfff) OR by reading memory location (for CSR_Offset >= 0xfff). The counter overflow handling will have to be done in software by Linux kernel.
Using the SBI PMU extension, the M-mode runtime firmware (or Hypervisors) can provide a standardized view of HARDWARE/SOFTWARE counters and events to S-mode (or VS-mode) software.
The M-mode runtime firmware (OpenSBI) will need to know following platform dependent information: 1. Possible event_idx values allowed (or supported) by a HARDWARE counter (i.e. HPMCOUNTER) 2. Mapping of event_idx for HARDWARE event to HPMEVENT CSR value 3. Mapping of event_idx for HARDWARE CACHE event to HPMEVENT CSR value 4.
Mapping of event_idx for HARDWARE RAW event to HPMEVENT CSR value 5.
Additional platform-specific progamming required by any event_idx
All platform dependent information mentioned above, can be obtained by M-mode runtime firmware (OpenSBI) from platform specific code. The DT/ACPI can also be used to described 1), 2), 3), and 4) mentioned above but 5) will always require platform specific code. I would update the next version of DT file to describe the points from 1) to 4). Thanks. As you mentioned before, it would be hard to sync the platform specific code with the DT of real use. I prefer to get 1), 2), 3) and 4) from DT first on each platform, and use platform specific code if DT is unavailable. (generic platform use DT certainly), then we could maximally reduce the inconsistency. It should platform's choice on how it wants to describe HARDWARE events and HARDWARE counters. The OpenSBI generic platform will tend to use DT based parsing of HARDWARE events and HARDWARE counters but other platform can do things differently.
The S-mode software (i.e. Linux) should not get HARDWARE events and HARDWARE counters from DT because DT describes HARDWARE and DT will not include SOFTWARE events and SOFTWARE counters. Also, SOFTWARE events and SOFTWARE counters will change for given platform as OpenSBI continues to improve so it will be hard to keep the DT in sync.
The best thing for S-mode software would be to depend on one method of discovering all counters and supported events which is the SBI_PMU_COUNTER_DESCRIBE call. In other words, no need for platform driver for Linux RISC-V PMU driver instead depend only on sbi_probe_extension() to detect SBI PMU extension.
OK, make sense. Regards, Anup
|
|
On Tue, 2020-07-07 at 11:05 +0800, Zong Li wrote: On Tue, Jul 7, 2020 at 12:21 AM Anup Patel <anup.patel@...> wrote:
-----Original Message----- From: Zong Li <zong.li@...> Sent: 06 July 2020 13:59 To: Anup Patel <Anup.Patel@...> Cc: tech-unixplatformspec@...; Andrew Waterman <andrew@...>; Greg Favor <gfavor@...> Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Mon, Jul 6, 2020 at 12:35 AM Anup Patel <anup.patel@...> wrote:
Hi All,
We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER CSRs. A RISC-V implementation can support monitoring various HARDWARE events using limited number of HPMCOUNTER CSRs.
In addition to HARDWARE performance counters, a SBI implementation (e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for events such as number of RFENCEs, number of IPIs, number of misaligned load/store instructions, number of illegal instructions, etc.
We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and SOFTWARE counters provided by SBI implementation.
To define SBI PMU extension, we first define counter_idx which is a logical number assigned to a counter and event_idx which is an encoded Is there more detail about counter_idx? I was wondering that 1. What is the ordering of logical numbers for HW and SW counters? I think that the logical numbers are assigned by OpenSBI. Like mentioned here, counter_idx is a logical index for all available counters (i.e. HARDWARE and SOFTWARE). The SBI implementation (i.e. OpenSBI, Xvisor RISC-V, or KVM RISC-V) can assign counter_idx to HARDWARE and SOFTWARE counters in any order it likes.
2. How to know the logical number of counter_idx of each HW and SW counters from s-mode? I guess that we need to know the logical numbers of all counters before we invoke a SBI call. The SBI_PMU_COUNTER_DESCRIBE call mentioned below will tell us whether given counter_idx maps to a HARDWARE counter or SOFTWARE counter based on CSR_Number info returned by SBI_PMU_COUNTER_DESCRIBE call. OK, I assume the logical number of counte_idx is sequential and started from zero here, so during initialization of s-mode software, we could get the total number 'N' of counters by SBI_PMU_NUM_COUNTERS first, then loop the N times to identify capability of each counter. Does it align your ideas?
That's what my understanding as well. Assigning continous counter_idx may put a restriction on M-mode implementation. How about assigning some ranges for software vs hardware counters. May be split the hardware into different ranges as well based on event_idx.type. This also allows supervisor to know what type of the counter it is looking at without parsing the data written by the describe call. number representing the HARDWARE/SOFTWARE event to be monitored.
The SBI PMU event_idx is a XLEN bits wide number encoded as follows: event_idx[XLEN-1:16] = info event_idx[15:12] = type event_idx[11:0] = code
If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code can be one of the following values: enum sbi_pmu_hw_id { SBI_PMU_HW_CPU_CYCLES = 0, SBI_PMU_HW_INSTRUCTIONS = 1, SBI_PMU_HW_CACHE_REFERENCES = 2, SBI_PMU_HW_CACHE_MISSES = 3, SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4, SBI_PMU_HW_BRANCH_MISSES = 5, SBI_PMU_HW_BUS_CYCLES = 6, SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7, SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8, SBI_PMU_HW_REF_CPU_CYCLES = 9, SBI_PMU_HW_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code is encoded as follows: event_idx.code[11:3] = cache_id event_idx.code[2:1] = op_id event_idx.code[0:0] = result_id enum sbi_pmu_hw_cache_id { SBI_PMU_HW_CACHE_L1D = 0, SBI_PMU_HW_CACHE_L1I = 1, SBI_PMU_HW_CACHE_LL = 2, SBI_PMU_HW_CACHE_DTLB = 3, SBI_PMU_HW_CACHE_ITLB = 4, SBI_PMU_HW_CACHE_BPU = 5, SBI_PMU_HW_CACHE_NODE = 6, SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_id
{ SBI_PMU_HW_CACHE_OP_READ = 0, SBI_PMU_HW_CACHE_OP_WRITE = 1, SBI_PMU_HW_CACHE_OP_PREFETCH = 2, SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_result_id { SBI_PMU_HW_CACHE_RESULT_ACCESS = 0, SBI_PMU_HW_CACHE_RESULT_MISS = 1, SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW event, both event_idx.info and event_idx.code are platform dependent.
If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE event, event_idx.info is SBI implementation specific and event_idx.code can be one of the following: enum sbi_pmu_sw_id { SBI_PMU_SW_MISALIGNED_LOAD = 0, SBI_PMU_SW_MISALIGNED_STORE = 1, SBI_PMU_SW_ILLEGAL_INSN = 2, SBI_PMU_SW_LOCAL_SET_TIMER = 3, SBI_PMU_SW_LOCAL_IPI = 4, SBI_PMU_SW_LOCAL_FENCE_I = 5, SBI_PMU_SW_LOCAL_SFENCE_VMA = 6, SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7, SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8, SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9, SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10, SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11, SBI_PMU_SW_MAX, /* non-ABI */ };
In future, more events can be defined without breaking ABI compatibility of SBI calls.
Using definition of counter_idx and event_idx, we can potentially have the following SBI calls:
1. SBI_PMU_NUM_COUNTERS This call will return the number of COUNTERs Is it for the SW counters and we get the number of HW counters by DT? Or does it return the number of HW and SW counters both? If so, how to distinguish the number of HW and SW? This call returns total number of counters (i.e. HARDWARE and SOFTWARE both)
The other question is that the number of SW counters is defined by the core of OpenSBI or platform-dependent? Number of SW counters are defined by SBI implementation (i.e. OpenSBI, Xvisor RISC-V, and KVM RISC-V). Most likely SW counters will not include any platform-dependent SW counters although this is design choice of SBI implementation. OK, I got it. It would be enough, thanks.
2. SBI_PMU_COUNTER_DESCRIBE This call takes two parameters: 1) counter_idx 2) physical address It will write the description of SBI PMU counter at specified physical address. The details of the SBI PMU counter written at specified physical address are as follows: 1. Name (64 bytes) 2. CSR_Number (2 bytes) (CSR_Number <= 0xfff means counter is a RISC-V CSR) (CSR_Number > 0xfff means counter is a SBI implementation counter) (E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR) 3. CSR_Width (2 bytes) (Number of CSR bits implemented in HW) 4. Event_Count (2 bytes) (Number of events in Event_List array) 5. Event_List (2 * Event_Count bytes) (This is an array of 16bit values where each 16bit value is the supported event_idx.type and event_idx.code combination) What is the size we should allocate for this physical address? In my understanding, we need to allocate the pages in s-mode first, then pass the address of the pages to the second parameter, but we don't know the event_counter before we allocate the space for it, so it might across the boundary if event_count is very big. Theoretically, Event_Count cannot be more than 65535.
I think we should have SBI_PMU_NUM_EVENTS calls which will return number of events supported by given counter_idx. This will help S-mode software to determine amount of memory to allocate for SBI_PMU_COUNTER_DESCRIBE.
Sounds good to me.
3. SBI_PMU_COUNTER_SET_PHYS_ADDR This call takes two parameters: 1) counter_idx 2) physical address It will set the physical address of memory location where the SBI implementation will write the 64bit SOFTWARE counter. This SBI call is only for counters not mapped to any CSR (i.e. only for counters with CSR_Number > 0xfff). 4. SBI_PMU_COUNTER_START This call takes two parameters: 1) counter_idx 2) event_idx It will inform SBI implementation to configure and start/enable specified counter on the calling HART to monitor specific event. This SBI call will fail for counters which are not present and specified event_idx is not supported by the counter. 5. SBI_PMU_COUNTER_STOP This call takes one parameter: 1) counter_idx It will inform SBI implementation to stop/disable specified counters on the calling HART. This SBI call will fail for counters which are not present.
From above, the RISC-V PMU driver will use most of the SBI calls at boot time. Only SBI_PMU_COUNTER_START to be used once before using the counter.
The reading of counter is by reading CSR (for CSR_Number < 0xfff) OR by reading memory location (for CSR_Offset >= 0xfff). The counter overflow handling will have to be done in software by Linux kernel.
Using the SBI PMU extension, the M-mode runtime firmware (or Hypervisors) can provide a standardized view of HARDWARE/SOFTWARE counters and events to S-mode (or VS-mode) software.
The M-mode runtime firmware (OpenSBI) will need to know following platform dependent information: 1. Possible event_idx values allowed (or supported) by a HARDWARE counter (i.e. HPMCOUNTER) 2. Mapping of event_idx for HARDWARE event to HPMEVENT CSR value 3. Mapping of event_idx for HARDWARE CACHE event to HPMEVENT CSR value 4.
Mapping of event_idx for HARDWARE RAW event to HPMEVENT CSR value 5.
Additional platform-specific progamming required by any event_idx
All platform dependent information mentioned above, can be obtained by M-mode runtime firmware (OpenSBI) from platform specific code. The DT/ACPI can also be used to described 1), 2), 3), and 4) mentioned above but 5) will always require platform specific code. I would update the next version of DT file to describe the points from 1) to 4). Thanks. As you mentioned before, it would be hard to sync the platform specific code with the DT of real use. I prefer to get 1), 2), 3) and 4) from DT first on each platform, and use platform specific code if DT is unavailable. (generic platform use DT certainly), then we could maximally reduce the inconsistency. It should platform's choice on how it wants to describe HARDWARE events and HARDWARE counters. The OpenSBI generic platform will tend to use DT based parsing of HARDWARE events and HARDWARE counters but other platform can do things differently.
The S-mode software (i.e. Linux) should not get HARDWARE events and HARDWARE counters from DT because DT describes HARDWARE and DT will not include SOFTWARE events and SOFTWARE counters. Also, SOFTWARE events and SOFTWARE counters will change for given platform as OpenSBI continues to improve so it will be hard to keep the DT in sync.
The best thing for S-mode software would be to depend on one method of discovering all counters and supported events which is the SBI_PMU_COUNTER_DESCRIBE call. In other words, no need for platform driver for Linux RISC-V PMU driver instead depend only on sbi_probe_extension() to detect SBI PMU extension.
OK, make sense.
Regards, Anup
-- Regards, Atish
|
|
> > > > The SBI PMU event_idx is a XLEN bits wide number encoded as
> > > > follows:
> > > > event_idx[XLEN-1:16] = info
> > > > event_idx[15:12] = type
> > > > event_idx[11:0] = code
Is there a reason you are limiting the event to 16 bits? On current designs, the mhpmeventX field is already >16 bits wide. I don't see an easy way to support that with this approach directly. (Or maybe I'm missing something?)
Are your event listings intended to represent a layer of mapping between SBI event numbers and actual how-to-program-the-silicon numbers (which would get around the issue above)? They do not match up with Rocket events, for example. Given the number of potential implementers, it is likely impossible to get agreement on a common base set of hardware-compatible events unless we act now, or yesteryear. :)
I would also recommend at least discussing adding an SBI call that allows one to do a wide write of many/all registers, otherwise writes become incredibly more expensive than reads. I can think of two use-cases that would be writing many of the counters at least several hundred times a second, and maybe even more rapidly.
Brian
|
|

Anup Patel
Hi Brian,
Please see my reply inline below..
Regards,
Anup
toggle quoted message
Show quoted text
From: Brian Grayson <brian.grayson@...>
Sent: 08 July 2020 06:51
To: Atish Patra <Atish.Patra@...>
Cc: zong.li@...; Anup Patel <Anup.Patel@...>; andrew@...; tech-unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
> > > > The SBI PMU event_idx is a XLEN bits wide number encoded as
> > > > follows:
> > > > event_idx[XLEN-1:16] = info
> > > > event_idx[15:12] = type
> > > > event_idx[11:0] = code
Is there a reason you are limiting the event to 16 bits? On current designs, the mhpmeventX field is already >16 bits wide. I don't see an easy way to support that with this approach directly. (Or maybe I'm missing something?)
[Anup] The event_idx defined here is for providing uniform platform independent view of various HARDWARE/SOFTWARE events
[Anup] Actual number of bits in mhpmenventX will be platform dependent. The M-mode runtime firmware (OpenSBI) will translate event_idx to platform specific value for mhpmeventX CSR.
Are your event listings intended to represent a layer of mapping between SBI event numbers and actual how-to-program-the-silicon numbers (which would get around the issue above)? They do not match up with Rocket events, for example. Given
the number of potential implementers, it is likely impossible to get agreement on a common base set of hardware-compatible events unless we act now, or yesteryear. :)
[Anup] Yes, event listings defined here are generic events. These SBI event numbers (i.e. event_idx) will be translated to platform specific value and then programmed into mhpmeventX CSR.
[Anup] It is not mandatory for a RISC-V platform to implement all generic events listed here. The M-mode runtime firmware (OpenSBI) will determine set of events (i.e. set of event_idx) supported for given counter from platform specific
code.
[Anup] We don’t need to put efforts on standardizing generic events for RISC-V world because Linux perf_event subsystem has already done that for us. The generic event listed here are aligned with generic events defined in Linux perf_event
user-space ABI defined in at <linux_source>/include/uapi/linux/perf_event.h. These generic Linux perf_events are common across architectures.
I would also recommend at least discussing adding an SBI call that allows one to do a wide write of many/all registers, otherwise writes become incredibly more expensive than reads. I can think of two use-cases that would be writing many
of the counters at least several hundred times a second, and maybe even more rapidly.
[Anup] The SBI calls defined here are only for configuring and starting a counter.
[Anup] Both HARDWARE and SOFTWARE counter will be directly accessible to S-mode software (Linux)
[Anup] The Linux perf_event only expects interface to read a counter. Although, we should certainly have a parameter in SBI_PMU_COUNTER_START call to specify initial value of counter before it is used by S-mode software.
|
|

Anup Patel
Hi Atish,
toggle quoted message
Show quoted text
-----Original Message----- From: Atish Patra <Atish.Patra@...> Sent: 08 July 2020 00:44 To: zong.li@...; Anup Patel <Anup.Patel@...> Cc: andrew@...; tech-unixplatformspec@...; gfavor@... Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Tue, 2020-07-07 at 11:05 +0800, Zong Li wrote:
On Tue, Jul 7, 2020 at 12:21 AM Anup Patel <anup.patel@...> wrote:
-----Original Message----- From: Zong Li <zong.li@...> Sent: 06 July 2020 13:59 To: Anup Patel <Anup.Patel@...> Cc: tech-unixplatformspec@...; Andrew Waterman <andrew@...>; Greg Favor <gfavor@...> Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Mon, Jul 6, 2020 at 12:35 AM Anup Patel <anup.patel@...> wrote:
Hi All,
We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER CSRs. A RISC-V implementation can support
monitoring
various HARDWARE events using limited number of HPMCOUNTER
CSRs.
In addition to HARDWARE performance counters, a SBI implementation (e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for events such as number of RFENCEs, number of IPIs, number of misaligned load/store instructions, number of illegal instructions, etc.
We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and SOFTWARE counters provided
by
SBI implementation.
To define SBI PMU extension, we first define counter_idx which is a logical number assigned to a counter and event_idx which is an encoded Is there more detail about counter_idx? I was wondering that 1. What is the ordering of logical numbers for HW and SW counters? I think that the logical numbers are assigned by OpenSBI. Like mentioned here, counter_idx is a logical index for all available counters (i.e. HARDWARE and SOFTWARE). The SBI implementation (i.e. OpenSBI, Xvisor RISC-V, or KVM RISC-V) can assign counter_idx to HARDWARE and SOFTWARE counters in any order it likes.
2. How to know the logical number of counter_idx of each HW and SW counters from s-mode? I guess that we need to know the logical numbers of all counters before we invoke a SBI call. The SBI_PMU_COUNTER_DESCRIBE call mentioned below will tell us whether given counter_idx maps to a HARDWARE counter or SOFTWARE counter based on CSR_Number info returned by SBI_PMU_COUNTER_DESCRIBE call. OK, I assume the logical number of counte_idx is sequential and started from zero here, so during initialization of s-mode software, we could get the total number 'N' of counters by SBI_PMU_NUM_COUNTERS
first, then loop the N times to identify capability of each counter. Does it align your ideas?
That's what my understanding as well. Assigning continous counter_idx may put a restriction on M-mode implementation. How about assigning some There is not restriction on M-mode runtime firmware in assigning counter_idx to various HARDWARE and SOFTWARE counters. In fact, counter_idx being logical index helps M-mode software to implement a registration mechanism. ranges for software vs hardware counters. May be split the hardware into different ranges as well based on event_idx.type. I had done that initially but it will only increase SBI calls because we will need separate SBI calls to determine number of HARDWARE and SOFTWARE counters. Also, this makes things difficult if a RISC-V implementation has non-standard implementation specific CSR as HARDWARE counter. This also allows supervisor to know what type of the counter it is looking at without parsing the data written by the describe call.
There is no real advantage of knowing type of counter from counter_idx over CSR_Number returned by SBI_PMU_COUNTER_DESCRIBE call because the SBI_PMU_COUNTER_DESCRIBE call will be called only at boot-time once for each counter and S-mode software can mark counters as HARDWARE/SOFTWARE at boot-time based on CSR_Number returned by SBI_PMU_COUNTER_DESCRIBE call.
number representing the HARDWARE/SOFTWARE event to be
monitored.
The SBI PMU event_idx is a XLEN bits wide number encoded as follows: event_idx[XLEN-1:16] = info event_idx[15:12] = type event_idx[11:0] = code
If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code can be one of the following values: enum sbi_pmu_hw_id { SBI_PMU_HW_CPU_CYCLES = 0, SBI_PMU_HW_INSTRUCTIONS = 1, SBI_PMU_HW_CACHE_REFERENCES = 2, SBI_PMU_HW_CACHE_MISSES = 3, SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4, SBI_PMU_HW_BRANCH_MISSES = 5, SBI_PMU_HW_BUS_CYCLES = 6, SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7, SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8, SBI_PMU_HW_REF_CPU_CYCLES = 9, SBI_PMU_HW_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code is encoded as follows: event_idx.code[11:3] = cache_id event_idx.code[2:1] = op_id event_idx.code[0:0] = result_id enum sbi_pmu_hw_cache_id { SBI_PMU_HW_CACHE_L1D = 0, SBI_PMU_HW_CACHE_L1I = 1, SBI_PMU_HW_CACHE_LL = 2, SBI_PMU_HW_CACHE_DTLB = 3, SBI_PMU_HW_CACHE_ITLB = 4, SBI_PMU_HW_CACHE_BPU = 5, SBI_PMU_HW_CACHE_NODE = 6, SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_id
{ SBI_PMU_HW_CACHE_OP_READ = 0, SBI_PMU_HW_CACHE_OP_WRITE = 1, SBI_PMU_HW_CACHE_OP_PREFETCH = 2, SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_result_id { SBI_PMU_HW_CACHE_RESULT_ACCESS = 0, SBI_PMU_HW_CACHE_RESULT_MISS = 1, SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW event, both event_idx.info and event_idx.code are platform dependent.
If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE event, event_idx.info is SBI implementation specific and event_idx.code can be one of the following: enum sbi_pmu_sw_id { SBI_PMU_SW_MISALIGNED_LOAD = 0, SBI_PMU_SW_MISALIGNED_STORE = 1, SBI_PMU_SW_ILLEGAL_INSN = 2, SBI_PMU_SW_LOCAL_SET_TIMER = 3, SBI_PMU_SW_LOCAL_IPI = 4, SBI_PMU_SW_LOCAL_FENCE_I = 5, SBI_PMU_SW_LOCAL_SFENCE_VMA = 6, SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7, SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8, SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9, SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10, SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11, SBI_PMU_SW_MAX, /* non-ABI */ };
In future, more events can be defined without breaking ABI compatibility of SBI calls.
Using definition of counter_idx and event_idx, we can potentially have the following SBI calls:
1. SBI_PMU_NUM_COUNTERS This call will return the number of COUNTERs Is it for the SW counters and we get the number of HW counters by DT? Or does it return the number of HW and SW counters both? If so, how to distinguish the number of HW and SW? This call returns total number of counters (i.e. HARDWARE and SOFTWARE both)
The other question is that the number of SW counters is defined by the core of OpenSBI or platform-dependent? Number of SW counters are defined by SBI implementation (i.e. OpenSBI, Xvisor RISC-V, and KVM RISC-V). Most likely SW counters will not include any platform-dependent SW counters although this is design choice of SBI implementation. OK, I got it. It would be enough, thanks.
2. SBI_PMU_COUNTER_DESCRIBE This call takes two parameters: 1) counter_idx 2) physical address It will write the description of SBI PMU counter at specified physical address. The details of the SBI PMU counter written at specified physical address are as follows: 1. Name (64 bytes) 2. CSR_Number (2 bytes) (CSR_Number <= 0xfff means counter is a RISC-V CSR) (CSR_Number > 0xfff means counter is a SBI implementation counter) (E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR) 3. CSR_Width (2 bytes) (Number of CSR bits implemented in HW) 4. Event_Count (2 bytes) (Number of events in Event_List array) 5. Event_List (2 * Event_Count bytes) (This is an array of 16bit values where each 16bit value is the supported event_idx.type and event_idx.code combination) What is the size we should allocate for this physical address? In my understanding, we need to allocate the pages in s-mode first, then pass the address of the pages to the second parameter, but we don't know the event_counter before we allocate the space for it, so it might across the boundary if event_count is very big. Theoretically, Event_Count cannot be more than 65535.
I think we should have SBI_PMU_NUM_EVENTS calls which will return number of events supported by given counter_idx. This will help S-mode software to determine amount of memory to allocate for SBI_PMU_COUNTER_DESCRIBE.
Sounds good to me.
3. SBI_PMU_COUNTER_SET_PHYS_ADDR This call takes two parameters: 1) counter_idx 2) physical address It will set the physical address of memory location where the SBI implementation will write the 64bit SOFTWARE counter. This SBI call is only for counters not mapped to any CSR (i.e. only for counters with CSR_Number > 0xfff). 4. SBI_PMU_COUNTER_START This call takes two parameters: 1) counter_idx 2) event_idx It will inform SBI implementation to configure and start/enable specified counter on the calling HART to monitor specific event. This SBI call will fail for counters which are not present and specified event_idx is not supported by the counter. 5. SBI_PMU_COUNTER_STOP This call takes one parameter: 1) counter_idx It will inform SBI implementation to stop/disable specified counters on the calling HART. This SBI call will fail for counters which are not present.
From above, the RISC-V PMU driver will use most of the SBI calls at boot time. Only SBI_PMU_COUNTER_START to be used once
before
using the counter.
The reading of counter is by reading CSR (for CSR_Number < 0xfff) OR by reading memory location (for CSR_Offset >= 0xfff). The counter overflow handling will have to be done in software by Linux kernel.
Using the SBI PMU extension, the M-mode runtime firmware (or Hypervisors) can provide a standardized view of HARDWARE/SOFTWARE counters and events to S-mode (or VS-
mode)
software.
The M-mode runtime firmware (OpenSBI) will need to know following platform dependent information: 1. Possible event_idx values allowed (or supported) by a HARDWARE counter (i.e. HPMCOUNTER) 2. Mapping of event_idx for HARDWARE event to HPMEVENT CSR
value
3. Mapping of event_idx for HARDWARE CACHE event to HPMEVENT
CSR
value 4.
Mapping of event_idx for HARDWARE RAW event to HPMEVENT CSR value 5.
Additional platform-specific progamming required by any event_idx
All platform dependent information mentioned above, can be obtained by M-mode runtime firmware (OpenSBI) from platform specific code. The DT/ACPI can also be used to described 1), 2), 3), and 4) mentioned above but 5) will always require platform specific code. I would update the next version of DT file to describe the points from 1) to 4). Thanks. As you mentioned before, it would be hard to sync the platform specific code with the DT of real use. I prefer to get 1), 2), 3) and 4) from DT first on each platform, and use platform specific code if DT is unavailable. (generic platform use DT certainly), then we could maximally reduce the inconsistency. It should platform's choice on how it wants to describe HARDWARE events and HARDWARE counters. The OpenSBI generic platform will tend to use DT based parsing of HARDWARE events and HARDWARE counters
but
other platform can do things differently.
The S-mode software (i.e. Linux) should not get HARDWARE events and HARDWARE counters from DT because DT describes HARDWARE and DT will
not include SOFTWARE events and SOFTWARE counters. Also, SOFTWARE events and SOFTWARE counters will change for given platform as OpenSBI continues to improve so it will be hard to keep the DT in sync.
The best thing for S-mode software would be to depend on one method of discovering all counters and supported events which is the SBI_PMU_COUNTER_DESCRIBE call. In other words, no need for platform driver for Linux RISC-V PMU driver instead depend only on sbi_probe_extension() to detect SBI PMU extension.
OK, make sense.
Regards, Anup
-- Regards, Atish
Regards, Anup
|
|
On Wed, 2020-07-08 at 03:04 +0000, Anup Patel wrote: Hi Atish,
-----Original Message----- From: Atish Patra <Atish.Patra@...> Sent: 08 July 2020 00:44 To: zong.li@...; Anup Patel <Anup.Patel@...> Cc: andrew@...; tech-unixplatformspec@...; gfavor@... Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Tue, 2020-07-07 at 11:05 +0800, Zong Li wrote:
On Tue, Jul 7, 2020 at 12:21 AM Anup Patel <anup.patel@...> wrote:
-----Original Message----- From: Zong Li <zong.li@...> Sent: 06 July 2020 13:59 To: Anup Patel <Anup.Patel@...> Cc: tech-unixplatformspec@...; Andrew Waterman <andrew@...>; Greg Favor <gfavor@...> Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Mon, Jul 6, 2020 at 12:35 AM Anup Patel < anup.patel@...> wrote:
Hi All,
We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER CSRs. A RISC-V implementation can support
monitoring
various HARDWARE events using limited number of HPMCOUNTER
CSRs.
In addition to HARDWARE performance counters, a SBI implementation (e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for events such as number of RFENCEs, number of IPIs, number of misaligned load/store instructions, number of illegal instructions, etc.
We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and SOFTWARE counters provided
by
SBI implementation.
To define SBI PMU extension, we first define counter_idx which is a logical number assigned to a counter and event_idx which is an encoded Is there more detail about counter_idx? I was wondering that 1. What is the ordering of logical numbers for HW and SW counters? I think that the logical numbers are assigned by OpenSBI. Like mentioned here, counter_idx is a logical index for all available counters (i.e. HARDWARE and SOFTWARE). The SBI implementation (i.e. OpenSBI, Xvisor RISC-V, or KVM RISC-V) can assign counter_idx to HARDWARE and SOFTWARE counters in any order it likes.
2. How to know the logical number of counter_idx of each HW and SW counters from s-mode? I guess that we need to know the logical numbers of all counters before we invoke a SBI call. The SBI_PMU_COUNTER_DESCRIBE call mentioned below will tell us whether given counter_idx maps to a HARDWARE counter or SOFTWARE counter based on CSR_Number info returned by SBI_PMU_COUNTER_DESCRIBE call. OK, I assume the logical number of counte_idx is sequential and started from zero here, so during initialization of s-mode software, we could get the total number 'N' of counters by SBI_PMU_NUM_COUNTERS
first, then loop the N times to identify capability of each counter. Does it align your ideas?
That's what my understanding as well. Assigning continous counter_idx may put a restriction on M-mode implementation. How about assigning some There is not restriction on M-mode runtime firmware in assigning counter_idx to various HARDWARE and SOFTWARE counters. In fact, counter_idx being logical index helps M-mode software to implement a registration mechanism.
ranges for software vs hardware counters. May be split the hardware into different ranges as well based on event_idx.type. I had done that initially but it will only increase SBI calls because we will need separate SBI calls to determine number of HARDWARE and SOFTWARE counters.
I was suggesting to have fixed ranges for both event types. Also, this makes things difficult if a RISC-V implementation has non- standard implementation specific CSR as HARDWARE counter.
But I agree that it gets tricky with non-standard implementation specific counters. This also allows supervisor to know what type of the counter it is looking at without parsing the data written by the describe call. There is no real advantage of knowing type of counter from counter_idx over CSR_Number returned by SBI_PMU_COUNTER_DESCRIBE call because the SBI_PMU_COUNTER_DESCRIBE call will be called only at boot-time once for each counter and S-mode software can mark counters as HARDWARE/SOFTWARE at boot-time based on CSR_Number returned by SBI_PMU_COUNTER_DESCRIBE call.
My concern is that it may increase the booting time. For example, my current x86 desktop has 1679 counters. If a RISC-V desktop has those many counters (hopefully one day!! :)), there will be ~2k SBI calls and memory reads just to get perf working. I guess there will be even more counters in servers. Moreover, supervisor OS may choose to configure only few basic perf counter at boot time and defer configuring everything later depending on the usecase. Having a continous logical counter_idx may prevent those kind of optimizations. Correct ? number representing the HARDWARE/SOFTWARE event to be
monitored.
The SBI PMU event_idx is a XLEN bits wide number encoded as follows: event_idx[XLEN-1:16] = info event_idx[15:12] = type event_idx[11:0] = code
If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code can be one of the following values: enum sbi_pmu_hw_id { SBI_PMU_HW_CPU_CYCLES = 0, SBI_PMU_HW_INSTRUCTIONS = 1, SBI_PMU_HW_CACHE_REFERENCES = 2, SBI_PMU_HW_CACHE_MISSES = 3, SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4, SBI_PMU_HW_BRANCH_MISSES = 5, SBI_PMU_HW_BUS_CYCLES = 6, SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7, SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8, SBI_PMU_HW_REF_CPU_CYCLES = 9, SBI_PMU_HW_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code is encoded as follows: event_idx.code[11:3] = cache_id event_idx.code[2:1] = op_id event_idx.code[0:0] = result_id enum sbi_pmu_hw_cache_id { SBI_PMU_HW_CACHE_L1D = 0, SBI_PMU_HW_CACHE_L1I = 1, SBI_PMU_HW_CACHE_LL = 2, SBI_PMU_HW_CACHE_DTLB = 3, SBI_PMU_HW_CACHE_ITLB = 4, SBI_PMU_HW_CACHE_BPU = 5, SBI_PMU_HW_CACHE_NODE = 6, SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_id
{ SBI_PMU_HW_CACHE_OP_READ = 0, SBI_PMU_HW_CACHE_OP_WRITE = 1, SBI_PMU_HW_CACHE_OP_PREFETCH = 2, SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_result_id { SBI_PMU_HW_CACHE_RESULT_ACCESS = 0, SBI_PMU_HW_CACHE_RESULT_MISS = 1, SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW event, both event_idx.info and event_idx.code are platform dependent.
If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE event, event_idx.info is SBI implementation specific and event_idx.code can be one of the following: enum sbi_pmu_sw_id { SBI_PMU_SW_MISALIGNED_LOAD = 0, SBI_PMU_SW_MISALIGNED_STORE = 1, SBI_PMU_SW_ILLEGAL_INSN = 2, SBI_PMU_SW_LOCAL_SET_TIMER = 3, SBI_PMU_SW_LOCAL_IPI = 4, SBI_PMU_SW_LOCAL_FENCE_I = 5, SBI_PMU_SW_LOCAL_SFENCE_VMA = 6, SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7, SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8, SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9, SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10, SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11, SBI_PMU_SW_MAX, /* non-ABI */ };
In future, more events can be defined without breaking ABI compatibility of SBI calls.
Using definition of counter_idx and event_idx, we can potentially have the following SBI calls:
1. SBI_PMU_NUM_COUNTERS This call will return the number of COUNTERs Is it for the SW counters and we get the number of HW counters by DT? Or does it return the number of HW and SW counters both? If so, how to distinguish the number of HW and SW? This call returns total number of counters (i.e. HARDWARE and SOFTWARE both)
The other question is that the number of SW counters is defined by the core of OpenSBI or platform-dependent? Number of SW counters are defined by SBI implementation (i.e. OpenSBI, Xvisor RISC-V, and KVM RISC-V). Most likely SW counters will not include any platform-dependent SW counters although this is design choice of SBI implementation. OK, I got it. It would be enough, thanks.
2. SBI_PMU_COUNTER_DESCRIBE This call takes two parameters: 1) counter_idx 2) physical address It will write the description of SBI PMU counter at specified physical address. The details of the SBI PMU counter written at specified physical address are as follows: 1. Name (64 bytes) 2. CSR_Number (2 bytes) (CSR_Number <= 0xfff means counter is a RISC-V CSR) (CSR_Number > 0xfff means counter is a SBI implementation counter) (E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR) 3. CSR_Width (2 bytes) (Number of CSR bits implemented in HW) 4. Event_Count (2 bytes) (Number of events in Event_List array) 5. Event_List (2 * Event_Count bytes) (This is an array of 16bit values where each 16bit value is the supported event_idx.type and event_idx.code combination) What is the size we should allocate for this physical address? In my understanding, we need to allocate the pages in s-mode first, then pass the address of the pages to the second parameter, but we don't know the event_counter before we allocate the space for it, so it might across the boundary if event_count is very big. Theoretically, Event_Count cannot be more than 65535.
I think we should have SBI_PMU_NUM_EVENTS calls which will return number of events supported by given counter_idx. This will help S-mode software to determine amount of memory to allocate for SBI_PMU_COUNTER_DESCRIBE.
Sounds good to me.
3. SBI_PMU_COUNTER_SET_PHYS_ADDR This call takes two parameters: 1) counter_idx 2) physical address It will set the physical address of memory location where the SBI implementation will write the 64bit SOFTWARE counter. This SBI call is only for counters not mapped to any CSR (i.e. only for counters with CSR_Number > 0xfff). 4. SBI_PMU_COUNTER_START This call takes two parameters: 1) counter_idx 2) event_idx It will inform SBI implementation to configure and start/enable specified counter on the calling HART to monitor specific event. This SBI call will fail for counters which are not present and specified event_idx is not supported by the counter. 5. SBI_PMU_COUNTER_STOP This call takes one parameter: 1) counter_idx It will inform SBI implementation to stop/disable specified counters on the calling HART. This SBI call will fail for counters which are not present.
From above, the RISC-V PMU driver will use most of the SBI calls at boot time. Only SBI_PMU_COUNTER_START to be used once
before
using the counter.
The reading of counter is by reading CSR (for CSR_Number < 0xfff) OR by reading memory location (for CSR_Offset >= 0xfff). The counter overflow handling will have to be done in software by Linux kernel.
Using the SBI PMU extension, the M-mode runtime firmware (or Hypervisors) can provide a standardized view of HARDWARE/SOFTWARE counters and events to S-mode (or VS-
mode)
software.
The M-mode runtime firmware (OpenSBI) will need to know following platform dependent information: 1. Possible event_idx values allowed (or supported) by a HARDWARE counter (i.e. HPMCOUNTER) 2. Mapping of event_idx for HARDWARE event to HPMEVENT CSR
value
3. Mapping of event_idx for HARDWARE CACHE event to HPMEVENT
CSR
value 4.
Mapping of event_idx for HARDWARE RAW event to HPMEVENT CSR value 5.
Additional platform-specific progamming required by any event_idx
All platform dependent information mentioned above, can be obtained by M-mode runtime firmware (OpenSBI) from platform specific code. The DT/ACPI can also be used to described 1), 2), 3), and 4) mentioned above but 5) will always require platform specific code. I would update the next version of DT file to describe the points from 1) to 4). Thanks. As you mentioned before, it would be hard to sync the platform specific code with the DT of real use. I prefer to get 1), 2), 3) and 4) from DT first on each platform, and use platform specific code if DT is unavailable. (generic platform use DT certainly), then we could maximally reduce the inconsistency. It should platform's choice on how it wants to describe HARDWARE events and HARDWARE counters. The OpenSBI generic platform will tend to use DT based parsing of HARDWARE events and HARDWARE counters
but
other platform can do things differently.
The S-mode software (i.e. Linux) should not get HARDWARE events and HARDWARE counters from DT because DT describes HARDWARE and DT will
not include SOFTWARE events and SOFTWARE counters. Also, SOFTWARE events and SOFTWARE counters will change for given platform as OpenSBI continues to improve so it will be hard to keep the DT in sync.
The best thing for S-mode software would be to depend on one method of discovering all counters and supported events which is the SBI_PMU_COUNTER_DESCRIBE call. In other words, no need for platform driver for Linux RISC-V PMU driver instead depend only on sbi_probe_extension() to detect SBI PMU extension.
OK, make sense.
Regards, Anup
-- Regards, Atish Regards, Anup
-- Regards, Atish
|
|
On Wed, Jul 8, 2020 at 2:17 PM Atish Patra <atish.patra@...> wrote: On Wed, 2020-07-08 at 03:04 +0000, Anup Patel wrote:
Hi Atish,
-----Original Message----- From: Atish Patra <Atish.Patra@...> Sent: 08 July 2020 00:44 To: zong.li@...; Anup Patel <Anup.Patel@...> Cc: andrew@...; tech-unixplatformspec@...; gfavor@... Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Tue, 2020-07-07 at 11:05 +0800, Zong Li wrote:
On Tue, Jul 7, 2020 at 12:21 AM Anup Patel <anup.patel@...> wrote:
-----Original Message----- From: Zong Li <zong.li@...> Sent: 06 July 2020 13:59 To: Anup Patel <Anup.Patel@...> Cc: tech-unixplatformspec@...; Andrew Waterman <andrew@...>; Greg Favor <gfavor@...> Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Mon, Jul 6, 2020 at 12:35 AM Anup Patel < anup.patel@...> wrote:
Hi All,
We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER CSRs. A RISC-V implementation can support
monitoring
various HARDWARE events using limited number of HPMCOUNTER
CSRs.
In addition to HARDWARE performance counters, a SBI implementation (e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for events such as number of RFENCEs, number of IPIs, number of misaligned load/store instructions, number of illegal instructions, etc.
We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and SOFTWARE counters provided
by
SBI implementation.
To define SBI PMU extension, we first define counter_idx which is a logical number assigned to a counter and event_idx which is an encoded Is there more detail about counter_idx? I was wondering that 1. What is the ordering of logical numbers for HW and SW counters? I think that the logical numbers are assigned by OpenSBI. Like mentioned here, counter_idx is a logical index for all available counters (i.e. HARDWARE and SOFTWARE). The SBI implementation (i.e. OpenSBI, Xvisor RISC-V, or KVM RISC-V) can assign counter_idx to HARDWARE and SOFTWARE counters in any order it likes.
2. How to know the logical number of counter_idx of each HW and SW counters from s-mode? I guess that we need to know the logical numbers of all counters before we invoke a SBI call. The SBI_PMU_COUNTER_DESCRIBE call mentioned below will tell us whether given counter_idx maps to a HARDWARE counter or SOFTWARE counter based on CSR_Number info returned by SBI_PMU_COUNTER_DESCRIBE call. OK, I assume the logical number of counte_idx is sequential and started from zero here, so during initialization of s-mode software, we could get the total number 'N' of counters by SBI_PMU_NUM_COUNTERS
first, then loop the N times to identify capability of each counter. Does it align your ideas?
That's what my understanding as well. Assigning continous counter_idx may put a restriction on M-mode implementation. How about assigning some There is not restriction on M-mode runtime firmware in assigning counter_idx to various HARDWARE and SOFTWARE counters. In fact, counter_idx being logical index helps M-mode software to implement a registration mechanism.
ranges for software vs hardware counters. May be split the hardware into different ranges as well based on event_idx.type. I had done that initially but it will only increase SBI calls because we will need separate SBI calls to determine number of HARDWARE and SOFTWARE counters.
I was suggesting to have fixed ranges for both event types.
Also, this makes things difficult if a RISC-V implementation has non- standard implementation specific CSR as HARDWARE counter.
But I agree that it gets tricky with non-standard implementation specific counters.
This also allows supervisor to know what type of the counter it is looking at without parsing the data written by the describe call. There is no real advantage of knowing type of counter from counter_idx over CSR_Number returned by SBI_PMU_COUNTER_DESCRIBE call because the SBI_PMU_COUNTER_DESCRIBE call will be called only at boot-time once for each counter and S-mode software can mark counters as HARDWARE/SOFTWARE at boot-time based on CSR_Number returned by SBI_PMU_COUNTER_DESCRIBE call.
My concern is that it may increase the booting time. For example, my current x86 desktop has 1679 counters. If a RISC-V desktop has those many counters (hopefully one day!! :)), there will be ~2k SBI calls and memory reads just to get perf working. I guess there will be even more counters in servers.
Moreover, supervisor OS may choose to configure only few basic perf counter at boot time and defer configuring everything later depending on the usecase. Having a continous logical counter_idx may prevent those kind of optimizations. Correct ?
Based on the optimization as you mentioned, it is good to me if we have SBI call to get the number of HW and SW counters respectively. If s-mode OS can know the separating numbers, then s-mode OS can lazy assign and query counters no matter if the counter_idx is continuous or not. If counter_idx is started for HW counters, the start countex_idx of the SW counter is the number of HW counters. I would suggest that SBI_PMU_NUM_COUNTER can take a parameter to return the total number of all counters, the number of SW counters only and the number of HW counters only.
number representing the HARDWARE/SOFTWARE event to be
monitored.
The SBI PMU event_idx is a XLEN bits wide number encoded as follows: event_idx[XLEN-1:16] = info event_idx[15:12] = type event_idx[11:0] = code
If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code can be one of the following values: enum sbi_pmu_hw_id { SBI_PMU_HW_CPU_CYCLES = 0, SBI_PMU_HW_INSTRUCTIONS = 1, SBI_PMU_HW_CACHE_REFERENCES = 2, SBI_PMU_HW_CACHE_MISSES = 3, SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4, SBI_PMU_HW_BRANCH_MISSES = 5, SBI_PMU_HW_BUS_CYCLES = 6, SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7, SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8, SBI_PMU_HW_REF_CPU_CYCLES = 9, SBI_PMU_HW_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code is encoded as follows: event_idx.code[11:3] = cache_id event_idx.code[2:1] = op_id event_idx.code[0:0] = result_id enum sbi_pmu_hw_cache_id { SBI_PMU_HW_CACHE_L1D = 0, SBI_PMU_HW_CACHE_L1I = 1, SBI_PMU_HW_CACHE_LL = 2, SBI_PMU_HW_CACHE_DTLB = 3, SBI_PMU_HW_CACHE_ITLB = 4, SBI_PMU_HW_CACHE_BPU = 5, SBI_PMU_HW_CACHE_NODE = 6, SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_id
{ SBI_PMU_HW_CACHE_OP_READ = 0, SBI_PMU_HW_CACHE_OP_WRITE = 1, SBI_PMU_HW_CACHE_OP_PREFETCH = 2, SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_result_id { SBI_PMU_HW_CACHE_RESULT_ACCESS = 0, SBI_PMU_HW_CACHE_RESULT_MISS = 1, SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW event, both event_idx.info and event_idx.code are platform dependent.
If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE event, event_idx.info is SBI implementation specific and event_idx.code can be one of the following: enum sbi_pmu_sw_id { SBI_PMU_SW_MISALIGNED_LOAD = 0, SBI_PMU_SW_MISALIGNED_STORE = 1, SBI_PMU_SW_ILLEGAL_INSN = 2, SBI_PMU_SW_LOCAL_SET_TIMER = 3, SBI_PMU_SW_LOCAL_IPI = 4, SBI_PMU_SW_LOCAL_FENCE_I = 5, SBI_PMU_SW_LOCAL_SFENCE_VMA = 6, SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7, SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8, SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9, SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10, SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11, SBI_PMU_SW_MAX, /* non-ABI */ };
In future, more events can be defined without breaking ABI compatibility of SBI calls.
Using definition of counter_idx and event_idx, we can potentially have the following SBI calls:
1. SBI_PMU_NUM_COUNTERS This call will return the number of COUNTERs Is it for the SW counters and we get the number of HW counters by DT? Or does it return the number of HW and SW counters both? If so, how to distinguish the number of HW and SW? This call returns total number of counters (i.e. HARDWARE and SOFTWARE both)
The other question is that the number of SW counters is defined by the core of OpenSBI or platform-dependent? Number of SW counters are defined by SBI implementation (i.e. OpenSBI, Xvisor RISC-V, and KVM RISC-V). Most likely SW counters will not include any platform-dependent SW counters although this is design choice of SBI implementation. OK, I got it. It would be enough, thanks.
2. SBI_PMU_COUNTER_DESCRIBE This call takes two parameters: 1) counter_idx 2) physical address It will write the description of SBI PMU counter at specified physical address. The details of the SBI PMU counter written at specified physical address are as follows: 1. Name (64 bytes) 2. CSR_Number (2 bytes) (CSR_Number <= 0xfff means counter is a RISC-V CSR) (CSR_Number > 0xfff means counter is a SBI implementation counter) (E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR) 3. CSR_Width (2 bytes) (Number of CSR bits implemented in HW) 4. Event_Count (2 bytes) (Number of events in Event_List array) 5. Event_List (2 * Event_Count bytes) (This is an array of 16bit values where each 16bit value is the supported event_idx.type and event_idx.code combination) What is the size we should allocate for this physical address? In my understanding, we need to allocate the pages in s-mode first, then pass the address of the pages to the second parameter, but we don't know the event_counter before we allocate the space for it, so it might across the boundary if event_count is very big. Theoretically, Event_Count cannot be more than 65535.
I think we should have SBI_PMU_NUM_EVENTS calls which will return number of events supported by given counter_idx. This will help S-mode software to determine amount of memory to allocate for SBI_PMU_COUNTER_DESCRIBE.
Sounds good to me.
3. SBI_PMU_COUNTER_SET_PHYS_ADDR This call takes two parameters: 1) counter_idx 2) physical address It will set the physical address of memory location where the SBI implementation will write the 64bit SOFTWARE counter. This SBI call is only for counters not mapped to any CSR (i.e. only for counters with CSR_Number > 0xfff). 4. SBI_PMU_COUNTER_START This call takes two parameters: 1) counter_idx 2) event_idx It will inform SBI implementation to configure and start/enable specified counter on the calling HART to monitor specific event. This SBI call will fail for counters which are not present and specified event_idx is not supported by the counter. 5. SBI_PMU_COUNTER_STOP This call takes one parameter: 1) counter_idx It will inform SBI implementation to stop/disable specified counters on the calling HART. This SBI call will fail for counters which are not present.
From above, the RISC-V PMU driver will use most of the SBI calls at boot time. Only SBI_PMU_COUNTER_START to be used once
before
using the counter.
The reading of counter is by reading CSR (for CSR_Number < 0xfff) OR by reading memory location (for CSR_Offset >= 0xfff). The counter overflow handling will have to be done in software by Linux kernel.
Using the SBI PMU extension, the M-mode runtime firmware (or Hypervisors) can provide a standardized view of HARDWARE/SOFTWARE counters and events to S-mode (or VS-
mode)
software.
The M-mode runtime firmware (OpenSBI) will need to know following platform dependent information: 1. Possible event_idx values allowed (or supported) by a HARDWARE counter (i.e. HPMCOUNTER) 2. Mapping of event_idx for HARDWARE event to HPMEVENT CSR
value
3. Mapping of event_idx for HARDWARE CACHE event to HPMEVENT
CSR
value 4.
Mapping of event_idx for HARDWARE RAW event to HPMEVENT CSR value 5.
Additional platform-specific progamming required by any event_idx
All platform dependent information mentioned above, can be obtained by M-mode runtime firmware (OpenSBI) from platform specific code. The DT/ACPI can also be used to described 1), 2), 3), and 4) mentioned above but 5) will always require platform specific code. I would update the next version of DT file to describe the points from 1) to 4). Thanks. As you mentioned before, it would be hard to sync the platform specific code with the DT of real use. I prefer to get 1), 2), 3) and 4) from DT first on each platform, and use platform specific code if DT is unavailable. (generic platform use DT certainly), then we could maximally reduce the inconsistency. It should platform's choice on how it wants to describe HARDWARE events and HARDWARE counters. The OpenSBI generic platform will tend to use DT based parsing of HARDWARE events and HARDWARE counters
but
other platform can do things differently.
The S-mode software (i.e. Linux) should not get HARDWARE events and HARDWARE counters from DT because DT describes HARDWARE and DT will
not include SOFTWARE events and SOFTWARE counters. Also, SOFTWARE events and SOFTWARE counters will change for given platform as OpenSBI continues to improve so it will be hard to keep the DT in sync.
The best thing for S-mode software would be to depend on one method of discovering all counters and supported events which is the SBI_PMU_COUNTER_DESCRIBE call. In other words, no need for platform driver for Linux RISC-V PMU driver instead depend only on sbi_probe_extension() to detect SBI PMU extension.
OK, make sense.
Regards, Anup
-- Regards, Atish Regards, Anup -- Regards, Atish
|
|

Anup Patel
toggle quoted message
Show quoted text
-----Original Message----- From: Atish Patra <Atish.Patra@...> Sent: 08 July 2020 11:47 To: Anup Patel <Anup.Patel@...>; zong.li@... Cc: andrew@...; tech-unixplatformspec@...; gfavor@... Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Wed, 2020-07-08 at 03:04 +0000, Anup Patel wrote:
Hi Atish,
-----Original Message----- From: Atish Patra <Atish.Patra@...> Sent: 08 July 2020 00:44 To: zong.li@...; Anup Patel <Anup.Patel@...> Cc: andrew@...; tech-unixplatformspec@...; gfavor@... Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Tue, 2020-07-07 at 11:05 +0800, Zong Li wrote:
On Tue, Jul 7, 2020 at 12:21 AM Anup Patel <anup.patel@...> wrote:
-----Original Message----- From: Zong Li <zong.li@...> Sent: 06 July 2020 13:59 To: Anup Patel <Anup.Patel@...> Cc: tech-unixplatformspec@...; Andrew Waterman <andrew@...>; Greg Favor <gfavor@...> Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Mon, Jul 6, 2020 at 12:35 AM Anup Patel < anup.patel@...> wrote:
Hi All,
We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER CSRs. A RISC-V implementation can support
monitoring
various HARDWARE events using limited number of
HPMCOUNTER
CSRs.
In addition to HARDWARE performance counters, a SBI implementation (e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for events such as number of RFENCEs, number of IPIs, number of misaligned load/store instructions, number of illegal instructions, etc.
We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and SOFTWARE counters
provided
by
SBI implementation.
To define SBI PMU extension, we first define counter_idx which is a logical number assigned to a counter and event_idx which is an encoded Is there more detail about counter_idx? I was wondering that 1. What is the ordering of logical numbers for HW and SW counters? I think that the logical numbers are assigned by OpenSBI. Like mentioned here, counter_idx is a logical index for all available counters (i.e. HARDWARE and SOFTWARE). The SBI implementation (i.e. OpenSBI, Xvisor RISC-V, or KVM RISC-V) can assign counter_idx to HARDWARE and SOFTWARE counters in any order it likes.
2. How to know the logical number of counter_idx of each HW and SW counters from s-mode? I guess that we need to know the logical numbers of all counters before we invoke a SBI call. The SBI_PMU_COUNTER_DESCRIBE call mentioned below will tell us whether given counter_idx maps to a HARDWARE counter or
SOFTWARE
counter based on CSR_Number info returned by SBI_PMU_COUNTER_DESCRIBE call. OK, I assume the logical number of counte_idx is sequential and started from zero here, so during initialization of s-mode software, we could get the total number 'N' of counters by SBI_PMU_NUM_COUNTERS
first, then loop the N times to identify capability of each counter. Does it align your ideas?
That's what my understanding as well. Assigning continous counter_idx may put a restriction on M-mode implementation. How about assigning some There is not restriction on M-mode runtime firmware in assigning counter_idx to various HARDWARE and SOFTWARE counters. In fact, counter_idx being logical index helps M-mode software to implement a registration mechanism.
ranges for software vs hardware counters. May be split the hardware into different ranges as well based on event_idx.type. I had done that initially but it will only increase SBI calls because we will need separate SBI calls to determine number of HARDWARE and SOFTWARE counters.
I was suggesting to have fixed ranges for both event types.
Also, this makes things difficult if a RISC-V implementation has non- standard implementation specific CSR as HARDWARE counter.
But I agree that it gets tricky with non-standard implementation specific counters.
This also allows supervisor to know what type of the counter it is looking at without parsing the data written by the describe call. There is no real advantage of knowing type of counter from counter_idx over CSR_Number returned by SBI_PMU_COUNTER_DESCRIBE call because
the SBI_PMU_COUNTER_DESCRIBE call will be called only at boot-time once for each counter and S-mode software can mark counters as HARDWARE/SOFTWARE at boot-time based on CSR_Number returned by SBI_PMU_COUNTER_DESCRIBE call.
My concern is that it may increase the booting time. For example, my current x86 desktop has 1679 counters. If a RISC-V desktop has those many counters (hopefully one day!! :)), there will be ~2k SBI calls and memory reads just to get perf working. I guess there will be even more counters in servers. Please look again. Not all of these counters are CPU counters. In real world systems, we have lots of counters in interconnect, IOMMU, and other peripherals. In fact, MMIO peripherals can also provide perf counters. We have only 29 HARDWARE counter CSRs (i.e. HPMCOUNTER CSRs) so number of HARDWARE counters is unlikely to go beyond 29. Also, it is fairly unlikely that a RISC-V system will have implementation specific CSR as additional HARDWARE counters. Similarly, a SBI implementation will only provide few SOFTWARE counters but number of SW events will be quite big. Worst case, I think a RISC-V system will have 60+ counters (both HARDWARE and SOFTWARE) but number of HARDWARE and SOFTWARE events will be much more. In future, number of counters might not change but number of HARDWARE and SOFTWARE events will definitely grow. I don't see how doing one SBI_PMU_COUNTER_DESCRIBE call for each counter will increase boot-time. Even if we have reserved ranges of counter_idx for HARDWARE and SOFTWARE counters, we still cannot avoid doing SBI_PMU_COUNTER_DESCRIBE call for each counter because each counter will only support a defined set of event_idx. For example, INSTRET CSR only supports counting instructions while CYCLE CSR only supports counting machine cycles. Moreover, supervisor OS may choose to configure only few basic perf counter at boot time and defer configuring everything later depending on the usecase. Having a continous logical counter_idx may prevent those kind of optimizations. Correct ?
I don't see how continuous logical counter_idx prevents optimizations. Irrespective to counter_idx numbering scheme, we cannot avoid SBI_PMU_COUNTER_DESCRIBE once per-counter to know the events supported by the specified counter.
number representing the HARDWARE/SOFTWARE event to be
monitored.
The SBI PMU event_idx is a XLEN bits wide number encoded as follows: event_idx[XLEN-1:16] = info event_idx[15:12] = type event_idx[11:0] = code
If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code can be one of the following values: enum sbi_pmu_hw_id { SBI_PMU_HW_CPU_CYCLES = 0, SBI_PMU_HW_INSTRUCTIONS = 1, SBI_PMU_HW_CACHE_REFERENCES = 2, SBI_PMU_HW_CACHE_MISSES = 3, SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4, SBI_PMU_HW_BRANCH_MISSES = 5, SBI_PMU_HW_BUS_CYCLES = 6, SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7, SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8, SBI_PMU_HW_REF_CPU_CYCLES = 9, SBI_PMU_HW_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code is encoded as follows: event_idx.code[11:3] = cache_id event_idx.code[2:1] = op_id event_idx.code[0:0] = result_id enum sbi_pmu_hw_cache_id { SBI_PMU_HW_CACHE_L1D = 0, SBI_PMU_HW_CACHE_L1I = 1, SBI_PMU_HW_CACHE_LL = 2, SBI_PMU_HW_CACHE_DTLB = 3, SBI_PMU_HW_CACHE_ITLB = 4, SBI_PMU_HW_CACHE_BPU = 5, SBI_PMU_HW_CACHE_NODE = 6, SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_id
{ SBI_PMU_HW_CACHE_OP_READ = 0, SBI_PMU_HW_CACHE_OP_WRITE = 1, SBI_PMU_HW_CACHE_OP_PREFETCH = 2, SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_result_id { SBI_PMU_HW_CACHE_RESULT_ACCESS = 0, SBI_PMU_HW_CACHE_RESULT_MISS = 1, SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW event, both event_idx.info and event_idx.code are platform dependent.
If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE event, event_idx.info is SBI implementation specific and event_idx.code can be one of the following: enum sbi_pmu_sw_id { SBI_PMU_SW_MISALIGNED_LOAD = 0, SBI_PMU_SW_MISALIGNED_STORE = 1, SBI_PMU_SW_ILLEGAL_INSN = 2, SBI_PMU_SW_LOCAL_SET_TIMER = 3, SBI_PMU_SW_LOCAL_IPI = 4, SBI_PMU_SW_LOCAL_FENCE_I = 5, SBI_PMU_SW_LOCAL_SFENCE_VMA = 6, SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7, SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8, SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9, SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10, SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11, SBI_PMU_SW_MAX, /* non-ABI */ };
In future, more events can be defined without breaking ABI compatibility of SBI calls.
Using definition of counter_idx and event_idx, we can potentially have the following SBI calls:
1. SBI_PMU_NUM_COUNTERS This call will return the number of COUNTERs Is it for the SW counters and we get the number of HW counters by DT? Or does it return the number of HW and SW counters both? If so, how to distinguish the number of HW and SW? This call returns total number of counters (i.e. HARDWARE and SOFTWARE both)
The other question is that the number of SW counters is defined by the core of OpenSBI or platform-dependent? Number of SW counters are defined by SBI implementation (i.e. OpenSBI, Xvisor RISC-V, and KVM RISC-V). Most likely SW counters will not include any platform-dependent SW counters although this is design choice of SBI implementation. OK, I got it. It would be enough, thanks.
2. SBI_PMU_COUNTER_DESCRIBE This call takes two parameters: 1) counter_idx 2) physical address It will write the description of SBI PMU counter at specified physical address. The details of the SBI PMU counter written at specified physical address are as follows: 1. Name (64 bytes) 2. CSR_Number (2 bytes) (CSR_Number <= 0xfff means counter is a RISC-V CSR) (CSR_Number > 0xfff means counter is a SBI implementation counter) (E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR) 3. CSR_Width (2 bytes) (Number of CSR bits implemented in HW) 4. Event_Count (2 bytes) (Number of events in Event_List array) 5. Event_List (2 * Event_Count bytes) (This is an array of 16bit values where each 16bit value is the supported event_idx.type and event_idx.code combination) What is the size we should allocate for this physical address? In my understanding, we need to allocate the pages in s-mode first, then pass the address of the pages to the second parameter, but we don't know the event_counter before we allocate the space for it, so it might across the boundary if event_count is very big. Theoretically, Event_Count cannot be more than 65535.
I think we should have SBI_PMU_NUM_EVENTS calls which will return number of events supported by given counter_idx. This will help S-mode software to determine amount of memory to allocate for SBI_PMU_COUNTER_DESCRIBE.
Sounds good to me.
3. SBI_PMU_COUNTER_SET_PHYS_ADDR This call takes two parameters: 1) counter_idx 2) physical address It will set the physical address of memory location where the SBI implementation will write the 64bit SOFTWARE counter. This SBI call is only for counters not mapped to any CSR (i.e. only for counters with CSR_Number > 0xfff). 4. SBI_PMU_COUNTER_START This call takes two parameters: 1) counter_idx 2) event_idx It will inform SBI implementation to configure and start/enable specified counter on the calling HART to monitor specific event. This SBI call will fail for counters which are not present and specified event_idx is not supported by the counter. 5. SBI_PMU_COUNTER_STOP This call takes one parameter: 1) counter_idx It will inform SBI implementation to stop/disable specified counters on the calling HART. This SBI call will fail for counters which are not present.
From above, the RISC-V PMU driver will use most of the SBI calls at boot time. Only SBI_PMU_COUNTER_START to be used once
before
using the counter.
The reading of counter is by reading CSR (for CSR_Number < 0xfff) OR by reading memory location (for CSR_Offset >= 0xfff). The counter overflow handling will have to be done in software by Linux kernel.
Using the SBI PMU extension, the M-mode runtime firmware (or Hypervisors) can provide a standardized view of HARDWARE/SOFTWARE counters and events to S-mode (or VS-
mode)
software.
The M-mode runtime firmware (OpenSBI) will need to know following platform dependent information: 1. Possible event_idx values allowed (or supported) by a HARDWARE counter (i.e. HPMCOUNTER) 2. Mapping of event_idx for HARDWARE event to HPMEVENT CSR
value
3. Mapping of event_idx for HARDWARE CACHE event to HPMEVENT
CSR
value 4.
Mapping of event_idx for HARDWARE RAW event to HPMEVENT
CSR
value 5.
Additional platform-specific progamming required by any event_idx
All platform dependent information mentioned above, can be obtained by M-mode runtime firmware (OpenSBI) from platform specific code. The DT/ACPI can also be used to described 1), 2), 3), and 4) mentioned above but 5) will always require platform specific code. I would update the next version of DT file to describe the points from 1) to 4). Thanks. As you mentioned before, it would be hard to sync the platform specific code with the DT of real use. I prefer to get 1), 2), 3) and 4) from DT first on each platform, and use platform specific code if DT is unavailable. (generic platform use DT certainly), then we could maximally reduce the inconsistency. It should platform's choice on how it wants to describe HARDWARE events and HARDWARE counters. The OpenSBI generic platform will tend to use DT based parsing of HARDWARE events and HARDWARE counters
but
other platform can do things differently.
The S-mode software (i.e. Linux) should not get HARDWARE events and HARDWARE counters from DT because DT describes HARDWARE and
DT
will
not include SOFTWARE events and SOFTWARE counters. Also, SOFTWARE events and SOFTWARE counters will change for given platform as OpenSBI continues to improve so it will be hard to keep the DT in sync.
The best thing for S-mode software would be to depend on one method of discovering all counters and supported events which is the SBI_PMU_COUNTER_DESCRIBE call. In other words, no need for platform driver for Linux RISC-V PMU driver instead depend only on sbi_probe_extension() to detect SBI PMU extension.
OK, make sense.
Regards, Anup
-- Regards, Atish Regards, Anup -- Regards, Atish
Regards, Anup
|
|

Anup Patel
toggle quoted message
Show quoted text
-----Original Message----- From: Zong Li <zong.li@...> Sent: 08 July 2020 12:21 To: Atish Patra <Atish.Patra@...> Cc: Anup Patel <Anup.Patel@...>; andrew@...; tech- unixplatformspec@...; gfavor@... Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Wed, Jul 8, 2020 at 2:17 PM Atish Patra <atish.patra@...> wrote:
On Wed, 2020-07-08 at 03:04 +0000, Anup Patel wrote:
Hi Atish,
-----Original Message----- From: Atish Patra <Atish.Patra@...> Sent: 08 July 2020 00:44 To: zong.li@...; Anup Patel <Anup.Patel@...> Cc: andrew@...; tech-unixplatformspec@...; gfavor@... Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Tue, 2020-07-07 at 11:05 +0800, Zong Li wrote:
On Tue, Jul 7, 2020 at 12:21 AM Anup Patel <anup.patel@...> wrote:
-----Original Message----- From: Zong Li <zong.li@...> Sent: 06 July 2020 13:59 To: Anup Patel <Anup.Patel@...> Cc: tech-unixplatformspec@...; Andrew Waterman <andrew@...>; Greg Favor <gfavor@...> Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Mon, Jul 6, 2020 at 12:35 AM Anup Patel < anup.patel@...> wrote:
Hi All,
We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER CSRs. A RISC-V implementation can support
monitoring
various HARDWARE events using limited number of
HPMCOUNTER
CSRs.
In addition to HARDWARE performance counters, a SBI implementation (e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for events such as number of RFENCEs, number of IPIs, number of misaligned load/store instructions, number of illegal instructions, etc.
We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and SOFTWARE counters provided
by
SBI implementation.
To define SBI PMU extension, we first define counter_idx which is a logical number assigned to a counter and event_idx which is an encoded Is there more detail about counter_idx? I was wondering that 1. What is the ordering of logical numbers for HW and SW counters? I think that the logical numbers are assigned by OpenSBI. Like mentioned here, counter_idx is a logical index for all available counters (i.e. HARDWARE and SOFTWARE). The SBI implementation (i.e. OpenSBI, Xvisor RISC-V, or KVM RISC-V) can assign counter_idx to HARDWARE and SOFTWARE counters in any order it likes.
2. How to know the logical number of counter_idx of each HW and SW counters from s-mode? I guess that we need to know the logical numbers of all counters before we invoke a SBI call. The SBI_PMU_COUNTER_DESCRIBE call mentioned below will tell us whether given counter_idx maps to a HARDWARE counter or SOFTWARE counter based on CSR_Number info returned by SBI_PMU_COUNTER_DESCRIBE call. OK, I assume the logical number of counte_idx is sequential and started from zero here, so during initialization of s-mode software, we could get the total number 'N' of counters by SBI_PMU_NUM_COUNTERS
first, then loop the N times to identify capability of each counter. Does it align your ideas?
That's what my understanding as well. Assigning continous counter_idx may put a restriction on M-mode implementation. How about assigning some There is not restriction on M-mode runtime firmware in assigning counter_idx to various HARDWARE and SOFTWARE counters. In fact, counter_idx being logical index helps M-mode software to implement a registration mechanism.
ranges for software vs hardware counters. May be split the hardware into different ranges as well based on event_idx.type. I had done that initially but it will only increase SBI calls because we will need separate SBI calls to determine number of HARDWARE and SOFTWARE counters.
I was suggesting to have fixed ranges for both event types.
Also, this makes things difficult if a RISC-V implementation has non- standard implementation specific CSR as HARDWARE counter.
But I agree that it gets tricky with non-standard implementation specific counters.
This also allows supervisor to know what type of the counter it is looking at without parsing the data written by the describe call. There is no real advantage of knowing type of counter from counter_idx over CSR_Number returned by SBI_PMU_COUNTER_DESCRIBE call
because
the SBI_PMU_COUNTER_DESCRIBE call will be called only at boot-time once for each counter and S-mode software can mark counters as HARDWARE/SOFTWARE at boot-time based on CSR_Number returned by SBI_PMU_COUNTER_DESCRIBE call.
My concern is that it may increase the booting time. For example, my current x86 desktop has 1679 counters. If a RISC-V desktop has those many counters (hopefully one day!! :)), there will be ~2k SBI calls and memory reads just to get perf working. I guess there will be even more counters in servers.
Moreover, supervisor OS may choose to configure only few basic perf counter at boot time and defer configuring everything later depending on the usecase. Having a continous logical counter_idx may prevent those kind of optimizations. Correct ? Based on the optimization as you mentioned, it is good to me if we have SBI call to get the number of HW and SW counters respectively. If s-mode OS can know the separating numbers, then s-mode OS can lazy assign and query counters no matter if the counter_idx is continuous or not. If counter_idx is started for HW counters, the start countex_idx of the SW counter is the number of HW counters. Like mentioned in previous reply, any optimization possible using fixed ranges for counter_idx can also be done using logical counter_idx. The biggest problem with fixed ranges for counter_idx is that it will be difficult describe HARDWARE counters which map to implementation specific CSR. I would suggest that SBI_PMU_NUM_COUNTER can take a parameter to return the total number of all counters, the number of SW counters only and the number of HW counters only.
This is only required if we go for fixed ranges counter_idx numbering. Regards, Anup
number representing the HARDWARE/SOFTWARE event to be
monitored.
The SBI PMU event_idx is a XLEN bits wide number encoded as follows: event_idx[XLEN-1:16] = info event_idx[15:12] = type event_idx[11:0] = code
If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code can be one of the following values: enum sbi_pmu_hw_id { SBI_PMU_HW_CPU_CYCLES = 0, SBI_PMU_HW_INSTRUCTIONS = 1, SBI_PMU_HW_CACHE_REFERENCES = 2, SBI_PMU_HW_CACHE_MISSES = 3, SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4, SBI_PMU_HW_BRANCH_MISSES = 5, SBI_PMU_HW_BUS_CYCLES = 6, SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7, SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8, SBI_PMU_HW_REF_CPU_CYCLES = 9, SBI_PMU_HW_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code is encoded as follows: event_idx.code[11:3] = cache_id event_idx.code[2:1] = op_id event_idx.code[0:0] = result_id enum sbi_pmu_hw_cache_id { SBI_PMU_HW_CACHE_L1D = 0, SBI_PMU_HW_CACHE_L1I = 1, SBI_PMU_HW_CACHE_LL = 2, SBI_PMU_HW_CACHE_DTLB = 3, SBI_PMU_HW_CACHE_ITLB = 4, SBI_PMU_HW_CACHE_BPU = 5, SBI_PMU_HW_CACHE_NODE = 6, SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_id
{ SBI_PMU_HW_CACHE_OP_READ = 0, SBI_PMU_HW_CACHE_OP_WRITE = 1, SBI_PMU_HW_CACHE_OP_PREFETCH = 2, SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_result_id { SBI_PMU_HW_CACHE_RESULT_ACCESS = 0, SBI_PMU_HW_CACHE_RESULT_MISS = 1, SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW event, both event_idx.info and
event_idx.code
are platform dependent.
If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE event, event_idx.info is SBI implementation specific and event_idx.code can be one of the following: enum sbi_pmu_sw_id { SBI_PMU_SW_MISALIGNED_LOAD = 0, SBI_PMU_SW_MISALIGNED_STORE = 1, SBI_PMU_SW_ILLEGAL_INSN = 2, SBI_PMU_SW_LOCAL_SET_TIMER = 3, SBI_PMU_SW_LOCAL_IPI = 4, SBI_PMU_SW_LOCAL_FENCE_I = 5, SBI_PMU_SW_LOCAL_SFENCE_VMA = 6, SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7, SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8, SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9, SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10, SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11, SBI_PMU_SW_MAX, /* non-ABI */ };
In future, more events can be defined without breaking ABI compatibility of SBI calls.
Using definition of counter_idx and event_idx, we can potentially have the following SBI calls:
1. SBI_PMU_NUM_COUNTERS This call will return the number of COUNTERs Is it for the SW counters and we get the number of HW counters by DT? Or does it return the number of HW and SW counters both? If so, how to distinguish the number of HW and SW? This call returns total number of counters (i.e. HARDWARE and SOFTWARE both)
The other question is that the number of SW counters is defined by the core of OpenSBI or platform-dependent? Number of SW counters are defined by SBI implementation (i.e. OpenSBI, Xvisor RISC-V, and KVM RISC-V). Most likely SW counters will not include any platform-dependent SW counters although this is design choice of SBI implementation. OK, I got it. It would be enough, thanks.
2. SBI_PMU_COUNTER_DESCRIBE This call takes two parameters: 1) counter_idx 2) physical address It will write the description of SBI PMU counter at specified physical address. The details of the SBI PMU counter written at specified physical address are as follows: 1. Name (64 bytes) 2. CSR_Number (2 bytes) (CSR_Number <= 0xfff means counter is a RISC-V CSR) (CSR_Number > 0xfff means counter is a SBI implementation counter) (E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR) 3. CSR_Width (2 bytes) (Number of CSR bits implemented in HW) 4. Event_Count (2 bytes) (Number of events in Event_List array) 5. Event_List (2 * Event_Count bytes) (This is an array of 16bit values where each 16bit value is the supported event_idx.type and event_idx.code combination) What is the size we should allocate for this physical address? In my understanding, we need to allocate the pages in s-mode first, then pass the address of the pages to the second parameter, but we don't know the event_counter before we allocate the space for it, so it might across the boundary if event_count is very big. Theoretically, Event_Count cannot be more than 65535.
I think we should have SBI_PMU_NUM_EVENTS calls which will return number of events supported by given counter_idx. This will help S-mode software to determine amount of memory to allocate for SBI_PMU_COUNTER_DESCRIBE.
Sounds good to me.
3. SBI_PMU_COUNTER_SET_PHYS_ADDR This call takes two parameters: 1) counter_idx 2) physical address It will set the physical address of memory location where the SBI implementation will write the 64bit SOFTWARE counter. This SBI call is only for counters not mapped to any CSR (i.e. only for counters with CSR_Number > 0xfff). 4. SBI_PMU_COUNTER_START This call takes two parameters: 1) counter_idx 2) event_idx It will inform SBI implementation to configure and start/enable specified counter on the calling HART to monitor specific event. This SBI call will fail for counters which are not present and specified event_idx is not supported by the counter. 5. SBI_PMU_COUNTER_STOP This call takes one parameter: 1) counter_idx It will inform SBI implementation to stop/disable specified counters on the calling HART. This SBI call will fail for counters which are not present.
From above, the RISC-V PMU driver will use most of the SBI calls at boot time. Only SBI_PMU_COUNTER_START to be used once
before
using the counter.
The reading of counter is by reading CSR (for CSR_Number < 0xfff) OR by reading memory location (for CSR_Offset >= 0xfff). The counter overflow handling will have to be done in software by Linux kernel.
Using the SBI PMU extension, the M-mode runtime firmware (or Hypervisors) can provide a standardized view of HARDWARE/SOFTWARE counters and events to S-mode (or VS-
mode)
software.
The M-mode runtime firmware (OpenSBI) will need to know following platform dependent information: 1. Possible event_idx values allowed (or supported) by a HARDWARE counter (i.e. HPMCOUNTER) 2. Mapping of event_idx for HARDWARE event to HPMEVENT
CSR
value
3. Mapping of event_idx for HARDWARE CACHE event to
HPMEVENT
CSR
value 4.
Mapping of event_idx for HARDWARE RAW event to HPMEVENT
CSR
value 5.
Additional platform-specific progamming required by any event_idx
All platform dependent information mentioned above, can be obtained by M-mode runtime firmware (OpenSBI) from
platform
specific code. The DT/ACPI can also be used to described 1), 2), 3), and 4) mentioned above but 5) will always require platform specific code. I would update the next version of DT file to describe the points from 1) to 4). Thanks. As you mentioned before, it would be hard to sync the platform specific code with the DT of real use. I prefer to get 1), 2), 3) and 4) from DT first on each platform, and use platform specific code if DT is unavailable. (generic platform use DT certainly), then we could maximally reduce the inconsistency. It should platform's choice on how it wants to describe HARDWARE events and HARDWARE counters. The OpenSBI generic platform will tend to use DT based parsing of HARDWARE events and HARDWARE counters
but
other platform can do things differently.
The S-mode software (i.e. Linux) should not get HARDWARE events and HARDWARE counters from DT because DT describes HARDWARE
and DT
will
not include SOFTWARE events and SOFTWARE counters. Also, SOFTWARE events and SOFTWARE counters will change for given platform as OpenSBI continues to improve so it will be hard to keep the DT in sync.
The best thing for S-mode software would be to depend on one method of discovering all counters and supported events which is the SBI_PMU_COUNTER_DESCRIBE call. In other words, no need for platform driver for Linux RISC-V PMU driver instead depend only on sbi_probe_extension() to detect SBI PMU extension.
OK, make sense.
Regards, Anup
-- Regards, Atish Regards, Anup -- Regards, Atish
|
|
On Wed, Jul 8, 2020 at 4:45 PM Anup Patel <Anup.Patel@...> wrote:
-----Original Message----- From: Zong Li <zong.li@...> Sent: 08 July 2020 12:21 To: Atish Patra <Atish.Patra@...> Cc: Anup Patel <Anup.Patel@...>; andrew@...; tech- unixplatformspec@...; gfavor@... Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Wed, Jul 8, 2020 at 2:17 PM Atish Patra <atish.patra@...> wrote:
On Wed, 2020-07-08 at 03:04 +0000, Anup Patel wrote:
Hi Atish,
-----Original Message----- From: Atish Patra <Atish.Patra@...> Sent: 08 July 2020 00:44 To: zong.li@...; Anup Patel <Anup.Patel@...> Cc: andrew@...; tech-unixplatformspec@...; gfavor@... Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Tue, 2020-07-07 at 11:05 +0800, Zong Li wrote:
On Tue, Jul 7, 2020 at 12:21 AM Anup Patel <anup.patel@...> wrote:
-----Original Message----- From: Zong Li <zong.li@...> Sent: 06 July 2020 13:59 To: Anup Patel <Anup.Patel@...> Cc: tech-unixplatformspec@...; Andrew Waterman <andrew@...>; Greg Favor <gfavor@...> Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Mon, Jul 6, 2020 at 12:35 AM Anup Patel < anup.patel@...> wrote:
Hi All,
We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER CSRs. A RISC-V implementation can support
monitoring
various HARDWARE events using limited number of
HPMCOUNTER
CSRs.
In addition to HARDWARE performance counters, a SBI implementation (e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for events such as number of RFENCEs, number of IPIs, number of misaligned load/store instructions, number of illegal instructions, etc.
We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and SOFTWARE counters provided
by
SBI implementation.
To define SBI PMU extension, we first define counter_idx which is a logical number assigned to a counter and event_idx which is an encoded Is there more detail about counter_idx? I was wondering that 1. What is the ordering of logical numbers for HW and SW counters? I think that the logical numbers are assigned by OpenSBI. Like mentioned here, counter_idx is a logical index for all available counters (i.e. HARDWARE and SOFTWARE). The SBI implementation (i.e. OpenSBI, Xvisor RISC-V, or KVM RISC-V) can assign counter_idx to HARDWARE and SOFTWARE counters in any order it likes.
2. How to know the logical number of counter_idx of each HW and SW counters from s-mode? I guess that we need to know the logical numbers of all counters before we invoke a SBI call. The SBI_PMU_COUNTER_DESCRIBE call mentioned below will tell us whether given counter_idx maps to a HARDWARE counter or SOFTWARE counter based on CSR_Number info returned by SBI_PMU_COUNTER_DESCRIBE call. OK, I assume the logical number of counte_idx is sequential and started from zero here, so during initialization of s-mode software, we could get the total number 'N' of counters by SBI_PMU_NUM_COUNTERS
first, then loop the N times to identify capability of each counter. Does it align your ideas?
That's what my understanding as well. Assigning continous counter_idx may put a restriction on M-mode implementation. How about assigning some There is not restriction on M-mode runtime firmware in assigning counter_idx to various HARDWARE and SOFTWARE counters. In fact, counter_idx being logical index helps M-mode software to implement a registration mechanism.
ranges for software vs hardware counters. May be split the hardware into different ranges as well based on event_idx.type. I had done that initially but it will only increase SBI calls because we will need separate SBI calls to determine number of HARDWARE and SOFTWARE counters.
I was suggesting to have fixed ranges for both event types.
Also, this makes things difficult if a RISC-V implementation has non- standard implementation specific CSR as HARDWARE counter.
But I agree that it gets tricky with non-standard implementation specific counters.
This also allows supervisor to know what type of the counter it is looking at without parsing the data written by the describe call. There is no real advantage of knowing type of counter from counter_idx over CSR_Number returned by SBI_PMU_COUNTER_DESCRIBE call
because
the SBI_PMU_COUNTER_DESCRIBE call will be called only at boot-time once for each counter and S-mode software can mark counters as HARDWARE/SOFTWARE at boot-time based on CSR_Number returned by SBI_PMU_COUNTER_DESCRIBE call.
My concern is that it may increase the booting time. For example, my current x86 desktop has 1679 counters. If a RISC-V desktop has those many counters (hopefully one day!! :)), there will be ~2k SBI calls and memory reads just to get perf working. I guess there will be even more counters in servers.
Moreover, supervisor OS may choose to configure only few basic perf counter at boot time and defer configuring everything later depending on the usecase. Having a continous logical counter_idx may prevent those kind of optimizations. Correct ? Based on the optimization as you mentioned, it is good to me if we have SBI call to get the number of HW and SW counters respectively. If s-mode OS can know the separating numbers, then s-mode OS can lazy assign and query counters no matter if the counter_idx is continuous or not. If counter_idx is started for HW counters, the start countex_idx of the SW counter is the number of HW counters. Like mentioned in previous reply, any optimization possible using fixed ranges for counter_idx can also be done using logical counter_idx.
The biggest problem with fixed ranges for counter_idx is that it will be difficult describe HARDWARE counters which map to implementation specific CSR.
I would suggest that SBI_PMU_NUM_COUNTER can take a parameter to return the total number of all counters, the number of SW counters only and the number of HW counters only. This is only required if we go for fixed ranges counter_idx numbering.
The key is we need to know the range of HW counters and SW counters in countex_idxs. Even if we use continuous logical counter_idx, we still need knowing HW counters and SW counters respectively for lazy getting the capability of counter. For example, we just get the capability of basic counters at initialization, such as cycle and instret, and then, we want to monitor a software event at some moment, so we try to get the capability of counters again by invoking SBI_PMU_COUNTER_DESCRIBE. At this moment, if we know what the first counter_idx of all SW counters is, then we could ignore the rest counter_idx of HW counters. We don't need to know the number of HW and SW counters respectively at the beginning unless we are going to get the capability of all counters during the initial phase, because we will know the number of them after that. Regards, Anup
number representing the HARDWARE/SOFTWARE event to be
monitored.
The SBI PMU event_idx is a XLEN bits wide number encoded as follows: event_idx[XLEN-1:16] = info event_idx[15:12] = type event_idx[11:0] = code
If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code can be one of the following values: enum sbi_pmu_hw_id { SBI_PMU_HW_CPU_CYCLES = 0, SBI_PMU_HW_INSTRUCTIONS = 1, SBI_PMU_HW_CACHE_REFERENCES = 2, SBI_PMU_HW_CACHE_MISSES = 3, SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4, SBI_PMU_HW_BRANCH_MISSES = 5, SBI_PMU_HW_BUS_CYCLES = 6, SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7, SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8, SBI_PMU_HW_REF_CPU_CYCLES = 9, SBI_PMU_HW_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code is encoded as follows: event_idx.code[11:3] = cache_id event_idx.code[2:1] = op_id event_idx.code[0:0] = result_id enum sbi_pmu_hw_cache_id { SBI_PMU_HW_CACHE_L1D = 0, SBI_PMU_HW_CACHE_L1I = 1, SBI_PMU_HW_CACHE_LL = 2, SBI_PMU_HW_CACHE_DTLB = 3, SBI_PMU_HW_CACHE_ITLB = 4, SBI_PMU_HW_CACHE_BPU = 5, SBI_PMU_HW_CACHE_NODE = 6, SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_id
{ SBI_PMU_HW_CACHE_OP_READ = 0, SBI_PMU_HW_CACHE_OP_WRITE = 1, SBI_PMU_HW_CACHE_OP_PREFETCH = 2, SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_result_id { SBI_PMU_HW_CACHE_RESULT_ACCESS = 0, SBI_PMU_HW_CACHE_RESULT_MISS = 1, SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW event, both event_idx.info and
event_idx.code
are platform dependent.
If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE event, event_idx.info is SBI implementation specific and event_idx.code can be one of the following: enum sbi_pmu_sw_id { SBI_PMU_SW_MISALIGNED_LOAD = 0, SBI_PMU_SW_MISALIGNED_STORE = 1, SBI_PMU_SW_ILLEGAL_INSN = 2, SBI_PMU_SW_LOCAL_SET_TIMER = 3, SBI_PMU_SW_LOCAL_IPI = 4, SBI_PMU_SW_LOCAL_FENCE_I = 5, SBI_PMU_SW_LOCAL_SFENCE_VMA = 6, SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7, SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8, SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9, SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10, SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11, SBI_PMU_SW_MAX, /* non-ABI */ };
In future, more events can be defined without breaking ABI compatibility of SBI calls.
Using definition of counter_idx and event_idx, we can potentially have the following SBI calls:
1. SBI_PMU_NUM_COUNTERS This call will return the number of COUNTERs Is it for the SW counters and we get the number of HW counters by DT? Or does it return the number of HW and SW counters both? If so, how to distinguish the number of HW and SW? This call returns total number of counters (i.e. HARDWARE and SOFTWARE both)
The other question is that the number of SW counters is defined by the core of OpenSBI or platform-dependent? Number of SW counters are defined by SBI implementation (i.e. OpenSBI, Xvisor RISC-V, and KVM RISC-V). Most likely SW counters will not include any platform-dependent SW counters although this is design choice of SBI implementation. OK, I got it. It would be enough, thanks.
2. SBI_PMU_COUNTER_DESCRIBE This call takes two parameters: 1) counter_idx 2) physical address It will write the description of SBI PMU counter at specified physical address. The details of the SBI PMU counter written at specified physical address are as follows: 1. Name (64 bytes) 2. CSR_Number (2 bytes) (CSR_Number <= 0xfff means counter is a RISC-V CSR) (CSR_Number > 0xfff means counter is a SBI implementation counter) (E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR) 3. CSR_Width (2 bytes) (Number of CSR bits implemented in HW) 4. Event_Count (2 bytes) (Number of events in Event_List array) 5. Event_List (2 * Event_Count bytes) (This is an array of 16bit values where each 16bit value is the supported event_idx.type and event_idx.code combination) What is the size we should allocate for this physical address? In my understanding, we need to allocate the pages in s-mode first, then pass the address of the pages to the second parameter, but we don't know the event_counter before we allocate the space for it, so it might across the boundary if event_count is very big. Theoretically, Event_Count cannot be more than 65535.
I think we should have SBI_PMU_NUM_EVENTS calls which will return number of events supported by given counter_idx. This will help S-mode software to determine amount of memory to allocate for SBI_PMU_COUNTER_DESCRIBE.
Sounds good to me.
3. SBI_PMU_COUNTER_SET_PHYS_ADDR This call takes two parameters: 1) counter_idx 2) physical address It will set the physical address of memory location where the SBI implementation will write the 64bit SOFTWARE counter. This SBI call is only for counters not mapped to any CSR (i.e. only for counters with CSR_Number > 0xfff). 4. SBI_PMU_COUNTER_START This call takes two parameters: 1) counter_idx 2) event_idx It will inform SBI implementation to configure and start/enable specified counter on the calling HART to monitor specific event. This SBI call will fail for counters which are not present and specified event_idx is not supported by the counter. 5. SBI_PMU_COUNTER_STOP This call takes one parameter: 1) counter_idx It will inform SBI implementation to stop/disable specified counters on the calling HART. This SBI call will fail for counters which are not present.
From above, the RISC-V PMU driver will use most of the SBI calls at boot time. Only SBI_PMU_COUNTER_START to be used once
before
using the counter.
The reading of counter is by reading CSR (for CSR_Number < 0xfff) OR by reading memory location (for CSR_Offset >= 0xfff). The counter overflow handling will have to be done in software by Linux kernel.
Using the SBI PMU extension, the M-mode runtime firmware (or Hypervisors) can provide a standardized view of HARDWARE/SOFTWARE counters and events to S-mode (or VS-
mode)
software.
The M-mode runtime firmware (OpenSBI) will need to know following platform dependent information: 1. Possible event_idx values allowed (or supported) by a HARDWARE counter (i.e. HPMCOUNTER) 2. Mapping of event_idx for HARDWARE event to HPMEVENT
CSR
value
3. Mapping of event_idx for HARDWARE CACHE event to
HPMEVENT
CSR
value 4.
Mapping of event_idx for HARDWARE RAW event to HPMEVENT
CSR
value 5.
Additional platform-specific progamming required by any event_idx
All platform dependent information mentioned above, can be obtained by M-mode runtime firmware (OpenSBI) from
platform
specific code. The DT/ACPI can also be used to described 1), 2), 3), and 4) mentioned above but 5) will always require platform specific code. I would update the next version of DT file to describe the points from 1) to 4). Thanks. As you mentioned before, it would be hard to sync the platform specific code with the DT of real use. I prefer to get 1), 2), 3) and 4) from DT first on each platform, and use platform specific code if DT is unavailable. (generic platform use DT certainly), then we could maximally reduce the inconsistency. It should platform's choice on how it wants to describe HARDWARE events and HARDWARE counters. The OpenSBI generic platform will tend to use DT based parsing of HARDWARE events and HARDWARE counters
but
other platform can do things differently.
The S-mode software (i.e. Linux) should not get HARDWARE events and HARDWARE counters from DT because DT describes HARDWARE
and DT
will
not include SOFTWARE events and SOFTWARE counters. Also, SOFTWARE events and SOFTWARE counters will change for given platform as OpenSBI continues to improve so it will be hard to keep the DT in sync.
The best thing for S-mode software would be to depend on one method of discovering all counters and supported events which is the SBI_PMU_COUNTER_DESCRIBE call. In other words, no need for platform driver for Linux RISC-V PMU driver instead depend only on sbi_probe_extension() to detect SBI PMU extension.
OK, make sense.
Regards, Anup
-- Regards, Atish Regards, Anup -- Regards, Atish
|
|
Would there be a raw style interface to access all the SBI-unaware events, like perf's rNNN support?
How would this work on a multicore system -- would the SBI calls only handle the current hart's counters? That seems easiest to deal with.
Brian
|
|
On Thu, Jul 9, 2020 at 1:06 AM Brian Grayson <brian.grayson@...> wrote: Would there be a raw style interface to access all the SBI-unaware events, like perf's rNNN support?
Follow this question, in our current proposal, s-mode software only knows the event_idx, and m-mode firmware takes care of the mapping, my question is that s-mode software doesn't seem to understand the meaning of each event_idx, that means, it just get the array of all supported event_idx, but couldn't know which one is for what. This also happened on u-mode program, for rNNN interface, normally, we should refer to the processor specific documentation for getting these details, and now, users won't know what value they should give. Please correct me if I miss something. Thanks. How would this work on a multicore system -- would the SBI calls only handle the current hart's counters? That seems easiest to deal with.
Brian
|
|
My question is, let's say I know that putting the value 0x12345678 into the mhpmevent3 register gets me the event I want, and there is no support for that event in the SBI spec/API. Will this API allow me to program such an event, basically bypassing the usual mapping functionality? perf basically allows you to say "I know this event number is not one you know about, but it's the value I want placed directly into the hardware." I want to ensure that the full capabilities of the hardware will still be accessible through the SBI spec in some sort of "raw" mode, and I didn't see a way for that to happen right now. We don't want to restrict users to the lowest common denominator of functionality.
toggle quoted message
Show quoted text
On Wed, Jul 8, 2020 at 10:27 PM Zong Li < zong.li@...> wrote: On Thu, Jul 9, 2020 at 1:06 AM Brian Grayson <brian.grayson@...> wrote:
>
> Would there be a raw style interface to access all the SBI-unaware events, like perf's rNNN support?
>
Follow this question, in our current proposal, s-mode software only
knows the event_idx, and m-mode firmware takes care of the mapping, my
question is that s-mode software doesn't seem to understand the
meaning of each event_idx, that means, it just get the array of all
supported event_idx, but couldn't know which one is for what. This
also happened on u-mode program, for rNNN interface, normally, we
should refer to the processor specific documentation for getting these
details, and now, users won't know what value they should give. Please
correct me if I miss something. Thanks.
> How would this work on a multicore system -- would the SBI calls only handle the current hart's counters? That seems easiest to deal with.
>
> Brian
|
|
I think this need is covered by this excerpt from the v2 proposal:
If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW event, both event_idx.info and event_idx.code are platform dependent.
Greg
toggle quoted message
Show quoted text
My question is, let's say I know that putting the value 0x12345678 into the mhpmevent3 register gets me the event I want, and there is no support for that event in the SBI spec/API. Will this API allow me to program such an event, basically bypassing the usual mapping functionality? perf basically allows you to say "I know this event number is not one you know about, but it's the value I want placed directly into the hardware." I want to ensure that the full capabilities of the hardware will still be accessible through the SBI spec in some sort of "raw" mode, and I didn't see a way for that to happen right now. We don't want to restrict users to the lowest common denominator of functionality.
On Wed, Jul 8, 2020 at 10:27 PM Zong Li < zong.li@...> wrote: On Thu, Jul 9, 2020 at 1:06 AM Brian Grayson <brian.grayson@...> wrote:
>
> Would there be a raw style interface to access all the SBI-unaware events, like perf's rNNN support?
>
Follow this question, in our current proposal, s-mode software only
knows the event_idx, and m-mode firmware takes care of the mapping, my
question is that s-mode software doesn't seem to understand the
meaning of each event_idx, that means, it just get the array of all
supported event_idx, but couldn't know which one is for what. This
also happened on u-mode program, for rNNN interface, normally, we
should refer to the processor specific documentation for getting these
details, and now, users won't know what value they should give. Please
correct me if I miss something. Thanks.
> How would this work on a multicore system -- would the SBI calls only handle the current hart's counters? That seems easiest to deal with.
>
> Brian
|
|

Anup Patel
Like Greg already mentioned, SBI PMU event_idx.type == 0x2 is HARDWARE RAW event.
To monitor RAW events, user-space perf tool will create user space perf RAW event (i.e. perf_event_attr.type == 4 and perf_event_attr.config = = <hardware_specific_raw_event_idx>). The Linux RISC-V PMU driver will allocate and map matching
HARDWARE counter which supports specified corrosponding SBI RAW event (event_idx.type = 2, event_idx.code = perf_event_attr.config[11:0], and event_idx.info = perf_event_attr.config[59:12]).
The SBI PMU RAW events are mostly opaque to all software layers (i.e. User-space, Linux, Hypervisors, and OpenSBI). Users will need to refer HW specs for semantics of RAW events when using RAW events with perf tool.
Regards,
Anup
toggle quoted message
Show quoted text
From: tech-unixplatformspec@... <tech-unixplatformspec@...>
On Behalf Of Greg Favor
Sent: 09 July 2020 12:43
To: Brian Grayson <brian.grayson@...>
Cc: Zong Li <zong.li@...>; Anup Patel <Anup.Patel@...>; Atish Patra <Atish.Patra@...>; andrew@...; tech-unixplatformspec@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
I think this need is covered by this excerpt from the v2 proposal:
If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW
event, both event_idx.info and event_idx.code are platform dependent.
My question is, let's say I know that putting the value 0x12345678 into the mhpmevent3 register gets me the event I want, and there is no support for that event in the SBI spec/API. Will this API allow me to program such an event, basically
bypassing the usual mapping functionality? perf basically allows you to say "I know this event number is not one you know about, but it's the value I want placed directly into the hardware." I want to ensure that the full capabilities of the hardware will
still be accessible through the SBI spec in some sort of "raw" mode, and I didn't see a way for that to happen right now. We don't want to restrict users to the lowest common denominator of functionality.
On Wed, Jul 8, 2020 at 10:27 PM Zong Li <zong.li@...> wrote:
On Thu, Jul 9, 2020 at 1:06 AM Brian Grayson <brian.grayson@...> wrote:
>
> Would there be a raw style interface to access all the SBI-unaware events, like perf's rNNN support?
>
Follow this question, in our current proposal, s-mode software only
knows the event_idx, and m-mode firmware takes care of the mapping, my
question is that s-mode software doesn't seem to understand the
meaning of each event_idx, that means, it just get the array of all
supported event_idx, but couldn't know which one is for what. This
also happened on u-mode program, for rNNN interface, normally, we
should refer to the processor specific documentation for getting these
details, and now, users won't know what value they should give. Please
correct me if I miss something. Thanks.
> How would this work on a multicore system -- would the SBI calls only handle the current hart's counters? That seems easiest to deal with.
>
> Brian
|
|

Anup Patel
Based on my previous reply…
To monitor RAW event 0x12345678, user-space perf tool will create user space perf RAW event (i.e. perf_event_attr.type == 4 and perf_event_attr.config = = 0x12345678). The Linux RISC-V PMU driver will allocate and map matching HARDWARE
counter which supports specified corrosponding SBI RAW event (event_idx.type = 2, event_idx.code = 0x678 and event_idx.info = 0x12345). Finally, the SBI_PMU_COUNTER_START call implemented by OpenSBI will write 0x12345678 (or some platform specific translated
value of 0x12345678) to appropriate mhpmeventX CSR).
(Note: above we assume mhpmcounterX supports monitoring RAW event 0x12345678 and OpenSBI is aware of this)
Regards,
Anup
toggle quoted message
Show quoted text
From: Brian Grayson <brian.grayson@...>
Sent: 09 July 2020 12:35
To: Zong Li <zong.li@...>
Cc: Anup Patel <Anup.Patel@...>; Atish Patra <Atish.Patra@...>; andrew@...; tech-unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
My question is, let's say I know that putting the value 0x12345678 into the mhpmevent3 register gets me the event I want, and there is no support for that event in the SBI spec/API. Will this API allow me to program such an event, basically
bypassing the usual mapping functionality? perf basically allows you to say "I know this event number is not one you know about, but it's the value I want placed directly into the hardware." I want to ensure that the full capabilities of the hardware will
still be accessible through the SBI spec in some sort of "raw" mode, and I didn't see a way for that to happen right now. We don't want to restrict users to the lowest common denominator of functionality.
On Wed, Jul 8, 2020 at 10:27 PM Zong Li <zong.li@...> wrote:
On Thu, Jul 9, 2020 at 1:06 AM Brian Grayson <brian.grayson@...> wrote:
>
> Would there be a raw style interface to access all the SBI-unaware events, like perf's rNNN support?
>
Follow this question, in our current proposal, s-mode software only
knows the event_idx, and m-mode firmware takes care of the mapping, my
question is that s-mode software doesn't seem to understand the
meaning of each event_idx, that means, it just get the array of all
supported event_idx, but couldn't know which one is for what. This
also happened on u-mode program, for rNNN interface, normally, we
should refer to the processor specific documentation for getting these
details, and now, users won't know what value they should give. Please
correct me if I miss something. Thanks.
> How would this work on a multicore system -- would the SBI calls only handle the current hart's counters? That seems easiest to deal with.
>
> Brian
|
|

Anup Patel
toggle quoted message
Show quoted text
-----Original Message----- From: Zong Li <zong.li@...> Sent: 08 July 2020 15:06 To: Anup Patel <Anup.Patel@...> Cc: Atish Patra <Atish.Patra@...>; andrew@...; tech- unixplatformspec@...; gfavor@... Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Wed, Jul 8, 2020 at 4:45 PM Anup Patel <Anup.Patel@...> wrote:
-----Original Message----- From: Zong Li <zong.li@...> Sent: 08 July 2020 12:21 To: Atish Patra <Atish.Patra@...> Cc: Anup Patel <Anup.Patel@...>; andrew@...; tech- unixplatformspec@...; gfavor@... Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Wed, Jul 8, 2020 at 2:17 PM Atish Patra <atish.patra@...> wrote:
On Wed, 2020-07-08 at 03:04 +0000, Anup Patel wrote:
Hi Atish,
-----Original Message----- From: Atish Patra <Atish.Patra@...> Sent: 08 July 2020 00:44 To: zong.li@...; Anup Patel <Anup.Patel@...> Cc: andrew@...; tech-unixplatformspec@...; gfavor@... Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Tue, 2020-07-07 at 11:05 +0800, Zong Li wrote:
On Tue, Jul 7, 2020 at 12:21 AM Anup Patel <anup.patel@...> wrote:
-----Original Message----- From: Zong Li <zong.li@...> Sent: 06 July 2020 13:59 To: Anup Patel <Anup.Patel@...> Cc: tech-unixplatformspec@...; Andrew Waterman <andrew@...>; Greg Favor <gfavor@...> Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU Extension
On Mon, Jul 6, 2020 at 12:35 AM Anup Patel < anup.patel@...> wrote:
Hi All,
We don't have a dedicated RISC-V PMU extension but we do have HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER CSRs. A RISC-V implementation can support
monitoring
various HARDWARE events using limited number of
HPMCOUNTER
CSRs.
In addition to HARDWARE performance counters, a SBI implementation (e.g. OpenSBI, Xvisor, KVM, etc) can provide SOFTWARE counters for events such as number of RFENCEs, number of IPIs, number of misaligned load/store instructions, number of illegal instructions, etc.
We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and SOFTWARE counters provided
by
SBI implementation.
To define SBI PMU extension, we first define counter_idx which is a logical number assigned to a counter and event_idx which is an encoded Is there more detail about counter_idx? I was wondering that 1. What is the ordering of logical numbers for HW and SW counters? I think that the logical numbers are assigned by OpenSBI. Like mentioned here, counter_idx is a logical index for all available counters (i.e. HARDWARE and SOFTWARE). The SBI implementation (i.e. OpenSBI, Xvisor RISC-V, or KVM RISC-V) can assign counter_idx to HARDWARE and SOFTWARE counters in any order it likes.
2. How to know the logical number of counter_idx of each HW and SW counters from s-mode? I guess that we need to know the logical numbers of all counters before we invoke a SBI call. The SBI_PMU_COUNTER_DESCRIBE call mentioned below will tell us whether given counter_idx maps to a HARDWARE counter or SOFTWARE counter based on CSR_Number info returned by SBI_PMU_COUNTER_DESCRIBE call. OK, I assume the logical number of counte_idx is sequential and started from zero here, so during initialization of s-mode software, we could get the total number 'N' of counters by SBI_PMU_NUM_COUNTERS
first, then loop the N times to identify capability of each counter. Does it align your ideas?
That's what my understanding as well. Assigning continous counter_idx may put a restriction on M-mode implementation. How about assigning some There is not restriction on M-mode runtime firmware in assigning counter_idx to various HARDWARE and SOFTWARE counters. In fact, counter_idx being logical index helps M-mode software to implement a registration mechanism.
ranges for software vs hardware counters. May be split the hardware into different ranges as well based on event_idx.type. I had done that initially but it will only increase SBI calls because we will need separate SBI calls to determine number of HARDWARE and SOFTWARE counters.
I was suggesting to have fixed ranges for both event types.
Also, this makes things difficult if a RISC-V implementation has non- standard implementation specific CSR as HARDWARE counter.
But I agree that it gets tricky with non-standard implementation specific counters.
This also allows supervisor to know what type of the counter it is looking at without parsing the data written by the describe call. There is no real advantage of knowing type of counter from counter_idx over CSR_Number returned by
SBI_PMU_COUNTER_DESCRIBE
call because
the SBI_PMU_COUNTER_DESCRIBE call will be called only at boot-time once for each counter and S-mode software can mark counters as HARDWARE/SOFTWARE at boot-time based on
CSR_Number
returned by SBI_PMU_COUNTER_DESCRIBE call.
My concern is that it may increase the booting time. For example, my current x86 desktop has 1679 counters. If a RISC-V desktop has those many counters (hopefully one day!! :)), there will be ~2k SBI calls and memory reads just to get perf working. I guess there will be even more counters in servers.
Moreover, supervisor OS may choose to configure only few basic perf counter at boot time and defer configuring everything later depending on the usecase. Having a continous logical counter_idx may prevent those kind of optimizations. Correct ? Based on the optimization as you mentioned, it is good to me if we have SBI call to get the number of HW and SW counters respectively. If s-mode OS can know the separating numbers, then s-mode OS can lazy assign and query counters no matter if the counter_idx is continuous or not. If counter_idx is started for HW counters, the start countex_idx of the SW counter is the number of HW counters. Like mentioned in previous reply, any optimization possible using fixed ranges for counter_idx can also be done using logical counter_idx.
The biggest problem with fixed ranges for counter_idx is that it will be difficult describe HARDWARE counters which map to implementation specific CSR.
I would suggest that SBI_PMU_NUM_COUNTER can take a parameter to return the total number of all counters, the number of SW counters only and the number of HW counters only. This is only required if we go for fixed ranges counter_idx numbering.
The key is we need to know the range of HW counters and SW counters in countex_idxs. Even if we use continuous logical counter_idx, we still need knowing HW counters and SW counters respectively for lazy getting the capability of counter. For example, we just get the capability of basic counters at initialization, such as cycle and instret, and then, we want to monitor a software event at some moment, so we try to get the capability of counters again by invoking SBI_PMU_COUNTER_DESCRIBE. At this moment, if we know what the first counter_idx of all SW counters is, then we could ignore the rest counter_idx of HW counters.
We don't need to know the number of HW and SW counters respectively at the beginning unless we are going to get the capability of all counters during the initial phase, because we will know the number of them after that. The only difference in SBI PMU HARDWARE and SOFTWARE counters is how the counter value is read. For SBI PMU HARDWARE counter, the value is read from some RISC-V CSR whereas for SBI PMU SOFTWARE counter the value is read from some memory location. The S-mode software can do lazy programming of memory location for SBI PMU SOFTWARE counters using SBI_PMU_COUNTER_SET_PHYS_ADDR call. Apart from SBI_PMU_COUNTER_SET_PHYS_ADDR call, all other SBI call sequence is exactly same for both SBI PMU HARDWARE and SOFTWARE counters. I am still not convinced why we need fixed ranges counter_idx to distinguish HARDWARE and SOFTWARE counters. Regards, Anup
Regards, Anup
number representing the HARDWARE/SOFTWARE event to
be
monitored.
The SBI PMU event_idx is a XLEN bits wide number encoded as follows: event_idx[XLEN-1:16] = info event_idx[15:12] = type event_idx[11:0] = code
If event_idx.type == 0x0 then it is HARDWARE event. For HARDWARE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code can be one of the following values: enum sbi_pmu_hw_id { SBI_PMU_HW_CPU_CYCLES = 0, SBI_PMU_HW_INSTRUCTIONS = 1, SBI_PMU_HW_CACHE_REFERENCES = 2, SBI_PMU_HW_CACHE_MISSES = 3, SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4, SBI_PMU_HW_BRANCH_MISSES = 5, SBI_PMU_HW_BUS_CYCLES = 6, SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7, SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8, SBI_PMU_HW_REF_CPU_CYCLES = 9, SBI_PMU_HW_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x1 then it is HARDWARE CACHE event. For HARDWARE
CACHE event, the event_idx.info is optional and can be passed zero whereas the event_idx.code is encoded as follows: event_idx.code[11:3] = cache_id event_idx.code[2:1] = op_id event_idx.code[0:0] = result_id enum sbi_pmu_hw_cache_id { SBI_PMU_HW_CACHE_L1D = 0, SBI_PMU_HW_CACHE_L1I = 1, SBI_PMU_HW_CACHE_LL = 2, SBI_PMU_HW_CACHE_DTLB = 3, SBI_PMU_HW_CACHE_ITLB = 4, SBI_PMU_HW_CACHE_BPU = 5, SBI_PMU_HW_CACHE_NODE = 6, SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_id
{ SBI_PMU_HW_CACHE_OP_READ = 0, SBI_PMU_HW_CACHE_OP_WRITE = 1, SBI_PMU_HW_CACHE_OP_PREFETCH = 2, SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */ }; enum sbi_pmu_hw_cache_op_result_id { SBI_PMU_HW_CACHE_RESULT_ACCESS = 0, SBI_PMU_HW_CACHE_RESULT_MISS = 1, SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */ }; (NOTE: Same as <linux_source>/include/uapi/linux/perf_event.h)
If event_idx.type == 0x2 then it is HARDWARE RAW event. For HARDWARE RAW event, both event_idx.info and
event_idx.code
are platform dependent.
If event_idx.type == 0xf then it is SOFTWARE event. For SOFTWARE event, event_idx.info is SBI implementation specific and event_idx.code can be one of the following: enum sbi_pmu_sw_id { SBI_PMU_SW_MISALIGNED_LOAD = 0, SBI_PMU_SW_MISALIGNED_STORE = 1, SBI_PMU_SW_ILLEGAL_INSN = 2, SBI_PMU_SW_LOCAL_SET_TIMER = 3, SBI_PMU_SW_LOCAL_IPI = 4, SBI_PMU_SW_LOCAL_FENCE_I = 5, SBI_PMU_SW_LOCAL_SFENCE_VMA = 6, SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7, SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8, SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9, SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10, SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11, SBI_PMU_SW_MAX, /* non-ABI */ };
In future, more events can be defined without breaking ABI compatibility of SBI calls.
Using definition of counter_idx and event_idx, we can potentially have the following SBI calls:
1. SBI_PMU_NUM_COUNTERS This call will return the number of COUNTERs Is it for the SW counters and we get the number of HW counters by DT? Or does it return the number of HW and SW counters both? If so, how to distinguish the number of HW and SW? This call returns total number of counters (i.e. HARDWARE and SOFTWARE both)
The other question is that the number of SW counters is defined by the core of OpenSBI or platform-dependent? Number of SW counters are defined by SBI implementation (i.e. OpenSBI, Xvisor RISC-V, and KVM RISC-V). Most likely SW counters will not include any platform-dependent SW counters although this is design choice of SBI implementation. OK, I got it. It would be enough, thanks.
2. SBI_PMU_COUNTER_DESCRIBE This call takes two parameters: 1) counter_idx 2) physical address It will write the description of SBI PMU counter at specified physical address. The details of the SBI PMU counter written at specified physical address are as follows: 1. Name (64 bytes) 2. CSR_Number (2 bytes) (CSR_Number <= 0xfff means counter is a RISC-V CSR) (CSR_Number > 0xfff means counter is a SBI implementation counter) (E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR) 3. CSR_Width (2 bytes) (Number of CSR bits implemented in HW) 4. Event_Count (2 bytes) (Number of events in Event_List array) 5. Event_List (2 * Event_Count bytes) (This is an array of 16bit values where each 16bit value is the supported event_idx.type and event_idx.code combination) What is the size we should allocate for this physical address? In my understanding, we need to allocate the pages in s-mode first, then pass the address of the pages to the second parameter, but we don't know the event_counter before we allocate the space for it, so it might across the boundary if event_count is very big. Theoretically, Event_Count cannot be more than 65535.
I think we should have SBI_PMU_NUM_EVENTS calls which will return number of events supported by given counter_idx. This will help S-mode software to determine amount of memory to allocate for SBI_PMU_COUNTER_DESCRIBE.
Sounds good to me.
3. SBI_PMU_COUNTER_SET_PHYS_ADDR This call takes two parameters: 1) counter_idx 2) physical address It will set the physical address of memory location where the SBI implementation will write the 64bit SOFTWARE counter. This SBI call is only for counters not mapped to any CSR (i.e. only for counters with CSR_Number > 0xfff). 4. SBI_PMU_COUNTER_START This call takes two parameters: 1) counter_idx 2) event_idx It will inform SBI implementation to configure and start/enable specified counter on the calling HART to monitor specific event. This SBI call will fail for counters which are not present and specified event_idx is not supported by the counter. 5. SBI_PMU_COUNTER_STOP This call takes one parameter: 1) counter_idx It will inform SBI implementation to stop/disable specified counters on the calling HART. This SBI call will fail for counters which are not present.
From above, the RISC-V PMU driver will use most of the SBI calls at boot time. Only SBI_PMU_COUNTER_START to be used once
before
using the counter.
The reading of counter is by reading CSR (for CSR_Number < 0xfff) OR by reading memory location (for CSR_Offset >= 0xfff). The counter overflow handling will have to be done in software by Linux kernel.
Using the SBI PMU extension, the M-mode runtime firmware (or Hypervisors) can provide a standardized view of HARDWARE/SOFTWARE counters and events to S-mode (or VS-
mode)
software.
The M-mode runtime firmware (OpenSBI) will need to know following platform dependent information: 1. Possible event_idx values allowed (or supported) by a HARDWARE counter (i.e. HPMCOUNTER) 2. Mapping of event_idx for HARDWARE event to HPMEVENT
CSR
value
3. Mapping of event_idx for HARDWARE CACHE event to
HPMEVENT
CSR
value 4.
Mapping of event_idx for HARDWARE RAW event to HPMEVENT
CSR
value 5.
Additional platform-specific progamming required by any event_idx
All platform dependent information mentioned above, can be obtained by M-mode runtime firmware (OpenSBI) from
platform
specific code. The DT/ACPI can also be used to described 1), 2), 3), and 4) mentioned above but 5) will always require platform specific code. I would update the next version of DT file to describe the points from 1) to 4). Thanks. As you mentioned before, it would be hard to sync the platform specific code with the DT of real use. I prefer to get 1), 2), 3) and 4) from DT first on each platform, and use platform specific code if DT is unavailable. (generic platform use DT certainly), then we could maximally reduce the inconsistency. It should platform's choice on how it wants to describe HARDWARE events and HARDWARE counters. The OpenSBI
generic
platform will tend to use DT based parsing of HARDWARE events and HARDWARE counters but
other platform can do things differently.
The S-mode software (i.e. Linux) should not get HARDWARE events and HARDWARE counters from DT because DT describes HARDWARE
and DT
will
not include SOFTWARE events and SOFTWARE counters. Also, SOFTWARE events and SOFTWARE counters will change for given platform as OpenSBI continues to improve so it will be hard to keep the DT in sync.
The best thing for S-mode software would be to depend on one method of discovering all counters and supported events which is the SBI_PMU_COUNTER_DESCRIBE call. In other words, no need for platform driver for Linux RISC-V PMU driver instead depend only on sbi_probe_extension() to detect SBI PMU extension.
OK, make sense.
Regards, Anup
-- Regards, Atish Regards, Anup -- Regards, Atish
|
|