Re: Proposal v2: SBI PMU Extension


Anup Patel
 

-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 08 July 2020 15:06
To: Anup Patel <Anup.Patel@...>
Cc: Atish Patra <Atish.Patra@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU
Extension

On Wed, Jul 8, 2020 at 4:45 PM Anup Patel <Anup.Patel@...> wrote:



-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 08 July 2020 12:21
To: Atish Patra <Atish.Patra@...>
Cc: Anup Patel <Anup.Patel@...>; andrew@...; tech-
unixplatformspec@...; gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI PMU
Extension

On Wed, Jul 8, 2020 at 2:17 PM Atish Patra <atish.patra@...>
wrote:

On Wed, 2020-07-08 at 03:04 +0000, Anup Patel wrote:
Hi Atish,

-----Original Message-----
From: Atish Patra <Atish.Patra@...>
Sent: 08 July 2020 00:44
To: zong.li@...; Anup Patel <Anup.Patel@...>
Cc: andrew@...; tech-unixplatformspec@...;
gfavor@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2: SBI
PMU Extension

On Tue, 2020-07-07 at 11:05 +0800, Zong Li wrote:
On Tue, Jul 7, 2020 at 12:21 AM Anup Patel
<anup.patel@...>
wrote:

-----Original Message-----
From: Zong Li <zong.li@...>
Sent: 06 July 2020 13:59
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Andrew
Waterman <andrew@...>; Greg Favor
<gfavor@...>
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal v2:
SBI PMU
Extension

On Mon, Jul 6, 2020 at 12:35 AM Anup Patel <
anup.patel@...>
wrote:
Hi All,

We don't have a dedicated RISC-V PMU extension but we
do have
HARDWARE
performance counters such as CYCLE CSR, INSTRET CSR,
and HPMCOUNTER CSRs. A RISC-V implementation can
support
monitoring
various HARDWARE events using limited number of
HPMCOUNTER
CSRs.
In addition to HARDWARE performance counters, a SBI
implementation (e.g. OpenSBI, Xvisor, KVM, etc) can
provide SOFTWARE counters for events such as number of
RFENCEs, number of IPIs, number of misaligned
load/store instructions, number of illegal instructions, etc.

We propose SBI PMU extension which tries to cover
CYCLE CSR, INSTRET CSR, HPMCOUNTER CSRs and SOFTWARE
counters provided
by
SBI
implementation.
To define SBI PMU extension, we first define
counter_idx which is a logical number assigned to a
counter and event_idx which is an encoded
Is there more detail about counter_idx? I was wondering
that 1.
What is the
ordering of logical numbers for HW and SW counters? I
think that the logical numbers are assigned by OpenSBI.
Like mentioned here, counter_idx is a logical index for
all available counters (i.e. HARDWARE and SOFTWARE). The
SBI implementation (i.e. OpenSBI, Xvisor RISC-V, or KVM
RISC-V) can assign counter_idx to HARDWARE and SOFTWARE
counters in any order it likes.

2. How to know the logical number of counter_idx of each
HW and SW counters from s-mode? I guess that we need to
know the logical numbers of all counters before we
invoke a SBI call.
The SBI_PMU_COUNTER_DESCRIBE call mentioned below will
tell us whether given counter_idx maps to a HARDWARE
counter or SOFTWARE counter based on CSR_Number info
returned by SBI_PMU_COUNTER_DESCRIBE call.
OK, I assume the logical number of counte_idx is sequential
and started from zero here, so during initialization of
s-mode software, we could get the total number 'N' of
counters by
SBI_PMU_NUM_COUNTERS
first, then loop the N times to identify capability of each
counter.
Does it align your ideas?
That's what my understanding as well. Assigning continous
counter_idx may put a restriction on M-mode implementation.
How about assigning some
There is not restriction on M-mode runtime firmware in assigning
counter_idx to various HARDWARE and SOFTWARE counters. In fact,
counter_idx being logical index helps M-mode software to
implement a registration mechanism.

ranges for software vs hardware counters. May be split the
hardware into different ranges as well based on event_idx.type.
I had done that initially but it will only increase SBI calls
because we will need separate SBI calls to determine number of
HARDWARE and SOFTWARE counters.
I was suggesting to have fixed ranges for both event types.

Also, this makes things difficult if a RISC-V implementation has
non- standard implementation specific CSR as HARDWARE counter.
But I agree that it gets tricky with non-standard implementation
specific counters.

This also allows supervisor to know what type of the counter
it is looking at without parsing the data written by the
describe call.
There is no real advantage of knowing type of counter from
counter_idx over CSR_Number returned by
SBI_PMU_COUNTER_DESCRIBE
call
because
the SBI_PMU_COUNTER_DESCRIBE call will be called only at
boot-time once for each counter and S-mode software can mark
counters as HARDWARE/SOFTWARE at boot-time based on
CSR_Number
returned by SBI_PMU_COUNTER_DESCRIBE call.
My concern is that it may increase the booting time.
For example, my current x86 desktop has 1679 counters. If a RISC-V
desktop has those many counters (hopefully one day!! :)), there
will be ~2k SBI calls and memory reads just to get perf working. I
guess there will be even more counters in servers.

Moreover, supervisor OS may choose to configure only few basic
perf counter at boot time and defer configuring everything later
depending on the usecase. Having a continous logical counter_idx
may prevent those kind of optimizations. Correct ?
Based on the optimization as you mentioned, it is good to me if we
have SBI call to get the number of HW and SW counters respectively.
If s-mode OS can know the separating numbers, then s-mode OS can
lazy assign and query counters no matter if the counter_idx is
continuous or not. If counter_idx is started for HW counters, the
start countex_idx of the SW counter is the number of HW counters.
Like mentioned in previous reply, any optimization possible using
fixed ranges for counter_idx can also be done using logical counter_idx.

The biggest problem with fixed ranges for counter_idx is that it will
be difficult describe HARDWARE counters which map to implementation
specific CSR.


I would suggest that SBI_PMU_NUM_COUNTER can take a parameter to
return the total number of all counters, the number of SW counters
only and the number of HW counters only.
This is only required if we go for fixed ranges counter_idx numbering.
The key is we need to know the range of HW counters and SW counters in
countex_idxs.
Even if we use continuous logical counter_idx, we still need knowing HW
counters and SW counters respectively for lazy getting the capability of
counter. For example, we just get the capability of basic counters at
initialization, such as cycle and instret, and then, we want to monitor a
software event at some moment, so we try to get the capability of counters
again by invoking SBI_PMU_COUNTER_DESCRIBE. At this moment, if we
know what the first counter_idx of all SW counters is, then we could ignore
the rest counter_idx of HW counters.

We don't need to know the number of HW and SW counters respectively at
the beginning unless we are going to get the capability of all counters during
the initial phase, because we will know the number of them after that.
The only difference in SBI PMU HARDWARE and SOFTWARE counters is
how the counter value is read. For SBI PMU HARDWARE counter, the
value is read from some RISC-V CSR whereas for SBI PMU SOFTWARE
counter the value is read from some memory location. The S-mode
software can do lazy programming of memory location for SBI PMU
SOFTWARE counters using SBI_PMU_COUNTER_SET_PHYS_ADDR call.

Apart from SBI_PMU_COUNTER_SET_PHYS_ADDR call, all other SBI call
sequence is exactly same for both SBI PMU HARDWARE and SOFTWARE
counters.

I am still not convinced why we need fixed ranges counter_idx to
distinguish HARDWARE and SOFTWARE counters.

Regards,
Anup


Regards,
Anup



number representing the HARDWARE/SOFTWARE event to
be
monitored.
The SBI PMU event_idx is a XLEN bits wide number
encoded as
follows:
event_idx[XLEN-1:16] = info event_idx[15:12] = type
event_idx[11:0] = code

If event_idx.type == 0x0 then it is HARDWARE event.
For HARDWARE event, the event_idx.info is optional and
can be passed zero whereas the event_idx.code can be
one of the following
values:
enum sbi_pmu_hw_id {
SBI_PMU_HW_CPU_CYCLES = 0,
SBI_PMU_HW_INSTRUCTIONS = 1,
SBI_PMU_HW_CACHE_REFERENCES = 2,
SBI_PMU_HW_CACHE_MISSES = 3,
SBI_PMU_HW_BRANCH_INSTRUCTIONS = 4,
SBI_PMU_HW_BRANCH_MISSES = 5,
SBI_PMU_HW_BUS_CYCLES = 6,
SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 7,
SBI_PMU_HW_STALLED_CYCLES_BACKEND = 8,
SBI_PMU_HW_REF_CPU_CYCLES = 9,
SBI_PMU_HW_MAX, /* non-ABI */
};
(NOTE: Same as
<linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x1 then it is HARDWARE CACHE event.
For
HARDWARE
CACHE event, the event_idx.info is optional and can be
passed zero whereas the event_idx.code is encoded as
follows:
event_idx.code[11:3] = cache_id event_idx.code[2:1] =
op_id event_idx.code[0:0] = result_id enum
sbi_pmu_hw_cache_id {
SBI_PMU_HW_CACHE_L1D = 0,
SBI_PMU_HW_CACHE_L1I = 1,
SBI_PMU_HW_CACHE_LL = 2,
SBI_PMU_HW_CACHE_DTLB = 3,
SBI_PMU_HW_CACHE_ITLB = 4,
SBI_PMU_HW_CACHE_BPU = 5,
SBI_PMU_HW_CACHE_NODE = 6,
SBI_PMU_HW_CACHE_MAX, /* non-ABI */ }; enum
sbi_pmu_hw_cache_op_id
{
SBI_PMU_HW_CACHE_OP_READ = 0,
SBI_PMU_HW_CACHE_OP_WRITE = 1,
SBI_PMU_HW_CACHE_OP_PREFETCH = 2,
SBI_PMU_HW_CACHE_OP_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
SBI_PMU_HW_CACHE_RESULT_ACCESS = 0,
SBI_PMU_HW_CACHE_RESULT_MISS = 1,
SBI_PMU_HW_CACHE_RESULT_MAX, /* non-ABI */
};
(NOTE: Same as
<linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 0x2 then it is HARDWARE RAW
event. For HARDWARE RAW event, both event_idx.info and
event_idx.code
are
platform
dependent.
If event_idx.type == 0xf then it is SOFTWARE event.
For SOFTWARE event, event_idx.info is SBI
implementation specific and event_idx.code can be one
of the following:
enum sbi_pmu_sw_id {
SBI_PMU_SW_MISALIGNED_LOAD = 0,
SBI_PMU_SW_MISALIGNED_STORE = 1,
SBI_PMU_SW_ILLEGAL_INSN = 2,
SBI_PMU_SW_LOCAL_SET_TIMER = 3,
SBI_PMU_SW_LOCAL_IPI = 4,
SBI_PMU_SW_LOCAL_FENCE_I = 5,
SBI_PMU_SW_LOCAL_SFENCE_VMA = 6,
SBI_PMU_SW_LOCAL_SFENCE_VMA_ASID = 7,
SBI_PMU_SW_LOCAL_HFENCE_GVMA = 8,
SBI_PMU_SW_LOCAL_HFENCE_GVMA_VMID = 9,
SBI_PMU_SW_LOCAL_HFENCE_VVMA = 10,
SBI_PMU_SW_LOCAL_HFENCE_VVMA_ASID = 11,
SBI_PMU_SW_MAX, /* non-ABI */
};

In future, more events can be defined without breaking
ABI compatibility of SBI calls.

Using definition of counter_idx and event_idx, we can
potentially have the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
This call will return the number of COUNTERs
Is it for the SW counters and we get the number of HW
counters by DT?
Or does it return the number of HW and SW counters both?
If so, how to distinguish the number of HW and SW?
This call returns total number of counters (i.e. HARDWARE
and SOFTWARE both)

The other question is that the number of SW counters is
defined by the core of OpenSBI or platform-dependent?
Number of SW counters are defined by SBI implementation (i.e.
OpenSBI,
Xvisor RISC-V, and KVM RISC-V). Most likely SW counters
will not include any platform-dependent SW counters
although this is design choice of SBI implementation.
OK, I got it. It would be enough, thanks.

2. SBI_PMU_COUNTER_DESCRIBE
This call takes two parameters: 1) counter_idx 2)
physical address
It will write the description of SBI PMU counter at
specified physical
address. The details of the SBI PMU counter written
at specified
physical address are as follows:
1. Name (64 bytes)
2. CSR_Number (2 bytes)
(CSR_Number <= 0xfff means counter is a RISC-V CSR)
(CSR_Number > 0xfff means counter is a SBI
implementation
counter)
(E.g. CSR_Number == 0xC02 imply HPMCOUNTER2 CSR)
3. CSR_Width (2 bytes)
(Number of CSR bits implemented in HW)
4. Event_Count (2 bytes)
(Number of events in Event_List array)
5. Event_List (2 * Event_Count bytes)
(This is an array of 16bit values where each
16bit value is the
supported event_idx.type and event_idx.code
combination)
What is the size we should allocate for this physical
address? In my understanding, we need to allocate the
pages in s-mode first, then pass the address of the
pages to the second parameter, but we don't know the
event_counter before we allocate the space for it, so it
might across the boundary if event_count is very big.
Theoretically, Event_Count cannot be more than 65535.

I think we should have SBI_PMU_NUM_EVENTS calls which will
return number of events supported by given counter_idx.
This will help S-mode software to determine amount of
memory to allocate for SBI_PMU_COUNTER_DESCRIBE.
Sounds good to me.

3. SBI_PMU_COUNTER_SET_PHYS_ADDR
This call takes two parameters: 1) counter_idx 2)
physical address
It will set the physical address of memory location
where the SBI
implementation will write the 64bit SOFTWARE counter.
This
SBI call
is only for counters not mapped to any CSR (i.e.
only for counters
with CSR_Number > 0xfff).
4. SBI_PMU_COUNTER_START
This call takes two parameters: 1) counter_idx 2)
event_idx
It will inform SBI implementation to configure and
start/enable
specified counter on the calling HART to monitor
specific event.
This SBI call will fail for counters which are not
present and
specified event_idx is not supported by the counter.
5. SBI_PMU_COUNTER_STOP
This call takes one parameter: 1) counter_idx
It will inform SBI implementation to stop/disable
specified counters
on the calling HART. This SBI call will fail for
counters which are
not present.

From above, the RISC-V PMU driver will use most of the
SBI calls at boot time. Only SBI_PMU_COUNTER_START to
be used once
before
using
the counter.
The reading of counter is by reading CSR (for
CSR_Number <
0xfff) OR
by reading memory location (for CSR_Offset >= 0xfff).
The counter overflow handling will have to be done in
software by Linux kernel.

Using the SBI PMU extension, the M-mode runtime
firmware (or
Hypervisors) can provide a standardized view of
HARDWARE/SOFTWARE counters and events to S-mode (or
VS-
mode)
software.

The M-mode runtime firmware (OpenSBI) will need to
know following platform dependent information:
1. Possible event_idx values allowed (or supported) by
a HARDWARE
counter (i.e. HPMCOUNTER) 2. Mapping of event_idx
for HARDWARE event to HPMEVENT
CSR
value
3.
Mapping of event_idx for HARDWARE CACHE event to
HPMEVENT
CSR
value 4.
Mapping of event_idx for HARDWARE RAW event to
HPMEVENT
CSR
value
5.
Additional platform-specific progamming required by
any event_idx

All platform dependent information mentioned above,
can be obtained by M-mode runtime firmware (OpenSBI)
from
platform
specific code.
The
DT/ACPI can also be used to described 1), 2), 3), and
4) mentioned above but 5) will always require platform
specific code.
I would update the next version of DT file to describe
the points from
1) to 4). Thanks.
As you mentioned before, it would be hard to sync the
platform specific code with the DT of real use.
I prefer to get 1), 2), 3) and 4) from DT first on each
platform, and use platform specific code if DT is
unavailable. (generic platform use DT certainly), then
we could maximally reduce the inconsistency.
It should platform's choice on how it wants to describe
HARDWARE events and HARDWARE counters. The OpenSBI
generic
platform will tend to use DT based parsing of HARDWARE
events and HARDWARE counters
but
other platform can do things differently.

The S-mode software (i.e. Linux) should not get HARDWARE
events and HARDWARE counters from DT because DT describes
HARDWARE
and DT
will
not include SOFTWARE events and SOFTWARE counters.
Also, SOFTWARE events and SOFTWARE counters will change
for given platform as OpenSBI continues to improve so it
will be hard to keep the DT in sync.

The best thing for S-mode software would be to depend on
one method of discovering all counters and supported
events which is the SBI_PMU_COUNTER_DESCRIBE call. In
other words, no need for platform driver for Linux RISC-V
PMU driver instead depend only on
sbi_probe_extension() to detect SBI PMU extension.
OK, make sense.

Regards,
Anup



--
Regards,
Atish
Regards,
Anup
--
Regards,
Atish


Join {tech-unixplatformspec@lists.riscv.org to automatically receive all group messages.