Date   

Re: P extension instruction opcode encoding allocation

Allen Baum
 

Ken Dockser wrote a document on instruction encoding guidelines, but not the actual values of the minor/major opcodes, or sub-minor (functX) fields.
Attached
 

On Tue, Aug 11, 2020 at 7:33 AM mark <markhimelstein@...> wrote:
Krste has said (correct me if I am wrong) that the unpriv SC owns the opcode space. I know there is a lot of overlap in the SC members but I suggest we get that SC officially in the loop.

I have CC'ed them here. 

Unpriv SC, do you have a preferred process for proposing new instruction opcode space, format, etc.? If not, I suggest you establish one.

Mark


On Tue, Aug 11, 2020 at 2:30 AM Chuanhua Chang <chchang@...> wrote:

P extension instructions need to allocate opcode encoding space officially in the OP opcode space or other major opcode (such as reserved opcode).

What is the best way to decide on this and officially allocate encoding space for the P extension instructions?

 

Thanks.

 

Chuanhua


Re: P extension instruction opcode encoding allocation

mark
 

Krste has said (correct me if I am wrong) that the unpriv SC owns the opcode space. I know there is a lot of overlap in the SC members but I suggest we get that SC officially in the loop.

I have CC'ed them here. 

Unpriv SC, do you have a preferred process for proposing new instruction opcode space, format, etc.? If not, I suggest you establish one.

Mark


On Tue, Aug 11, 2020 at 2:30 AM Chuanhua Chang <chchang@...> wrote:

P extension instructions need to allocate opcode encoding space officially in the OP opcode space or other major opcode (such as reserved opcode).

What is the best way to decide on this and officially allocate encoding space for the P extension instructions?

 

Thanks.

 

Chuanhua


csrrc/csrrs with mip, sip and uip

Simon Davidmann Imperas
 

We posted this on https://groups.google.com/a/groups.riscv.org/g/isa-dev/
but had no response in 2 weeks - so maybe this is a better place:
Looking forward to a response.
Simon

The Privileged Architecture specification describes special behavior for mip.SEIP as follows:
 
When mip is read with a CSR instruction, the value of the SEIP bit returned in the rd destination register is the logical-OR of the software-writable bit and the interrupt signal from the interrupt controller. However, the value used in the read-modify-write sequence of a CSRRS or CSRRC instruction contains only the software-writable SEIP bit, ignoring the interrupt value from the external interrupt controller.
 
The SEIP field behavior is designed to allow a higher privilege layer to mimic external interrupts cleanly, without losing any real external interrupts. The behavior of the CSR instructions is slightly modified from regular CSR accesses as a result.
 
I think this description needs improvement, because the intent is not fully clear for SEIP, or other bits. In particular:
1. What about other set-pending bits that are writable by software? For example, if the N extension is implemented, how do mip.UEIP and sip.UEIP behave?
2. For which bits does any externally-asserted interrupt contribute to the result seen in rd for csrrc or csrrs? For example, would the external value of mip.SEIP contribute to rd in this case, or is just the software-writable bit value seen?
 
As a general case, imagine that:
1. A system using the N extension is being used;
2. All interrupts are delegated to their lowest possible privilege level using mideleg and sideleg;
3. All interrupts are disabled;
4. Interrupts MEISEIUEIMTISTI and UTI are all asserted externally (so csrr t1, mip returns 0xbb0).
5. No software pending bits are set for these interrupts.
 
Given the above set up, what value is observed in t1 in each of these cases:
 
li      t0, 0
csrrc   t1, mip, t0    // t1 = ???
csrrc   t1, sip, t0    // t1 = ???
csrrc   t1, uip, t0    // t1 = ???
 
(Note that no CSR state is changed by these instructions - only the result in t1 is of interest.)
 
And given the same set up, which (if any) software-writable bits are set by these instructions:
 
li      t0, 0
csrrs   t1, mip, t0
csrrs   t1, sip, t0
 
(In other words, what externally-asserted interrupt signal values are propagated to software-writable bits?)
 
Thanks.
 


P extension instruction opcode encoding allocation

Chuanhua Chang
 

P extension instructions need to allocate opcode encoding space officially in the OP opcode space or other major opcode (such as reserved opcode).

What is the best way to decide on this and officially allocate encoding space for the P extension instructions?

 

Thanks.

 

Chuanhua


Re: Small tweak to Privileged spec regarding PMP management?

Greg Favor
 

One could argue that the current spec and the sentence in question (with or without the suggested modification), is clear in calling out when an sfence.vma is not required.  But I agree that adding a short non-normative note would avoid any chance of misunderstanding.

Greg


On Mon, Aug 10, 2020 at 12:00 AM Allen Baum <allen.baum@...> wrote:
Do you want to add more detail about the page-based virtual memory being disabled case? 
    (that some implementations may require sfence.vma, depending on whether they do XXX with their TLB)?
That would be non-normative, but will alert designers about this corner case.

On Sun, Aug 9, 2020 at 11:45 PM Greg Favor <gfavor@...> wrote:
In section 3.6.2 of the Privileged spec discussing changing PMP settings, it currently says:
"If page-based virtual memory is not implemented, or when it is disabled, memory accesses check the PMP settings synchronously, so no fence is needed."

I would like to suggest removing "or when it is disabled" and just say:
"If page-based virtual memory is not implemented, memory accesses check the PMP settings synchronously, so no fence is needed."

The motivation is that high-performance implementations that support page-based virtual memory have TLBs and want to use them to handle all fetch/load/store memory accesses as they go down load/store execution pipelines during all modes of execution - including while in M-mode.  In the case of M mode, they would effectively just be caching PMA/PMP permission/access control info (as well as identity address mappings).

For designs that implement page-based virtual memory and use their TLBs as described (which is generally true in high-performance designs), not requiring that M-mode software do an sfence.vma after a series of PMP CSR writes means that these CSR writes cannot simply be implemented as CSR writes, but instead each PMP CSR write needs to also perform a heavyweight sfence.vma operation.  This is both heavily redundant (across a series of PMP writes) and is unnatural for an aggressive o-o-o design RISC design in which an sfence.vma operation really is a very strong fencing operation as well as TLB invalidation operation.  (Put differently, a key point of RISC architecture is to simplify hardware in ways that software can easily and efficiently support.)

Given that M-mode software runs a lot of implementation-specific code (including code related with PMA and PMP management), this spec tweak allows for some implementations to simplify their hardware design and include an sfence.vma in their M-mode PMP CSR writing code (while other implementations can choose to not include an sfence.vma in their M-mode code).  But also note that all designs need to at least selectively do an sfence.vma (per section 3.6.2), so this essentially means that the M-mode code would simply always do an sfence.vma after a series of PMP writes.

Lastly note that this change is backward compatible in that software that does do an sfence.vma after PMP changes will run on "old" designs that support page-based virtual memory yet access PMA's and PMP's inline with load/store execution while in M-mode.

Any objections to this simple accomodation for high-performance CPU designs?

Greg


Re: Small tweak to Privileged spec regarding PMP management?

Allen Baum
 

Do you want to add more detail about the page-based virtual memory being disabled case? 
    (that some implementations may require sfence.vma, depending on whether they do XXX with their TLB)?
That would be non-normative, but will alert designers about this corner case.

On Sun, Aug 9, 2020 at 11:45 PM Greg Favor <gfavor@...> wrote:
In section 3.6.2 of the Privileged spec discussing changing PMP settings, it currently says:
"If page-based virtual memory is not implemented, or when it is disabled, memory accesses check the PMP settings synchronously, so no fence is needed."

I would like to suggest removing "or when it is disabled" and just say:
"If page-based virtual memory is not implemented, memory accesses check the PMP settings synchronously, so no fence is needed."

The motivation is that high-performance implementations that support page-based virtual memory have TLBs and want to use them to handle all fetch/load/store memory accesses as they go down load/store execution pipelines during all modes of execution - including while in M-mode.  In the case of M mode, they would effectively just be caching PMA/PMP permission/access control info (as well as identity address mappings).

For designs that implement page-based virtual memory and use their TLBs as described (which is generally true in high-performance designs), not requiring that M-mode software do an sfence.vma after a series of PMP CSR writes means that these CSR writes cannot simply be implemented as CSR writes, but instead each PMP CSR write needs to also perform a heavyweight sfence.vma operation.  This is both heavily redundant (across a series of PMP writes) and is unnatural for an aggressive o-o-o design RISC design in which an sfence.vma operation really is a very strong fencing operation as well as TLB invalidation operation.  (Put differently, a key point of RISC architecture is to simplify hardware in ways that software can easily and efficiently support.)

Given that M-mode software runs a lot of implementation-specific code (including code related with PMA and PMP management), this spec tweak allows for some implementations to simplify their hardware design and include an sfence.vma in their M-mode PMP CSR writing code (while other implementations can choose to not include an sfence.vma in their M-mode code).  But also note that all designs need to at least selectively do an sfence.vma (per section 3.6.2), so this essentially means that the M-mode code would simply always do an sfence.vma after a series of PMP writes.

Lastly note that this change is backward compatible in that software that does do an sfence.vma after PMP changes will run on "old" designs that support page-based virtual memory yet access PMA's and PMP's inline with load/store execution while in M-mode.

Any objections to this simple accomodation for high-performance CPU designs?

Greg


Small tweak to Privileged spec regarding PMP management?

Greg Favor
 

In section 3.6.2 of the Privileged spec discussing changing PMP settings, it currently says:
"If page-based virtual memory is not implemented, or when it is disabled, memory accesses check the PMP settings synchronously, so no fence is needed."

I would like to suggest removing "or when it is disabled" and just say:
"If page-based virtual memory is not implemented, memory accesses check the PMP settings synchronously, so no fence is needed."

The motivation is that high-performance implementations that support page-based virtual memory have TLBs and want to use them to handle all fetch/load/store memory accesses as they go down load/store execution pipelines during all modes of execution - including while in M-mode.  In the case of M mode, they would effectively just be caching PMA/PMP permission/access control info (as well as identity address mappings).

For designs that implement page-based virtual memory and use their TLBs as described (which is generally true in high-performance designs), not requiring that M-mode software do an sfence.vma after a series of PMP CSR writes means that these CSR writes cannot simply be implemented as CSR writes, but instead each PMP CSR write needs to also perform a heavyweight sfence.vma operation.  This is both heavily redundant (across a series of PMP writes) and is unnatural for an aggressive o-o-o design RISC design in which an sfence.vma operation really is a very strong fencing operation as well as TLB invalidation operation.  (Put differently, a key point of RISC architecture is to simplify hardware in ways that software can easily and efficiently support.)

Given that M-mode software runs a lot of implementation-specific code (including code related with PMA and PMP management), this spec tweak allows for some implementations to simplify their hardware design and include an sfence.vma in their M-mode PMP CSR writing code (while other implementations can choose to not include an sfence.vma in their M-mode code).  But also note that all designs need to at least selectively do an sfence.vma (per section 3.6.2), so this essentially means that the M-mode code would simply always do an sfence.vma after a series of PMP writes.

Lastly note that this change is backward compatible in that software that does do an sfence.vma after PMP changes will run on "old" designs that support page-based virtual memory yet access PMA's and PMP's inline with load/store execution while in M-mode.

Any objections to this simple accomodation for high-performance CPU designs?

Greg


Proposed WG: RISC V needs CMOs, and hence a CMO Working Group

Andy Glew Si5
 

RISC V needs CMOs, and hence a CMO Working Group





All successful computer instruction sets have Cache Management Operations (CMOs).

Several RISC-V systems have already defined implementation specific CMO instructions. It is desirable to have standard CMO instructions to facilitate portable software.

CMOs do things like flushing dirty data and invalidating clean data for use cases that include non-coherent DMA I/O, security (e.g. Spectre), power management (flush to battery backed-up DRAM), persistence (flush to NVRAM), and more.

CMOs cut across several problem domains. It is desirable to have a consistent approach, rather than different idiosyncratic instructions for different problem domains. RISC-V therefore needs a CMO working group that will coordinate with any working groups in those overlapping domains.



Administrivia

2020/8/5: Email proposing this will soon be sent to the RISC-V Technical Steering Committee and other mailing lists, seeking approval of the formation of such a CMO working group.

Here linked is a wiki version of the WG proposal RISC V needs CMOs, and hence a CMO Working Group. Also a CMOs WG Draft Proposed Charter - although probably too long.

Assuming the CMO WG is approved:

Please indicate if you are interested by replying to this email (to me, Andy Glew). To faciliate scheduling of meetings, please indicate timezone.

A risc.org mailing list should be set up soon.

We have already set up https://github.com/riscv/riscv-CMOs, and will arrange permissions for working group members as soon as possible.

Here linked is a CMOs WG Draft Proposed Charter.

Proposals:

  • At least one CMO proposal has been developed in some detail. It is linked to from https://github.com/riscv/riscv-CMOs, and may soon be moved to this official place.
  • We welcome: Other proposals, and/or examples of implementation specific CMO extensions already implemented




I look forward to meeting other folks interested in CMOs!


--- Sorry: Typos (Speech-Os?) Writing Errors <= Speech Recognition <= Computeritis


Re: A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

Jonathan Behrens <behrensj@...>
 

I'd also strongly argue for there only being a single configuration for both virtualized and non-virtualized systems. The fewer different cases that software has to handle, the better for everyone. This is especially true for smaller projects like research/teaching operating systems, which don't have the resources of something like the Linux kernel.

There is a long history of paravirtualization (virtualization where the guest kernel knows it is in a VM), but that is generally only done in cases where the original interface was poorly designed, or for exposing features that don't make sense on actual hardware. I don't think it should be used here.

Jonathan


On Wed, Aug 5, 2020 at 5:23 AM Anup Patel via lists.riscv.org <anup.patel=wdc.com@...> wrote:

Hi Alan,

 

My statement “Allowing S-mode to write HPMCOUNTER CSR is good but won’t benefit much.” is because:

  1. Linux PMU updates counter value in-frequently only in start() callback
  2. The SBI_PMU_COUNTER_START/STOP calls will be used to turn-on and turn-off counting in start() and stop() callbacks respectively
  3. Due to point1 and point2, we can easily set the HPMCOUNTER value in SBI implementation via SBI_PMU_COUNTER_START call.

 

Regards,

Anup

 

From: tech-privileged@... <tech-privileged@...> On Behalf Of alankao
Sent: 05 August 2020 14:11
To: tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

 

Hi Anup,

 

> The “bypass-sbi” DT property will break QEMU virt machine

 

No, it won’t.  Why should QEMU virt machine’s PMU follow this flag?  The platform can totally choose not to support the attribute of its PMU.

 

 

Of course, I know the SBI route works.  If you don’t see the benefit, based on what did you say the feature is good? 

Your words were “Allowing S-mode to write HPMCOUNTER CSR is good but won’t benefit much.” So the issue here is if it will benefit much or marginally.

Writing from S-mode can skip the whole M-mode part every time the kernel wants to write CSRs.  Isn't that obvious?

 

Also, you didn't comment on my final part yet.

 

Thanks,

Alan


Re: A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

Anup Patel
 

Hi Alan,

 

My statement “Allowing S-mode to write HPMCOUNTER CSR is good but won’t benefit much.” is because:

  1. Linux PMU updates counter value in-frequently only in start() callback
  2. The SBI_PMU_COUNTER_START/STOP calls will be used to turn-on and turn-off counting in start() and stop() callbacks respectively
  3. Due to point1 and point2, we can easily set the HPMCOUNTER value in SBI implementation via SBI_PMU_COUNTER_START call.

 

Regards,

Anup

 

From: tech-privileged@... <tech-privileged@...> On Behalf Of alankao
Sent: 05 August 2020 14:11
To: tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

 

Hi Anup,

 

> The “bypass-sbi” DT property will break QEMU virt machine

 

No, it won’t.  Why should QEMU virt machine’s PMU follow this flag?  The platform can totally choose not to support the attribute of its PMU.

 

 

Of course, I know the SBI route works.  If you don’t see the benefit, based on what did you say the feature is good? 

Your words were “Allowing S-mode to write HPMCOUNTER CSR is good but won’t benefit much.” So the issue here is if it will benefit much or marginally.

Writing from S-mode can skip the whole M-mode part every time the kernel wants to write CSRs.  Isn't that obvious?

 

Also, you didn't comment on my final part yet.

 

Thanks,

Alan


Re: A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

alankao
 

Hi Anup,

 

> The “bypass-sbi” DT property will break QEMU virt machine

 

No, it won’t.  Why should QEMU virt machine’s PMU follow this flag?  The platform can totally choose not to support the attribute of its PMU.

 

 

Of course, I know the SBI route works.  If you don’t see the benefit, based on what did you say the feature is good? 

Your words were “Allowing S-mode to write HPMCOUNTER CSR is good but won’t benefit much.” So the issue here is if it will benefit much or marginally.

Writing from S-mode can skip the whole M-mode part every time the kernel wants to write CSRs.  Isn't that obvious?

 

Also, you didn't comment on my final part yet.

 

Thanks,

Alan


Re: A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

Anup Patel
 

The “bypass-sbi” DT property will break QEMU virt machine for KVM because same QEMU virt machine is used with both TCG and KVM acceleration. This is yet another work-around for doing things differently for HS-mode and VS-mode in Linux PMU driver because kernel has no way of knowing which mode kernel is running (HS-mode or VS-mode).

 

The SBI_PMU_COUNTER_START() call can’t be by-passed because the bits which turn-on or turn-off counting are accessible to M-mode only. This SBI_PMU_COUNTER_START() call also takes counter start value provided by Linux PMU framework. We can easily write the counter start value in SBI implementation (M-mode runtime firmware) instead of writing it in S-mode. I still don’t see the benefit of directly writing counter value in S-mode.

 

Regards,

Anup

 

From: tech-privileged@... <tech-privileged@...> On Behalf Of alankao
Sent: 05 August 2020 13:16
To: tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

 

> I think I am repeating myself here but still don’t see any benefit of allowing HPMCOUNTER CSR write access to S-mode. On the contrary, it will make context switching expensive for hypervisors.
I totally am aware of that.  That's why I don't think the same route should be taken in a guest OS.  But since the kernel should not be able to have that knowledge, that is not possible.  I am sorry that I mentioned a condition that confused everyone who shows many interests in H extension. 

Then, please allow me to refine my statement in my previous comment:

> (I asked) Is there any way for a kernel to detect if it is in S-mode or VS-mode?

The answer is no, and should always be not allowed to.

> (I derived) If so, then just don't go that route when writing HPM CSRs.  

But, I would like to claim that this is still possible.  As follows,

> (I stated) Just like what we did with CONFIG_FPU, I suggest we can set CONFIG_RESTRICT_HPM_REG_ACCESS or something true by default, and detect if there is some attribute like "bypass-sbi" in a PMU node.  With static key feature and the followups, this runtime check is not expensive.

Let me elaborate more. The CONFIG_FPU is true by default.  During system booting, the ISA string in hart nodes in the DTB is checked.  If "fd" is presented, then a global variable has_fpu is set to true; otherwise, false.  Later, during signal handling or context switch, has_fpu is checked.

Why cannot we apply the same path? The CONFIG_RESTRICT_HPM_REG_ACCESS is true by default.  During system booting, the "bypass-sbi" attribute in PMU nodes in the DTB is checked.  If presented, then a global variable mhpm_writable is set to true; otherwise, false.  Later, in PMU life cycles, the variable is checked.  With that flag, the way to update to the desired registers can be decided.  Now the key is in DTB, you can decide that the PMU you are using contains no extra attributes like "bypass-sbi".


Re: A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

alankao
 

> I think I am repeating myself here but still don’t see any benefit of allowing HPMCOUNTER CSR write access to S-mode. On the contrary, it will make context switching expensive for hypervisors.
I totally am aware of that.  That's why I don't think the same route should be taken in a guest OS.  But since the kernel should not be able to have that knowledge, that is not possible.  I am sorry that I mentioned a condition that confused everyone who shows many interests in H extension. 

Then, please allow me to refine my statement in my previous comment:

> (I asked) Is there any way for a kernel to detect if it is in S-mode or VS-mode?

The answer is no, and should always be not allowed to.

> (I derived) If so, then just don't go that route when writing HPM CSRs.  

But, I would like to claim that this is still possible.  As follows,

> (I stated) Just like what we did with CONFIG_FPU, I suggest we can set CONFIG_RESTRICT_HPM_REG_ACCESS or something true by default, and detect if there is some attribute like "bypass-sbi" in a PMU node.  With static key feature and the followups, this runtime check is not expensive.

Let me elaborate more. The CONFIG_FPU is true by default.  During system booting, the ISA string in hart nodes in the DTB is checked.  If "fd" is presented, then a global variable has_fpu is set to true; otherwise, false.  Later, during signal handling or context switch, has_fpu is checked.

Why cannot we apply the same path? The CONFIG_RESTRICT_HPM_REG_ACCESS is true by default.  During system booting, the "bypass-sbi" attribute in PMU nodes in the DTB is checked.  If presented, then a global variable mhpm_writable is set to true; otherwise, false.  Later, in PMU life cycles, the variable is checked.  With that flag, the way to update to the desired registers can be decided.  Now the key is in DTB, you can decide that the PMU you are using contains no extra attributes like "bypass-sbi".


Re: A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

Anup Patel
 

Yes, there is no way for kernel to know whether it is running in HS-mode or VS-mode. We would like to keep it that way.

 

Regards,

Anup

 

From: tech-privileged@... <tech-privileged@...> On Behalf Of Allen Baum
Sent: 05 August 2020 11:19
To: alankao <alankao@...>
Cc: tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

 

I may be dense (I won't take a poll on that though), but if a kernel can detect that it is in S-mode vs. VSMode, that sounds like a buggy virtualization scheme. The kernel should (appear to) be in whatever mode it thinks it is, even if it is really in VU mode (with appropriate hypervisor work, and  performance penalties, etc).

 

On Tue, Aug 4, 2020 at 9:52 PM alankao <alankao@...> wrote:

Hi Anup,

> Linux PMU driver framework only updates counter value in “add()” or “start()” callback. That’s why allow S-mode write HPMCOUNTER CSRs won’t provide much benefit.

It doesn't matter where the kernel does the update.  What matters is how often kernel does add() and start().  Anyway, it may take a while but we will do the experiment mentioned in previous thread to give real evidence here.

> We should avoid a kernel feature which needs to be explicitly enabled by users and distros keeping it disabled by default. The “#ifdef” based feature checking should be replaced by runtime feature checking based on device tree OR something else.

Is there any way for a kernel to detect if it is in S-mode or VS-mode?  If so, then just don't go that route when writing HPM CSRs.  Just like what we did with CONFIG_FPU, I suggest we can set CONFIG_RESTRICT_HPM_REG_ACCESS or something true by default, and detect if there is some attribute like "bypass-sbi" in a PMU node.  With static key feature and the followups, this runtime check is not expensive.


Re: A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

Anup Patel
 

Hi Alan,

 

I think I am repeating myself here but still don’t see any benefit of allowing HPMCOUNTER CSR write access to S-mode. On the contrary, it will make context switching expensive for hypervisors.

 

The SBI PMU extension is designed such that kernel need not be aware of HS-mode or VS-mode. For Guest kernel (VS-mode), SBI is provided by Hypervisor so Hypervisor will act as mediator for Guest kernel. For Host kernel (HS-mode), SBI provider is M-mode runtime firmware (OpenSBI). The Linux PMU driver should not

care which mode it is running (HS-mode or VS-mode). The same rationale applies here, we use single Linux RISC-V kernel image for both Guest and Host.

 

Regards,

Anup

 

From: tech-privileged@... <tech-privileged@...> On Behalf Of alankao
Sent: 05 August 2020 10:23
To: tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

 

Hi Anup,

> Linux PMU driver framework only updates counter value in “add()” or “start()” callback. That’s why allow S-mode write HPMCOUNTER CSRs won’t provide much benefit.

It doesn't matter where the kernel does the update.  What matters is how often kernel does add() and start().  Anyway, it may take a while but we will do the experiment mentioned in previous thread to give real evidence here.

> We should avoid a kernel feature which needs to be explicitly enabled by users and distros keeping it disabled by default. The “#ifdef” based feature checking should be replaced by runtime feature checking based on device tree OR something else.

Is there any way for a kernel to detect if it is in S-mode or VS-mode?  If so, then just don't go that route when writing HPM CSRs.  Just like what we did with CONFIG_FPU, I suggest we can set CONFIG_RESTRICT_HPM_REG_ACCESS or something true by default, and detect if there is some attribute like "bypass-sbi" in a PMU node.  With static key feature and the followups, this runtime check is not expensive.


Re: A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

Allen Baum
 

I may be dense (I won't take a poll on that though), but if a kernel can detect that it is in S-mode vs. VSMode, that sounds like a buggy virtualization scheme. The kernel should (appear to) be in whatever mode it thinks it is, even if it is really in VU mode (with appropriate hypervisor work, and  performance penalties, etc).

On Tue, Aug 4, 2020 at 9:52 PM alankao <alankao@...> wrote:
Hi Anup,

> Linux PMU driver framework only updates counter value in “add()” or “start()” callback. That’s why allow S-mode write HPMCOUNTER CSRs won’t provide much benefit.

It doesn't matter where the kernel does the update.  What matters is how often kernel does add() and start().  Anyway, it may take a while but we will do the experiment mentioned in previous thread to give real evidence here.

> We should avoid a kernel feature which needs to be explicitly enabled by users and distros keeping it disabled by default. The “#ifdef” based feature checking should be replaced by runtime feature checking based on device tree OR something else.

Is there any way for a kernel to detect if it is in S-mode or VS-mode?  If so, then just don't go that route when writing HPM CSRs.  Just like what we did with CONFIG_FPU, I suggest we can set CONFIG_RESTRICT_HPM_REG_ACCESS or something true by default, and detect if there is some attribute like "bypass-sbi" in a PMU node.  With static key feature and the followups, this runtime check is not expensive.


Re: A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

alankao
 

Hi Anup,

> Linux PMU driver framework only updates counter value in “add()” or “start()” callback. That’s why allow S-mode write HPMCOUNTER CSRs won’t provide much benefit.

It doesn't matter where the kernel does the update.  What matters is how often kernel does add() and start().  Anyway, it may take a while but we will do the experiment mentioned in previous thread to give real evidence here.

> We should avoid a kernel feature which needs to be explicitly enabled by users and distros keeping it disabled by default. The “#ifdef” based feature checking should be replaced by runtime feature checking based on device tree OR something else.

Is there any way for a kernel to detect if it is in S-mode or VS-mode?  If so, then just don't go that route when writing HPM CSRs.  Just like what we did with CONFIG_FPU, I suggest we can set CONFIG_RESTRICT_HPM_REG_ACCESS or something true by default, and detect if there is some attribute like "bypass-sbi" in a PMU node.  With static key feature and the followups, this runtime check is not expensive.


Re: A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

Anup Patel
 

HI Alan,

 

I never said HPM overflow interrupt is not important. The MHPMOVERFLOW CSR proposed by Greg is perfectly fine.

 

I think you missed my point regarding H-extension. If S-mode is allowed to directly write to HPMCOUNTER CSRs then for H-Extension we will need additional VSHPMCOUNTER CSRs to allow Hypervisor to context-switch. We can avoid lot of these CSRs by keeping HPMCOUNTER CSRs read-only for S-mode. The initialization/restoration of HPMCOUNTER value can be done SBI_PMU_COUNTER_START call and this integrates well with Linux PMU driver framework too. The Linux PMU driver framework only updates counter value in “add()” or “start()” callback. That’s why allow S-mode write HPMCOUNTER CSRs won’t provide much benefit.

 

Regarding single Linux RISC-V image for all platforms, this is a requirement from various distros and Linux RISC-V maintainers. We should avoid a kernel feature which needs to be explicitly enabled by users and distros keeping it disabled by default. The “#ifdef” based feature checking should be replaced by runtime feature checking based on device tree OR something else.

 

Regards,

Anup

 

From: tech-privileged@... <tech-privileged@...> On Behalf Of alankao
Sent: 05 August 2020 07:47
To: tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] A proposal to enhance RISC-V HPM (Hardware Performance Monitor)

 

Hi Anup,

> The most desired feature from a PMU is counting events in right-context (or right-mode). This is not clearly defined in RISC-V spec right now. Greg’s proposal already address this in a clean way by defining required bits in MHPMEVENT CSRs.  Other important feature expected from a PMU is reading counters without traps, this is already available because HPMCOUNTER CSRs are “User-Read-Only”

 
Claiming some features as "most desired" is too subjective.  I agree that mode-specific counting is important, but for performance monitoring, the HPM interrupt is also essential.  Otherwise, sampling like `perf record` just doesn't work.

> Regarding HPMCOUNTER writes from S-mode, the Linux PMU drivers (across architectures) only writes PMU counter at time of configuring the counter. We anyway have SBI call to configure a RISC-V PMU counter because MHPMEVENT CSR is for M-mode only so it is better to initialize/write the MHPMCOUNTER CSR in M-mode at time of configuring the counter. Allowing S-mode to write HPMCOUNTER CSR is good but won’t benefit much. On the contrary, RISC-V hypervisors might end up save/restore more CSRs if HPMCOUTNER CSR is writeable from S-mode.

You are understating the case when you say "only writes counter at configuration time."  It sounds like the kernel seldom writes them.  The fact is, the counters and other registers need writing every time the corresponding process is being context-switch in and every time an HPM interrupt is being handled.  Would you like to elaborate more, based on what do you say writing counters from S-mode is good? based on what do you judge it won't benefit much?  

However, since RISC-V doesn't have any discussed features yet, I doubt that anyone has any quantitative data.  My proposal here is that I will take our existing solution (AndeStar V5 extension and perf-event port on Linux 4.17) as the testbed.  By default, we have *mcounterwen*, which is effectively equal to the bit[28] in Greg's proposal, to enable S-mode writing HPM CSRs. This is the treatment group.  Then we do a patch to transform all existing csr_write's to HPM CSRs (including counters) into SBI calls as the control group.  My anticipation of the result is that the wall clock time performing a sampling in the treatment group will be not just marginally shorter than the control group.

Meanwhile, I agree with your concern about H extension.  That's why I emphasized this feature3 is useful for M-S-U configuration and questionable for M-H-S-U one.

> The code snippet mentioned below requires “#ifdef” which means we have to build Linux RISC-V image differently for doing CSR writes this way. This approach is not going to fly for distros because distros can’t release single Linux RISC-V image for all RISC-V hardware if we have such “#ifdef”.

Each distro maintains its own priority of hundreds of thousands of kernel features, not to mention many nameless "distributions" released by different teams as their BSPs do the same thing.  The diversity of features is the reason that so many distributions rise and fall, compete and cooperate.  Therefore, what we should debate is not what distros that support RISC-V should do with this possible divergence: I am totally fine that this CONFIG_RESTRICT_MREG_ACESS is off by default!  Big ones like Fedora and Debian aim at Desktop or Server, and that's good.  What we should really debate here is the feature itself, if it is useful enough for some, not all, possible RISC-V machines that help to make people's lives easier.  

For the record, distributions can just release a single image that disables this feature by default.  That's their choice because they expect quite a ratio of it will run as a guest OS, and it should not enable the feature or there will be a lot more work in the hypervisor.  The image can still run on any RISC-V machines that either or not support bit[28]/mcounterwen.  You are not making a valid point here.

Best,
Alan


Re: P extension fixed-point saturation flag CSR

Chuanhua Chang
 

Thanks. I will update the P extension specification to use the vxsat CSR.

Best Regards,
Chuanhua


Re: P extension fixed-point saturation flag CSR

Krste Asanovic
 

Yes, should share the flag.  Code that has an outer check for saturation to check if re-scaling needed, won’t want to care whether it was caused by scalar or vector routine.

Krste

On Aug 4, 2020, at 7:54 PM, Andrew Waterman <andrew@...> wrote:

Analogous to the F and V extensions sharing fflags/frm, I’d think that P and V would share vxsat/vxrm.

The reasoning is slightly different from the floating-point CSRs; in that case, sharing the registers helps vectorized code to efficiently implement C99 semantics for dynamic rounding mode and exception flags. That concern doesn’t apply to fixed point (at least not today). But the other advantages still apply, namely, smaller context and fewer CSRs.

On Tue, Aug 4, 2020 at 7:36 PM Chuanhua Chang <chchang@...> wrote:
P extension specification has defined a fixed-point saturation flag CSR. We need to allocate it officially in the user standard read/write address range. Should we use the same CSR for V and P extension? The current vector extension spec defines a "vxsat" CSR that can be used for both P and V extensions.

Best Regards,
Chuanhua



841 - 860 of 1130