Proposal : Hart Suspend Extension for IDLE
liush@...
Hi ALL,
Since there is no extension in the specification that can apply to the cpuidle module now, I propose adding a new extension as Aunp also mentioned in the linux mailing list. The new extension is used to move a hart or cluster into a lower power state, which indicates that no current work for it until a new event wakes it up. Here I propose two solutions, welcome to discuss: solution 1: Similar to the CPU_SUSPEND interface defined in the PSCI specifications, create a new extension in the SBI specification - HART SUSPEND Extension,Extension ID:0x485350(HSP) solution 2: Given that the idle state belongs to the power consumption state of hart,add a new function in the HSM(Hart State Management Extension),shown as follows: Function Name | Function ID | Extension ID sbi_hart_start | 0 | 0x48534D sbi_hart_stop | 1 | 0x48534D sbi_hart_get_status | 2 | 0x48534D sbi_hart_suspend | 3 | 0x48534 |
|
andrew@...
The WFI instruction can serve this purpose. Furthermore, the M-mode architecture has a mechanism (mstatus.TW) to trap WFI instructions, and the hypervisor extension proposes something analogous. Between those two, I'm not sure a new SBI call is necessary: what would the SBI call do that WFI + TW cannot do? Note also that there's a proposed HINT instruction for short-duration idling. This is what I expect the cpu_idle loop to use, at least for the first several thousand iterations: https://lists.riscv.org/g/tech-unprivileged/topic/pause_hint_instruction/76890707 On Thu, Sep 24, 2020 at 10:48 PM <liush@...> wrote: Hi ALL, |
|
liush@...
As far as I know, either "WFI" or "PAUSE" is an instruction which only move hart into a low-power state without power off. For the other deeper low-power states,however, there is no extension to handle the status requests. For example, the idle state is divided into 3 levels (C0/C1/C2) where only C0 means WFI state. Some hardware operations are required to achieve C1/C2, such as shutting down the bus, clock, and power supply.With reference to other architectures, these operations should be performed by Runtime Firmware.
To this end, a new extension in the sbi specification is necessary to handle C1/C2 level idle status requests. |
|
Hi Andrew,
Generally there are two categories of CPU idle power modes: 1) state preserving 2) state non-preserving. A state preserving CPU idle power mode will preserve internal micro-architectural state (registers, caches, and other state machines) whereas a state non-preserving CPU idle power mode will not preserve some (or all) internal micro-architectural state. The WFI and HINT instructions put CPU in a state preserving idle power mode whereas SBI HSM HART STOP call will put CPU in a state non-preserving idle power mode.
The power savings in state non-preserving idle power modes is higher compared to state preserving idle power modes but it takes more time to wake-up/resume from a state non-preserving idle power mode. This means turning off power to a CPU (i.e. SBI HSM HART STOP) will lead to maximum power savings but it will take more time to wake-up/resume whereas executing WFI/HINT on a CPU will give some power savings but it will be take much less time to wake-up/resume.
We can easily envision following options for S-mode software to change CPU power mode:
From above, both 2) and 3) will put CPU in a state non-preserving idle power mode and M-mode firmware will have a platform specific way to achieve both 2) and 3). There is no defined SUSPEND state across architectures and CPU implementations so 2) will always put CPU in a platform specific CPU idle power mode. Also, supporting 2) and/or 3) is not mandatory for any RISC-V platform because M-mode firmware (or Hypervisor) can always functionally emulate 2) and 3) using a software state machine.
Generally, the Intel/x86 CPU C-states for CPU idle power management are used as reference when defining platform specific CPU idle power modes for option 2) above.
The Linux kernel has very mature CPU idle framework. We just need to define option 2) considering both x86 and ARM64 as reference so that RISC-V platforms/vendors have a way to integrate their platform specific CPU idle power modes.
Regards, Anup
From: tech-unixplatformspec@... <tech-unixplatformspec@...>
On Behalf Of Andrew Waterman
Sent: 25 September 2020 11:53 To: liush@... Cc: tech-unixplatformspec@... Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal : Hart Suspend Extension for IDLE
The WFI instruction can serve this purpose. Furthermore, the M-mode architecture has a mechanism (mstatus.TW) to trap WFI instructions, and the hypervisor extension proposes something analogous. Between those two, I'm not sure a new SBI call is necessary: what would the SBI call do that WFI + TW cannot do?
Note also that there's a proposed HINT instruction for short-duration idling. This is what I expect the cpu_idle loop to use, at least for the first several thousand iterations: https://lists.riscv.org/g/tech-unprivileged/topic/pause_hint_instruction/76890707
On Thu, Sep 24, 2020 at 10:48 PM <liush@...> wrote:
|
|
andrew@...
Thanks for the additional context, Anup. This wasn't clear to me from the original post, but I agree an SBI call is appropriate for deeper idle states than WFI. On Sun, Sep 27, 2020 at 2:13 AM Anup Patel <Anup.Patel@...> wrote:
|
|
Greg Favor
On Sun, Sep 27, 2020 at 2:13 AM Anup Patel <anup.patel@...> wrote:
If I remember right, some power management frameworks (e.g. ACPI) define a series of "C-states" that include both state preserving and non-preserving sleep states. For the former category, there may be more than just the "shallow" WFI state, i.e. SBI HART SUSPEND doesn't necessarily have to be a non-state-preserving power state. Taking a quick look at the power states defined in ARM SBSA, one has: - Run - Idle_standby
state-preserving
- Idle_retention state-preserving
- Sleep non-state-preserving
- Off
non-state-preserving
In short, the SBI HART SUSPEND parameter would specify different requested "suspend" power states - some that may preserve state, and some that may not. Greg
|
|
Hi Greg,
Yes, you are correct. The SBI HART SUSPEND does not have to be state non-preserving. My previous description about SBI HART SUSPEND was a bit over simplified.
The ACPI “C-states” are Intel/x86 C-states. The C0-state is similar to WFI/HINT on RISC-V and for other C-states we can have the SBI HART SUSPEND call.
Regarding high-level states defined in ARM SBSA, I think “Idle_retention” and “Sleep” will fall under SBI HART SUSPEND call for RISC-V.
A parameter to specify exact suspend state to SBI HART SUSPEND call will be certainly required.
Now we have two things to be defined here:
Suggestions ?? Any volunteers to propose #1 and #2 above ?
Regards, Anup
From: Greg Favor <gfavor@...>
Sent: 29 September 2020 06:03 To: Anup Patel <Anup.Patel@...> Cc: Andrew Waterman <andrew@...>; liush@...; tech-unixplatformspec@... Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal : Hart Suspend Extension for IDLE
On Sun, Sep 27, 2020 at 2:13 AM Anup Patel <anup.patel@...> wrote:
If I remember right, some power management frameworks (e.g. ACPI) define a series of "C-states" that include both state preserving and non-preserving sleep states. For the former category, there may be more than just the "shallow" WFI state, i.e. SBI HART SUSPEND doesn't necessarily have to be a non-state-preserving power state.
Taking a quick look at the power states defined in ARM SBSA, one has: - Run - Idle_standby state-preserving - Idle_retention state-preserving - Sleep non-state-preserving - Off non-state-preserving
In short, the SBI HART SUSPEND parameter would specify different requested "suspend" power states - some that may preserve state, and some that may not.
Greg
|
|
Greg Favor
On Tue, Sep 29, 2020 at 1:29 AM Anup Patel <Anup.Patel@...> wrote:
Mirroring (very roughly) ARM SBSA and x86 C-states supported nowadays, I would maybe suggest the following power states between "Run" (aka C0) and "Off" (aka C6). Only the two "Sleep" states matter to this SBI call, and they are both state-preserving: - Run "C0" - Idle (i.e. WFI) "C1" - Sleep "C3 sub-state" - Deep Sleep "C3 sub-state"
- Off "C6" Many implementations may support only one Sleep state. Low-power designs may support both. (Option for additional "custom" parameter values should also be supported in the SBI call.) Among these Sleep states, things like switching to a min operational voltage and frequency, switching to a retention voltage, flushing caches, etc. will come into play (as well as shutting off all clocking to the core). But the actual meaning in a system for these states would be implementation-specific.
How does ARMv8 and x86 handle this? Blindly I would imagine we would do the same. I'm also guessing the answer is "both DT and ACPI" since some systems use DT and some use ACPI (unless the platform spec standardizes on just one of these). Greg |
|
atishp@...
On Tue, 2020-09-29 at 08:29 +0000, Anup Patel wrote:
Hi Greg,For #2, I think device tree/ACPI would be a better choice. It's already well defined for ARM. Obviously, it needs to be adopted for RISC-V. https://elixir.bootlin.com/linux/v5.9-rc7/source/Documentation/devicetree/bindings/arm/idle-states.yaml Suggestions ??-- Regards, Atish |
|
liu shaohua <liush@...>
Hi Anup,
I have gained a lot from your information. Thank you very much. > 1. SBI HSM HART SUSPEND call (with a parameter to specify platform specific CPU idle power mode) In addition to "power_state", do we consider adding an address parameter “entrypoint”.It can be used to specify the adress where code execution resumes when a program is waken up. I think whether or not to configure the address parameters may affect the software flow after wake-up. Assuming that the address parameters are not configured, the runtime firmware may need to restore the state in all modes. If the address is configured, when waking up from the idle state, the runtime firmware only restores the state related to the M mode, and the OS restores the state related to the S mode. > 2. Discovering supported SBI HART SUSPEND modes for given platform (Device Tree ?? ACPI ?? SBI Calls ??) Do we need to add this additional configuration? I learned that arm gets the parameters of Cx state from dts to fill the internal idle structure. If the Cx status parameter is not configured in the dts, it means that there will be no platform-related idle implementation, and the linux driver will not perform deeper idle behaviors.
liu shaohua |
|
Hi Liu,
Regarding #1, let’s have a detailed draft proposal for SBI HSM HART SUSPEND call. You can initially include the “entrypoint” and “context” parameters for SUSPEND call and we will see what everyone in this list thinks about the parameters. Does this sound okay ?
Regarding #2, I suggest we go with Cx state parsing from device tree initially (and eventually add it to ACPI as well). In fact, we can share device tree parsing code between Linux RISC-V and Linux ARM/ARM64.
Regards, Anup
From: tech-unixplatformspec@... <tech-unixplatformspec@...>
On Behalf Of liu shaohua
Sent: 30 September 2020 10:52 To: tech-unixplatformspec@... Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal : Hart Suspend Extension for IDLE
Hi Anup,
liu shaohua
|
|