Proposal : Hart Suspend Extension for IDLE


liush@...
 

Hi ALL,
Since there is no extension in the specification that can apply to the cpuidle module now, I propose adding a new extension as Aunp also mentioned in the linux mailing list. The new extension is used to move a hart or cluster into a lower power state, which indicates that no current work for it until a new event wakes it up.

Here I propose two solutions, welcome to discuss:
solution 1:
Similar to the CPU_SUSPEND interface defined in the PSCI specifications, create a new extension in the SBI specification -
HART SUSPEND Extension,Extension ID:0x485350(HSP)

solution 2:
Given that the idle state belongs to the power consumption state of hart,add a new function in the
HSM(Hart State Management Extension),shown as follows:

Function Name         |    Function ID    |    Extension ID
sbi_hart_start       |      0           |    0x48534D
sbi_hart_stop        |      1           |    0x48534D
sbi_hart_get_status  |      2           |    0x48534D
sbi_hart_suspend     |      3           |    0x48534


andrew@...
 

The WFI instruction can serve this purpose.  Furthermore, the M-mode architecture has a mechanism (mstatus.TW) to trap WFI instructions, and the hypervisor extension proposes something analogous.  Between those two, I'm not sure a new SBI call is necessary: what would the SBI call do that WFI + TW cannot do?

Note also that there's a proposed HINT instruction for short-duration idling.  This is what I expect the cpu_idle loop to use, at least for the first several thousand iterations: https://lists.riscv.org/g/tech-unprivileged/topic/pause_hint_instruction/76890707


On Thu, Sep 24, 2020 at 10:48 PM <liush@...> wrote:
Hi ALL,
Since there is no extension in the specification that can apply to the cpuidle module now, I propose adding a new extension as Aunp also mentioned in the linux mailing list. The new extension is used to move a hart or cluster into a lower power state, which indicates that no current work for it until a new event wakes it up.

Here I propose two solutions, welcome to discuss:
solution 1:
Similar to the CPU_SUSPEND interface defined in the PSCI specifications, create a new extension in the SBI specification -
HART SUSPEND Extension,Extension ID:0x485350(HSP)

solution 2:
Given that the idle state belongs to the power consumption state of hart,add a new function in the
HSM(Hart State Management Extension),shown as follows:

Function Name         |    Function ID    |    Extension ID
sbi_hart_start       |      0           |    0x48534D
sbi_hart_stop        |      1           |    0x48534D
sbi_hart_get_status  |      2           |    0x48534D
sbi_hart_suspend     |      3           |    0x48534


liush@...
 

As far as I know, either "WFI" or "PAUSE" is an instruction which only move hart into a low-power state without power off. For the other deeper low-power states,however, there is no extension to handle the status requests. For example, the idle state is divided into 3 levels (C0/C1/C2) where only C0 means WFI state. Some hardware operations are required  to achieve C1/C2, such as shutting down the bus, clock, and power supply.With reference to other architectures, these operations should be performed by Runtime Firmware. 
To this end, a new extension in the sbi specification is necessary to handle C1/C2 level idle status requests.


Anup Patel
 

Hi Andrew,

 

Generally there are two categories of CPU idle power modes: 1) state preserving 2) state non-preserving. A state preserving CPU idle power mode will preserve internal micro-architectural state (registers, caches, and other state machines) whereas a state non-preserving CPU idle power mode will not preserve some (or all) internal micro-architectural state. The WFI and HINT instructions put CPU in a state preserving idle power mode whereas SBI HSM HART STOP call will put CPU in a state non-preserving idle power mode.

 

The power savings in state non-preserving idle power modes is higher compared to state preserving idle power modes but it takes more time to wake-up/resume from a state non-preserving idle power mode. This means turning off power to a CPU (i.e. SBI HSM HART STOP) will lead to maximum power savings but it will take more time to wake-up/resume whereas executing WFI/HINT on a CPU will give some power savings but it will be take much less time to wake-up/resume.

 

We can easily envision following options for S-mode software to change CPU power mode:

  1. WFI/HINT
  2. SBI HART SUSPEND (with a parameter to specify exact platform specific CPU idle power mode)
  3. SBI HART STOP

 

From above, both 2) and 3) will put CPU in a state non-preserving idle power mode and M-mode firmware will have a platform specific way to achieve both 2) and 3).  There is no defined SUSPEND state across architectures and CPU implementations so 2) will always put CPU in a platform specific CPU idle power mode. Also, supporting 2) and/or 3) is not mandatory for any RISC-V platform because M-mode firmware (or Hypervisor) can always functionally emulate 2) and 3) using a software state machine.

 

Generally, the Intel/x86 CPU C-states for CPU idle power management are used as reference when defining platform specific CPU idle power modes for option 2) above.

 

The Linux kernel has very mature CPU idle framework. We just need to define option 2) considering both x86 and ARM64 as reference so that RISC-V platforms/vendors have a way to integrate their platform specific CPU idle power modes.

 

Regards,

Anup

 

From: tech-unixplatformspec@... <tech-unixplatformspec@...> On Behalf Of Andrew Waterman
Sent: 25 September 2020 11:53
To: liush@...
Cc: tech-unixplatformspec@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal : Hart Suspend Extension for IDLE

 

The WFI instruction can serve this purpose.  Furthermore, the M-mode architecture has a mechanism (mstatus.TW) to trap WFI instructions, and the hypervisor extension proposes something analogous.  Between those two, I'm not sure a new SBI call is necessary: what would the SBI call do that WFI + TW cannot do?

 

Note also that there's a proposed HINT instruction for short-duration idling.  This is what I expect the cpu_idle loop to use, at least for the first several thousand iterations: https://lists.riscv.org/g/tech-unprivileged/topic/pause_hint_instruction/76890707

 

On Thu, Sep 24, 2020 at 10:48 PM <liush@...> wrote:

Hi ALL,
Since there is no extension in the specification that can apply to the cpuidle module now, I propose adding a new extension as Aunp also mentioned in the linux mailing list. The new extension is used to move a hart or cluster into a lower power state, which indicates that no current work for it until a new event wakes it up.

Here I propose two solutions, welcome to discuss
solution 1
Similar to the CPU_SUSPEND interface defined in the PSCI specifications, create a new extension in the SBI specification -
HART SUSPEND ExtensionExtension ID0x485350(HSP)

solution 2
Given that the idle state belongs to the power consumption state of hartadd a new function in the
HSMHart State Management Extension)shown as follows

Function Name         |    Function ID    |    Extension ID
sbi_hart_start       |      0           |    0x48534D
sbi_hart_stop        |      1           |    0x48534D
sbi_hart_get_status  |      2           |    0x48534D
sbi_hart_suspend     |      3           |    0x48534


andrew@...
 

Thanks for the additional context, Anup.  This wasn't clear to me from the original post, but I agree an SBI call is appropriate for deeper idle states than WFI.


On Sun, Sep 27, 2020 at 2:13 AM Anup Patel <Anup.Patel@...> wrote:

Hi Andrew,

 

Generally there are two categories of CPU idle power modes: 1) state preserving 2) state non-preserving. A state preserving CPU idle power mode will preserve internal micro-architectural state (registers, caches, and other state machines) whereas a state non-preserving CPU idle power mode will not preserve some (or all) internal micro-architectural state. The WFI and HINT instructions put CPU in a state preserving idle power mode whereas SBI HSM HART STOP call will put CPU in a state non-preserving idle power mode.

 

The power savings in state non-preserving idle power modes is higher compared to state preserving idle power modes but it takes more time to wake-up/resume from a state non-preserving idle power mode. This means turning off power to a CPU (i.e. SBI HSM HART STOP) will lead to maximum power savings but it will take more time to wake-up/resume whereas executing WFI/HINT on a CPU will give some power savings but it will be take much less time to wake-up/resume.

 

We can easily envision following options for S-mode software to change CPU power mode:

  1. WFI/HINT
  2. SBI HART SUSPEND (with a parameter to specify exact platform specific CPU idle power mode)
  3. SBI HART STOP

 

From above, both 2) and 3) will put CPU in a state non-preserving idle power mode and M-mode firmware will have a platform specific way to achieve both 2) and 3).  There is no defined SUSPEND state across architectures and CPU implementations so 2) will always put CPU in a platform specific CPU idle power mode. Also, supporting 2) and/or 3) is not mandatory for any RISC-V platform because M-mode firmware (or Hypervisor) can always functionally emulate 2) and 3) using a software state machine.

 

Generally, the Intel/x86 CPU C-states for CPU idle power management are used as reference when defining platform specific CPU idle power modes for option 2) above.

 

The Linux kernel has very mature CPU idle framework. We just need to define option 2) considering both x86 and ARM64 as reference so that RISC-V platforms/vendors have a way to integrate their platform specific CPU idle power modes.

 

Regards,

Anup

 

From: tech-unixplatformspec@... <tech-unixplatformspec@...> On Behalf Of Andrew Waterman
Sent: 25 September 2020 11:53
To: liush@...
Cc: tech-unixplatformspec@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal : Hart Suspend Extension for IDLE

 

The WFI instruction can serve this purpose.  Furthermore, the M-mode architecture has a mechanism (mstatus.TW) to trap WFI instructions, and the hypervisor extension proposes something analogous.  Between those two, I'm not sure a new SBI call is necessary: what would the SBI call do that WFI + TW cannot do?

 

Note also that there's a proposed HINT instruction for short-duration idling.  This is what I expect the cpu_idle loop to use, at least for the first several thousand iterations: https://lists.riscv.org/g/tech-unprivileged/topic/pause_hint_instruction/76890707

 

On Thu, Sep 24, 2020 at 10:48 PM <liush@...> wrote:

Hi ALL,
Since there is no extension in the specification that can apply to the cpuidle module now, I propose adding a new extension as Aunp also mentioned in the linux mailing list. The new extension is used to move a hart or cluster into a lower power state, which indicates that no current work for it until a new event wakes it up.

Here I propose two solutions, welcome to discuss
solution 1
Similar to the CPU_SUSPEND interface defined in the PSCI specifications, create a new extension in the SBI specification -
HART SUSPEND ExtensionExtension ID0x485350(HSP)

solution 2
Given that the idle state belongs to the power consumption state of hartadd a new function in the
HSMHart State Management Extension)shown as follows

Function Name         |    Function ID    |    Extension ID
sbi_hart_start       |      0           |    0x48534D
sbi_hart_stop        |      1           |    0x48534D
sbi_hart_get_status  |      2           |    0x48534D
sbi_hart_suspend     |      3           |    0x48534


Greg Favor
 

On Sun, Sep 27, 2020 at 2:13 AM Anup Patel <anup.patel@...> wrote:

We can easily envision following options for S-mode software to change CPU power mode:

  1. WFI/HINT
  2. SBI HART SUSPEND (with a parameter to specify exact platform specific CPU idle power mode)
  3. SBI HART STOP

 

From above, both 2) and 3) will put CPU in a state non-preserving idle power mode and M-mode firmware will have a platform specific way to achieve both 2) and 3).  


If I remember right, some power management frameworks (e.g. ACPI) define a series of "C-states" that include both state preserving and non-preserving sleep states.  For the former category, there may be more than just the "shallow" WFI state, i.e. SBI HART SUSPEND doesn't necessarily have to be a non-state-preserving power state.

Taking a quick look at the power states defined in ARM SBSA, one has:
- Run
- Idle_standby       state-preserving
- Idle_retention     state-preserving
- Sleep                  non-state-preserving
- Off                       non-state-preserving

In short, the SBI HART SUSPEND parameter would specify different requested "suspend" power states - some that may preserve state, and some that may not.
 
Greg

There is no defined SUSPEND state across architectures and CPU implementations so 2) will always put CPU in a platform specific CPU idle power mode. Also, supporting 2) and/or 3) is not mandatory for any RISC-V platform because M-mode firmware (or Hypervisor) can always functionally emulate 2) and 3) using a software state machine.

 

Generally, the Intel/x86 CPU C-states for CPU idle power management are used as reference when defining platform specific CPU idle power modes for option 2) above.

 

The Linux kernel has very mature CPU idle framework. We just need to define option 2) considering both x86 and ARM64 as reference so that RISC-V platforms/vendors have a way to integrate their platform specific CPU idle power modes.

 

Regards,

Anup



Anup Patel
 

Hi Greg,

 

Yes, you are correct. The SBI HART SUSPEND does not have to be state non-preserving. My previous description about SBI HART SUSPEND was a bit over simplified.

 

The ACPI “C-states” are Intel/x86 C-states. The C0-state is similar to WFI/HINT on RISC-V and for other C-states we can have the SBI HART SUSPEND call.

 

Regarding high-level states defined in ARM SBSA, I think “Idle_retention” and “Sleep” will fall under SBI HART SUSPEND call for RISC-V.

 

A parameter to specify exact suspend state to SBI HART SUSPEND call will be certainly required.

 

Now we have two things to be defined here:

  1. SBI HSM HART SUSPEND call (with a parameter to specify platform specific CPU idle power mode)
  2. Discovering supported SBI HART SUSPEND modes for given platform (Device Tree ?? ACPI ?? SBI Calls ??)

 

Suggestions ??

Any volunteers to propose #1 and #2 above ?

 

Regards,

Anup

 

From: Greg Favor <gfavor@...>
Sent: 29 September 2020 06:03
To: Anup Patel <Anup.Patel@...>
Cc: Andrew Waterman <andrew@...>; liush@...; tech-unixplatformspec@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal : Hart Suspend Extension for IDLE

 

On Sun, Sep 27, 2020 at 2:13 AM Anup Patel <anup.patel@...> wrote:

We can easily envision following options for S-mode software to change CPU power mode:

  1. WFI/HINT
  2. SBI HART SUSPEND (with a parameter to specify exact platform specific CPU idle power mode)
  3. SBI HART STOP

 

From above, both 2) and 3) will put CPU in a state non-preserving idle power mode and M-mode firmware will have a platform specific way to achieve both 2) and 3).  

 

If I remember right, some power management frameworks (e.g. ACPI) define a series of "C-states" that include both state preserving and non-preserving sleep states.  For the former category, there may be more than just the "shallow" WFI state, i.e. SBI HART SUSPEND doesn't necessarily have to be a non-state-preserving power state.

 

Taking a quick look at the power states defined in ARM SBSA, one has:

- Run

- Idle_standby       state-preserving

- Idle_retention     state-preserving

- Sleep                  non-state-preserving

- Off                       non-state-preserving

 

In short, the SBI HART SUSPEND parameter would specify different requested "suspend" power states - some that may preserve state, and some that may not.

 

Greg

 

There is no defined SUSPEND state across architectures and CPU implementations so 2) will always put CPU in a platform specific CPU idle power mode. Also, supporting 2) and/or 3) is not mandatory for any RISC-V platform because M-mode firmware (or Hypervisor) can always functionally emulate 2) and 3) using a software state machine.

 

Generally, the Intel/x86 CPU C-states for CPU idle power management are used as reference when defining platform specific CPU idle power modes for option 2) above.

 

The Linux kernel has very mature CPU idle framework. We just need to define option 2) considering both x86 and ARM64 as reference so that RISC-V platforms/vendors have a way to integrate their platform specific CPU idle power modes.

 

Regards,

Anup

 


Greg Favor
 

On Tue, Sep 29, 2020 at 1:29 AM Anup Patel <Anup.Patel@...> wrote:

Now we have two things to be defined here:

  1. SBI HSM HART SUSPEND call (with a parameter to specify platform specific CPU idle power mode)
Mirroring (very roughly) ARM SBSA and x86 C-states supported nowadays, I would maybe suggest the following power states between "Run" (aka C0) and "Off" (aka C6).  Only the two "Sleep" states matter to this SBI call, and they are both state-preserving:

- Run                    "C0"
- Idle (i.e. WFI)     "C1"
- Sleep                 "C3 sub-state"
- Deep Sleep        "C3 sub-state"
- Off                      "C6"

Many implementations may support only one Sleep state.  Low-power designs may support both.  (Option for additional "custom" parameter values should also be supported in the SBI call.)

Among these Sleep states, things like switching to a min operational voltage and frequency, switching to a retention voltage, flushing caches, etc. will come into play (as well as shutting off all clocking to the core).  But the actual meaning in a system for these states would be implementation-specific.
 
  1. Discovering supported SBI HART SUSPEND modes for given platform (Device Tree ?? ACPI ?? SBI Calls ??)
How does ARMv8 and x86 handle this?  Blindly I would imagine we would do the same.  I'm also guessing the answer is "both DT and ACPI" since some systems use DT and some use ACPI (unless the platform spec standardizes on just one of these).

Greg


atishp@...
 

On Tue, 2020-09-29 at 08:29 +0000, Anup Patel wrote:
Hi Greg,

Yes, you are correct. The SBI HART SUSPEND does not have to be state
non-preserving. My previous description about SBI HART SUSPEND was a
bit over simplified.

The ACPI “C-states” are Intel/x86 C-states. The C0-state is similar
to WFI/HINT on RISC-V and for other C-states we can have the SBI HART
SUSPEND call.

Regarding high-level states defined in ARM SBSA, I think
“Idle_retention” and “Sleep” will fall under SBI HART SUSPEND call
for RISC-V.

A parameter to specify exact suspend state to SBI HART SUSPEND call
will be certainly required.

Now we have two things to be defined here:
SBI HSM HART SUSPEND call (with a parameter to specify platform
specific CPU idle power mode)
Discovering supported SBI HART SUSPEND modes for given platform
(Device Tree ?? ACPI ?? SBI Calls ??)
For #2, I think device tree/ACPI would be a better choice. It's already
well defined for ARM. Obviously, it needs to be adopted for RISC-V.

https://elixir.bootlin.com/linux/v5.9-rc7/source/Documentation/devicetree/bindings/arm/idle-states.yaml


Suggestions ??
Any volunteers to propose #1 and #2 above ?

Regards,
Anup

From: Greg Favor <gfavor@...>
Sent: 29 September 2020 06:03
To: Anup Patel <Anup.Patel@...>
Cc: Andrew Waterman <andrew@...>; liush@...;
tech-unixplatformspec@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal : Hart Suspend
Extension for IDLE

On Sun, Sep 27, 2020 at 2:13 AM Anup Patel <anup.patel@...>
wrote:
We can easily envision following options for S-mode software to
change CPU power mode:
WFI/HINT
SBI HART SUSPEND (with a parameter to specify exact platform
specific CPU idle power mode)
SBI HART STOP

From above, both 2) and 3) will put CPU in a state non-preserving
idle power mode and M-mode firmware will have a platform specific
way to achieve both 2) and 3).

If I remember right, some power management frameworks (e.g. ACPI)
define a series of "C-states" that include both state preserving and
non-preserving sleep states. For the former category, there may be
more than just the "shallow" WFI state, i.e. SBI HART SUSPEND doesn't
necessarily have to be a non-state-preserving power state.

Taking a quick look at the power states defined in ARM SBSA, one has:
- Run
- Idle_standby state-preserving
- Idle_retention state-preserving
- Sleep non-state-preserving
- Off non-state-preserving

In short, the SBI HART SUSPEND parameter would specify different
requested "suspend" power states - some that may preserve state, and
some that may not.

Greg

There is no defined SUSPEND state across architectures and CPU
implementations so 2) will always put CPU in a platform specific
CPU idle power mode. Also, supporting 2) and/or 3) is not mandatory
for any RISC-V platform because M-mode firmware (or Hypervisor) can
always functionally emulate 2) and 3) using a software state
machine.

Generally, the Intel/x86 CPU C-states for CPU idle power management
are used as reference when defining platform specific CPU idle
power modes for option 2) above.

The Linux kernel has very mature CPU idle framework. We just need
to define option 2) considering both x86 and ARM64 as reference so
that RISC-V platforms/vendors have a way to integrate their
platform specific CPU idle power modes.

Regards,
Anup
--
Regards,
Atish


liu shaohua <liush@...>
 

Hi Anup,
I have gained a lot from your information. Thank you very much.

1. SBI HSM HART SUSPEND call (with a parameter to specify platform specific CPU idle power mode)
In addition to "power_state", do we consider adding an address parameter “entrypoint”.It can be used to specify the adress where code execution resumes when a program is waken up. I think whether or not to configure the address parameters may affect the software flow after wake-up. Assuming that the address parameters are not configured, the runtime firmware may need to restore the state in all modes. If the address is configured, when waking up from the idle state, the runtime firmware only restores the state related to the M mode, and the OS restores the state related to the S mode.
> 2. Discovering supported SBI HART SUSPEND modes for given platform (Device Tree ?? ACPI ?? SBI Calls ??)
Do we need to add this additional configuration?
I learned that arm gets the parameters of Cx state from dts to fill the internal idle structure. If the Cx status parameter is not configured in the dts, it means that there will be no platform-related idle implementation, and the linux driver will not perform deeper idle behaviors.


Regards,

liu shaohua



Anup Patel
 

Hi Liu,

 

Regarding #1, let’s have a detailed draft proposal for SBI HSM HART SUSPEND call. You can initially include the “entrypoint” and “context” parameters for SUSPEND call and we will see what everyone in this list thinks about the parameters. Does this sound okay ?

 

Regarding #2, I suggest we go with Cx state parsing from device tree initially (and eventually add it to ACPI as well). In fact, we can share device tree parsing code between Linux RISC-V and Linux ARM/ARM64.

 

Regards,

Anup

 

From: tech-unixplatformspec@... <tech-unixplatformspec@...> On Behalf Of liu shaohua
Sent: 30 September 2020 10:52
To: tech-unixplatformspec@...
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal : Hart Suspend Extension for IDLE

 

Hi Anup,
I have gained a lot from your information. Thank you very much.

> 1. SBI HSM HART SUSPEND call (with a parameter to specify platform specific CPU idle power mode)
In addition to "power_state", do we consider adding an address parameter “entrypoint”.It can be used to specify the adress where code execution resumes when a program is waken up. I think whether or not to configure the address parameters may affect the software flow after wake-up. Assuming that the address parameters are not configured, the runtime firmware may need to restore the state in all modes. If the address is configured, when waking up from the idle state, the runtime firmware only restores the state related to the M mode, and the OS restores the state related to the S mode.
> 2. Discovering supported SBI HART SUSPEND modes for given platform (Device Tree ?? ACPI ?? SBI Calls ??)
Do we need to add this additional configuration?
I learned that arm gets the parameters of Cx state from dts to fill the internal idle structure. If the Cx status parameter is not configured in the dts, it means that there will be no platform-related idle implementation, and the linux driver will not perform deeper idle behaviors.


Regards,

liu shaohua