Re: Proposal for accelerating nested virtualization on RISC-V

Anup Patel
Hi Jonathan,
All cases for CSR accesses have not been thought through (initial draft) and written out. Regarding WARL CSR with hardwired bits, the HW will always read/write fixed values of hardwired bits in memory.
I totally agree with you on the two times overhead of illegal instruction trap. The illegal instruction trap is not delegated to S-mode because M-mode (i.e. OpenSBI) emulates S-mode access to TIME, HTIMEDELTA, CYCLE, INSTRET, and other
COUNTER CSRs. I am not sure if we can totally get rid off illegal instruction trap handling from OpenSBI because quite a few HW out there don’t have TIME CSRs and other CSRs for accessed from S-mode. Currently OpenSBI emulates TIME CSR for HS-mode, U-mode,
VS-mode and VU-mode. The nested virtualization acceleration will certainly help.
I forgot to mention that implementation can choose to not implement nested virtualization acceleration and hardwire HNESTED CSR to zero.
Regards,
Anup
toggle quoted message
Show quoted text
From: Jonathan Behrens <behrensj@...>
Sent: 17 March 2020 18:58
To: Anup Patel <Anup.Patel@...>
Cc: tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] Proposal for accelerating nested virtualization on RISC-V
Your description of un-accelerated nested virtualization seems workable to me. I'm less sure of the proposal to avoid trapping on h<xyz> and vs<xyz> accesses. Aren't you going to run into issues with any WARL CSR that has hardwired bits?
I'd like to point out another performance pitfall with trap-and-emulate that I've mentioned before but might not be obvious from reading your proposal: the illegal instruction traps triggered by the guest trying to use hypervisor CSRs or
run hypervisor instructions will not trap directly to HS-mode. Rather they will be routed to M-mode and then get forwarded to HS-mode, which has about
two times higher overhead (forwarding a trap is at least as expensive as emulating most instructions). It is also quite avoidable by adding a bit to let M-mode delegate traps from legal but privileged instructions executed in U/VS/VU modes.
A clarification is required in RISC-V H-Extension spec regarding scope
of HSTATUS.VTVM bit. Currently as-per the spec, all virtual memory
management instructions (both SFENCEs and HFENCEs) will trap to
HS-mode when HSTATUS.VTVM == 1 and V == 1. Rather, only SFENCEs are
required to be trapped to HS-mode when HSTATUS.VTVM == 1 and V == 1
because HFENCEs are only defined for HS-mode (i.e. V==0).
To better describe nested virtualization, we define following dummy
privilege modes:
Host HS-mode
Host hypervisor kernel will run in this mode
Software in this mode will actually run in HW HS-mode
Host U-mode
Host hypervisor user-space will run in this mode
Software in this mode will actually run in HW U-mode
Guest HS-mode
Guest hypervisor kernel will run in this mode
Software in this mode will actually run in HW VS-mode
Guest U-mode => HW VU-mode
Guest hypervisor user-space will run in this mode
Software in this mode will actually run in HW VU-mode
Guest VS-mode => HW VS-mode
Software in this mode will actually run in HW VS-mode
Guest VU-mode => HW VU-mode
Software in this mode will actually run in HW VU-mode
A high-level software approach for nested virtualization in RISC-V
can be as follows:
1. The Host HS-mode (Host hypervisor) will enable HSTATUS.VTSR to
emulate SRET instruction for Guest. This emulation will involve
a CSR world-switch when switching from Guest HS/U-mode to/from
Guest VS/VU-mode.
2. Virtual interrupts will be injected to Guest VS/VU-mode after
doing CSR world-switch (in point1 above) from Guest HS/U-mode
to Guest VS/VU-mode.
3. All accesses to "h<xyz>" and "vs<xyz>" from Guest will trap to
Host HS-mode (Host hypervisor) where:
a) These CSRs will emulated for Guest HS-mode
b) For Guest U-mode and Guest VS/VU-mode, the trap will
be forwarded to Guest HS-mode
4. The Host HS-mode (Host hypervisor) will manage two Stage2 page
tables:
a) Regular Stage2 page table for Guest HS/U-mode
b) Shadow Stage2 page table for Guest VS/VU-mode. Of course,
Host HS-mode (host hypervisor) will have to do software walk
of Guest HS-mode HGATP page table when populating mappings in
Shadow Stage2 page table and it will have mappings which are
combined effect of Guest HS-mode HGATP page table and Regular
Stage2 page table.
5. All HFENCEs will trap to Host HS-mode where the Host HS-mode
(Host hypervisor) will:
a) Trap-n-emulate HFENCE.VVMA and HFENCE.GVMA for Guest HS-mode
b) Redirect HFENCE.VVMA and HFENCE.GVMA traps from Guest VS-mode
to Guest HS-mode irrespective to Guest HS-mode HSTATUS.VTVM
6. All HLV/HSV instructions from Guest HS/U-mode and Guest VS/VU-mode
will trap to Host HS-mode (Host hypervisor) where:
a) HLV/HSV instruction from Guest HS/U-mode will be emulated
by Host HS-mode (Host hypervisor)
b) HLV/HSV instruction from Guest VS/VU-mode will be forwarded
to Guest HS-mode by Host HS-mode (Host hypervisor)
Please suggest if any case is not considered in above high-level
software approach for nested virtualization.
Based on above high-level software approach, we propose a way to
accelerate nested virtualization performance by reducing "h<xyz>" and
"vs<xyz>" CSR access traps from VS-mode to HS-mode (point3 above).
As-per our proposal, we convert "h<xyz>" and "vs<xyz>" CSR accesses
From VS-mode as memory accesses relative to a nested context base
(or <nested_context_base>).
The enable bit (or <nested_enable>) for above described CSR accesses
conversion and the <nested_context_base> can be specified via new
HNESTED CSR.
<nested_enable> = HNESTED[0]
<nested_context_base> = HNESTED[XLEN:1] << (log2 (XLEN / 8))
Note: <nested_context_base> address is always machine word aligned
Note: <nested_enable> = 0 means "h<xyz>" and "vs<xyz>" trap to HS-mode
without any CSR accesses conversion
Various "h<xyz>" and "vs<xyz>" CSRs are accessed at <csr_nested_offset>
relative to <nested_context_base> based on their CSR number as follows:
CSR number 0x2xx
<csr_nested_offset> = 0x0000 + ((CSR_number & 0xff) * (XLEN / 8))
CSR number 0x6xx
<csr_nested_offset> = 0x1000 + ((CSR_number & 0xff) * (XLEN / 8))
CSR number 0xAxx
<csr_nested_offset> = 0x2000 + ((CSR_number & 0xff) * (XLEN / 8))
CSR number 0xExx
<csr_nested_offset> = 0x3000 + ((CSR_number & 0xff) * (XLEN / 8))
The VS-mode accesses to some of the "h<xyz>" CSRs cannot be converted
into memory accesses due to nature of these CSRs. These CSRs include
HGEIP and HGEIE CSRs (any other CSRs ??).
Accesses to the HNESTED CSR (described above) from VS-mode is also
converted to memory access when <nested_enable> = 1 because the
HNESTED CSR can be safely emulated using nested acceleration.
Best Regards,
Anup Patel
|
|
Handling faults on new HLV/HSV instructions in Hypervisor Extension draft 0.6
When one of the new HLV/HSV instructions faults, what virtualization and privilege modes are recorded in mstatus.mpp/mpv, or in sstatus.spp/spv and hstatus.spvp? Are they based on the actual modes from within which the instruction executes (i.e. on HS or U, and V=0), or on the effective modes used by the instruction as it executes (i.e. on spvp and V=1).
Assuming, for example, the trap is taken into HS-mode:
- If the actual modes apply, then hstatus.spvp remains unchanged and sstatus.spp/spv are set to reflect the actual privilege mode and V=0. The hypervisor would then presumably figure out from htinst what caused this trap? (In certain cases would the hypervisor need to save the original 'spp/spv' values before doing any HLV/HSV instructions so that it can restore them afterwards?)
- If the effective modes apply, then
sstatus.spp and hstatus.spvp are set to the effective privilege mode of the HLV/HSV instruction (as specified by spvp) and sstatus.spv is set to reflect V=1. The hypervisor would then figure out
in some way
(such as from htinst?)
that this was a re-entry into the hypervisor due to its own actions? (Typically all three of these fields would end up not changing in their values. But in certain cases would the hypervisor need to save the original 'spp/spv' values before doing any HLV/HSV instructions so that it can restore them afterwards?)
In any case, which is the intended behavior (which should probably then be clarified in the spec)?
Thanks, Greg
|
|
Re: Handling faults on new HLV/HSV instructions in Hypervisor Extension draft 0.6
Greg Favor wrote: When one of the new HLV/HSV instructions faults, what virtualization and privilege modes are recorded in mstatus.mpp/mpv, or in sstatus.spp/spv and hstatus.spvp? Are they based on the actual modes from within which the instruction executes (i.e. on HS or U, and V=0), or on the effective modes used by the instruction as it executes (i.e. on spvp and V=1). The actual virtualization and privilege modes, same as always. Consider the analogy with memory accesses made in M mode when mstatus.MPRV = 1. The document says that such memory accesses occur "as though the current privilege mode were set to MPP". If such a memory access causes a trap, mstatus.MPP gets set to 3, the actual mode at the time of the trap, not the "as-though" mode. As far as I know, there's never been a question about this for MPRV. Likewise, HLV and HSV are defined as performing memory accesses "as though V = 1". Sounds the same to me. I also think tables 5.6 and 5.7 in section 5.7.2, "Trap Entry", are reasonably unambiguous on this point. Since HLV and HSV aren't said to actually change the current virtualization or privilege modes, I feel it's evident they don't affect what's written to SPV and SPP on a trap. If instead the "effective modes applied", as you put it, then note that SRET would no longer be sufficient to resume from a trap caused by HLV/HSV. (Nor would MRET, if the trap is taken in M mode.) FWIW, there's another clue hidden in this comment in section 5.2.1, "Hypervisor Status Register (hstatus)": For memory faults, GVA is redundant with field SPV (the two bits are set the same) except when the explicit memory access of an HLV, HLVX, or HSV instruction causes a fault. In that case, SPV = 0 but GVA = 1. Note, it says SPV gets set to 0, not 1. Assuming, for example, the trap is taken into HS-mode:
- If the actual modes apply, then hstatus.spvp remains unchanged and sstatus.spp/spv are set to reflect the actual privilege mode and V=0. The hypervisor would then presumably figure out from htinst what caused this trap? Yes. Bit GVA in hstatus might also be helpful. (In certain cases would the hypervisor need to save the original 'spp/spv' values before doing any HLV/HSV instructions so that it can restore them afterwards?) It is generally the case, whenever nested traps might be taken in HS mode, that the hypervisor may need to save sstatus and hstatus before the nested trap could occur, and restore them afterward. That's no different than when an operating system might trigger a nested S-mode-handled trap (like a page fault) by a memory access executed in S mode: the OS may need to save and restore sstatus around such possibilities. The specific situation determines exactly what must be saved and restored. - John Hauser
|
|
Re: Handling faults on new HLV/HSV instructions in Hypervisor Extension draft 0.6
Jonathan Behrens <behrensj@...>
Having SPP/SPV hold the real values makes the most sense to me. The strategy I'd expect hypervisors to use would be to set a bit before issuing any HLV or HSV instructions and clear it after. Then in their page fault handler they'd check if it is set in order to "blame" that fault on the guest and take appropriate action instead of resuming normal execution.
Jonathan
toggle quoted message
Show quoted text
Greg Favor wrote:
> When one of the new HLV/HSV instructions faults, what virtualization and
> privilege modes are recorded in mstatus.mpp/mpv, or in sstatus.spp/spv and
> hstatus.spvp? Are they based on the actual modes from within which the
> instruction executes (i.e. on HS or U, and V=0), or on the effective modes
> used by the instruction as it executes (i.e. on spvp and V=1).
The actual virtualization and privilege modes, same as always.
Consider the analogy with memory accesses made in M mode when
mstatus.MPRV = 1. The document says that such memory accesses occur
"as though the current privilege mode were set to MPP". If such a
memory access causes a trap, mstatus.MPP gets set to 3, the actual mode
at the time of the trap, not the "as-though" mode. As far as I know,
there's never been a question about this for MPRV.
Likewise, HLV and HSV are defined as performing memory accesses "as
though V = 1". Sounds the same to me.
I also think tables 5.6 and 5.7 in section 5.7.2, "Trap Entry", are
reasonably unambiguous on this point. Since HLV and HSV aren't said to
actually change the current virtualization or privilege modes, I feel
it's evident they don't affect what's written to SPV and SPP on a trap.
If instead the "effective modes applied", as you put it, then note
that SRET would no longer be sufficient to resume from a trap caused by
HLV/HSV. (Nor would MRET, if the trap is taken in M mode.)
FWIW, there's another clue hidden in this comment in section 5.2.1,
"Hypervisor Status Register (hstatus)":
For memory faults, GVA is redundant with field SPV (the two bits
are set the same) except when the explicit memory access of an HLV,
HLVX, or HSV instruction causes a fault. In that case, SPV = 0 but
GVA = 1.
Note, it says SPV gets set to 0, not 1.
> Assuming, for example, the trap is taken into HS-mode:
>
> - If the actual modes apply, then hstatus.spvp remains unchanged and
> sstatus.spp/spv are set to reflect the actual privilege mode and V=0. The
> hypervisor would then presumably figure out from htinst what caused this
> trap?
Yes. Bit GVA in hstatus might also be helpful.
> (In certain cases would the hypervisor need to save the original
> 'spp/spv' values before doing any HLV/HSV instructions so that it can
> restore them afterwards?)
It is generally the case, whenever nested traps might be taken in
HS mode, that the hypervisor may need to save sstatus and hstatus
before the nested trap could occur, and restore them afterward. That's
no different than when an operating system might trigger a nested
S-mode-handled trap (like a page fault) by a memory access executed
in S mode: the OS may need to save and restore sstatus around such
possibilities. The specific situation determines exactly what must be
saved and restored.
- John Hauser
|
|
Re: Handling faults on new HLV/HSV instructions in Hypervisor Extension draft 0.6
John, thanks for the full responses. I had suspected the former. But as can sometimes be the case, we were looking at certain parts of the spec and weren't looking at the tables in section 5.7.2.
Given that the general style of the arch spec is to not do heavy cross-referencing, I won't suggest that. And ultimately it was our own fault in not broadly searching for and noticing those tables while in the heat of the moment of answering a question about "hstatus" section 5.2.1 that was raised by one of our designers.
Thanks, Greg
toggle quoted message
Show quoted text
On Mon, Apr 13, 2020 at 7:48 PM John Hauser < jh.riscv@...> wrote: Greg Favor wrote:
> When one of the new HLV/HSV instructions faults, what virtualization and
> privilege modes are recorded in mstatus.mpp/mpv, or in sstatus.spp/spv and
> hstatus.spvp? Are they based on the actual modes from within which the
> instruction executes (i.e. on HS or U, and V=0), or on the effective modes
> used by the instruction as it executes (i.e. on spvp and V=1).
The actual virtualization and privilege modes, same as always.
Consider the analogy with memory accesses made in M mode when
mstatus.MPRV = 1. The document says that such memory accesses occur
"as though the current privilege mode were set to MPP". If such a
memory access causes a trap, mstatus.MPP gets set to 3, the actual mode
at the time of the trap, not the "as-though" mode. As far as I know,
there's never been a question about this for MPRV.
Likewise, HLV and HSV are defined as performing memory accesses "as
though V = 1". Sounds the same to me.
I also think tables 5.6 and 5.7 in section 5.7.2, "Trap Entry", are
reasonably unambiguous on this point. Since HLV and HSV aren't said to
actually change the current virtualization or privilege modes, I feel
it's evident they don't affect what's written to SPV and SPP on a trap.
If instead the "effective modes applied", as you put it, then note
that SRET would no longer be sufficient to resume from a trap caused by
HLV/HSV. (Nor would MRET, if the trap is taken in M mode.)
FWIW, there's another clue hidden in this comment in section 5.2.1,
"Hypervisor Status Register (hstatus)":
For memory faults, GVA is redundant with field SPV (the two bits
are set the same) except when the explicit memory access of an HLV,
HLVX, or HSV instruction causes a fault. In that case, SPV = 0 but
GVA = 1.
Note, it says SPV gets set to 0, not 1.
> Assuming, for example, the trap is taken into HS-mode:
>
> - If the actual modes apply, then hstatus.spvp remains unchanged and
> sstatus.spp/spv are set to reflect the actual privilege mode and V=0. The
> hypervisor would then presumably figure out from htinst what caused this
> trap?
Yes. Bit GVA in hstatus might also be helpful.
> (In certain cases would the hypervisor need to save the original
> 'spp/spv' values before doing any HLV/HSV instructions so that it can
> restore them afterwards?)
It is generally the case, whenever nested traps might be taken in
HS mode, that the hypervisor may need to save sstatus and hstatus
before the nested trap could occur, and restore them afterward. That's
no different than when an operating system might trigger a nested
S-mode-handled trap (like a page fault) by a memory access executed
in S mode: the OS may need to save and restore sstatus around such
possibilities. The specific situation determines exactly what must be
saved and restored.
- John Hauser
|
|
Re: Handling faults on new HLV/HSV instructions in Hypervisor Extension draft 0.6

Anup Patel
toggle quoted message
Show quoted text
-----Original Message----- From: tech-privileged@... <tech-privileged@...> On Behalf Of John Hauser Sent: 14 April 2020 08:17 To: tech-privileged@... Subject: Re: [RISC-V] [tech-privileged] Handling faults on new HLV/HSV instructions in Hypervisor Extension draft 0.6
Greg Favor wrote:
When one of the new HLV/HSV instructions faults, what virtualization and privilege modes are recorded in mstatus.mpp/mpv, or in sstatus.spp/spv and hstatus.spvp? Are they based on the actual modes from within which the instruction executes (i.e. on HS or U, and V=0), or on the effective modes used by the instruction as it executes (i.e. on spvp and V=1).
The actual virtualization and privilege modes, same as always.
Consider the analogy with memory accesses made in M mode when mstatus.MPRV = 1. The document says that such memory accesses occur "as though the current privilege mode were set to MPP". If such a memory access causes a trap, mstatus.MPP gets set to 3, the actual mode at the time of the trap, not the "as-though" mode. As far as I know, there's never been a question about this for MPRV.
Likewise, HLV and HSV are defined as performing memory accesses "as though V = 1". Sounds the same to me.
I also think tables 5.6 and 5.7 in section 5.7.2, "Trap Entry", are reasonably unambiguous on this point. Since HLV and HSV aren't said to actually change the current virtualization or privilege modes, I feel it's evident they don't affect what's written to SPV and SPP on a trap.
If instead the "effective modes applied", as you put it, then note that SRET would no longer be sufficient to resume from a trap caused by HLV/HSV. (Nor would MRET, if the trap is taken in M mode.) This is our understanding as well. The SRET usage will certainly break for hypervisors if STATUS.SPP and HSTATUS.SPV don't point to mode when trap happened. FWIW, there's another clue hidden in this comment in section 5.2.1, "Hypervisor Status Register (hstatus)":
For memory faults, GVA is redundant with field SPV (the two bits are set the same) except when the explicit memory access of an HLV, HLVX, or HSV instruction causes a fault. In that case, SPV = 0 but GVA = 1.
Note, it says SPV gets set to 0, not 1.
Assuming, for example, the trap is taken into HS-mode:
- If the actual modes apply, then hstatus.spvp remains unchanged and sstatus.spp/spv are set to reflect the actual privilege mode and V=0. The hypervisor would then presumably figure out from htinst what caused this trap? Yes. Bit GVA in hstatus might also be helpful.
(In certain cases would the hypervisor need to save the original 'spp/spv' values before doing any HLV/HSV instructions so that it can restore them afterwards?) It is generally the case, whenever nested traps might be taken in HS mode, that the hypervisor may need to save sstatus and hstatus before the nested trap could occur, and restore them afterward. That's no different than when an operating system might trigger a nested S-mode-handled trap (like a page fault) by a memory access executed in S mode: the OS may need to save and restore sstatus around such possibilities. The specific situation determines exactly what must be saved and restored.
Yes, both Xvisor RISC-V and KVM RISC-V will save SSTATUS and HSTATUS In low-level trap entry path. Regards, Anup
|
|
32-bit accesses to mtime/mtimecmp under RV64
The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers. The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers. Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers? Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?
Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register. (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)
Greg
|
|
Re: 32-bit accesses to mtime/mtimecmp under RV64
On Fri, Apr 17, 2020 at 7:00 PM Greg Favor < gfavor@...> wrote: The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers. The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers. Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers? Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?
The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64. I will add the additional sentence that makes this implication explicit.
Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register. (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)
In general, this depends on the peripheral and the platform. We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error. But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.
|
|
Re: 32-bit accesses to mtime/mtimecmp under RV64
On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman < andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:00 PM Greg Favor < gfavor@...> wrote: The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers. The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers. Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers? Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?
The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64. I will add the additional sentence that makes this implication explicit.
Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register. (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)
In general, this depends on the peripheral and the platform. We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error. But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.
Asking this slightly differently (I think) to clarify....
With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.
Cheers, David
|
|
Re: 32-bit accesses to mtime/mtimecmp under RV64
On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman < andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:00 PM Greg Favor < gfavor@...> wrote: The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers. The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers. Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers? Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?
The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64. I will add the additional sentence that makes this implication explicit.
Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register. (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)
In general, this depends on the peripheral and the platform. We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error. But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.
Asking this slightly differently (I think) to clarify....
With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.
The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear:
|
|
Re: 32-bit accesses to mtime/mtimecmp under RV64
On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman < andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman < andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:00 PM Greg Favor < gfavor@...> wrote: The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers. The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers. Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers? Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?
The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64. I will add the additional sentence that makes this implication explicit.
Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register. (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)
In general, this depends on the peripheral and the platform. We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error. But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.
Asking this slightly differently (I think) to clarify....
With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.
The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear:
So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?
Cheers, David
|
|
Re: 32-bit accesses to mtime/mtimecmp under RV64
On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman < andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman < andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:00 PM Greg Favor < gfavor@...> wrote: The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers. The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers. Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers? Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?
The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64. I will add the additional sentence that makes this implication explicit.
Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register. (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)
In general, this depends on the peripheral and the platform. We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error. But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.
Asking this slightly differently (I think) to clarify....
With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.
The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear:
So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?
I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).
|
|
Re: 32-bit accesses to mtime/mtimecmp under RV64

Mark Hill
To widen the question slightly further are there any plans to provide atomic load/store pair operations (128-bits for RV64, 64-bits
for RV32)?
toggle quoted message
Show quoted text
From: tech-privileged@... [mailto:tech-privileged@...]
On Behalf Of Andrew Waterman
Sent: 20 April 2020 23:49
To: David Kruckemyer <dkruckemyer@...>
Cc: Greg Favor <gfavor@...>; tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman <andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:
The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers. The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers. Since this was made specific to RV32, is there an intended implication
in the spec that in RV64 the system must support atomic 64-bit accesses to these registers? Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where
mtime/mtimecmp are located)?
The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64. I will add the additional
sentence that makes this implication explicit.
Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register. (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)
In general, this depends on the peripheral and the platform. We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of
error. But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.
Asking this slightly differently (I think) to clarify....
With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors
to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.
The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear:
So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed
to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?
I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).
|
|
Re: 32-bit accesses to mtime/mtimecmp under RV64

Allen Baum
more mtimecmp questions: - the spec says that an interrupt occurs is posted when the mtime register contains a value greater than or equal to the value in
the mtimecmp register. but doesn't specify that it is *unsigned* greater than or equal.
toggle quoted message
Show quoted text
On Mon, Apr 20, 2020 at 3:48 PM Andrew Waterman < andrew@...> wrote:
On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman < andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman < andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:00 PM Greg Favor < gfavor@...> wrote: The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers. The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers. Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers? Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?
The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64. I will add the additional sentence that makes this implication explicit.
Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register. (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)
In general, this depends on the peripheral and the platform. We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error. But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.
Asking this slightly differently (I think) to clarify....
With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.
The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear:
So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?
I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).
|
|
Re: 32-bit accesses to mtime/mtimecmp under RV64
To widen your question even further Mark (no pun intended), do we need 256 bits for RV128?
Yes, RV128 is a bit speculative, but it does at least rate being in the book, so best to have all the consequences of the request here on the table.
Also, I'm curious what you intend to use the bigger ones for?
The only answer here I know of is emulating CAS with the ticket/epoch/whatever counter next to the actual data element to solve CAS A-B-A problems (which, handily LR/SC naturally avoids anyways).
Is that the one you're after? (Asking because if there's another reason beyond that one, I'm interested in hearing about it).
Derek Williams
toggle quoted message
Show quoted text
----- Original message ----- From: "Dr Mark Hill" <mark.hill@...> Sent by: tech-privileged@... To: Andrew Waterman <andrew@...>, David Kruckemyer <dkruckemyer@...> Cc: Greg Favor <gfavor@...>, "tech-privileged@..." <tech-privileged@...> Subject: [EXTERNAL] Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64 Date: Tue, Apr 21, 2020 1:55 AM
To widen the question slightly further are there any plans to provide atomic load/store pair operations (128-bits for RV64, 64-bits for RV32)?
From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of Andrew Waterman Sent: 20 April 2020 23:49 To: David Kruckemyer <dkruckemyer@...> Cc: Greg Favor <gfavor@...>; tech-privileged@... Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman <andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:
The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers. The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers. Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers? Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?
The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64. I will add the additional sentence that makes this implication explicit.
Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register. (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)
In general, this depends on the peripheral and the platform. We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error. But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.
Asking this slightly differently (I think) to clarify....
With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.
The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear:
So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?
I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).
|
|
Re: 32-bit accesses to mtime/mtimecmp under RV64
more mtimecmp questions: - the spec says that an interrupt occurs is posted when the mtime register contains a value greater than or equal to the value in
the mtimecmp register. but doesn't specify that it is *unsigned* greater than or equal.
On Mon, Apr 20, 2020 at 3:48 PM Andrew Waterman < andrew@...> wrote:
On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman < andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman < andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:00 PM Greg Favor < gfavor@...> wrote: The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers. The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers. Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers? Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?
The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64. I will add the additional sentence that makes this implication explicit.
Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register. (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)
In general, this depends on the peripheral and the platform. We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error. But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.
Asking this slightly differently (I think) to clarify....
With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.
The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear:
So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?
I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).
|
|
Re: 32-bit accesses to mtime/mtimecmp under RV64

Mark Hill
Another possible use case is access sensitive devices, for example a FIFO of 128-bit records with multiple RV64 harts reading from
the FIFO.
toggle quoted message
Show quoted text
From: tech-privileged@... [mailto:tech-privileged@...]
On Behalf Of striker@...
Sent: 22 April 2020 06:21
To: Mark Hill <mark.hill@...>
Cc: andrew@...; dkruckemyer@...; gfavor@...; tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
To widen your question even further Mark (no pun intended), do we need 256 bits for RV128?
Yes, RV128 is a bit speculative, but it does at least rate being in the book, so best to have all the consequences of the request here on the table.
Also, I'm curious what you intend to use the bigger ones for?
The only answer here I know of is emulating CAS with the ticket/epoch/whatever counter next to the actual data element to solve CAS A-B-A problems (which, handily LR/SC naturally
avoids anyways).
Is that the one you're after? (Asking because if there's another reason beyond that one, I'm interested in hearing about it).
----- Original message -----
From: "Dr Mark Hill" <mark.hill@...>
Sent by: tech-privileged@...
To: Andrew Waterman <andrew@...>, David Kruckemyer <dkruckemyer@...>
Cc: Greg Favor <gfavor@...>, "tech-privileged@..." <tech-privileged@...>
Subject: [EXTERNAL] Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
Date: Tue, Apr 21, 2020 1:55 AM
To widen the question slightly further are there any plans to provide atomic load/store pair operations (128-bits
for RV64, 64-bits for RV32)?
From:
tech-privileged@... [mailto:tech-privileged@...]
On Behalf Of Andrew Waterman
Sent: 20 April 2020 23:49
To: David Kruckemyer <dkruckemyer@...>
Cc: Greg Favor <gfavor@...>;
tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman <andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:
The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers. The priv spec says that - in RV32 - mtimecmp can be written as a pair
of 32-bit registers. Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support
atomic 64-bit accesses to these registers? Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?
The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope
with non-atomicity, but mentions nothing of the sort for RV64. I will add the additional sentence that makes this implication explicit.
Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register. (Note: ARMv8 explicitly says software
must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)
In general, this depends on the peripheral and the platform. We aren't trying to preclude interfacing with legacy devices and buses, so of course some
64-bit accesses to some devices will either become non-atomic or signal some sort of error. But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely
do not want to preclude that, either.
Asking this slightly differently (I think) to clarify....
With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor?
If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers
non-atomically with 32b loads or stores, a la RV32.
The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear:
So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows
64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?
I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely,
the various SBI implementations make that assumption).
|
|
Re: 32-bit accesses to mtime/mtimecmp under RV64
Interesting. If you had a H/W FIFO, seems like it would be easier to make it work with single-copy atomic loads or stores to read from or write to the FIFO rather than bothering with the tedium of LR/SC pairs?
Yes, you can have multiple HARTs going after the "device" concurrently, but the single-copy atomicity of the loads or stores would seem to keep those accesses to the device separated rather than LR/SC which is more to do an atomic RMW of memory.
I'm having trouble seeing how LR/SC would fit there?
Also, I'll assume you really do intend to use the "double-wide" LR/SC for the CAS emulation?
Aside from whatever this FIFO example might turn out to be?
Derek
toggle quoted message
Show quoted text
----- Original message ----- From: Mark Hill <mark.hill@...> To: "striker@..." <striker@...> Cc: "andrew@..." <andrew@...>, "dkruckemyer@..." <dkruckemyer@...>, "gfavor@..." <gfavor@...>, "tech-privileged@..." <tech-privileged@...> Subject: [EXTERNAL] RE: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64 Date: Wed, Apr 22, 2020 4:36 AM
Another possible use case is access sensitive devices, for example a FIFO of 128-bit records with multiple RV64 harts reading from the FIFO.
From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of striker@... Sent: 22 April 2020 06:21 To: Mark Hill <mark.hill@...> Cc: andrew@...; dkruckemyer@...; gfavor@...; tech-privileged@... Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
To widen your question even further Mark (no pun intended), do we need 256 bits for RV128?
Yes, RV128 is a bit speculative, but it does at least rate being in the book, so best to have all the consequences of the request here on the table.
Also, I'm curious what you intend to use the bigger ones for?
The only answer here I know of is emulating CAS with the ticket/epoch/whatever counter next to the actual data element to solve CAS A-B-A problems (which, handily LR/SC naturally avoids anyways).
Is that the one you're after? (Asking because if there's another reason beyond that one, I'm interested in hearing about it).
----- Original message ----- From: "Dr Mark Hill" <mark.hill@...> Sent by: tech-privileged@... To: Andrew Waterman <andrew@...>, David Kruckemyer <dkruckemyer@...> Cc: Greg Favor <gfavor@...>, "tech-privileged@..." <tech-privileged@...> Subject: [EXTERNAL] Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64 Date: Tue, Apr 21, 2020 1:55 AM
To widen the question slightly further are there any plans to provide atomic load/store pair operations (128-bits for RV64, 64-bits for RV32)?
From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of Andrew Waterman Sent: 20 April 2020 23:49 To: David Kruckemyer <dkruckemyer@...> Cc: Greg Favor <gfavor@...>; tech-privileged@... Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman <andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:
The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers. The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers. Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers? Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?
The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64. I will add the additional sentence that makes this implication explicit.
Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register. (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)
In general, this depends on the peripheral and the platform. We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error. But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.
Asking this slightly differently (I think) to clarify....
With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.
The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear:
So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?
I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).
|
|
Re: 32-bit accesses to mtime/mtimecmp under RV64
I assumed that Dr Mark Hill was talking about 256 bit atomic loads and stores to ask the FIFO, not LR/SC. Also, double width CAS (and other double width atomics) is used not just for A-B-A problems, but also for things like atomically inserting into circular lists (e.g. where the list itself has pointers to both the first and the last elements singly link elements of the circle). In general, if your word or address width is W For atomic read modify writes: You need W+V bits or A-B-A problems, where V is whatever number of bits you need for versions or epochs 2W bits for list heads of course, 2W subsumes W+V, so we often don't make the distinction And the other big user of extra width atomics RMWs being page tables, e.g. 32-bit virtual addresses with 40 bit physical addresses (stored in 64-bit PTEs). Non-read modify write, atomic loads and stores of nearly any width – W, 2W, 4W - are useful for active memory devices like FIFOs.
toggle quoted message
Show quoted text
From: tech-privileged@... <tech-privileged@...> On Behalf Of striker@... Sent: Wednesday, April 22, 2020 18:42 To: mark.hill@... Cc: andrew@...; dkruckemyer@...; gfavor@...; tech-privileged@... Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64 Interesting. If you had a H/W FIFO, seems like it would be easier to make it work with single-copy atomic loads or stores to read from or write to the FIFO rather than bothering with the tedium of LR/SC pairs? Yes, you can have multiple HARTs going after the "device" concurrently, but the single-copy atomicity of the loads or stores would seem to keep those accesses to the device separated rather than LR/SC which is more to do an atomic RMW of memory. I'm having trouble seeing how LR/SC would fit there? Also, I'll assume you really do intend to use the "double-wide" LR/SC for the CAS emulation? Aside from whatever this FIFO example might turn out to be? ----- Original message ----- From: Mark Hill <mark.hill@...> To: "striker@..." <striker@...> Cc: "andrew@..." <andrew@...>, "dkruckemyer@..." <dkruckemyer@...>, "gfavor@..." <gfavor@...>, "tech-privileged@..." <tech-privileged@...> Subject: [EXTERNAL] RE: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64 Date: Wed, Apr 22, 2020 4:36 AM Another possible use case is access sensitive devices, for example a FIFO of 128-bit records with multiple RV64 harts reading from the FIFO. From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of striker@... Sent: 22 April 2020 06:21 To: Mark Hill <mark.hill@...> Cc: andrew@...; dkruckemyer@...; gfavor@...; tech-privileged@... Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64 To widen your question even further Mark (no pun intended), do we need 256 bits for RV128? Yes, RV128 is a bit speculative, but it does at least rate being in the book, so best to have all the consequences of the request here on the table. Also, I'm curious what you intend to use the bigger ones for? The only answer here I know of is emulating CAS with the ticket/epoch/whatever counter next to the actual data element to solve CAS A-B-A problems (which, handily LR/SC naturally avoids anyways). Is that the one you're after? (Asking because if there's another reason beyond that one, I'm interested in hearing about it). ----- Original message ----- From: "Dr Mark Hill" <mark.hill@...> Sent by: tech-privileged@... To: Andrew Waterman <andrew@...>, David Kruckemyer <dkruckemyer@...> Cc: Greg Favor <gfavor@...>, "tech-privileged@..." <tech-privileged@...> Subject: [EXTERNAL] Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64 Date: Tue, Apr 21, 2020 1:55 AM To widen the question slightly further are there any plans to provide atomic load/store pair operations (128-bits for RV64, 64-bits for RV32)? From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of Andrew Waterman Sent: 20 April 2020 23:49 To: David Kruckemyer <dkruckemyer@...> Cc: Greg Favor <gfavor@...>; tech-privileged@... Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64 On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman <andrew@...> wrote: On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote: On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote: The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers. The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers. Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers? Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?
The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64. I will add the additional sentence that makes this implication explicit. Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register. (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)
In general, this depends on the peripheral and the platform. We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error. But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.
Asking this slightly differently (I think) to clarify.... With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.
The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear:
So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?
I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).
|
|
Re: 32-bit accesses to mtime/mtimecmp under RV64
Ah..... yeah, ok. that "atoimc" (single-copy atomicity) vs the "atomic" (LR/SC pair).
My bad. Apologies Mark (assuming Andy is right and you meant load and store instructions that are single-copy-atomic) for the needless side trip into LR/SC.
Andy, thanks for the interesting point about wider LR/SC. I have some questions, but I won't bother everyone else with that... I'll just get you tomorrow.
Derek
toggle quoted message
Show quoted text
----- Original message ----- From: "Andy Glew Si5" <andy.glew@...> Sent by: tech-privileged@... To: striker@..., mark.hill@... Cc: Andrew Waterman <andrew@...>, dkruckemyer@..., gfavor@..., tech-privileged@... Subject: [EXTERNAL] Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64 Date: Wed, Apr 22, 2020 9:08 PM
I assumed that Dr Mark Hill was talking about 256 bit atomic loads and stores to ask the FIFO, not LR/SC.
Also, double width CAS (and other double width atomics) is used not just for A-B-A problems, but also for things like atomically inserting into circular lists (e.g. where the list itself has pointers to both the first and the last elements singly link elements of the circle).
In general, if your word or address width is W
For atomic read modify writes:
You need W+V bits or A-B-A problems, where V is whatever number of bits you need for versions or epochs
2W bits for list heads
of course, 2W subsumes W+V, so we often don't make the distinction
And the other big user of extra width atomics RMWs being page tables, e.g. 32-bit virtual addresses with 40 bit physical addresses (stored in 64-bit PTEs).
Non-read modify write, atomic loads and stores of nearly any width – W, 2W, 4W - are useful for active memory devices like FIFOs.
From: tech-privileged@... <tech-privileged@...> On Behalf Of striker@... Sent: Wednesday, April 22, 2020 18:42 To: mark.hill@... Cc: andrew@...; dkruckemyer@...; gfavor@...; tech-privileged@... Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
Interesting. If you had a H/W FIFO, seems like it would be easier to make it work with single-copy atomic loads or stores to read from or write to the FIFO rather than bothering with the tedium of LR/SC pairs?
Yes, you can have multiple HARTs going after the "device" concurrently, but the single-copy atomicity of the loads or stores would seem to keep those accesses to the device separated rather than LR/SC which is more to do an atomic RMW of memory.
I'm having trouble seeing how LR/SC would fit there?
Also, I'll assume you really do intend to use the "double-wide" LR/SC for the CAS emulation?
Aside from whatever this FIFO example might turn out to be?
----- Original message ----- From: Mark Hill <mark.hill@...> To: "striker@..." <striker@...> Cc: "andrew@..." <andrew@...>, "dkruckemyer@..." <dkruckemyer@...>, "gfavor@..." <gfavor@...>, "tech-privileged@..." <tech-privileged@...> Subject: [EXTERNAL] RE: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64 Date: Wed, Apr 22, 2020 4:36 AM
Another possible use case is access sensitive devices, for example a FIFO of 128-bit records with multiple RV64 harts reading from the FIFO.
From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of striker@... Sent: 22 April 2020 06:21 To: Mark Hill <mark.hill@...> Cc: andrew@...; dkruckemyer@...; gfavor@...; tech-privileged@... Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
To widen your question even further Mark (no pun intended), do we need 256 bits for RV128?
Yes, RV128 is a bit speculative, but it does at least rate being in the book, so best to have all the consequences of the request here on the table.
Also, I'm curious what you intend to use the bigger ones for?
The only answer here I know of is emulating CAS with the ticket/epoch/whatever counter next to the actual data element to solve CAS A-B-A problems (which, handily LR/SC naturally avoids anyways).
Is that the one you're after? (Asking because if there's another reason beyond that one, I'm interested in hearing about it).
----- Original message ----- From: "Dr Mark Hill" <mark.hill@...> Sent by: tech-privileged@... To: Andrew Waterman <andrew@...>, David Kruckemyer <dkruckemyer@...> Cc: Greg Favor <gfavor@...>, "tech-privileged@..." <tech-privileged@...> Subject: [EXTERNAL] Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64 Date: Tue, Apr 21, 2020 1:55 AM
To widen the question slightly further are there any plans to provide atomic load/store pair operations (128-bits for RV64, 64-bits for RV32)?
From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of Andrew Waterman Sent: 20 April 2020 23:49 To: David Kruckemyer <dkruckemyer@...> Cc: Greg Favor <gfavor@...>; tech-privileged@... Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman <andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:
On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:
The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers. The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers. Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers? Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?
The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64. I will add the additional sentence that makes this implication explicit.
Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register. (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)
In general, this depends on the peripheral and the platform. We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error. But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.
Asking this slightly differently (I think) to clarify....
With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.
The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear:
So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?
I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).
|
|