Date   

Re: Proposal for accelerating nested virtualization on RISC-V

Anup Patel
 

Hi Jonathan,

 

All cases for CSR accesses have not been thought through (initial draft) and written out. Regarding WARL CSR with hardwired bits, the HW will always read/write fixed values of hardwired bits in memory.

 

I totally agree with you on the two times overhead of illegal instruction trap. The illegal instruction trap is not delegated to S-mode because M-mode (i.e. OpenSBI) emulates S-mode access to TIME, HTIMEDELTA, CYCLE, INSTRET, and other COUNTER CSRs. I am not sure if we can totally get rid off illegal instruction trap handling from OpenSBI because quite a few HW out there don’t have TIME CSRs and other CSRs for accessed from S-mode. Currently OpenSBI emulates TIME CSR for HS-mode, U-mode, VS-mode and VU-mode. The nested virtualization acceleration will certainly help.

 

I forgot to mention that implementation can choose to not implement nested virtualization acceleration and hardwire HNESTED CSR to zero.

 

Regards,

Anup

 

 

From: Jonathan Behrens <behrensj@...>
Sent: 17 March 2020 18:58
To: Anup Patel <Anup.Patel@...>
Cc: tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] Proposal for accelerating nested virtualization on RISC-V

 

Your description of un-accelerated nested virtualization seems workable to me. I'm less sure of the proposal to avoid trapping on h<xyz> and vs<xyz> accesses. Aren't you going to run into issues with any WARL CSR that has hardwired bits?

 

I'd like to point out another performance pitfall with trap-and-emulate that I've mentioned before but might not be obvious from reading your proposal: the illegal instruction traps triggered by the guest trying to use hypervisor CSRs or run hypervisor instructions will not trap directly to HS-mode. Rather they will be routed to M-mode and then get forwarded to HS-mode, which has about two times higher overhead (forwarding a trap is at least as expensive as emulating most instructions). It is also quite avoidable by adding a bit to let M-mode delegate traps from legal but privileged instructions executed in U/VS/VU modes.

 

Jonathan

 

On Tue, Mar 17, 2020 at 6:40 AM Anup Patel via Lists.Riscv.Org <anup.patel=wdc.com@...> wrote:

A clarification is required in RISC-V H-Extension spec regarding scope
of HSTATUS.VTVM bit. Currently as-per the spec, all virtual memory
management instructions (both SFENCEs and HFENCEs) will trap to
HS-mode when HSTATUS.VTVM == 1 and V == 1. Rather, only SFENCEs are
required to be trapped to HS-mode when HSTATUS.VTVM == 1 and V == 1
because HFENCEs are only defined for HS-mode (i.e. V==0).

To better describe nested virtualization, we define following dummy
privilege modes:
Host HS-mode
  Host hypervisor kernel will run in this mode
  Software in this mode will actually run in HW HS-mode
Host U-mode
  Host hypervisor user-space will run in this mode
  Software in this mode will actually run in HW U-mode
Guest HS-mode
  Guest hypervisor kernel will run in this mode
  Software in this mode will actually run in HW VS-mode
Guest U-mode => HW VU-mode
  Guest hypervisor user-space will run in this mode
  Software in this mode will actually run in HW VU-mode
Guest VS-mode => HW VS-mode
  Software in this mode will actually run in HW VS-mode
Guest VU-mode => HW VU-mode
  Software in this mode will actually run in HW VU-mode

A high-level software approach for nested virtualization in RISC-V
can be as follows:
1. The Host HS-mode (Host hypervisor) will enable HSTATUS.VTSR to
   emulate SRET instruction for Guest. This emulation will involve
   a CSR world-switch when switching from Guest HS/U-mode to/from
   Guest VS/VU-mode.
2. Virtual interrupts will be injected to Guest VS/VU-mode after
   doing CSR world-switch (in point1 above) from Guest HS/U-mode
   to Guest VS/VU-mode.
3. All accesses to "h<xyz>" and "vs<xyz>" from Guest will trap to
   Host HS-mode (Host hypervisor) where:
   a) These CSRs will emulated for Guest HS-mode
   b) For Guest U-mode and Guest VS/VU-mode, the trap will
      be forwarded to Guest HS-mode
4. The Host HS-mode (Host hypervisor) will manage two Stage2 page
   tables:
   a) Regular Stage2 page table for Guest HS/U-mode
   b) Shadow Stage2 page table for Guest VS/VU-mode. Of course,
      Host HS-mode (host hypervisor) will have to do software walk
      of Guest HS-mode HGATP page table when populating mappings in
      Shadow Stage2 page table and it will have mappings which are
      combined effect of Guest HS-mode HGATP page table and Regular
      Stage2 page table.
5. All HFENCEs will trap to Host HS-mode where the Host HS-mode
   (Host hypervisor) will:
   a) Trap-n-emulate HFENCE.VVMA and HFENCE.GVMA for Guest HS-mode
   b) Redirect HFENCE.VVMA and HFENCE.GVMA traps from Guest VS-mode
      to Guest HS-mode irrespective to Guest HS-mode HSTATUS.VTVM
6. All HLV/HSV instructions from Guest HS/U-mode and Guest VS/VU-mode
   will trap to Host HS-mode (Host hypervisor) where:
   a) HLV/HSV instruction from Guest HS/U-mode will be emulated
      by Host HS-mode (Host hypervisor)
   b) HLV/HSV instruction from Guest VS/VU-mode will be forwarded
      to Guest HS-mode by Host HS-mode (Host hypervisor)

Please suggest if any case is not considered in above high-level
software approach for nested virtualization.

Based on above high-level software approach, we propose a way to
accelerate nested virtualization performance by reducing "h<xyz>" and
"vs<xyz>" CSR access traps from VS-mode to HS-mode (point3 above).

As-per our proposal, we convert "h<xyz>" and "vs<xyz>" CSR accesses
From VS-mode as memory accesses relative to a nested context base
(or <nested_context_base>).

The enable bit (or <nested_enable>) for above described CSR accesses
conversion and the <nested_context_base> can be specified via new
HNESTED CSR.

<nested_enable> = HNESTED[0]
<nested_context_base> = HNESTED[XLEN:1] << (log2 (XLEN / 8))

Note: <nested_context_base> address is always machine word aligned
Note: <nested_enable> = 0 means "h<xyz>" and "vs<xyz>" trap to HS-mode
without any CSR accesses conversion

Various "h<xyz>" and "vs<xyz>" CSRs are accessed at <csr_nested_offset>
relative to <nested_context_base> based on their CSR number as follows:

CSR number 0x2xx
<csr_nested_offset> = 0x0000 + ((CSR_number & 0xff) * (XLEN / 8))
CSR number 0x6xx
<csr_nested_offset> = 0x1000 + ((CSR_number & 0xff) * (XLEN / 8))
CSR number 0xAxx
<csr_nested_offset> = 0x2000 + ((CSR_number & 0xff) * (XLEN / 8))
CSR number 0xExx
<csr_nested_offset> = 0x3000 + ((CSR_number & 0xff) * (XLEN / 8))

The VS-mode accesses to some of the "h<xyz>" CSRs cannot be converted
into memory accesses due to nature of these CSRs. These CSRs include
HGEIP and HGEIE CSRs (any other CSRs ??).

Accesses to the HNESTED CSR (described above) from VS-mode is also
converted to memory access when <nested_enable> = 1 because the
HNESTED CSR can be safely emulated using nested acceleration.

Best Regards,
Anup Patel


Handling faults on new HLV/HSV instructions in Hypervisor Extension draft 0.6

Greg Favor
 

When one of the new HLV/HSV instructions faults, what virtualization and privilege modes are recorded in mstatus.mpp/mpv, or in sstatus.spp/spv and hstatus.spvp?  Are they based on the actual modes from within which the instruction executes (i.e. on HS or U, and V=0), or on the effective modes used by the instruction as it executes (i.e. on spvp and V=1).

Assuming, for example, the trap is taken into HS-mode:

- If the actual modes apply, then hstatus.spvp remains unchanged and sstatus.spp/spv are set to reflect the actual privilege mode and V=0.  The hypervisor would then presumably figure out from htinst what caused this trap?   (In certain cases would the hypervisor need to save the original 'spp/spv' values before doing any HLV/HSV instructions so that it can restore them afterwards?)

- If the effective modes apply, then sstatus.spp and hstatus.spvp are set to the effective privilege mode of the HLV/HSV instruction (as specified by spvp) and sstatus.spv is set to reflect V=1.  The hypervisor would then figure out in some way (such as from htinst?) that this was a re-entry into the hypervisor due to its own actions?  (Typically all three of these fields would end up not changing in their values.  But in certain cases would the hypervisor need to save the original 'spp/spv' values before doing any HLV/HSV instructions so that it can restore them afterwards?)

In any case, which is the intended behavior (which should probably then be clarified in the spec)?

Thanks,
Greg


Re: Handling faults on new HLV/HSV instructions in Hypervisor Extension draft 0.6

John Hauser
 

Greg Favor wrote:
When one of the new HLV/HSV instructions faults, what virtualization and
privilege modes are recorded in mstatus.mpp/mpv, or in sstatus.spp/spv and
hstatus.spvp? Are they based on the actual modes from within which the
instruction executes (i.e. on HS or U, and V=0), or on the effective modes
used by the instruction as it executes (i.e. on spvp and V=1).
The actual virtualization and privilege modes, same as always.

Consider the analogy with memory accesses made in M mode when
mstatus.MPRV = 1. The document says that such memory accesses occur
"as though the current privilege mode were set to MPP". If such a
memory access causes a trap, mstatus.MPP gets set to 3, the actual mode
at the time of the trap, not the "as-though" mode. As far as I know,
there's never been a question about this for MPRV.

Likewise, HLV and HSV are defined as performing memory accesses "as
though V = 1". Sounds the same to me.

I also think tables 5.6 and 5.7 in section 5.7.2, "Trap Entry", are
reasonably unambiguous on this point. Since HLV and HSV aren't said to
actually change the current virtualization or privilege modes, I feel
it's evident they don't affect what's written to SPV and SPP on a trap.

If instead the "effective modes applied", as you put it, then note
that SRET would no longer be sufficient to resume from a trap caused by
HLV/HSV. (Nor would MRET, if the trap is taken in M mode.)

FWIW, there's another clue hidden in this comment in section 5.2.1,
"Hypervisor Status Register (hstatus)":

For memory faults, GVA is redundant with field SPV (the two bits
are set the same) except when the explicit memory access of an HLV,
HLVX, or HSV instruction causes a fault. In that case, SPV = 0 but
GVA = 1.

Note, it says SPV gets set to 0, not 1.

Assuming, for example, the trap is taken into HS-mode:

- If the actual modes apply, then hstatus.spvp remains unchanged and
sstatus.spp/spv are set to reflect the actual privilege mode and V=0. The
hypervisor would then presumably figure out from htinst what caused this
trap?
Yes. Bit GVA in hstatus might also be helpful.

(In certain cases would the hypervisor need to save the original
'spp/spv' values before doing any HLV/HSV instructions so that it can
restore them afterwards?)
It is generally the case, whenever nested traps might be taken in
HS mode, that the hypervisor may need to save sstatus and hstatus
before the nested trap could occur, and restore them afterward. That's
no different than when an operating system might trigger a nested
S-mode-handled trap (like a page fault) by a memory access executed
in S mode: the OS may need to save and restore sstatus around such
possibilities. The specific situation determines exactly what must be
saved and restored.

- John Hauser


Re: Handling faults on new HLV/HSV instructions in Hypervisor Extension draft 0.6

Jonathan Behrens <behrensj@...>
 

Having SPP/SPV hold the real values makes the most sense to me. The strategy I'd expect hypervisors to use would be to set a bit before issuing any HLV or HSV instructions and clear it after. Then in their page fault handler they'd check if it is set in order to "blame" that fault on the guest and take appropriate action instead of resuming normal execution.

Jonathan


On Mon, Apr 13, 2020 at 10:48 PM John Hauser via lists.riscv.org <jh.riscv=jhauser.us@...> wrote:
Greg Favor wrote:
> When one of the new HLV/HSV instructions faults, what virtualization and
> privilege modes are recorded in mstatus.mpp/mpv, or in sstatus.spp/spv and
> hstatus.spvp?  Are they based on the actual modes from within which the
> instruction executes (i.e. on HS or U, and V=0), or on the effective modes
> used by the instruction as it executes (i.e. on spvp and V=1).

The actual virtualization and privilege modes, same as always.

Consider the analogy with memory accesses made in M mode when
mstatus.MPRV = 1.  The document says that such memory accesses occur
"as though the current privilege mode were set to MPP".  If such a
memory access causes a trap, mstatus.MPP gets set to 3, the actual mode
at the time of the trap, not the "as-though" mode.  As far as I know,
there's never been a question about this for MPRV.

Likewise, HLV and HSV are defined as performing memory accesses "as
though V = 1".  Sounds the same to me.

I also think tables 5.6 and 5.7 in section 5.7.2, "Trap Entry", are
reasonably unambiguous on this point.  Since HLV and HSV aren't said to
actually change the current virtualization or privilege modes, I feel
it's evident they don't affect what's written to SPV and SPP on a trap.

If instead the "effective modes applied", as you put it, then note
that SRET would no longer be sufficient to resume from a trap caused by
HLV/HSV.  (Nor would MRET, if the trap is taken in M mode.)

FWIW, there's another clue hidden in this comment in section 5.2.1,
"Hypervisor Status Register (hstatus)":

    For memory faults, GVA is redundant with field SPV (the two bits
    are set the same) except when the explicit memory access of an HLV,
    HLVX, or HSV instruction causes a fault.  In that case, SPV = 0 but
    GVA = 1.

Note, it says SPV gets set to 0, not 1.

> Assuming, for example, the trap is taken into HS-mode:
>
> - If the actual modes apply, then hstatus.spvp remains unchanged and
> sstatus.spp/spv are set to reflect the actual privilege mode and V=0.  The
> hypervisor would then presumably figure out from htinst what caused this
> trap?

Yes.  Bit GVA in hstatus might also be helpful.

>  (In certain cases would the hypervisor need to save the original
> 'spp/spv' values before doing any HLV/HSV instructions so that it can
> restore them afterwards?)

It is generally the case, whenever nested traps might be taken in
HS mode, that the hypervisor may need to save sstatus and hstatus
before the nested trap could occur, and restore them afterward.  That's
no different than when an operating system might trigger a nested
S-mode-handled trap (like a page fault) by a memory access executed
in S mode:  the OS may need to save and restore sstatus around such
possibilities.  The specific situation determines exactly what must be
saved and restored.

    - John Hauser




Re: Handling faults on new HLV/HSV instructions in Hypervisor Extension draft 0.6

Greg Favor
 

John, thanks for the full responses.  I had suspected the former.  But as can sometimes be the case, we were looking at certain parts of the spec and weren't looking at the tables in section 5.7.2.

Given that the general style of the arch spec is to not do heavy cross-referencing, I won't suggest that.  And ultimately it was our own fault in not broadly searching for and noticing those tables while in the heat of the moment of answering a question about "hstatus" section 5.2.1 that was raised by one of our designers.

Thanks,
Greg


On Mon, Apr 13, 2020 at 7:48 PM John Hauser <jh.riscv@...> wrote:
Greg Favor wrote:
> When one of the new HLV/HSV instructions faults, what virtualization and
> privilege modes are recorded in mstatus.mpp/mpv, or in sstatus.spp/spv and
> hstatus.spvp?  Are they based on the actual modes from within which the
> instruction executes (i.e. on HS or U, and V=0), or on the effective modes
> used by the instruction as it executes (i.e. on spvp and V=1).

The actual virtualization and privilege modes, same as always.

Consider the analogy with memory accesses made in M mode when
mstatus.MPRV = 1.  The document says that such memory accesses occur
"as though the current privilege mode were set to MPP".  If such a
memory access causes a trap, mstatus.MPP gets set to 3, the actual mode
at the time of the trap, not the "as-though" mode.  As far as I know,
there's never been a question about this for MPRV.

Likewise, HLV and HSV are defined as performing memory accesses "as
though V = 1".  Sounds the same to me.

I also think tables 5.6 and 5.7 in section 5.7.2, "Trap Entry", are
reasonably unambiguous on this point.  Since HLV and HSV aren't said to
actually change the current virtualization or privilege modes, I feel
it's evident they don't affect what's written to SPV and SPP on a trap.

If instead the "effective modes applied", as you put it, then note
that SRET would no longer be sufficient to resume from a trap caused by
HLV/HSV.  (Nor would MRET, if the trap is taken in M mode.)

FWIW, there's another clue hidden in this comment in section 5.2.1,
"Hypervisor Status Register (hstatus)":

    For memory faults, GVA is redundant with field SPV (the two bits
    are set the same) except when the explicit memory access of an HLV,
    HLVX, or HSV instruction causes a fault.  In that case, SPV = 0 but
    GVA = 1.

Note, it says SPV gets set to 0, not 1.

> Assuming, for example, the trap is taken into HS-mode:
>
> - If the actual modes apply, then hstatus.spvp remains unchanged and
> sstatus.spp/spv are set to reflect the actual privilege mode and V=0.  The
> hypervisor would then presumably figure out from htinst what caused this
> trap?

Yes.  Bit GVA in hstatus might also be helpful.

>  (In certain cases would the hypervisor need to save the original
> 'spp/spv' values before doing any HLV/HSV instructions so that it can
> restore them afterwards?)

It is generally the case, whenever nested traps might be taken in
HS mode, that the hypervisor may need to save sstatus and hstatus
before the nested trap could occur, and restore them afterward.  That's
no different than when an operating system might trigger a nested
S-mode-handled trap (like a page fault) by a memory access executed
in S mode:  the OS may need to save and restore sstatus around such
possibilities.  The specific situation determines exactly what must be
saved and restored.

    - John Hauser




Re: Handling faults on new HLV/HSV instructions in Hypervisor Extension draft 0.6

Anup Patel
 

-----Original Message-----
From: tech-privileged@... <tech-privileged@...> On
Behalf Of John Hauser
Sent: 14 April 2020 08:17
To: tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] Handling faults on new HLV/HSV
instructions in Hypervisor Extension draft 0.6

Greg Favor wrote:
When one of the new HLV/HSV instructions faults, what virtualization
and privilege modes are recorded in mstatus.mpp/mpv, or in
sstatus.spp/spv and hstatus.spvp? Are they based on the actual modes
from within which the instruction executes (i.e. on HS or U, and V=0),
or on the effective modes used by the instruction as it executes (i.e. on
spvp and V=1).

The actual virtualization and privilege modes, same as always.

Consider the analogy with memory accesses made in M mode when
mstatus.MPRV = 1. The document says that such memory accesses occur "as
though the current privilege mode were set to MPP". If such a memory
access causes a trap, mstatus.MPP gets set to 3, the actual mode at the time
of the trap, not the "as-though" mode. As far as I know, there's never been
a question about this for MPRV.

Likewise, HLV and HSV are defined as performing memory accesses "as
though V = 1". Sounds the same to me.

I also think tables 5.6 and 5.7 in section 5.7.2, "Trap Entry", are reasonably
unambiguous on this point. Since HLV and HSV aren't said to actually change
the current virtualization or privilege modes, I feel it's evident they don't
affect what's written to SPV and SPP on a trap.

If instead the "effective modes applied", as you put it, then note that SRET
would no longer be sufficient to resume from a trap caused by HLV/HSV.
(Nor would MRET, if the trap is taken in M mode.)
This is our understanding as well.

The SRET usage will certainly break for hypervisors if STATUS.SPP and
HSTATUS.SPV don't point to mode when trap happened.


FWIW, there's another clue hidden in this comment in section 5.2.1,
"Hypervisor Status Register (hstatus)":

For memory faults, GVA is redundant with field SPV (the two bits
are set the same) except when the explicit memory access of an HLV,
HLVX, or HSV instruction causes a fault. In that case, SPV = 0 but
GVA = 1.

Note, it says SPV gets set to 0, not 1.

Assuming, for example, the trap is taken into HS-mode:

- If the actual modes apply, then hstatus.spvp remains unchanged and
sstatus.spp/spv are set to reflect the actual privilege mode and V=0.
The hypervisor would then presumably figure out from htinst what
caused this trap?
Yes. Bit GVA in hstatus might also be helpful.

(In certain cases would the hypervisor need to save the original
'spp/spv' values before doing any HLV/HSV instructions so that it can
restore them afterwards?)
It is generally the case, whenever nested traps might be taken in HS mode,
that the hypervisor may need to save sstatus and hstatus before the nested
trap could occur, and restore them afterward. That's no different than when
an operating system might trigger a nested S-mode-handled trap (like a page
fault) by a memory access executed in S mode: the OS may need to save and
restore sstatus around such possibilities. The specific situation determines
exactly what must be saved and restored.
Yes, both Xvisor RISC-V and KVM RISC-V will save SSTATUS and HSTATUS
In low-level trap entry path.

Regards,
Anup


32-bit accesses to mtime/mtimecmp under RV64

Greg Favor
 

The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers.  The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers.  Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers?  Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?

Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register.  (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)

Greg


Re: 32-bit accesses to mtime/mtimecmp under RV64

andrew@...
 



On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:
The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers.  The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers.  Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers?  Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?

The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64.  I will add the additional sentence that makes this implication explicit.


Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register.  (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)

In general, this depends on the peripheral and the platform.  We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error.  But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.


Greg


Re: 32-bit accesses to mtime/mtimecmp under RV64

David Kruckemyer
 



On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:


On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:
The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers.  The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers.  Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers?  Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?

The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64.  I will add the additional sentence that makes this implication explicit.


Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register.  (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)

In general, this depends on the peripheral and the platform.  We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error.  But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.

Asking this slightly differently (I think) to clarify....

With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.

Cheers,
David

 


Greg


Re: 32-bit accesses to mtime/mtimecmp under RV64

andrew@...
 



On Mon, Apr 20, 2020 at 11:32 AM David Kruckemyer <dkruckemyer@...> wrote:


On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:


On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:
The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers.  The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers.  Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers?  Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?

The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64.  I will add the additional sentence that makes this implication explicit.


Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register.  (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)

In general, this depends on the peripheral and the platform.  We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error.  But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.

Asking this slightly differently (I think) to clarify....

With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.

The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear: 


Cheers,
David

 


Greg


Re: 32-bit accesses to mtime/mtimecmp under RV64

David Kruckemyer
 



On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman <andrew@...> wrote:


On Mon, Apr 20, 2020 at 11:32 AM David Kruckemyer <dkruckemyer@...> wrote:


On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:


On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:
The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers.  The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers.  Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers?  Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?

The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64.  I will add the additional sentence that makes this implication explicit.


Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register.  (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)

In general, this depends on the peripheral and the platform.  We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error.  But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.

Asking this slightly differently (I think) to clarify....

With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.

The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear: 


So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?

Cheers,
David

 

Cheers,
David

 


Greg


Re: 32-bit accesses to mtime/mtimecmp under RV64

andrew@...
 



On Mon, Apr 20, 2020 at 3:28 PM David Kruckemyer <dkruckemyer@...> wrote:


On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman <andrew@...> wrote:


On Mon, Apr 20, 2020 at 11:32 AM David Kruckemyer <dkruckemyer@...> wrote:


On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:


On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:
The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers.  The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers.  Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers?  Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?

The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64.  I will add the additional sentence that makes this implication explicit.


Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register.  (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)

In general, this depends on the peripheral and the platform.  We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error.  But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.

Asking this slightly differently (I think) to clarify....

With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.

The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear: 


So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?

I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).


Cheers,
David

 

Cheers,
David

 


Greg


Re: 32-bit accesses to mtime/mtimecmp under RV64

Mark Hill
 

 

To widen the question slightly further are there any plans to provide atomic load/store pair operations (128-bits for RV64, 64-bits for RV32)?

 

 

From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of Andrew Waterman
Sent: 20 April 2020 23:49
To: David Kruckemyer <dkruckemyer@...>
Cc: Greg Favor <gfavor@...>; tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64

 

 

 

On Mon, Apr 20, 2020 at 3:28 PM David Kruckemyer <dkruckemyer@...> wrote:

 

 

On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman <andrew@...> wrote:

 

 

On Mon, Apr 20, 2020 at 11:32 AM David Kruckemyer <dkruckemyer@...> wrote:

 

 

On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:

 

 

On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:

The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers.  The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers.  Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers?  Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?

 

The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64.  I will add the additional sentence that makes this implication explicit.

 

 

Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register.  (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)

 

In general, this depends on the peripheral and the platform.  We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error.  But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.

 

Asking this slightly differently (I think) to clarify....

 

With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.

 

The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear: 

 

 

So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?

 

I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).

 

 

Cheers,

David

 

 

 

Cheers,

David

 

 

 

 

Greg


Re: 32-bit accesses to mtime/mtimecmp under RV64

Allen Baum
 

more mtimecmp questions:
 - the spec says that an interrupt occurs is
       posted when the mtime register contains a value greater than or equal to the value in the mtimecmp register. 
but doesn't specify that it is *unsigned* greater than or equal.

On Mon, Apr 20, 2020 at 3:48 PM Andrew Waterman <andrew@...> wrote:


On Mon, Apr 20, 2020 at 3:28 PM David Kruckemyer <dkruckemyer@...> wrote:


On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman <andrew@...> wrote:


On Mon, Apr 20, 2020 at 11:32 AM David Kruckemyer <dkruckemyer@...> wrote:


On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:


On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:
The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers.  The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers.  Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers?  Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?

The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64.  I will add the additional sentence that makes this implication explicit.


Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register.  (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)

In general, this depends on the peripheral and the platform.  We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error.  But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.

Asking this slightly differently (I think) to clarify....

With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.

The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear: 


So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?

I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).


Cheers,
David

 

Cheers,
David

 


Greg


Re: 32-bit accesses to mtime/mtimecmp under RV64

striker@...
 

 
To widen your question even further Mark (no pun intended), do we need 256 bits for RV128?
 
Yes, RV128 is a bit speculative, but it does at least rate being in the book, so best to have all the consequences of the request here on the table. 
 
Also, I'm curious what you intend to use the bigger ones for?
 
The only answer here I know of is emulating CAS with the ticket/epoch/whatever counter next to the actual data element  to solve CAS A-B-A problems (which, handily LR/SC naturally avoids anyways).
 
Is that the one you're after? (Asking because if there's another reason beyond that one, I'm interested in hearing about it). 
 
Derek Williams
 
 

----- Original message -----
From: "Dr Mark Hill" <mark.hill@...>
Sent by: tech-privileged@...
To: Andrew Waterman <andrew@...>, David Kruckemyer <dkruckemyer@...>
Cc: Greg Favor <gfavor@...>, "tech-privileged@..." <tech-privileged@...>
Subject: [EXTERNAL] Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
Date: Tue, Apr 21, 2020 1:55 AM
 

 

To widen the question slightly further are there any plans to provide atomic load/store pair operations (128-bits for RV64, 64-bits for RV32)?

 

 

From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of Andrew Waterman
Sent: 20 April 2020 23:49
To: David Kruckemyer <dkruckemyer@...>
Cc: Greg Favor <gfavor@...>; tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64

 

 

 

On Mon, Apr 20, 2020 at 3:28 PM David Kruckemyer <dkruckemyer@...> wrote:

 

 

On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman <andrew@...> wrote:

 

 

On Mon, Apr 20, 2020 at 11:32 AM David Kruckemyer <dkruckemyer@...> wrote:

 

 

On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:

 

 

On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:

The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers.  The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers.  Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers?  Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?

 

The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64.  I will add the additional sentence that makes this implication explicit.

 

 

Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register.  (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)

 

In general, this depends on the peripheral and the platform.  We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error.  But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.

 

Asking this slightly differently (I think) to clarify....

 

With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.

 

The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear: 

 

 

So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?

 

I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).

 

 

Cheers,

David

 

 

 

Cheers,

David

 

 

 

 

Greg

 

 


Re: 32-bit accesses to mtime/mtimecmp under RV64

andrew@...
 



On Tue, Apr 21, 2020 at 12:24 AM Allen Baum <allen.baum@...> wrote:
more mtimecmp questions:
 - the spec says that an interrupt occurs is
       posted when the mtime register contains a value greater than or equal to the value in the mtimecmp register. 
but doesn't specify that it is *unsigned* greater than or equal.

 

On Mon, Apr 20, 2020 at 3:48 PM Andrew Waterman <andrew@...> wrote:


On Mon, Apr 20, 2020 at 3:28 PM David Kruckemyer <dkruckemyer@...> wrote:


On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman <andrew@...> wrote:


On Mon, Apr 20, 2020 at 11:32 AM David Kruckemyer <dkruckemyer@...> wrote:


On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:


On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:
The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers.  The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers.  Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers?  Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?

The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64.  I will add the additional sentence that makes this implication explicit.


Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register.  (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)

In general, this depends on the peripheral and the platform.  We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error.  But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.

Asking this slightly differently (I think) to clarify....

With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.

The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear: 


So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?

I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).


Cheers,
David

 

Cheers,
David

 


Greg


Re: 32-bit accesses to mtime/mtimecmp under RV64

Mark Hill
 

Another possible use case is access sensitive devices, for example a FIFO of 128-bit records with multiple RV64 harts reading from the FIFO.

 

From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of striker@...
Sent: 22 April 2020 06:21
To: Mark Hill <mark.hill@...>
Cc: andrew@...; dkruckemyer@...; gfavor@...; tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64

 

 

To widen your question even further Mark (no pun intended), do we need 256 bits for RV128?

 

Yes, RV128 is a bit speculative, but it does at least rate being in the book, so best to have all the consequences of the request here on the table. 

 

Also, I'm curious what you intend to use the bigger ones for?

 

The only answer here I know of is emulating CAS with the ticket/epoch/whatever counter next to the actual data element  to solve CAS A-B-A problems (which, handily LR/SC naturally avoids anyways).

 

Is that the one you're after? (Asking because if there's another reason beyond that one, I'm interested in hearing about it). 

 

Derek Williams

 

 

----- Original message -----
From: "Dr Mark Hill" <mark.hill@...>
Sent by: tech-privileged@...
To: Andrew Waterman <andrew@...>, David Kruckemyer <dkruckemyer@...>
Cc: Greg Favor <gfavor@...>, "tech-privileged@..." <tech-privileged@...>
Subject: [EXTERNAL] Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
Date: Tue, Apr 21, 2020 1:55 AM
 

 

To widen the question slightly further are there any plans to provide atomic load/store pair operations (128-bits for RV64, 64-bits for RV32)?

 

 

From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of Andrew Waterman
Sent: 20 April 2020 23:49
To: David Kruckemyer <dkruckemyer@...>
Cc: Greg Favor <gfavor@...>; tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64

 

 

 

On Mon, Apr 20, 2020 at 3:28 PM David Kruckemyer <dkruckemyer@...> wrote:

 

 

On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman <andrew@...> wrote:

 

 

On Mon, Apr 20, 2020 at 11:32 AM David Kruckemyer <dkruckemyer@...> wrote:

 

 

On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:

 

 

On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:

The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers.  The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers.  Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers?  Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?

 

The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64.  I will add the additional sentence that makes this implication explicit.

 

 

Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register.  (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)

 

In general, this depends on the peripheral and the platform.  We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error.  But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.

 

Asking this slightly differently (I think) to clarify....

 

With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.

 

The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear: 

 

 

So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?

 

I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).

 

 

Cheers,

David

 

 

 

Cheers,

David

 

 

 

 

Greg

 

 

 


Re: 32-bit accesses to mtime/mtimecmp under RV64

striker@...
 

 
Interesting. If you had a H/W FIFO, seems like it would be easier to make it work with  single-copy atomic loads or stores to read from or write to the FIFO rather than bothering with the tedium of LR/SC pairs?
 
Yes, you can have multiple HARTs going after the "device" concurrently, but the single-copy atomicity of the loads or stores would seem to keep those accesses to the device separated rather than LR/SC which is more to do an atomic RMW of memory. 
 
I'm having trouble seeing how LR/SC would fit there? 
 
Also, I'll assume you really do intend to use the "double-wide" LR/SC for the CAS emulation?
 
Aside from whatever this FIFO example might turn out to be? 
 
Derek 
 

----- Original message -----
From: Mark Hill <mark.hill@...>
To: "striker@..." <striker@...>
Cc: "andrew@..." <andrew@...>, "dkruckemyer@..." <dkruckemyer@...>, "gfavor@..." <gfavor@...>, "tech-privileged@..." <tech-privileged@...>
Subject: [EXTERNAL] RE: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
Date: Wed, Apr 22, 2020 4:36 AM
 

Another possible use case is access sensitive devices, for example a FIFO of 128-bit records with multiple RV64 harts reading from the FIFO.

 

From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of striker@...
Sent: 22 April 2020 06:21
To: Mark Hill <mark.hill@...>
Cc: andrew@...; dkruckemyer@...; gfavor@...; tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64

 

 

To widen your question even further Mark (no pun intended), do we need 256 bits for RV128?

 

Yes, RV128 is a bit speculative, but it does at least rate being in the book, so best to have all the consequences of the request here on the table. 

 

Also, I'm curious what you intend to use the bigger ones for?

 

The only answer here I know of is emulating CAS with the ticket/epoch/whatever counter next to the actual data element  to solve CAS A-B-A problems (which, handily LR/SC naturally avoids anyways).

 

Is that the one you're after? (Asking because if there's another reason beyond that one, I'm interested in hearing about it). 

 

Derek Williams

 

 

----- Original message -----
From: "Dr Mark Hill" <mark.hill@...>
Sent by: tech-privileged@...
To: Andrew Waterman <andrew@...>, David Kruckemyer <dkruckemyer@...>
Cc: Greg Favor <gfavor@...>, "tech-privileged@..." <tech-privileged@...>
Subject: [EXTERNAL] Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
Date: Tue, Apr 21, 2020 1:55 AM
 

 

To widen the question slightly further are there any plans to provide atomic load/store pair operations (128-bits for RV64, 64-bits for RV32)?

 

 

From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of Andrew Waterman
Sent: 20 April 2020 23:49
To: David Kruckemyer <dkruckemyer@...>
Cc: Greg Favor <gfavor@...>; tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64

 

 

 

On Mon, Apr 20, 2020 at 3:28 PM David Kruckemyer <dkruckemyer@...> wrote:

 

 

On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman <andrew@...> wrote:

 

 

On Mon, Apr 20, 2020 at 11:32 AM David Kruckemyer <dkruckemyer@...> wrote:

 

 

On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:

 

 

On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:

The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers.  The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers.  Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers?  Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?

 

The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64.  I will add the additional sentence that makes this implication explicit.

 

 

Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register.  (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)

 

In general, this depends on the peripheral and the platform.  We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error.  But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.

 

Asking this slightly differently (I think) to clarify....

 

With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.

 

The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear: 

 

 

So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?

 

I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).

 

 

Cheers,

David

 

 

 

Cheers,

David

 

 

 

 

Greg

 

 

 

 


Re: 32-bit accesses to mtime/mtimecmp under RV64

Andy Glew Si5
 

I assumed that Dr Mark Hill  was talking about 256 bit atomic loads and stores to ask the FIFO, not LR/SC.

 

Also, double width CAS (and other double width atomics) is used not just for A-B-A problems, but also for things like atomically inserting into circular lists (e.g. where the list itself has pointers to both the first and the last elements singly link elements of the circle).

 

In general, if your word or address width is W

 

For atomic read modify writes:

 

You need W+V bits or A-B-A problems, where V is whatever number of bits you need for versions or epochs

 

2W bits for list heads

 

of course, 2W subsumes W+V, so we often don't make the distinction

 

And the other big user of extra width atomics RMWs being page tables, e.g. 32-bit virtual addresses with 40 bit physical addresses (stored in 64-bit PTEs).

 

Non-read modify write, atomic loads and stores of nearly any width – W, 2W, 4W - are useful for active memory devices like FIFOs. 

 

 

From: tech-privileged@... <tech-privileged@...> On Behalf Of striker@...
Sent: Wednesday, April 22, 2020 18:42
To: mark.hill@...
Cc: andrew@...; dkruckemyer@...; gfavor@...; tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64

 

 

Interesting. If you had a H/W FIFO, seems like it would be easier to make it work with  single-copy atomic loads or stores to read from or write to the FIFO rather than bothering with the tedium of LR/SC pairs?

 

Yes, you can have multiple HARTs going after the "device" concurrently, but the single-copy atomicity of the loads or stores would seem to keep those accesses to the device separated rather than LR/SC which is more to do an atomic RMW of memory. 

 

I'm having trouble seeing how LR/SC would fit there? 

 

Also, I'll assume you really do intend to use the "double-wide" LR/SC for the CAS emulation?

 

Aside from whatever this FIFO example might turn out to be? 

 

Derek 

 

----- Original message -----
From: Mark Hill <mark.hill@...>
To: "striker@..." <striker@...>
Cc: "andrew@..." <andrew@...>, "dkruckemyer@..." <dkruckemyer@...>, "gfavor@..." <gfavor@...>, "tech-privileged@..." <tech-privileged@...>
Subject: [EXTERNAL] RE: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
Date: Wed, Apr 22, 2020 4:36 AM
 

Another possible use case is access sensitive devices, for example a FIFO of 128-bit records with multiple RV64 harts reading from the FIFO.

 

From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of striker@...
Sent: 22 April 2020 06:21
To: Mark Hill <mark.hill@...>
Cc: andrew@...; dkruckemyer@...; gfavor@...; tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64

 

 

To widen your question even further Mark (no pun intended), do we need 256 bits for RV128?

 

Yes, RV128 is a bit speculative, but it does at least rate being in the book, so best to have all the consequences of the request here on the table. 

 

Also, I'm curious what you intend to use the bigger ones for?

 

The only answer here I know of is emulating CAS with the ticket/epoch/whatever counter next to the actual data element  to solve CAS A-B-A problems (which, handily LR/SC naturally avoids anyways).

 

Is that the one you're after? (Asking because if there's another reason beyond that one, I'm interested in hearing about it). 

 

Derek Williams

 

 

----- Original message -----
From: "Dr Mark Hill" <mark.hill@...>
Sent by: tech-privileged@...
To: Andrew Waterman <andrew@...>, David Kruckemyer <dkruckemyer@...>
Cc: Greg Favor <gfavor@...>, "tech-privileged@..." <tech-privileged@...>
Subject: [EXTERNAL] Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
Date: Tue, Apr 21, 2020 1:55 AM
 

 

To widen the question slightly further are there any plans to provide atomic load/store pair operations (128-bits for RV64, 64-bits for RV32)?

 

 

From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of Andrew Waterman
Sent: 20 April 2020 23:49
To: David Kruckemyer <dkruckemyer@...>
Cc: Greg Favor <gfavor@...>; tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64

 

 

 

On Mon, Apr 20, 2020 at 3:28 PM David Kruckemyer <dkruckemyer@...> wrote:

 

 

On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman <andrew@...> wrote:

 

 

On Mon, Apr 20, 2020 at 11:32 AM David Kruckemyer <dkruckemyer@...> wrote:

 

 

On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:

 

 

On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:

The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers.  The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers.  Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers?  Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?

 

The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64.  I will add the additional sentence that makes this implication explicit.

 

 

Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register.  (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)

 

In general, this depends on the peripheral and the platform.  We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error.  But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.

 

Asking this slightly differently (I think) to clarify....

 

With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.

 

The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear: 

 

 

So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?

 

I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).

 

 

Cheers,

David

 

 

 

Cheers,

David

 

 

 

 

Greg

 

 

 

 

 


Re: 32-bit accesses to mtime/mtimecmp under RV64

striker@...
 

 
Ah..... yeah, ok. that "atoimc" (single-copy atomicity) vs the "atomic" (LR/SC pair). 
 
My bad. Apologies Mark (assuming Andy is right and you meant load and store instructions that are single-copy-atomic) for the needless side trip into LR/SC.
 
Andy, thanks for the interesting point about wider LR/SC. I have some questions, but I won't bother everyone else with that... I'll just get you tomorrow. 

Derek 
 
 

----- Original message -----
From: "Andy Glew Si5" <andy.glew@...>
Sent by: tech-privileged@...
To: striker@..., mark.hill@...
Cc: Andrew Waterman <andrew@...>, dkruckemyer@..., gfavor@..., tech-privileged@...
Subject: [EXTERNAL] Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
Date: Wed, Apr 22, 2020 9:08 PM
 

I assumed that Dr Mark Hill  was talking about 256 bit atomic loads and stores to ask the FIFO, not LR/SC.

 

Also, double width CAS (and other double width atomics) is used not just for A-B-A problems, but also for things like atomically inserting into circular lists (e.g. where the list itself has pointers to both the first and the last elements singly link elements of the circle).

 

In general, if your word or address width is W

 

For atomic read modify writes:

 

You need W+V bits or A-B-A problems, where V is whatever number of bits you need for versions or epochs

 

2W bits for list heads

 

of course, 2W subsumes W+V, so we often don't make the distinction

 

And the other big user of extra width atomics RMWs being page tables, e.g. 32-bit virtual addresses with 40 bit physical addresses (stored in 64-bit PTEs).

 

Non-read modify write, atomic loads and stores of nearly any width – W, 2W, 4W - are useful for active memory devices like FIFOs. 

 

 

From: tech-privileged@... <tech-privileged@...> On Behalf Of striker@...
Sent: Wednesday, April 22, 2020 18:42
To: mark.hill@...
Cc: andrew@...; dkruckemyer@...; gfavor@...; tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64

 

 

Interesting. If you had a H/W FIFO, seems like it would be easier to make it work with  single-copy atomic loads or stores to read from or write to the FIFO rather than bothering with the tedium of LR/SC pairs?

 

Yes, you can have multiple HARTs going after the "device" concurrently, but the single-copy atomicity of the loads or stores would seem to keep those accesses to the device separated rather than LR/SC which is more to do an atomic RMW of memory. 

 

I'm having trouble seeing how LR/SC would fit there? 

 

Also, I'll assume you really do intend to use the "double-wide" LR/SC for the CAS emulation?

 

Aside from whatever this FIFO example might turn out to be? 

 

Derek 

 

----- Original message -----
From: Mark Hill <mark.hill@...>
To: "striker@..." <striker@...>
Cc: "andrew@..." <andrew@...>, "dkruckemyer@..." <dkruckemyer@...>, "gfavor@..." <gfavor@...>, "tech-privileged@..." <tech-privileged@...>
Subject: [EXTERNAL] RE: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
Date: Wed, Apr 22, 2020 4:36 AM
 

Another possible use case is access sensitive devices, for example a FIFO of 128-bit records with multiple RV64 harts reading from the FIFO.

 

From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of striker@...
Sent: 22 April 2020 06:21
To: Mark Hill <mark.hill@...>
Cc: andrew@...; dkruckemyer@...; gfavor@...; tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64

 

 

To widen your question even further Mark (no pun intended), do we need 256 bits for RV128?

 

Yes, RV128 is a bit speculative, but it does at least rate being in the book, so best to have all the consequences of the request here on the table. 

 

Also, I'm curious what you intend to use the bigger ones for?

 

The only answer here I know of is emulating CAS with the ticket/epoch/whatever counter next to the actual data element  to solve CAS A-B-A problems (which, handily LR/SC naturally avoids anyways).

 

Is that the one you're after? (Asking because if there's another reason beyond that one, I'm interested in hearing about it). 

 

Derek Williams

 

 

----- Original message -----
From: "Dr Mark Hill" <mark.hill@...>
Sent by: tech-privileged@...
To: Andrew Waterman <andrew@...>, David Kruckemyer <dkruckemyer@...>
Cc: Greg Favor <gfavor@...>, "tech-privileged@..." <tech-privileged@...>
Subject: [EXTERNAL] Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64
Date: Tue, Apr 21, 2020 1:55 AM
 

 

To widen the question slightly further are there any plans to provide atomic load/store pair operations (128-bits for RV64, 64-bits for RV32)?

 

 

From: tech-privileged@... [mailto:tech-privileged@...] On Behalf Of Andrew Waterman
Sent: 20 April 2020 23:49
To: David Kruckemyer <dkruckemyer@...>
Cc: Greg Favor <gfavor@...>; tech-privileged@...
Subject: Re: [RISC-V] [tech-privileged] 32-bit accesses to mtime/mtimecmp under RV64

 

 

 

On Mon, Apr 20, 2020 at 3:28 PM David Kruckemyer <dkruckemyer@...> wrote:

 

 

On Mon, Apr 20, 2020 at 2:38 PM Andrew Waterman <andrew@...> wrote:

 

 

On Mon, Apr 20, 2020 at 11:32 AM David Kruckemyer <dkruckemyer@...> wrote:

 

 

On Fri, Apr 17, 2020 at 7:31 PM Andrew Waterman <andrew@...> wrote:

 

 

On Fri, Apr 17, 2020 at 7:00 PM Greg Favor <gfavor@...> wrote:

The mtime and mtimecmp registers are defined as 64-bit memory-mapped registers.  The priv spec says that - in RV32 - mtimecmp can be written as a pair of 32-bit registers.  Since this was made specific to RV32, is there an intended implication in the spec that in RV64 the system must support atomic 64-bit accesses to these registers?  Or is it allowable for only non-atomic 64-bit accesses to be supported (i.e. a 64-bit access by a CPU is performed as two 32-bit accesses out in the SoC where mtime/mtimecmp are located)?

 

The spec strongly implies by omission that 64-bit accesses are atomic for RV64, in that it gives an unusually detailed RV32-specific code example to cope with non-atomicity, but mentions nothing of the sort for RV64.  I will add the additional sentence that makes this implication explicit.

 

 

Put differently, must RV64 software not assume that a 64-bit load/store will atomically read/write the register.  (Note: ARMv8 explicitly says software must not make such an atomicity assumption for accesses to memory-mapped 64-bit registers.)

 

In general, this depends on the peripheral and the platform.  We aren't trying to preclude interfacing with legacy devices and buses, so of course some 64-bit accesses to some devices will either become non-atomic or signal some sort of error.  But it's really quite useful to be able to assume that 64-bit accesses are atomic when interfacing with more modern peripherals that use 64-bit addresses, so we definitely do not want to preclude that, either.

 

Asking this slightly differently (I think) to clarify....

 

With respect to mtime/mtimecmp, does an RV64 processor place constraints on the platform, or can the platform place constraints on the RV64 processor? If the former, the implication is that the platform must provide a way for the RV64 processors to access the registers atomically with a 64b load or store. If the latter, the implication is that the platform can require the RV64 processor to access the registers non-atomically with 32b loads or stores, a la RV32.

 

The second half of my answer was addressing the more general matter. For mtime and mtimecmp specifically, the spec is now clear: 

 

 

So the only constraint is that when a 64b naturally-aligned access is made to mtime/mtimecmp, the access must be completed atomically if the platform allows 64b naturally-aligned accesses to those registers? A platform is still allowed to signal an error on such accesses and to force an RV64 processor to access those registers with 32b loads and stores, right?

 

I think your interpretation of that sentence is accurate. FWIW, the insufficiently described Linux platform does assume such accesses are legal (more precisely, the various SBI implementations make that assumption).

 

 

Cheers,

David

 

 

 

Cheers,

David

 

 

 

 

Greg