On Sun, Mar 8, 2020 at 1:42 AM Krste Asanovic <krste@...> wrote:
It doesn't look like all these issues were addressed either on mailing
list or on github tracker. In general, it is much better to split
feedback into individual topics. I'm responding all-in-one below, but
if you want to continue any of these threads, please add to github
tracker as independent issues.
Several of these issues might be addressed with anticipated longer
64-bit vector instructions (ILEN=64).
| Dear all,
On Fri, 22 Nov 2019 16:17:33 +0100, Dr. Adrián Cristal <adrian.cristal@...> said:
| We have been involved in RTL design of an accelerator featuring the 0.7.1 standard; in parallel, we were running some vectorized HPC kernels on
| a simulator platform which mimics the accelerator. We have come across the following:
| 1. With respect to new proposed spill instructions, the cost could be high in case we need to spill few elements of a large vector. Instead we
| propose the following: spill v1, rs1, [rs2], recoverspill v2,rs2 and unspill rs1.
| The semantic is the following: spill v1, rs1, [rs2], will store v1 in the address rs1 up to rs2 elements. There is not a warranty that the
| values are stored, but there is the warranty that if they are not stored they will be completely recovered by the recoverspill v2, rs2 (
| otherwise recoverspill v2,rs2 will read the values from memory if they at some time were written to memory). The unspill rs1, will disable the
| capability to recoverspill operation at address rs1. In the case of OoO processor, for spill it can delay the free of the vector physical
| register on spill and assign to the logical register again on recovespill. So the cost will be much less, but it can be saved if the processor
| needs more physical registers. For in order implemenetations, it will save the registers on memory.
On a context switch, the underlying physical registers that hold spill
values need to be saved/restored to memory, as well as the associated
rs1 values. This means we need extra instructions to iterate through
the physical registers, and their associated rs1 values, to also save
as part of the context switch. Not impossible, but it is more complex
than just adding the 3 proposed instructions. Instead, you could just
force the spill to memory of all tracked registers, but then you run
into the delayed memory page faults etc brought up by Krste.
Instead, at the system level you could have a tightly-coupled memory
(TCM) as an addressable scratchpad where vector registers get written
during a spill? (This TCM could be used for any purpose.)
| 4- If a register is defined with a VL, then the values after VL (between VL+1 to VLMAX) could be set to “undefined” or 0, it would benefit
| implementations with longer vectors considerably in an OoO processor. The current text states that the previous values should be preserved.
| 5- Proposal to add some semantic information to the vectors. In particular it is important to know the VL and the SEW at the time vector was
| created, so when the context is saved we can have a special load/store operation that will only use the minimal storage that is needed in
| memory (plus the length and sew). Also this allows the system to know that after the VL, we do not have to preserve any element.
Except that all values after VL do have to be stored on a context
switch. Just because a VL was used to modify head elements of a
vector, it doesn't mean the tail elements can be ignored (under the
current tail-undisturbed model).