Re: Effective element width encoding in vector load/stores


David Horner
 


On 2020-04-16 11:02 p.m., Krste Asanovic wrote:
There are two separate issues noted with the proposal to fixed-size
vector load/stores.  One is the additional vsetvli instructions
needed, and the second is the additional widening instructions
required.  We've discussed adding more widening instructions to help
with the latter.  I have a proposal below to help with the former in a
way that improves FP also, and which also provides a solution to the
indexed vector index size wart we've had for a while.

This proposal still only supports packed load/stores, as opposed to
unpacked load/stores with sign/zero extension.  However, the
problematic instruction overhead of many additional vsetvli
instructions when simply removing fixed-size load/stores is avoided by
repurposing the width field to encode the "effective" element width
(EEW) for the current vector load/store instruction.

Using the width field, EEW is encoded as one of {8,16,32,SEW}.  This
now determines *both* the register element size and the memory element
size, where previously it only set the memory element size and
sign/zero extended this into the SEW-width register element.

What of SEW scaling factor instead. 1/4,1/2,1 and 2. This allows a much expanded dynamic range and addresses most scaling concerns.

It allows of 2 * SEW for vwop.wv source load, and store of all widened results.

And it allows source load for 4 * widening and 2 * widening to current SEW and even 8 * widening to 2 * SEW which as noted above can be the source and destination for the widening instructions.


Effective LMUL (ELMUL) is now calculated as ELMUL = (EEW/SEW)*LMUL to
with SEW scaling this becomes ELMUL = EEW*LMUL
keep SEW/LMUL constant. If this results in a bad LMUL value, an
illegal instruction exception is raised.

The effective EEW/ELMUL setting is only in effect for the single
instruction and does not change values in the vtype CSR.


yes.

Note this approach also helps floating-point code, whereas
byte/halfword/word load/stores do not.

yes.
I'm using vle32 syntax to mirror the assembler syntax for vsetvli e32 etc.

 for SEW scalingI don't have any solid nomenclature suggestions, but it could parallel lmul , lf4, lf2, l1, l2 (like I said no good ideas)

I think this also solves our indexed load/store problem.  We use
vtype.SEW to encode the data elements, but use the width-field-encoded
EEW for indices.  One wrinkle is that the largest EEW encoding
now indicates 64b not SEW, i.e., index EEW is {8,16,32,64}.

I don't believe removing SEW from index is problematic for indexed load/stores.

The program will in almost all cases know the precision of its offsets.

Indeed, it is arguable that dynamic SEW has any practical application.

Rather, I see the wrinkle as Indexed load/stores do not support the scaled  element mode present in the others.

Given the field has been repurposed to index only, then it is even less a problematic wrinkle that SEW is dropped.



# Load 32b values using 8b offsets:
vsetvli t0, a0, e32,m4
vlx8.v v8, (a1), v7  # Load 32b values into v8-11, using EEW=8,ELMUL=1 vector in v7

# Load 8b values using 64b offsets:
vsetvli t0, a0, e8,fl2
vlx64.v v1, (a1), v8  # EEW=64,ELMUL=4 indices in v8-v11, SEW=8,LMUL=1/2 in v1, 


Krste

Join tech-vector-ext@lists.riscv.org to automatically receive all group messages.