Re: Effective element width encoding in vector load/stores

David Horner

On 2020-04-16 11:02 p.m., Krste Asanovic wrote:
There are two separate issues noted with the proposal to fixed-size
vector load/stores.  One is the additional vsetvli instructions
needed, and the second is the additional widening instructions
required.  We've discussed adding more widening instructions to help
with the latter.  I have a proposal below to help with the former in a
way that improves FP also, and which also provides a solution to the
indexed vector index size wart we've had for a while.

This proposal still only supports packed load/stores, as opposed to
unpacked load/stores with sign/zero extension.  However, the
problematic instruction overhead of many additional vsetvli
instructions when simply removing fixed-size load/stores is avoided by
repurposing the width field to encode the "effective" element width
(EEW) for the current vector load/store instruction.

Using the width field, EEW is encoded as one of {8,16,32,SEW}.  This
now determines *both* the register element size and the memory element
size, where previously it only set the memory element size and
sign/zero extended this into the SEW-width register element.

What of SEW scaling factor instead. 1/4,1/2,1 and 2. This allows a much expanded dynamic range and addresses most scaling concerns.

It allows of 2 * SEW for vwop.wv source load, and store of all widened results.

And it allows source load for 4 * widening and 2 * widening to current SEW and even 8 * widening to 2 * SEW which as noted above can be the source and destination for the widening instructions.

Effective LMUL (ELMUL) is now calculated as ELMUL = (EEW/SEW)*LMUL to
with SEW scaling this becomes ELMUL = EEW*LMUL
keep SEW/LMUL constant. If this results in a bad LMUL value, an
illegal instruction exception is raised.

The effective EEW/ELMUL setting is only in effect for the single
instruction and does not change values in the vtype CSR.


Note this approach also helps floating-point code, whereas
byte/halfword/word load/stores do not.

I'm using vle32 syntax to mirror the assembler syntax for vsetvli e32 etc.

 for SEW scalingI don't have any solid nomenclature suggestions, but it could parallel lmul , lf4, lf2, l1, l2 (like I said no good ideas)

I think this also solves our indexed load/store problem.  We use
vtype.SEW to encode the data elements, but use the width-field-encoded
EEW for indices.  One wrinkle is that the largest EEW encoding
now indicates 64b not SEW, i.e., index EEW is {8,16,32,64}.

I don't believe removing SEW from index is problematic for indexed load/stores.

The program will in almost all cases know the precision of its offsets.

Indeed, it is arguable that dynamic SEW has any practical application.

Rather, I see the wrinkle as Indexed load/stores do not support the scaled  element mode present in the others.

Given the field has been repurposed to index only, then it is even less a problematic wrinkle that SEW is dropped.

# Load 32b values using 8b offsets:
vsetvli t0, a0, e32,m4
vlx8.v v8, (a1), v7  # Load 32b values into v8-11, using EEW=8,ELMUL=1 vector in v7

# Load 8b values using 64b offsets:
vsetvli t0, a0, e8,fl2
vlx64.v v1, (a1), v8  # EEW=64,ELMUL=4 indices in v8-v11, SEW=8,LMUL=1/2 in v1, 


Join to automatically receive all group messages.