On 2020-04-16 11:02 p.m., Krste
Asanovic wrote:
There are two separate
issues noted with the proposal to fixed-size
vector load/stores. One
is the additional vsetvli instructions
needed, and the second
is the additional widening instructions
required. We've
discussed adding more widening instructions to help
with the latter. I have
a proposal below to help with the former in a
way that improves FP
also, and which also provides a solution to the
indexed vector index
size wart we've had for a while.
This proposal still only
supports packed load/stores, as opposed to
unpacked load/stores
with sign/zero extension. However, the
problematic instruction
overhead of many additional vsetvli
instructions when simply
removing fixed-size load/stores is avoided by
repurposing the width
field to encode the "effective" element width
(EEW) for the current
vector load/store instruction.
Using the width field,
EEW is encoded as one of {8,16,32,SEW}. This
now determines *both*
the register element size and the memory element
size, where previously
it only set the memory element size and
sign/zero extended this
into the SEW-width register element.
What of SEW scaling factor instead. 1/4,1/2,1 and 2. This allows
a much expanded dynamic range and addresses most scaling concerns.
It allows of 2 * SEW for vwop.wv source load, and store of all
widened results.
And it allows source load for 4 * widening and 2 * widening to
current SEW and even 8 * widening to 2 * SEW which as noted above
can be the source and destination for the widening instructions.
Effective LMUL (ELMUL)
is now calculated as ELMUL = (EEW/SEW)*LMUL to
with SEW scaling this becomes
ELMUL = EEW*LMUL
keep SEW/LMUL constant.
If this results in a bad LMUL value, an
illegal instruction
exception is raised.
The effective EEW/ELMUL
setting is only in effect for the single
instruction and does not
change values in the vtype CSR.
yes.
Note this approach also
helps floating-point code, whereas
byte/halfword/word
load/stores do not.
yes.
I'm using vle32 syntax
to mirror the assembler syntax for vsetvli e32 etc.
for SEW scalingI don't have any solid nomenclature suggestions,
but it could parallel lmul , lf4, lf2, l1, l2 (like I said no good
ideas)
I think this also solves
our indexed load/store problem. We use
vtype.SEW to encode the
data elements, but use the width-field-encoded
EEW for indices. One
wrinkle is that the largest EEW encoding
now indicates 64b not
SEW, i.e., index EEW is {8,16,32,64}.
I don't believe removing SEW from index is problematic for
indexed load/stores.
The program will in almost all cases know the precision of its
offsets.
Indeed, it is arguable that dynamic SEW has any practical
application.
Rather, I see the wrinkle as Indexed load/stores do not support
the scaled element mode present in the others.
Given the field has been repurposed to index only, then it is
even less a problematic wrinkle that SEW is dropped.
# Load 32b values using
8b offsets:
vsetvli t0, a0, e32,m4
vlx8.v v8, (a1), v7 #
Load 32b values into v8-11, using EEW=8,ELMUL=1 vector in v7
# Load 8b values using
64b offsets:
vsetvli t0, a0, e8,fl2
vlx64.v v1, (a1), v8 #
EEW=64,ELMUL=4 indices in v8-v11, SEW=8,LMUL=1/2 in v1,
Krste