On 2020-04-18 3:14 p.m., Krste Asanovic
wrote:
Location of tail/mask-agnostic bits
Discussion reached consensus on placing these in the vtype register,
noting that later extensions could redefine vtype CSR written by
vsetvl as a "window" into a larger group of vector configuration CSRs.
It was noted that there is no space to add more configuration
instructions in existing footprint, and that the exisitng instructions
fit the base immediate format.
As I had asked that pagan question of opcode space I thought I
should try to address the problem:
I opened issue https://github.com/riscv/riscv-v-spec/issues/423
Which I paste here for your convenience
minutes of TG meeting suggested that if the encoding runs out in
vsetvli we can introduce another instruction to set other bits.
noting that later extensions could redefine vtype CSR
written by
vsetvl as a "window" into a larger group of
vector configuration CSRs.
That allows avoiding the register based vsetvl instruction in
common cases.
However, within the encoding space used for vsetvli and vsetvl,
there is no room for a further vsetvl2i with another 11 bit
immediate.
Quoting further from meeting notes:
It was noted that there is no space to add more
configuration
instructions in existing footprint,
We don't need another instruction to calculate vl. The currently
proposed addition fields, TAMA (Tail and Mask fill directives) and
EDIV (Element DIVision extension) do not modify vl as the SEW/LMUL
ratio is maintained. Similarly a SEW scaling factor that also
adjusts
LMUL to maintain the SEW/LMUL ratio does not change vl.
A vmodtype instruction can be encoded in the remaining opcode
space that uses rs1, rs2 and rd as 15 additional bits for setting
additional vtype fields.
Further the opcode space allows for 64 such instructions and
variations of them. Obviously we don't need all 64 of them and we
will want to reserve the opcode space for future needs. However, I
have a proposal for two distinct instruction types.
The first is as outlined above, the vmodtype instruction, further
defined here:
-
the 15 bits encoded in rs1, rs2 and rd are subdivided for
specific purposes to modify the vtype register, that is its
controlling fields.
rd could be reserved for now, to simplify decoding. Noting
rd=0 already has special meaning.
rs1 could also be initially reserved as it only maps to
immediate fields in the U-type format.
rs2 already maps into the immediate fields for I-type, so it
is the obvious choice for initial use.
-
any modification must retain the SEW/LMUL ratio or fail, e.g.
if the resultant LMUL>8.
-
any modifications that change SEW and hence LMUL will store
the resultant sew/lmul bits in vtype (a persistent change)
-
other bits that change other existing or new vtype fields
will store the modification of that affect field bits in
vtype.
It can be expected that the modification will simply be
over-writing the affect fields with the corresponding bits
from the register fields.
The second is a "transient" setting, vmodinstr (maybe OK
name?) that could provide up to 15 prefix bits for the next
executed
vector instruction.
-
The potentially 15 bits are also sourced from rs1, rs2 and rd
in the instruction.
As suggested above, rd could be excluded for simpler decoding
and limited need.
For RVV32 there is a further reason to limit to 10 bits:
the proposed use will compete with persistent vtype bits
"modded" by the above vmodtype instruction.
Of course, "10 bits should be enough for anyone": to misquote
a famous misquote ;)
So let us, reasonably, assume 10 bits in vtype will suffice
for this purpose for now.
-
All 10 bits are set in a corresponding 10 bit field in vtype
(suggested bits [30:21] for RVV32 and bits [62:52] for RVV64)
-
All 10 bits are interpreted in conjunction with the next
execute vector instruction, effectively increasing its
instruction length by 10 bits.
-
at the completion of that vector instruction the 10 bits are
cleared.
-
To aid the expectation that the store of this prefix into
vtype can be made virtual and be virtually supported:
a) vstart is not changed by the vmodinstr instruction.
b) if vmodinstr sources unexpected bits, the instruction
raises and invalid instruction exception (no surprise here,
but it does support virtualization)
c) if vmodinstr is not immediately followed by an appropriate
vector instruction
i) an exception occurs on the vmodinstr and
ii) the prefix bits in vtype are cleared.
(this supports a virtual write of the bits to vtype and clearing of them)
d) if vector instruction immediately following vmodinstr
does not support its 10 bits of prefix,
i) the vector instruction raises an illegal instruction exception and
ii) the prefix bits in vtype are cleared.
d) if an interrupt occurs during the execution of the
prefixed vector instruction, it may
i) update the epc to point to the preceding vmodinstr and also
ii) set vstart appropriately for resumption of the prefixed instruction
An implementation is free to physically implement the 10
transient
bits.
In which case, vmodinstr will update them and the following
executed prefixed vector instruction clears them.
: