Re: RISC-V Vector TG meeting minutes, April 17, 2020
Location of tail/mask-agnostic bits Discussion reached consensus on placing these in the vtype register, noting that later extensions could redefine vtype CSR written by vsetvl as a "window" into a larger group of vector configuration CSRs. It was noted that there is no space to add more configuration instructions in existing footprint, and that the exisitng instructions fit the base immediate format.
As I had asked that pagan question of opcode space I thought I should try to address the problem:
I opened issue https://github.com/riscv/riscv-v-spec/issues/423
Which I paste here for your convenience
additional instructions to set vtype fields. #423
minutes of TG meeting suggested that if the encoding runs out in vsetvli we can introduce another instruction to set other bits.
noting that later extensions could redefine vtype CSR written by
vsetvl as a "window" into a larger group of vector configuration CSRs.
That allows avoiding the register based vsetvl instruction in common cases.
However, within the encoding space used for vsetvli and vsetvl, there is no room for a further vsetvl2i with another 11 bit immediate.
Quoting further from meeting notes:
It was noted that there is no space to add more configuration
instructions in existing footprint,
We don't need another instruction to calculate vl. The currently proposed addition fields, TAMA (Tail and Mask fill directives) and EDIV (Element DIVision extension) do not modify vl as the SEW/LMUL ratio is maintained. Similarly a SEW scaling factor that also adjusts LMUL to maintain the SEW/LMUL ratio does not change vl.
A vmodtype instruction can be encoded in the remaining opcode space that uses rs1, rs2 and rd as 15 additional bits for setting additional vtype fields.
Further the opcode space allows for 64 such instructions and variations of them. Obviously we don't need all 64 of them and we will want to reserve the opcode space for future needs. However, I have a proposal for two distinct instruction types.
The first is as outlined above, the vmodtype instruction, further defined here:
the 15 bits encoded in rs1, rs2 and rd are subdivided for specific purposes to modify the vtype register, that is its controlling fields.
rd could be reserved for now, to simplify decoding. Noting rd=0 already has special meaning.
rs1 could also be initially reserved as it only maps to immediate fields in the U-type format.
rs2 already maps into the immediate fields for I-type, so it is the obvious choice for initial use.
any modification must retain the SEW/LMUL ratio or fail, e.g. if the resultant LMUL>8.
any modifications that change SEW and hence LMUL will store the resultant sew/lmul bits in vtype (a persistent change)
other bits that change other existing or new vtype fields will store the modification of that affect field bits in vtype.
It can be expected that the modification will simply be over-writing the affect fields with the corresponding bits from the register fields.
The second is a "transient" setting, vmodinstr (maybe OK name?) that could provide up to 15 prefix bits for the next executed vector instruction.
The potentially 15 bits are also sourced from rs1, rs2 and rd in the instruction.
As suggested above, rd could be excluded for simpler decoding and limited need.
For RVV32 there is a further reason to limit to 10 bits:
the proposed use will compete with persistent vtype bits "modded" by the above vmodtype instruction.
Of course, "10 bits should be enough for anyone": to misquote a famous misquote ;)
So let us, reasonably, assume 10 bits in vtype will suffice for this purpose for now.
All 10 bits are set in a corresponding 10 bit field in vtype (suggested bits [30:21] for RVV32 and bits [62:52] for RVV64)
All 10 bits are interpreted in conjunction with the next execute vector instruction, effectively increasing its instruction length by 10 bits.
at the completion of that vector instruction the 10 bits are cleared.
To aid the expectation that the store of this prefix into vtype can be made virtual and be virtually supported:
a) vstart is not changed by the vmodinstr instruction.
b) if vmodinstr sources unexpected bits, the instruction raises and invalid instruction exception (no surprise here, but it does support virtualization)
c) if vmodinstr is not immediately followed by an appropriate vector instruction
i) an exception occurs on the vmodinstr and
ii) the prefix bits in vtype are cleared.
(this supports a virtual write of the bits to vtype and clearing of them)
d) if vector instruction immediately following vmodinstr does not support its 10 bits of prefix,
i) the vector instruction raises an illegal instruction exception and
ii) the prefix bits in vtype are cleared.
d) if an interrupt occurs during the execution of the prefixed vector instruction, it may
i) update the epc to point to the preceding vmodinstr and also
ii) set vstart appropriately for resumption of the prefixed instruction
An implementation is free to physically implement the 10 transient bits.
In which case, vmodinstr will update them and the following executed prefixed vector instruction clears them.