intro to #421 Fractional vtype field vfill and #418 vlmt...

David Horner

Previous issues I opened on fractional LMUL were exploratory, suggesting various ways to encode and enable the feature.
The latest 4 issues opened on github are specific proposals based on the strawman recommendation Krste made on March 24th.
These are complimentary proposals which can be implemented individually or collectively.

#411 CLSTR and clstr: width specifiers for data cluster in each SLEN chunk. (when LMUL<=1/2)  

Defines the user visible machine specific fractional LMUL characteristics.
No new functionality is proposed for the strawman model.
The definition does allow for a relaxing of SEW size constraints Krste proposed for strawman model.

#413 cluster/decluster instructions: with LMUL<1 loads/stores provide byte/half/word support. 
This proposal relies on the strawman model, explained using the CLSTR model, to reformat fractional LMUL data to that of LMUL=1.
In conjunction with load and store byte/half/word to/from double are effected.
(I noted this issue in an email response to the minutes posting, but apparently the message was garbled).

#418 Introduce vlmt (vl multiplicative threshold) / VLMT Vector LiMiT 

This is a concrete proposal that augments the strawman model and the LMUL<4 structures to allow between 1 and 8 physical registers for a register group at each LMUL level. Its major benefits are 1) to allow register groups of 3,5,6 and 7 which can reduce register pressure and resultant spills, and 2) allow register groups for fractional data. This latter is particularly valuable, within the strawman model,  as register groups are required for fractional LMUL to allow same vl length as LMUL>1 structures.

#421 Fractional vtype field vfill – Fractional Fill order and Fractional Instruction eLement Location 

This proposal enables more modes of operation over the strawman model.
It reduces to LMUL>=1 when vfill=00 and to the strawman model when vfill=01.

The additional modes (vfill=10 and vfil=11) allow for
     1) doubling the capacity of fractional registers (and register groups if #418 is also enacted)
     2) processing pairs of clustered elements in tandem.
     3) allow a single physical register to source the data for widening operations in the common case LMUL=1/2.

I realize everyone is busy with various aspects of the standard and that not everyone subscribes to github riscv-v-spec, so I wanted to send out his overview prior to tomorrows meeting.