Thoughts on Git update (8a9fbce) Added fractional LMUL, including modifying vector data register and vector mask register layouts for SLEN<VLEN implementations.
David Horner
First some observations from the revised LMUL.
*1 The format for a given SLEN and SEW is the same for all LMUL>=1
*2 LMUL=n is equivalent to LMUL=2 * n with vl < 1/2 vlmax at that level, for n=1,2,4.
*3 Doubling SEW halves the number of elements in the same number of register bits, and visa versa..
The first provide the benefits that quad or higher widening with ESEW <= SLEN stays in data lanes.
(resolving an ugly characteristic of quad widening.)
The combined these leads to a realization that vl is the determinant of the register group size.
If vsetvli were separately provided the number of physical registers to calculate vl, LMUL>1 is eliminated.
The format for LMUL=1/2
- does not align with LMUL=1 complicating mixed width instructions.
- is wasteful of space
- but it does reduce the active portion of registers, that could benefit renaming machines (if they rename at sufficient low granularity).
Noting that point *2 could be extended into LMUL=1/2 and in conjunction with point *3:
Widening operations to LMUL=1 can equivalently be sourced from LMUL=1 where
source is 1/2 SEW of widened result and
vl is length of widened result.
Rephrased relative to source SEW:
At LMUL=1, widening operations
take source of SEW width elements and length vl,
and create widened result as LMUL=1 with 2*SEW and length of vl.
I recommend this uniformity apply through "fractional modes" that allocate 1/2, 1/4, etc. of the physical registers bits.
A specific optimization, such as dynamic VLEN can address the renaming micro-architectures efficiency issue.
Instead I recommend "fractional modes" that fill 1/2 of each SLEN before moving on to next physical register, with one mode using the first half and the other mode the other half.
Similar to proposed in #412 Fractional vtype field vfill – Fractional Fill order and Fractional Instruction eLement Location.
As that proposal was designed to be added to the previous LMUL modes, I am working through details of such encoding now for a revised proposal.
However, in the interim I thought these considerations might be helpful as is.
*1 The format for a given SLEN and SEW is the same for all LMUL>=1
*2 LMUL=n is equivalent to LMUL=2 * n with vl < 1/2 vlmax at that level, for n=1,2,4.
*3 Doubling SEW halves the number of elements in the same number of register bits, and visa versa..
The first provide the benefits that quad or higher widening with ESEW <= SLEN stays in data lanes.
(resolving an ugly characteristic of quad widening.)
The combined these leads to a realization that vl is the determinant of the register group size.
If vsetvli were separately provided the number of physical registers to calculate vl, LMUL>1 is eliminated.
The format for LMUL=1/2
- does not align with LMUL=1 complicating mixed width instructions.
- is wasteful of space
- but it does reduce the active portion of registers, that could benefit renaming machines (if they rename at sufficient low granularity).
Noting that point *2 could be extended into LMUL=1/2 and in conjunction with point *3:
Widening operations to LMUL=1 can equivalently be sourced from LMUL=1 where
source is 1/2 SEW of widened result and
vl is length of widened result.
Rephrased relative to source SEW:
At LMUL=1, widening operations
take source of SEW width elements and length vl,
and create widened result as LMUL=1 with 2*SEW and length of vl.
I recommend this uniformity apply through "fractional modes" that allocate 1/2, 1/4, etc. of the physical registers bits.
A specific optimization, such as dynamic VLEN can address the renaming micro-architectures efficiency issue.
Instead I recommend "fractional modes" that fill 1/2 of each SLEN before moving on to next physical register, with one mode using the first half and the other mode the other half.
Similar to proposed in #412 Fractional vtype field vfill – Fractional Fill order and Fractional Instruction eLement Location.
As that proposal was designed to be added to the previous LMUL modes, I am working through details of such encoding now for a revised proposal.
However, in the interim I thought these considerations might be helpful as is.