issue #393.3 - Towards a simple fractional LMUL design - third itteration .


David Horner
 

I am sending out the partial description of the next itteration for the Simple Fractioanl LMUL design.

It is incomplete because I only recently clarified in my own mind a means to represent the concepts and a nomenclature for the various LMUL>=1 and fractional register groups.

It has taken me much longer than I hoped to construct the diagrams and clarify wording. (and there continues to be additional COVID-19 demands in my home life).

I believe this provides a basis to consider further clarifications and describe alternative approaches.

-------------------------------------------------------------------------- --------------------------------------------------------------------------


Proposal to introduce register groups, SLEN and fractional SLEN into the simple register fractional LMUL model.

What has not changed:
Fractional LMUL will

  • still be denoted as 1/2, 1/4, 1/8.
  • will continue to halve the VLEN to match its denotation
  • conceptually LMUL=1 is still adjacent to 1/2

New, but fundamentally the same as for LMUL>1

Fractional groups (Striped groups of fraction registers).
A striped group of fractional registers (a fractional group) parallels LMUL>1 registers, in that:

  • the number of fractional registers in the group is a power of 2
  • the group is aligned on a multiple of the group size
  • all fractional registers in the group are of the same bit length.
  • elements are filled from 0 to vl in a round robin beginning at lowest register number.
  • . . filling proceeds to the next register after a striping number of bits are met.

The rest of this proposal talks about what has changed (even if some subtly).

Some convenient definitions:

Define fractional groups as above (to be abreviated to n:mgroups later)

Define “SEW-instructions” as those that vs1, vs2 and vd match SEW from vtype.
To clarify, they are not:
widening or narrowing
whole register moves
mask register only

Introduce register group characterization:
This proposal allows fractional groups to originate a multiple levels with their width determined by that level.
For example fractional groups with a physical width of VLEN/8 originated at LMUL=1/8.
A short hand to identify such groups will make the narrative much more readable.

Consider LMUL>=1 register groups.
They all start in LMUL=1 via a widening operation.
So 1 should be in their designation even though it is superfluous without fractional LMUL.

Consider n:m format where VLEN/n is the vector length and m is number of base-arch registers in the group.
Then we designate

  • LMUL=2 addressable registers as 1:2
  • LMUL=4 addressable registers as 1:4
  • LMUL=8 addressable registers as 1:8 and
  • LMUL=1 addressable registers are 1:1 (for completeness)

In the previously presented simple mappings of fractional LMUL, there was a presumptive understanding that widening operation sourced LMUL=1/n registers widen to LMUL=2 * (1/n) registers.

This would be represented by a table such as this:

LMUL 1/8 1/4 1/2 1 2 4 8
------------






group type






1:8




x a=0,8,16,24
1:4



x a=0,4,8,12 ...
1:2


x a=0,2,4,6, ...

1:1

x a=all


2:1
x a=all



4:1 x a=all




8:1 a=all





a = Accessible at LMUL level by SEW instructions
x = Created by widening instructions at LMUL level
(Narrowing instructions also source from this LMUL)

Note: 16:1 is intentionally omitted from the diagram although it works the same.

This proposal acknowledges that such a simplistic approach can be inefficient for many reasonable implementations.
It also acknowledges that some mandatory RVV instructions are comparably inefficient. vgather , slideup/down, and others similarly have to operate across lanes.
And further that striped register support is already present in the base design.

So this proposal introduces fractional groups beginning with table:

LMUL 1/16 1/8 1/4 1/2 1 2 4 8
------







group type







1:8





x a= 0,8, 16,24
1:4




x a= 0,4, 8,12 ...
1:2



x a=0,2, 4,6, ...

1:1



a=all


16:8

x a= 0,8, 16,24



16:4
x a= 0,4, 8,12 ...




16:2 x a=0, 2,4,6, ...





16:1 a= all






8:1
a= odd





4:1

a= odd




2:1


a= odd



LMUL **1/16 1/8 1/4 1/2 1 2 4 8

This is the same legend as above and will be assumed for all further diagrams:
a = Accessible at LMUL level by SEW instructions
x = Created by widening instructions at LMUL level
(Narrowing instructions also source from this LMUL)

Note: 8:1 , 4:1 and 2:1 were added to the table though technically not required to illustrate fraction groups. More below.

This has two undesirable features. Both of which present trade-offs

LMUL now determines both the levels fractional size and the fractional group's size

  • Registers used for fractional groups are not available for fractional registers (halved at the first level)
  • there was no need to provide addressing to other registers in LMUL>=1 as all registers were the same physical length.

The smallest fractional register size is used as the base for LMUL grouping

  • this is necessary to achieve 8 levels of grouping
  • however, the usefulness of the smallest vector register is generally limited to small element size.

Although it is possible to provide an even wider LMUL or additional fields in vtype to facilitate more states to address these concerns, the approach here will be to enlist the register numbers to provided context information.

Fist note that at any level the register numbers used by register groups are specific. In LMUL>=2 the only operands available to any operation (including widening and narrowing) were register groups. Widening to 1:8 can only be performed with 1:4 inputs. Converse for narrowing. Widening to 16:8 must use 16:4 inputs to parallel that behaviour. Taking both these observation together the comparable behaviour constraint can be incorporated into the instruction decoding using register addresses.

This allows widening to originate at other levels concurrently, as diagramed here:

LMUL 1/16 1/8 1/4 1/2 1 2 ...
------





group type





1:2



x a=0,2, 4,6, ...
1:1



a=all
16:8

x a= 0,8, 16,24

16:4
x a= 0,4, 8,12 ...


16:2 x a=0, 2,4,6, ...



16:1 a=all




8:8


x

8:4

x a= 4,12, 18,20, ...

8:2
x a= 2,6, 10,14, ...


8:1
a= odd



4:4


x

4:2

x a= 2,6, 10,14, ...

4:1

a= odd


2:2


x

2:1


a= odd

LMUL **1/16 1/8 1/4 1/2 1 2

Note: I dropped LMUL=4 and 8 only from the illustration.
Note: 16:8 is addressable (from LMUL=1/2), but 8;8, 4:4 and 2:2 are not addressable from LMUL=1.
They are however addressable from widening and narrowing instructions from LMUL=1/2.

To be continued ......