I am sending out the partial description of the next itteration
for the Simple Fractioanl LMUL design.
It is incomplete because I only recently clarified in my own mind
a means to represent the concepts and a nomenclature for the
various LMUL>=1 and fractional register groups.
It has taken me much longer than I hoped to construct the
diagrams and clarify wording. (and there continues to be
additional COVID-19 demands in my home life).
I believe this provides a basis to consider further
clarifications and describe alternative approaches.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Proposal to introduce register groups, SLEN and
fractional SLEN into the simple register fractional LMUL model.
What has not changed:
Fractional LMUL will
- still be denoted as 1/2, 1/4, 1/8.
- will continue to halve the VLEN to match its denotation
- conceptually LMUL=1 is still adjacent to 1/2
New, but fundamentally the same as for LMUL>1
Fractional groups (Striped groups of fraction registers).
A striped group of fractional registers (a fractional group)
parallels LMUL>1 registers, in that:
- the number of fractional registers in the group is a power of
2
- the group is aligned on a multiple of the group size
- all fractional registers in the group are of the same bit
length.
- elements are filled from 0 to vl in a round robin beginning at
lowest register number.
- . . filling proceeds to the next register after a striping
number of bits are met.
The rest of this proposal talks about what has changed
(even if some subtly).
Some convenient definitions:
Define fractional groups as above (to be abreviated to
n:mgroups later)
Define “SEW-instructions” as those that vs1,
vs2 and vd match SEW from vtype.
To clarify, they are not:
widening or narrowing
whole register moves
mask register only
Introduce register group characterization:
This proposal allows fractional groups to originate a multiple
levels with their width determined by that level.
For example fractional groups with a physical width of VLEN/8
originated at LMUL=1/8.
A short hand to identify such groups will make the narrative much
more readable.
Consider LMUL>=1 register groups.
They all start in LMUL=1 via a widening operation.
So 1 should be in their designation even though it is superfluous
without fractional LMUL.
Consider n:m format where VLEN/n is the vector length and m is
number of base-arch registers in the group.
Then we designate
- LMUL=2 addressable registers as 1:2
- LMUL=4 addressable registers as 1:4
- LMUL=8 addressable registers as 1:8 and
- LMUL=1 addressable registers are 1:1 (for completeness)
In the previously presented simple mappings of fractional LMUL, there
was a presumptive understanding that widening operation sourced
LMUL=1/n registers widen to LMUL=2 * (1/n) registers.
This would be represented by a table such as this:
LMUL |
1/8 |
1/4 |
1/2 |
1 |
2 |
4 |
8 |
------------ |
|
|
|
|
|
|
|
group type |
|
|
|
|
|
|
|
1:8 |
|
|
|
|
|
x |
a=0,8,16,24 |
1:4 |
|
|
|
|
x |
a=0,4,8,12 ... |
|
1:2 |
|
|
|
x |
a=0,2,4,6, ... |
|
|
1:1 |
|
|
x |
a=all |
|
|
|
2:1 |
|
x |
a=all |
|
|
|
|
4:1 |
x |
a=all |
|
|
|
|
|
8:1 |
a=all |
|
|
|
|
|
|
a = Accessible at LMUL level by SEW instructions
x = Created by widening instructions at LMUL level
(Narrowing instructions also source from this LMUL)
Note: 16:1 is intentionally omitted from the diagram although it
works the same.
This proposal acknowledges that such a simplistic approach can be
inefficient for many reasonable implementations.
It also acknowledges that some mandatory RVV instructions are
comparably inefficient. vgather , slideup/down, and others
similarly have to operate across lanes.
And further that striped register support is already present in
the base design.
So this proposal introduces fractional groups beginning
with table:
LMUL |
1/16 |
1/8 |
1/4 |
1/2 |
1 |
2 |
4 |
8 |
------ |
|
|
|
|
|
|
|
|
group type |
|
|
|
|
|
|
|
|
1:8 |
|
|
|
|
|
|
x |
a= 0,8, 16,24 |
1:4 |
|
|
|
|
|
x |
a= 0,4, 8,12 ... |
|
1:2 |
|
|
|
|
x |
a=0,2, 4,6, ... |
|
|
1:1 |
|
|
|
|
a=all |
|
|
|
16:8 |
|
|
x |
a= 0,8, 16,24 |
|
|
|
|
16:4 |
|
x |
a= 0,4, 8,12 ... |
|
|
|
|
|
16:2 |
x |
a=0, 2,4,6, ... |
|
|
|
|
|
|
16:1 |
a= all |
|
|
|
|
|
|
|
8:1 |
|
a= odd |
|
|
|
|
|
|
4:1 |
|
|
a= odd |
|
|
|
|
|
2:1 |
|
|
|
a= odd |
|
|
|
|
LMUL |
**1/16 |
1/8 |
1/4 |
1/2 |
1 |
2 |
4 |
8 |
This is the same legend as above and will be assumed for all
further diagrams:
a = Accessible at LMUL level by SEW instructions
x = Created by widening instructions at LMUL level
(Narrowing instructions also source from this LMUL)
Note: 8:1 , 4:1 and 2:1 were added to the table though
technically not required to illustrate fraction groups. More
below.
This has two undesirable features. Both of which present
trade-offs
LMUL now determines both the levels fractional size and
the fractional group's size
- Registers used for fractional groups are not available for
fractional registers (halved at the first level)
- there was no need to provide addressing to other registers in
LMUL>=1 as all registers were the same physical length.
The smallest fractional register size is used as the base
for LMUL grouping
- this is necessary to achieve 8 levels of grouping
- however, the usefulness of the smallest vector register is
generally limited to small element size.
Although it is possible to provide an even wider LMUL or
additional fields in vtype to facilitate more states to address
these concerns, the approach here will be to enlist the register
numbers to provided context information.
Fist note that at any level the register numbers used by register
groups are specific. In LMUL>=2 the only operands available to
any operation (including widening and narrowing) were register
groups. Widening to 1:8 can only be performed with 1:4 inputs.
Converse for narrowing. Widening to 16:8 must use 16:4 inputs to
parallel that behaviour. Taking both these observation together
the comparable behaviour constraint can be incorporated into the
instruction decoding using register addresses.
This allows widening to originate at other levels
concurrently, as diagramed here:
LMUL |
1/16 |
1/8 |
1/4 |
1/2 |
1 |
2 ... |
------ |
|
|
|
|
|
|
group type |
|
|
|
|
|
|
1:2 |
|
|
|
|
x |
a=0,2, 4,6, ... |
1:1 |
|
|
|
|
a=all |
|
16:8 |
|
|
x |
a= 0,8, 16,24 |
|
|
16:4 |
|
x |
a= 0,4, 8,12 ... |
|
|
|
16:2 |
x |
a=0, 2,4,6, ... |
|
|
|
|
16:1 |
a=all |
|
|
|
|
|
8:8 |
|
|
|
x |
|
|
8:4 |
|
|
x |
a= 4,12, 18,20, ... |
|
|
8:2 |
|
x |
a= 2,6, 10,14, ... |
|
|
|
8:1 |
|
a= odd |
|
|
|
|
4:4 |
|
|
|
x |
|
|
4:2 |
|
|
x |
a= 2,6, 10,14, ... |
|
|
4:1 |
|
|
a= odd |
|
|
|
2:2 |
|
|
|
x |
|
|
2:1 |
|
|
|
a= odd |
|
|
LMUL |
**1/16 |
1/8 |
1/4 |
1/2 |
1 |
2 |
Note: I dropped LMUL=4 and 8 only from the illustration.
Note: 16:8 is addressable (from LMUL=1/2), but 8;8, 4:4 and 2:2
are not addressable from LMUL=1.
They are however addressable from widening and narrowing
instructions from LMUL=1/2.
To be continued ......