issue #393 - Towards a simple fractional LMUL design.
I'm sending out to the correct mailing list a copy of the
revised issue #393.
This was requested at the last TG meeting.
I believe it is consistent with casual discussions of fractional LMUL and it is intended to formalize a design.To follow is the consideration of alternate register overlap to improve usability.
The issue #393 update adds to the Glossary and notes that mask registers and operations are unchanged from the plan of record.
Prior to LMUL, an elaborate mapping of registers numbers to various width element under different configuration settings that allowed for polymorphic operations was proposed.
LMUL was introduced in a pre-v0.5 Nov 2018 in conjunction with
widening operations and SEW widths.
This issue will look at simplest implementations of fraction LMUL.
base-arch registers* – the 32 registers addressable when LMUL=1
The simplest extensions to the base retain the
The simplest extension of LMUL to “fractional” is that
the observe affects continue predictably.
For LMUL >=1, VLMAX = LMUL * VLEN/SEW
This table exhaustively represents this simplest extension effect when SEW is unchanged throughout:
Fractional registers then have diminished capacity, 1/2 to 1/8th of a base-arch register.
The simplest mapping of fractional LMUL registers is
one to one (and only one) of the base-arch registers.
The simplest overlay (analogous to the register group
overlay of consecutive base-arch registers) is with zero
I call this iteration zero of the simplest fractional LMUL designs.
Note: Mask behaviour does not change. Mask operations read and write to a base-arch register. Base-arch register zero remains the default mask register. With this "iteration zero" design, as with LMUL>=1, fractional LMUL “register zero”s are substantially limited in their use.
There are some undesirable characteristic of this design.
My apologies, especially to those who have sent some feedback.
I had thought I had already sent this second iteration (It has been on git hub issue since Monday.
A slightly less simple design to partially address the destructive nature of register overlay.
Because the low (zero) elements aligning in the overlay the sub-group is in the active portion of the base-arch register the destructive impact is unavoidable. Similarly, an operation that writes to the base-arch register overwrites at least some of the register sub-group.
However, if instead the VLMAX elements of the base-arch register and the register sub-group are aligned then judicious use of vl can avoid mutual assured destruction. Register names would remain in the one to one correlation. However, the register sub-groups would start at 1/2 VLEN, 3/4 VLEN and 7/8 vlen depending upon fractional LMUL.
Consider when LMUL=1 and tail-undisturbed is active and
VLEN a power of 2.
In the perfect scenario registers will all be used to their maximum with fractional LMUL support.
With appropriate values of SLEN, LMUL>1 can also use the reduced vl to allow consecutive fractional register sub-groups to co-exist.
Nor is the technique restricted for LMUL >=1. LMUL=1/2 can tail protect 1/4 and 1/8th; and 1/4 LMUL tail protect 1/8th.
However, this is still not fully ideal.