issue #393 - Towards a simple fractional LMUL design.
David Horner
I'm sending out to the correct mailing list a copy of the
revised issue #393.
(link:
https://github.com/riscv/riscv-v-spec/issues/393 ) This was requested at the last TG meeting. I believe it is consistent with casual discussions of fractional LMUL and it is intended to formalize a design. To follow is the consideration of alternate register overlap to improve usability.The issue #393 update adds to the Glossary and notes that mask registers and operations are unchanged from the plan of record.
Towards a simple fractional LMUL design. Background: Prior to LMUL, an elaborate mapping of registers numbers to various width element under different configuration settings that allowed for polymorphic operations was proposed. LMUL was introduced in a pre-v0.5 Nov 2018 in conjunction with
widening operations and SEW widths.
This issue will look at simplest implementations of fraction LMUL. Glossary: base-arch registers* – the 32 registers addressable when LMUL=1 Guidance. The simplest extensions to the base retain the
fundamental characteristics. The simplest extension of LMUL to “fractional” is that
the observe affects continue predictably.
For LMUL >=1, VLMAX = LMUL * VLEN/SEW This table exhaustively represents this simplest extension effect when SEW is unchanged throughout:
Fractional registers then have diminished capacity, 1/2 to 1/8th of a base-arch register. The simplest mapping of fractional LMUL registers is
one to one (and only one) of the base-arch registers. The simplest overlay (analogous to the register group
overlay of consecutive base-arch registers) is with zero
elements overlaying. I call this iteration zero of the simplest fractional LMUL designs. Note: Mask behaviour does not change. Mask operations read and write to a base-arch register. Base-arch register zero remains the default mask register. With this "iteration zero" design, as with LMUL>=1, fractional LMUL “register zero”s are substantially limited in their use. There are some undesirable characteristic of this design.
|
|
David Horner
My apologies, especially to those who have sent some feedback. I had thought I had already sent this second iteration (It has been on git hub issue since Monday.
A slightly less simple design to partially address the destructive nature of register overlay.
Because the low (zero) elements aligning in the overlay the sub-group is in the active portion of the base-arch register the destructive impact is unavoidable. Similarly, an operation that writes to the base-arch register overwrites at least some of the register sub-group.
However, if instead the VLMAX elements of the base-arch register and the register sub-group are aligned then judicious use of vl can avoid mutual assured destruction. Register names would remain in the one to one correlation. However, the register sub-groups would start at 1/2 VLEN, 3/4 VLEN and 7/8 vlen depending upon fractional LMUL.
Consider when LMUL=1 and tail-undisturbed is active and
VLEN a power of 2.
In the perfect scenario registers will all be used to their maximum with fractional LMUL support.
With appropriate values of SLEN, LMUL>1 can also use the reduced vl to allow consecutive fractional register sub-groups to co-exist. Nor is the technique restricted for LMUL >=1. LMUL=1/2 can tail protect 1/4 and 1/8th; and 1/4 LMUL tail protect 1/8th.
However, this is still not fully ideal.
|
|