Re: On Vector Register Layout

David Horner

On 2020-06-12 7:05 a.m., Krste Asanovic wrote:
The interesting cases are mixed-width operations, which are prevalent
in low-precision multiply-accumulate kernels that dominate many
existing and emerging compute areas, but there are plenty of other
kernels that operate on mixed-width data items. Classic SIMD ISAs
handle mixed-width operations in one of five ways (would be glad to
add other known options to this list):
I will make a stab at even and odd layout for widening.

5) two versions of the widening ops are defined one for even and one odd.
The registers are divided into even:odd pairs.
Two versions of the widening ops are defined one for even and one odd.
The full widened result is the result of the operation performed on the even (or odd) halves of the pairs.
The sides of this approach are:
  a) the need for two instructions.
  b) only 1/2 of the input register bandwidth is used.
The widening operation is in lane.

Note: this approach is similar to the v0.8 LMUL=1 widening if SLEN were SEW wide.
  Logically, V0.8 does both an even (to dest) and an odd (to dest+1) set of instructions.

5B) a variation of this is possible for RVV. An even/odd widening op mode.
       vs1 provides the odd elements and vs2 provides the even elements and vd has a double width result.
    This approach has a number of advantages.
    a) When vs1 = vs2 then a single input vector provides both arguments: single read port, reduced energy cost.
    b) note that vd can also be either vs1 or vs2.
    c) as a result vd can be used as a temp for a slideup1/down1 either input to emulate even or odd pair ops.
            (this could be fused or to allow even/odd
    d) as with base even:odd operations are in lane, and with the v0.9 model up to register sets of up to 8 physical can participate.
    e) with v0.9 the ordinal masking interoperates unchanged.

Note: under v0.9 existing instructions provide supporting operations. e.g. for SEW>8 load with a 1/2 unit stride can simulate interleaved load.

I wanted to provide this option before the meeting because it clearly demonstrates another plausible approach to HPC independent of an SLEN parameter.

The presumption of SLEN, even when subsumed in the VLEN=SLEN, is not necessary for a base model.

Assuming a SLEN<=VLEN model when stipulating VLEN=SLEN is like mandating a rational ( a / b) number set and then stipulating the denominator (b ) is 1.
Better to mandate integer, a conceptually simpler number set, and introduce rational (or reals) if and when  needed.

Join { to automatically receive all group messages.