Re: On Vector Register Layout

Bill Huffman

Hi David,

If I try to compare this to the current proposal, it seems to me there
are two major differences.

** A layout difference in the wide registers where elements alternate
between two registers instead of going through first one and then the other.

** Two instructions to accomplish what one accomplishes today.

I've used even/odd arrangements a lot over the years and would certainly
consider them for advantage. But I'm not seeing the advantage here.

5B seems to require twice as many register specifiers.


On 6/24/20 7:05 PM, David Horner wrote:

On 2020-06-12 7:05 a.m., Krste Asanovic wrote:
The interesting cases are mixed-width operations, which are prevalent
in low-precision multiply-accumulate kernels that dominate many
existing and emerging compute areas, but there are plenty of other
kernels that operate on mixed-width data items.  Classic SIMD ISAs
handle mixed-width operations in one of five ways (would be glad to
add other known options to this list):
I will make a stab at even and odd layout for widening.

5) two versions of the widening ops are defined one for even and one odd.
The registers are divided into even:odd pairs.
Two versions of the widening ops are defined one for even and one odd.
The full widened result is the result of the operation performed on the
even (or odd) halves of the pairs.
The sides of this approach are:
  a) the need for two instructions.
  b) only 1/2 of the input register bandwidth is used.
The widening operation is in lane.

Note: this approach is similar to the v0.8 LMUL=1 widening if SLEN were
SEW wide.
  Logically, V0.8 does both an even (to dest) and an odd (to dest+1)
set of instructions.

5B) a variation of this is possible for RVV. An even/odd widening op mode.
       vs1 provides the odd elements and vs2 provides the even elements
and vd has a double width result.
    This approach has a number of advantages.
    a) When vs1 = vs2 then a single input vector provides both
arguments: single read port, reduced energy cost.
    b) note that vd can also be either vs1 or vs2.
    c) as a result vd can be used as a temp for a slideup1/down1 either
input to emulate even or odd pair ops.
            (this could be fused or to allow even/odd
    d) as with base even:odd operations are in lane, and with the v0.9
model up to register sets of up to 8 physical can participate.
    e) with v0.9 the ordinal masking interoperates unchanged.

Note: under v0.9 existing instructions provide supporting operations.
e.g. for SEW>8 load with a 1/2 unit stride can simulate interleaved load.

I wanted to provide this option before the meeting because it clearly
demonstrates another plausible approach to HPC independent of an SLEN

The presumption of SLEN, even when subsumed in the VLEN=SLEN, is not
necessary for a base model.

Assuming a SLEN<=VLEN model when stipulating VLEN=SLEN is like mandating
a rational ( a / b) number set and then stipulating the denominator (b )
is 1.
Better to mandate integer, a conceptually simpler number set, and
introduce rational (or reals) if and when  needed.

Join { to automatically receive all group messages.