Re: On Vector Register Layout


David Horner
 

These decisions are not made independently.

E.g. Removing expanding loads led to fractional register mode.

I believe there are other considerations that affect a definitive decision.

1) The even/odd approach (which I expect Krste will have available soon) also would benefit from a specified "interleave"  register structure.
        Specifically, for widening operations designating even-even, odd-odd and even-odd variants allows full register utilization.
                These variants need not be specified in the opcode, just as multiplier/fractional-multiplier is a vtype parameter.

2) As a more general case of fractional-mul, fill factor in conjunction with the original integral lmul
        allows multiple physical registers to participate (rather than restricting to a single physical register). 

3) Element Interleave is another major structure that is only partially addressed by the segmented memory operation.
        This functionality dovetails with both above points.  1 above.


Each of these three approaches can be added on top of a base model that assumes only
    a) integral lmul and
    b) in-memory order in-register data (i.e. non-segmented register mapping, in v0.9 it is called VLEN=SLEN )

As Krste has outlined here, there are multiple legitimate approaches to widening ops.
I agree that some are not reasonable candidates to propose as base, nor even to ensure convenient future inclusion.

However, as I suggested previously, ensuring support for more than one is important to meet RISCV's goals of a base for extensions, and RVV's goal of supporting a wide variety of physical hardware and micro-architurectures from IOT to HCP.

I have been reviewing past versions of RVV as I can find them.
The github riscv-v-spec goes back to Jul 27, 2018.
 It predates register groups (lmul)  which had a profound effect in perception of structure.
Introduced originally with vertical alignment of source/destination of widening/narrowing ops,
 it also enhances minimal systems effectively increasing VLEN.

Further, it continues to provide an alignment benefit even under v0.9 which abandons the strict vertical structure of v0.8.

Similarly, lmul and fractional-lmul appear to have been missing when discussions on even/odd approach occurred.
And perhaps value points have shifted since then.

I still have not located any extensive discussions.
e.g.  rejecting register-pairs or pack/unpack which are also closely related to register structure
 and the decision to remove sign/unsign-extending loads.

If anyone can direct me to these specific discussions around widening/narrowing approaches I would be quite grateful.



On 2020-06-12 7:05 a.m., Krste Asanovic wrote:

TL;DR: I'm leaning towards mandating SLEN=VLEN layout, at least for
application processor profiles.


Regarding register layout, I thought it would be good to lay out the
landscape and comparison with other SIMD ISAs before diving into a
proposal for RVV.


I think it's useful to distinguish "bitsliced" operations from
"bitcrossing" operations.

It's also useful to define a separate term for physical datapath width
"DPW".  In sensible designs, VLEN is an integer power-of-2 multiple of
DPW.  If

Bitsliced operations on elements of size EEW operate entirely within
an EEW region of DPW.

Bitcrossing operations traverse more than (source/dest) EEW bits of
DPW.

In all sane general-purpose SIMD designs, memory operations can move
vectors that are naturally aligned to element boundaries, not only to
VLEN boundaries, so all memory operations are bitcrossing operations
assuming DPW > smallest EEW and require at least a memory rotate if
not a full crossbar between memory ports and register file ports.
(Some specialized SIMD designs might retain a VLEN-alignment
constraint, but they're not of interest here).

There are specialized register permute instructions that are
bitcrossing instructions, such as our slide, vrgather, and compress
instructions (reductions also).  All SIMD ISAs add some variants of
these.

Many simple vector arithmetic operations are bitsliced.

The interesting cases are mixed-width operations, which are prevalent
in low-precision multiply-accumulate kernels that dominate many
existing and emerging compute areas, but there are plenty of other
kernels that operate on mixed-width data items.  Classic SIMD ISAs
handle mixed-width operations in one of five ways (would be glad to
add other known options to this list):

0) Single-width elements.  
1) Specialized registers.
2) Pack/unpack. 

3) Register pairs,

4) EDIV-style,

With RVV we are trying to support mixed-width operations without
adding specialized registers, or splitting an element across
architectural registers, or requiring implicit or explicit bitcrossing
beyond min(DPW,SLEN) on ALU operands (bitcrossing for memory
load/stores cannot be avoided).  We're also trying to support vector
units with current implementation targets ranging from VLEN=32 to
VLEN=16384.



Join tech-vector-ext@lists.riscv.org to automatically receive all group messages.