Re: On Vector Register Layout
David Horner
These decisions are not made independently.
E.g. Removing expanding loads led to fractional register mode.
I believe there are other considerations that affect a definitive decision.
1) The even/odd approach (which I expect Krste will have available soon) also would benefit from a specified "interleave" register structure.
Specifically, for widening operations designating even-even, odd-odd and even-odd variants allows full register utilization.
These variants need not be specified in the opcode, just as multiplier/fractional-multiplier is a vtype parameter.
2) As a more general case of fractional-mul, fill factor in conjunction with the original integral lmul
allows multiple physical registers to participate (rather than restricting to a single physical register).
3) Element Interleave is another major structure that is only partially addressed by the segmented memory operation.
This functionality dovetails with both above points. 1 above.
Each of these three approaches can be added on top of a base model that assumes only
a) integral lmul and
b) in-memory order in-register data (i.e. non-segmented register mapping, in v0.9 it is called VLEN=SLEN )
As Krste has outlined here, there are multiple legitimate approaches to widening ops.
I agree that some are not reasonable candidates to propose as base, nor even to ensure convenient future inclusion.
However, as I suggested previously, ensuring support for more than one is important to meet RISCV's goals of a base for extensions, and RVV's goal of supporting a wide variety of physical hardware and micro-architurectures from IOT to HCP.
I have been reviewing past versions of RVV as I can find them.
The github riscv-v-spec goes back to Jul 27, 2018.
It predates register groups (lmul) which had a profound effect in perception of structure.
Introduced originally with vertical alignment of source/destination of widening/narrowing ops,
it also enhances minimal systems effectively increasing VLEN.
Further, it continues to provide an alignment benefit even under v0.9 which abandons the strict vertical structure of v0.8.
Similarly, lmul and fractional-lmul appear to have been missing when discussions on even/odd approach occurred.
And perhaps value points have shifted since then.
I still have not located any extensive discussions.
e.g. rejecting register-pairs or pack/unpack which are also closely related to register structure
and the decision to remove sign/unsign-extending loads.
If anyone can direct me to these specific discussions around widening/narrowing approaches I would be quite grateful.
toggle quoted message
Show quoted text
E.g. Removing expanding loads led to fractional register mode.
I believe there are other considerations that affect a definitive decision.
1) The even/odd approach (which I expect Krste will have available soon) also would benefit from a specified "interleave" register structure.
Specifically, for widening operations designating even-even, odd-odd and even-odd variants allows full register utilization.
These variants need not be specified in the opcode, just as multiplier/fractional-multiplier is a vtype parameter.
2) As a more general case of fractional-mul, fill factor in conjunction with the original integral lmul
allows multiple physical registers to participate (rather than restricting to a single physical register).
3) Element Interleave is another major structure that is only partially addressed by the segmented memory operation.
This functionality dovetails with both above points. 1 above.
Each of these three approaches can be added on top of a base model that assumes only
a) integral lmul and
b) in-memory order in-register data (i.e. non-segmented register mapping, in v0.9 it is called VLEN=SLEN )
As Krste has outlined here, there are multiple legitimate approaches to widening ops.
I agree that some are not reasonable candidates to propose as base, nor even to ensure convenient future inclusion.
However, as I suggested previously, ensuring support for more than one is important to meet RISCV's goals of a base for extensions, and RVV's goal of supporting a wide variety of physical hardware and micro-architurectures from IOT to HCP.
I have been reviewing past versions of RVV as I can find them.
The github riscv-v-spec goes back to Jul 27, 2018.
It predates register groups (lmul) which had a profound effect in perception of structure.
Introduced originally with vertical alignment of source/destination of widening/narrowing ops,
it also enhances minimal systems effectively increasing VLEN.
Further, it continues to provide an alignment benefit even under v0.9 which abandons the strict vertical structure of v0.8.
Similarly, lmul and fractional-lmul appear to have been missing when discussions on even/odd approach occurred.
And perhaps value points have shifted since then.
I still have not located any extensive discussions.
e.g. rejecting register-pairs or pack/unpack which are also closely related to register structure
and the decision to remove sign/unsign-extending loads.
If anyone can direct me to these specific discussions around widening/narrowing approaches I would be quite grateful.
On 2020-06-12 7:05 a.m., Krste Asanovic
wrote:
TL;DR: I'm leaning towards mandating SLEN=VLEN layout, at least for application processor profiles. Regarding register layout, I thought it would be good to lay out the landscape and comparison with other SIMD ISAs before diving into a proposal for RVV. I think it's useful to distinguish "bitsliced" operations from "bitcrossing" operations. It's also useful to define a separate term for physical datapath width "DPW". In sensible designs, VLEN is an integer power-of-2 multiple of DPW. If Bitsliced operations on elements of size EEW operate entirely within an EEW region of DPW. Bitcrossing operations traverse more than (source/dest) EEW bits of DPW. In all sane general-purpose SIMD designs, memory operations can move vectors that are naturally aligned to element boundaries, not only to VLEN boundaries, so all memory operations are bitcrossing operations assuming DPW > smallest EEW and require at least a memory rotate if not a full crossbar between memory ports and register file ports. (Some specialized SIMD designs might retain a VLEN-alignment constraint, but they're not of interest here). There are specialized register permute instructions that are bitcrossing instructions, such as our slide, vrgather, and compress instructions (reductions also). All SIMD ISAs add some variants of these. Many simple vector arithmetic operations are bitsliced. The interesting cases are mixed-width operations, which are prevalent in low-precision multiply-accumulate kernels that dominate many existing and emerging compute areas, but there are plenty of other kernels that operate on mixed-width data items. Classic SIMD ISAs handle mixed-width operations in one of five ways (would be glad to add other known options to this list): 0) Single-width elements. 1) Specialized registers. 2) Pack/unpack. 3) Register pairs, 4) EDIV-style,
With RVV we are trying to support mixed-width operations without adding specialized registers, or splitting an element across architectural registers, or requiring implicit or explicit bitcrossing beyond min(DPW,SLEN) on ALU operands (bitcrossing for memory load/stores cannot be avoided). We're also trying to support vector units with current implementation targets ranging from VLEN=32 to VLEN=16384.