[riscv/riscv-v-spec] For V1.0 - Make unsigned scalar integer in widening instructions 2 * SEW (#427) (and signed)
David Horner
This was on Github; as not every one subscribes and it will be considered at TG, I include it on this list.First Krste’s synopsys, then my (modified) Github reply, then my thoughts for the TG and lastly the original post for reference. kasanovic commentedConcretely, this proposal is to change the widening integer vector-scalar operations to treat their x register input (rs1) as 2*SEW not SEW, first for unsigned, then possibly also for signed. I think the biggest benefit would be in the widening multiply instructions, but that also implies extra hardware (larger multiplier array) over what is needed for current case. I can see there is also some benefit to short computations into a wider accumulator with a large initial value, but also see that some cases will need more scalar instructions. There is non-zero hardware cost even just for adds to handle the new case, with possibly some software overhead in other cases (e.g., when using an XLEN-wide scalar load to bring in packed 8b elements into one register, then feeding these one at a time into a widening operation - the current spec provides the mask). Also, this would be a large change at a late stage in the proposal process. I will bring up again in next TG meeting. David-Horner commentedThank you very much for the synopsis. x input is naturally limited by XLEN. Once that is exceeded zero extend (or sign extend) takes over (thus maximum xlen precision) the biggest benefit would be in the widening multiply instructions As Krste alludes, this form of the multiply is not otherwise available. For widening ops (including multiply), PoR is EEW has to be a
supported SEW width.
feeding these one at a time into a widening operation - the current spec provides the mask The original proposal discusses this - a single andi is sufficient for byte.
this would be a large change at a late stage in the proposal process. Agreed. However, it cannot be retrofitted. The benefits are there, the timing is not excellent.
Further Comments.
The “feeding these one at a time into a widening operation” application highlights the versatility of this format as 16 bit values are directly masked, but values 9 and 10 (packed 3 to RV32 register) and 11 to 15 also available for shifting into a 16 bit op.
On reflection, I now advocate for the signed variants: The same 10 instruction variants except signed vs “u”. As I mentioned these are potentially more valuable (because especially the signed adds are more heavily used), but also because offsetting bias values are more naturally expressed as negative values. And finally because my current opinion is the scalar overhead was weighted too heavily. The reduction of x to the current SEW to allow PoR emulation is less relevant than what use is made of the extended functionality. Compiler optimizations can compensate when only SEW width values are required. And when the compiler chooses shift sign extension over code restructuring (for example negating the intermediate value) it would be as a result of trade-offs. (e.g. on those machines sll;sra are fused) My opinion is: had this formulation been proposed and considered earlier in the vector widening instruction set evolution, it would now be included in the base. It has substantive value at low cost.
Original proposal: David-Horner commented • This suggestion weighs the benefits of increased scalar range of 2 * SEW unsigned rs1 (X register input) with
First the increased scalar range. a) Biasing, preparing and tailoring
wide accumulator values. Not all accumulations will start from
zero,
or SEW size values. b) Extended operating range for
foundation operation. c) Such enhanced operations are
available in code flow without changing SEW. Note: once SEW reaches XLEN there is
no benefit for this enhancement. Second, the hardship to
condition X register values for 2 * SEW. In other areas, it appears the general feeling is that trading a few RV base operations for enhanced RVV functionality is a good tradeoff. Third, how disruptive to
current micro-architecture designs is this change. Overall I see this as a win with little hardship. However, I definitely need to have hardware gurus’ input. What about 2 * SEW for signed rs1 input? The
trade-off for scalar are more substantial at SEW=8 and 16. RV64
has
addw to sign extend at SEW = 32. Thus sign extension in RVV
widening
is more valuable to avoid sll;sra combination. |
|