Re: Issue categorization - #460


David Horner
 

I think I understand how I confused the situation.

Issue #458 introduced idea of using rd and rs1 values to encode more bits for vsetvli.
I proposed that this become the only vsetvli format.
Krste countered that the current format could be expanded later if needed to adopt the new format as long as a field encoding was otherwise unused.
I agreed that this was technically possible. But I did not raise a concern that this would have potential negative consequences.
In the meantime, I opened #460, which in addition to the rd and rs1 encoding, avoided using a bit within vtype to allow for vl calculation based on lmul of 3,5,6 or 7.
I my mind,  #460 raised all the concerns and considerations present in #458.
Further, it provided additional support for the rd/rs1 format by using the novel encoding is a unique way.
As a result I closed #458 to have all the relevant discussion tracked on #460.
It could, however, easily have been inferred that I closed #458 because the "escape mechanism" was perfect.
The closing comments in #458 however explicitly recommend the concern be revisited as V1.0 approaches.

As we approach v1.0 we should evaluate if we will inevitably exceed 10 bits encoding using only immediate bits, and if so reconsider:
a) whether the recovered 6 bits of rs1 and rd encoding will be sufficient for the life time of ILEN=32
b) we want to support two distinct and competing encodings (for perhaps overlapping settings)
c) if the expanded format will in effect supersede the original encoding, and thus result in dead weight of a low use format to be supported in perpetuity.

Note, If we defer until we only have one bit available we will use that one bit for selecting a mode that we could have chosen from the beginning without that loss of bit. This will also weigh into the considerations above.
If the “resolve for v1.0’ label had been available then I likely would have suggested it for #458 and definitely for #460.


The intro in #460 also infers the need to give early consideration to this format:

A1.
There are limited immediate bits in the vsetvli instruction.
Early use of bits will become entrenched in the design. Misuse cannot be corrected later.
Extensions to RVV will undoubtedly wish a single mechanism to both set vtype and establish the appropriate vl. Extensions such as complex and quaternion numbers affect vl calculation, and can leverage all of integer, fixed-point and especially float data formats for add/substract, multiply, divide (reciprocals) and share conjugate and norm.
There may be many other such data-types.
It is fully possible that the proliferation of modes and datatypes will exhaust the currently remaining 3 bits. See**

A2.
SEW and LMUL are essential opcode modifiers. However, together they use 8 [incorrect it is 6]  of the 11 available immediate bits in vsetvl, even though a dense encoding is used. This is undesirable. Finalizing this encoding will entrench other bits in the instruction making them unavailable for future use via innovative encoding.

A3.
An alternate encoding specifically for LMUL is here presented. (Whereas SEW could be similarly encoded, LMUL is proposed as it appears the most constrained. See*** )
On 2020-06-30 11:12 a.m., Krste Asanovic wrote:
For 1.0, we are just trying to fix vsew, vlmul, vma, and vta (and also vill in vtype, but that’s out of vsetvli immediate range).

I think it’s clear that vma and vta are not going to change very often in many code sequences,
If this is indeed true, then this makes the fields candidates for vtype fields that are only set by vsetvl (those in range [XLEN-2:11])
and agnostic provides significant PPA benefit for renamed register machines, especially with long vectors.
I agree they likely have merit, I advocated for their inclusion in vtype, and in vsetvli.

I can’t see what you are trying to propose that would affect the 1.0 spec?
I am proposing that we seriously consider the consequences of providing a vsetvli instruction that has as limited an immediate field.
There are alternatives, #458 and #460 are two such that  increase functionality(complete lmul range) and immediate bit encoding (by up to 6 bits)..
Using vlmul = "100" for vsetvl opcode decoding rather than the immediate sign bit [ bit 31] is another low cost approach that recovers a bit.
And of course there are other alternatives.

Are you saying that vsew, vlmul, vma, vta should not be in the vsetvli immediate space?
As reasoned above, vma and vta are candidates to be removed.

Conversely, vsew and vlmul are prime candidates for inclusion in the vsetvli immediate space because:
     they are essential to the "set vl" function, and
     they are common modifiers to base operations (as in the expected 64bit op-code space) and
     they are often used in conjunction with one another and
     many code examples show sew/lmul variation within typical loops.

This is another aspect that needs to form part of the reasoning about the sufficiency of vsetvli immediate space:
Pressure on immediate form of the instruction would be drastically diminished If
     only those fields that definitively provide an appreciable benefit to code efficiency are included.
     (in particular, if the field can be hoisted from the loop it is not a good candidate).

To me, the combination of removing vma and vta from the immediate and
     using lmul="100" for vsetvl encoding
     removes sufficient pressure that
     an immediate  bit could be used to expand lmul to 3,5,6 and 7 and
     still provide for judicious inclusion of warranted future immediates
     for years, without invoking the rd/sr1 encoding.

However, switching to rd/rs1 encoding does provide a substantial margin for error and neatly addresses the lmul=3,5,6 and 7 concern..



Krste

Join tech-vector-ext@lists.riscv.org to automatically receive all group messages.