On 2020-07-23 2:42 a.m., krste@... wrote:
Not a hill for me to die on, but I believe vsetvli x0,x0 is sufficiently important that even this aspect should be fully vetted.| On Wed, Jul 22, 2020 at 11:19 PM David Horner <ds2horner@...> wrote:On Wed, 22 Jul 2020 23:37:02 -0700, Andrew Waterman <andrew@...> said:
Other vsetvl[i] instructions are essentially different beasts than this variant.
The precedent is not particularly instructive nor persuasive.
Had we anticipated that the potentially dominant instruction for updating vtype fields would be vl invariant
we would have leaned towards error identification leaving vl alone.
Deviating from that course would be needlessly painfulis this pain dominantly due to existing behaviour entrenched in existing designs, verification and tool chain and documentation?
A) documentation will in any case, now, have to change for this revised behaviour.
Although updating documentation can be a pain, the increamental cost for either decision on #3 appears minimal to me...
B) I opine that current correctly formulated software behaves thusly:
The current software tool chain is indifferent to whether vl is zeroed or retained.
If software checks vill after vsetvl[i] and finds vill set, it ignores vl.
if software does not check vill it waits for an exception to be triggered.
Exception handling code does not check for zero vl, but currently assumes zero.
So it ensures that any restart sequence includes a vsetvl instruction to establish a correct vl.
But rather, in most cases, it just reports the failure.
The current debug code, similarly, assumes vl will be zero for the purpose of directing its support.
But if it reports vl it renders the result from a read from vl csr even if vill is set.
If this accurately reflects the current software state (directed as it is by the current vl=0 when vill set).
then allowing vl to remain unchanged when vill is set has minimal impact.
Along with the aforementioned documentation update,
current tool chain is not disrupted,
but future enhancements can leverage the additional information.
(Simplifying and augmenting debugging, allowing comprehensive error recovery determination, and simplifying the recovery code sequences).
C) similar opine for verification:
Current verification check is always for zero when vill set.
I suspect current verification and validation is minimal and cross dependency checks are few.
Further, vsetvli error setting behaviour is simple and independent of complex system states.
i.e. the changes are localized and few to support vl unchanged.
As a result changing the body of existing code.
D) existing designs
It goes without saying that committed hardware is least flexible.
However, it is because entrenchment can so easily occur each version has a substantial warning:
Once the draft label is removed, version 0.x is intended to be stable enough to begin developing toolchains, functional simulators, and initial implementa-We haven't removed that label yet.
For simple n-order implementations the change from clearing vl coincident with setting vill, to leaving vl alone is less than trivial; it removes some circuitry.
For aggressive ooo on the other hand resetting vl on a vill triggered flush is much simpler than
obtaining the correct vl and ensuring it is present in the csr and
any other place resumption code might be expecting it.
In the general case, additional/further roll-back/recover to further checkpoints may be needed to do so.
So I expect it is not just entrenchment but a potential real cost to aggressive ooo.
I don't believe massively parallel spacial and temporal designs are as adversely affected.
Krste, Andrew and others can speak to that.
and not especially beneficial.I interleaved situations in which I believe retaining vl is beneficial.
As implied by my examples, it is not just for x0,x0 that the previous vl could be beneficial.
But the x0,x0 case does benefit more.
Having undisturbed vl when the program specifically asked for undisturbed vl by using this explicit formulation reduces confusion in programmers.
It is the no-surprises promise.
The other forms arguably were asking for a new vl value, so giving zero when the machine says I cannot do what you are asking is not a surprise.
However, it is comparably unsurprising that the machine would leave the vl value alone if it cannot give you a valid new value.
Either of the alternatives undisturbed and zeroed (or in the case of ma and ta, ones) are acceptable to programmers, and intuitive.
They are common outcomes for well behaved instructions.
Programmers are less accommodating to "indeterminate" results. RV and RVV have done well to avoid such.
Notably, in the privileged spec when two distinct and competing results are possible, both have been allowed.
cf. allowing both zero and populated for misa csr. ditto for mtval.
I believe it is equally possible to allow both zero or undisturbed vl when vill is set.
The reasons of simplicity if desired and low cost/overheads tradeoffs depending upon Uarch are precisely why they are allowed.
But, one might argue, RVV is a non-privilege spec!
Such indeterminate state is repugnant for user state.
The counter argument is that RVV has Uarch visible state, especially egregious is the vstart settings.
Not only is its behaviour implementation defined, user mode can set vstart to cause unpredictable results.
The csr vl has been reigned in substantially in contrast.
It is not directly writable.
The resultant values from vsetvl[i] in all its variants are well defined/discoverable/predictable.
Allowing undisturbed and zeroed vl is no more challenging for userland to comprehend than the ma/ta machine dependent behaviour.
My vote continues to be #3, leave vl undisturbed.
I would agree to making vl undisturbed whenever vill is set.
I would begrudgingly agree to allowing both zeroed and undisturbed vl, if this is the consistent behaviour on a machine whenever vill is set.
I would reluctantly concede to allowing both to occur depending upon whether x0,x0 or other vsetvl[i] formulations are at play.
I do not support allowing both to occur in the same hart, one or the other dependent upon some internal Uarch state.