toggle quoted messageShow quoted text
I think it's an implementation choice whether vl is trimmed to 3 or 6 (or theoretically other values). I don't know a reason why the implementation couldn't always trim vl to the same value that vstart would have been set
to if the exception were being taken. Does anyone know such a reason? It seems simplest to me always to trim vl to the value vstart would have been set to.
I meant element 9. If vma=1, then inactive elements can be undisturbed or set to 1's. Element 'a' couldn't have been loaded in the case described because it was in a line with a fault. In general, I think our discussions
would have allowed element 'a' to be written if there were some other reason for trimming vl.
On 10/16/20 9:59 AM, Roger Espasa wrote:
Bill you said element 9, but did you mean element labeled "a" which is the 11th element in the vector? (I agree with that).
However, I would NOT agree that a masked out element has been written, even if past the failing point.
Here's where the "implementation" cost comes in (at least in our implementation; others, of course, may have more clever ways of doing this)
-> If you pick "vl=3", then the vstart and vltrim calculations can be made one and the same
-> If you pick "vl=6" then the vstart and vltrim calculations are not exactly equal and vltrim needs a LZC on the mask for the elements within the line followed by an adder. At SEW=8b, there can be lots of elements within a line...
On Fri, Oct 16, 2020 at 6:31 PM Bill Huffman <huffman@...
The way the discussion has been going, I think either would be permissible. Not only that, but it would have been permissible for element 9 already to have been overwritten with 1's (if vma allows it).
I think bringing this up is good as we need to be sure what precisely we mean by the v*ff instructions.
On 10/16/20 8:57 AM, Roger Espasa wrote:
Here's a question for the group: I did in as a picture... hopefully it will go through the mailing list:
On Fri, Oct 16, 2020 at 4:56 PM David Horner <ds2horner@...
On 2020-10-16 10:30 a.m.,
>>>>>> On Fri, 16 Oct 2020 07:48:00 -0400, "David Horner" <ds2horner@...> said:
> | First I am very happy that "arbitrary decisions by the
> | micro-architecture" allow reduction of vl to any [non-zero] value.
> | Even if such appear "random".
> | A check for vl=0 on platforms that allow it is eminently doable, low
> | overhead for many use cases AND guarantees forward progress under
> | SOFTWARE control.
> If we allowed implementation to return vl=0, how does software
> guarantee forward progress?
The forward progress is to advance to another task.
In the case of machine mode it can potentially "resolve" the cause of
the vl=0 return and re-execute the loop (without the overhead of the trap).
> | I see it as no different [in fundamental principle] than other cases
> | such as RVI integer divide by zero behaviour that does not trap but can
> | be readily checked for.
> | Also RVI integer overflow that if you want to check for it is at most a
> | few instructions including the branch.
> I don't see how these examples relate to returning vl=0 on some
> microarchitectural event. The examples here have results that depend
> only on architectural values, so can be deterministically handled.
The similarity is the avoidance of trap handling, when it is sufficient
to check instead register state.
> vl=0 is more related to load-reserved/store-conditional failure, where
> we need to add implementation constraints to guarantee forward
Ok. I can see providing guidance as to when vl=0 is allowed, but not to
exclude it outright.