As you get to pick where vl is trimmed, you would probably choose the
vl=3 case here to simplify implementation.
Krste
On Fri, 16 Oct 2020 18:59:55 +0200, Roger Espasa <roger.espasa@...> said:
| Bill you said element 9, but did you mean element labeled "a" which is the 11th element in the vector? (I agree with that).
| However, I would NOT agree that a masked out element has been written, even if past the failing point.
| roger.
| On Fri, Oct 16, 2020 at 6:57 PM Roger Espasa <roger.espasa@...> wrote:
| Here's where the "implementation" cost comes in (at least in our implementation; others, of course, may have more clever ways of doing this)
-| If you pick "vl=3", then the vstart and vltrim calculations can be made one and the same
-| If you pick "vl=6" then the vstart and vltrim calculations are not exactly equal and vltrim needs a LZC on the mask for the elements within the line
| followed by an adder. At SEW=8b, there can be lots of elements within a line...
| roger.
| On Fri, Oct 16, 2020 at 6:31 PM Bill Huffman <huffman@...> wrote:
| The way the discussion has been going, I think either would be permissible. Not only that, but it would have been permissible for element 9 already
| to have been overwritten with 1's (if vma allows it).
| I think bringing this up is good as we need to be sure what precisely we mean by the v*ff instructions.
| Bill
| On 10/16/20 8:57 AM, Roger Espasa wrote:
| EXTERNAL MAIL
| Here's a question for the group: I did in as a picture... hopefully it will go through the mailing list:
| image.png
| On Fri, Oct 16, 2020 at 4:56 PM David Horner <ds2horner@...> wrote:
| On 2020-10-16 10:30 a.m., krste@... wrote:
||
||||||| On Fri, 16 Oct 2020 07:48:00 -0400, "David Horner" <ds2horner@...> said:
|| | First I am very happy that "arbitrary decisions by the
|| | micro-architecture" allow reduction of vl to any [non-zero] value.
||
|| | Even if such appear "random".
|| [...]
|| | A check for vl=0 on platforms that allow it is eminently doable, low
|| | overhead for many use cases AND guarantees forward progress under
|| | SOFTWARE control.
||
|| If we allowed implementation to return vl=0, how does software
|| guarantee forward progress?
| The forward progress is to advance to another task.
| In the case of machine mode it can potentially "resolve" the cause of
| the vl=0 return and re-execute the loop (without the overhead of the trap).
||
|| | I see it as no different [in fundamental principle] than other cases
|| | such as RVI integer divide by zero behaviour that does not trap but can
|| | be readily checked for.
|| | Also RVI integer overflow that if you want to check for it is at most a
|| | few instructions including the branch.
||
|| I don't see how these examples relate to returning vl=0 on some
|| microarchitectural event. The examples here have results that depend
|| only on architectural values, so can be deterministically handled.
| The similarity is the avoidance of trap handling, when it is sufficient
| to check instead register state.
||
|| vl=0 is more related to load-reserved/store-conditional failure, where
|| we need to add implementation constraints to guarantee forward
|| progress.
| Ok. I can see providing guidance as to when vl=0 is allowed, but not to
| exclude it outright.
|| Krste
|
| x[DELETED ATTACHMENT image.png, PNG image]