Re: [RISC-V] [tech-cmo] Fault-on-first should be allowed to return randomly on non-faults (also, running SIMT code on vector ISA)

Roger Espasa

We're all in agreement that if the spec says "pick where you stop" we'd all pick to trim to VL=3. I was under the impression this was not yet closed (in light of the "stop at cache misses" discussion), but I sense everyone else is already on the "pick where you stop" camp.

Speaking of which, did we ever close on whether vleff could trim even when there was no fault (i.e., just because there's a cache miss for example)?
If the answer is "yes, you can arbitrarily stop on any element other than element 0", can someone show a while loop and how the compiler would then use vleff? I'm not seeing how they would use it, other than enclose the vleff loop in a second loop to make sure that "the index variable has reached the limit" (i.e., i<n, make sure that vleff has run enough times so that i has reached n).


On Fri, Oct 16, 2020 at 7:33 PM <krste@...> wrote:

As you get to pick where vl is trimmed, you would probably choose the
vl=3 case here to simplify implementation.


>>>>> On Fri, 16 Oct 2020 18:59:55 +0200, Roger Espasa <roger.espasa@...> said:

| Bill you said element 9, but did you mean element labeled "a" which is the 11th element in the vector? (I agree with that). 
| However, I would NOT agree that a masked out element has been written, even if past the failing point.

| roger.

| On Fri, Oct 16, 2020 at 6:57 PM Roger Espasa <roger.espasa@...> wrote:

|     Here's where the "implementation" cost comes in (at least in our implementation; others, of course, may have more clever ways of doing this)

-| If you pick "vl=3", then the vstart and vltrim calculations can be made one and the same
-| If you pick "vl=6" then the vstart and vltrim calculations are not exactly equal and vltrim needs a LZC on the mask for the elements within the line
|     followed by an adder. At SEW=8b, there can be lots of elements within a line...

|     roger.

|     On Fri, Oct 16, 2020 at 6:31 PM Bill Huffman <huffman@...> wrote:

|         The way the discussion has been going, I think either would be permissible.  Not only that, but it would have been permissible for element 9 already
|         to have been overwritten with 1's (if vma allows it).

|         I think bringing this up is good as we need to be sure what precisely we mean by the v*ff instructions.

|               Bill

|         On 10/16/20 8:57 AM, Roger Espasa wrote:

|             EXTERNAL MAIL

|             Here's a question for the group: I did in as a picture... hopefully it will go through the mailing list:

|             image.png

|             On Fri, Oct 16, 2020 at 4:56 PM David Horner <ds2horner@...> wrote:

|                 On 2020-10-16 10:30 a.m., krste@... wrote:
||||||| On Fri, 16 Oct 2020 07:48:00 -0400, "David Horner" <ds2horner@...> said:
|| | First I am very happy that "arbitrary decisions by the
|| | micro-architecture" allow reduction of vl to any [non-zero] value.
|| | Even if such appear "random".
|| [...]
|| | A check for vl=0 on platforms that allow it is eminently doable, low
|| | overhead for many use cases  AND guarantees forward progress under
|| | SOFTWARE control.
|| If we allowed implementation to return vl=0, how does software
|| guarantee forward progress?

|                 The forward progress is to advance to another task.

|                 In the case of machine mode it can potentially "resolve" the cause of
|                 the vl=0 return and re-execute the loop (without the overhead of the trap).

|| | I see it as no different [in fundamental principle] than other cases
|| | such as RVI integer divide by zero behaviour that does not trap but can
|| | be  readily checked for.
|| | Also RVI integer overflow that if you want to check for it is at most a
|| | few instructions including the branch.
|| I don't see how these examples relate to returning vl=0 on some
|| microarchitectural event.  The examples here have results that depend
|| only on architectural values, so can be deterministically handled.
|                 The similarity is the avoidance of trap handling, when it is sufficient
|                 to check instead register state.
|| vl=0 is more related to load-reserved/store-conditional failure, where
|| we need to add implementation constraints to guarantee forward
|| progress.

|                 Ok. I can see providing guidance as to when vl=0 is allowed, but not to
|                 exclude it outright.

|| Krste
| x[DELETED ATTACHMENT image.png, PNG image]

Join to automatically receive all group messages.