Re: [RISC-V] [tech-cmo] Fault-on-first should be allowed to return randomly on non-faults (also, running SIMT code on vector ISA)


Krste Asanovic
 

As you get to pick where vl is trimmed, you would probably choose the
vl=3 case here to simplify implementation.

Krste

On Fri, 16 Oct 2020 18:59:55 +0200, Roger Espasa <roger.espasa@semidynamics.com> said:
| Bill you said element 9, but did you mean element labeled "a" which is the 11th element in the vector? (I agree with that). 
| However, I would NOT agree that a masked out element has been written, even if past the failing point.

| roger.

| On Fri, Oct 16, 2020 at 6:57 PM Roger Espasa <roger.espasa@semidynamics.com> wrote:

| Here's where the "implementation" cost comes in (at least in our implementation; others, of course, may have more clever ways of doing this)

-| If you pick "vl=3", then the vstart and vltrim calculations can be made one and the same
-| If you pick "vl=6" then the vstart and vltrim calculations are not exactly equal and vltrim needs a LZC on the mask for the elements within the line
| followed by an adder. At SEW=8b, there can be lots of elements within a line...

| roger.

| On Fri, Oct 16, 2020 at 6:31 PM Bill Huffman <huffman@cadence.com> wrote:

| The way the discussion has been going, I think either would be permissible.  Not only that, but it would have been permissible for element 9 already
| to have been overwritten with 1's (if vma allows it).

| I think bringing this up is good as we need to be sure what precisely we mean by the v*ff instructions.

|       Bill

| On 10/16/20 8:57 AM, Roger Espasa wrote:

| EXTERNAL MAIL

| Here's a question for the group: I did in as a picture... hopefully it will go through the mailing list:

| image.png

| On Fri, Oct 16, 2020 at 4:56 PM David Horner <ds2horner@gmail.com> wrote:

| On 2020-10-16 10:30 a.m., krste@berkeley.edu wrote:
||
||||||| On Fri, 16 Oct 2020 07:48:00 -0400, "David Horner" <ds2horner@gmail.com> said:
|| | First I am very happy that "arbitrary decisions by the
|| | micro-architecture" allow reduction of vl to any [non-zero] value.
||
|| | Even if such appear "random".
|| [...]
|| | A check for vl=0 on platforms that allow it is eminently doable, low
|| | overhead for many use cases  AND guarantees forward progress under
|| | SOFTWARE control.
||
|| If we allowed implementation to return vl=0, how does software
|| guarantee forward progress?

| The forward progress is to advance to another task.

| In the case of machine mode it can potentially "resolve" the cause of
| the vl=0 return and re-execute the loop (without the overhead of the trap).

||
|| | I see it as no different [in fundamental principle] than other cases
|| | such as RVI integer divide by zero behaviour that does not trap but can
|| | be  readily checked for.
|| | Also RVI integer overflow that if you want to check for it is at most a
|| | few instructions including the branch.
||
|| I don't see how these examples relate to returning vl=0 on some
|| microarchitectural event.  The examples here have results that depend
|| only on architectural values, so can be deterministically handled.
| The similarity is the avoidance of trap handling, when it is sufficient
| to check instead register state.
||
|| vl=0 is more related to load-reserved/store-conditional failure, where
|| we need to add implementation constraints to guarantee forward
|| progress.

| Ok. I can see providing guidance as to when vl=0 is allowed, but not to
| exclude it outright.

|| Krste

|
| x[DELETED ATTACHMENT image.png, PNG image]

Join tech-vector-ext@lists.riscv.org to automatically receive all group messages.