Re: [RISC-V] [tech-cmo] Fault-on-first should be allowed to return randomly on non-faults (also, running SIMT code on vector ISA)

Krste Asanovic

Here's the strlen example from spec:

.balign 4
.global strlen
# size_t strlen(const char *str)
# a0 holds *str

mv a3, a0 # Save start
vsetvli a1, x0, e8,m8, ta,ma # Vector of bytes of maximum length
vle8ff.v v8, (a3) # Load bytes
csrr a1, vl # Get bytes read v0, v8, 0 # Set v0[i] where v8[i] = 0
vfirst.m a2, v0 # Find first set bit
add a3, a3, a1 # Bump pointer
bltz a2, loop # Not found?

add a0, a0, a1 # Sum start + bump
add a3, a3, a2 # Add index
sub a0, a3, a0 # Subtract start address+bump


This exits when vfirst.m returns non-negative (i.e,, something triggered
exit condition) - the vfirst.m instruction can early-out when exit found
(though vle8ff/vmseq will still have to run to completion).


On Fri, 16 Oct 2020 20:04:15 +0200, Roger Espasa <> said:
| So all the vleff use cases end up then using a vmpopc of some sort to determine the exit condition and never use the trimmed VL ? (other than, of course, to
| control within the while how many elements should be operated upon). Do the compiler folks on the list agree that's the only use of vleff?

| roger.

| On Fri, Oct 16, 2020 at 7:53 PM Bill Huffman <> wrote:

| I don't think the cases where there was no fault look any different to software than the fault cases.  Either can happen anywhere and the while loop may
| continue.  The while loop isn't ended by a trimmed vl, it's ended by data it sees.

|       Bill

| On 10/16/20 10:47 AM, Roger Espasa wrote:


| We're all in agreement that if the spec says "pick where you stop" we'd all pick to trim to VL=3. I was under the impression this was not yet closed
| (in light of the "stop at cache misses" discussion), but I sense everyone else is already on the "pick where you stop" camp.

| Speaking of which, did we ever close on whether vleff could trim even when there was no fault (i.e., just because there's a cache miss for example)?
| If the answer is "yes, you can arbitrarily stop on any element other than element 0", can someone show a while loop and how the compiler would then
| use vleff? I'm not seeing how they would use it, other than enclose the vleff loop in a second loop to make sure that "the index variable has reached
| the limit" (i.e., i<n, make sure that vleff has run enough times so that i has reached n).

| roger.

| On Fri, Oct 16, 2020 at 7:33 PM <> wrote:

| As you get to pick where vl is trimmed, you would probably choose the
| vl=3 case here to simplify implementation.

| Krste

|||||| On Fri, 16 Oct 2020 18:59:55 +0200, Roger Espasa <> said:

| | Bill you said element 9, but did you mean element labeled "a" which is the 11th element in the vector? (I agree with that). 
| | However, I would NOT agree that a masked out element has been written, even if past the failing point.

| | roger.

| | On Fri, Oct 16, 2020 at 6:57 PM Roger Espasa <> wrote:

| |     Here's where the "implementation" cost comes in (at least in our implementation; others, of course, may have more clever ways of doing this)

| -| If you pick "vl=3", then the vstart and vltrim calculations can be made one and the same
| -| If you pick "vl=6" then the vstart and vltrim calculations are not exactly equal and vltrim needs a LZC on the mask for the elements within the
| line
| |     followed by an adder. At SEW=8b, there can be lots of elements within a line...

| |     roger.

| |     On Fri, Oct 16, 2020 at 6:31 PM Bill Huffman <> wrote:

| |         The way the discussion has been going, I think either would be permissible.  Not only that, but it would have been permissible for
| element 9 already
| |         to have been overwritten with 1's (if vma allows it).

| |         I think bringing this up is good as we need to be sure what precisely we mean by the v*ff instructions.

| |               Bill

| |         On 10/16/20 8:57 AM, Roger Espasa wrote:

| |             EXTERNAL MAIL

| |             Here's a question for the group: I did in as a picture... hopefully it will go through the mailing list:

| |             image.png

| |             On Fri, Oct 16, 2020 at 4:56 PM David Horner <> wrote:

| |                 On 2020-10-16 10:30 a.m., wrote:
| ||
| ||||||| On Fri, 16 Oct 2020 07:48:00 -0400, "David Horner" <> said:
| || | First I am very happy that "arbitrary decisions by the
| || | micro-architecture" allow reduction of vl to any [non-zero] value.
| ||
| || | Even if such appear "random".
| || [...]
| || | A check for vl=0 on platforms that allow it is eminently doable, low
| || | overhead for many use cases  AND guarantees forward progress under
| || | SOFTWARE control.
| ||
| || If we allowed implementation to return vl=0, how does software
| || guarantee forward progress?

| |                 The forward progress is to advance to another task.

| |                 In the case of machine mode it can potentially "resolve" the cause of
| |                 the vl=0 return and re-execute the loop (without the overhead of the trap).

| ||
| || | I see it as no different [in fundamental principle] than other cases
| || | such as RVI integer divide by zero behaviour that does not trap but can
| || | be  readily checked for.
| || | Also RVI integer overflow that if you want to check for it is at most a
| || | few instructions including the branch.
| ||
| || I don't see how these examples relate to returning vl=0 on some
| || microarchitectural event.  The examples here have results that depend
| || only on architectural values, so can be deterministically handled.
| |                 The similarity is the avoidance of trap handling, when it is sufficient
| |                 to check instead register state.
| ||
| || vl=0 is more related to load-reserved/store-conditional failure, where
| || we need to add implementation constraints to guarantee forward
| || progress.

| |                 Ok. I can see providing guidance as to when vl=0 is allowed, but not to
| |                 exclude it outright.

| || Krste

| |             
| | x[DELETED ATTACHMENT image.png, PNG image]

Join to automatically receive all group messages.