Re: vector strided stores when rs1=x0

Nick Knight

I understand now. I'm on board iff the memory consistency model experts assent.

On Mon, Nov 9, 2020 at 11:41 AM Krste Asanovic <krste@...> wrote:
There’s a comment about this in spec already.

But note that this would be in a case where you're relying on having multiple accesses in a non-deterministic order to one memory location, which is probably fraught for other reasons.


On Nov 9, 2020, at 11:38 AM, Nick Knight <nick.knight@...> wrote:

Sorry, slightly off topic, but what was the rationale for 

When `rs2!=x0` and the value of `x[rs2]=0`, the implementation must perform one memory access for each active element (but these accesses will not be ordered).

I guess I'm thinking about the possibility of a toolchain relaxing `li, x1, 0; inst x1` into `inst x0`.  

On Mon, Nov 9, 2020 at 10:09 AM Krste Asanovic <krste@...> wrote:
I made an error copied from my meeting notes - this should be when rs2=x0 (i.e., the stride value),


On Nov 9, 2020, at 9:12 AM, Krste Asanovic via <> wrote:

On Nov 9, 2020, at 8:57 AM, Guy Lemieux <guy.lemieux@...> wrote:

I think this is a bad idea for both loads and stores. If the intent is a single load or single store, then there should be another way to do it.

Using vector loads/stores with stride=0 is one way to read/write a vector from/to a memory-mapped FIFO.

(I think we also discussed a way to do ordered writes for such cases earlier, which is necessary for FIFO-based communication; I don't recall whether this was discussed around strides. If there is a special way to declare ordered writes, then I'm only concerned with using a FIFO with that mode.)

These are all supported with ordered scatters/gathers to/from a single address.

We wanted to remove ordering requirements from all other vector load/store types.



On Mon, Nov 9, 2020 at 8:38 AM Krste Asanovic <krste@...> wrote:

Also on github as issue #595

In our earlier TG discussion in 9/18 meeting, we were in favor of
allowing vector strided load instructions with rs1=x0 to perform fewer
memory accesses than the number of active elements.  This allows
higher-performing splats of a scalar memory value into a vector.

In writing this up, I inadvertently made this true for stores too.
But on review, I can't see a reason to not also allow strided stores
(which are now unordered), to also perform fewer memory operations (in
effect, picking a random active element to write back).  The behavior
is indistinguishable from a possible legal execution of prior scheme,
and has potential niche use of storing element value to memory when it
is known all elements have same value.

I suppose we could also reserve the encoding with strided stores of
rs1=x0, but this would add some asymmetry.  Software could then get a
similar effect by settng vl=1 before the store.


Join to automatically receive all group messages.