toggle quoted messageShow quoted text
This sounds right to me as well. No use making a special case for strided stores with rs2=x0.
On 11/9/20 12:04 PM, Nick Knight wrote:
I understand now. I'm on board iff the memory consistency model experts assent.
On Mon, Nov 9, 2020 at 11:41 AM Krste Asanovic <krste@...
There’s a comment about this in spec already.
But note that this would be in a case where you're relying on having multiple accesses in a non-deterministic order to one memory location, which is probably fraught for other reasons.
Sorry, slightly off topic, but what was the rationale for
When `rs2!=x0` and the value of `x[rs2]=0`, the implementation must perform one memory access for each active element (but these accesses will not be ordered).
I guess I'm thinking about the possibility of a toolchain relaxing `li, x1, 0; inst x1` into `inst x0`.
On Mon, Nov 9, 2020 at 10:09 AM Krste Asanovic <krste@...
I made an error copied from my meeting notes - this should be when rs2=x0 (i.e., the stride value),
I think this is a bad idea for both loads and stores. If the intent is a single load or single store, then there should be another way to do it.
Using vector loads/stores with stride=0 is one way to read/write a vector from/to a memory-mapped FIFO.
(I think we also discussed a way to do ordered writes for such cases earlier, which is necessary for FIFO-based communication; I don't recall whether this was discussed around strides. If there is a special way to declare ordered writes, then
I'm only concerned with using a FIFO with that mode.)
These are all supported with ordered scatters/gathers to/from a single address.
We wanted to remove ordering requirements from all other vector load/store types.
On Mon, Nov 9, 2020 at 8:38 AM Krste Asanovic <krste@...
Also on github as issue #595
In our earlier TG discussion in 9/18 meeting, we were in favor of
allowing vector strided load instructions with rs1=x0 to perform fewer
memory accesses than the number of active elements. This allows
higher-performing splats of a scalar memory value into a vector.
In writing this up, I inadvertently made this true for stores too.
But on review, I can't see a reason to not also allow strided stores
(which are now unordered), to also perform fewer memory operations (in
effect, picking a random active element to write back). The behavior
is indistinguishable from a possible legal execution of prior scheme,
and has potential niche use of storing element value to memory when it
is known all elements have same value.
I suppose we could also reserve the encoding with strided stores of
rs1=x0, but this would add some asymmetry. Software could then get a
similar effect by settng vl=1 before the store.