
Krste Asanovic
Also on github as issue #595 In our earlier TG discussion in 9/18 meeting, we were in favor of allowing vector strided load instructions with rs1=x0 to perform fewer memory accesses than the number of active elements. This allows higher-performing splats of a scalar memory value into a vector. In writing this up, I inadvertently made this true for stores too. But on review, I can't see a reason to not also allow strided stores (which are now unordered), to also perform fewer memory operations (in effect, picking a random active element to write back). The behavior is indistinguishable from a possible legal execution of prior scheme, and has potential niche use of storing element value to memory when it is known all elements have same value. https://github.com/riscv/riscv-v-spec/commit/398d453e3592efbac77cc8f6658009759901185a#diff-ea57dd7a8daf0aa62f553688c1970c8e6608945d25597f8661c5ea6670fb509cI suppose we could also reserve the encoding with strided stores of rs1=x0, but this would add some asymmetry. Software could then get a similar effect by settng vl=1 before the store. Krste
|
|
I think this is a bad idea for both loads and stores. If the intent is a single load or single store, then there should be another way to do it.
Using vector loads/stores with stride=0 is one way to read/write a vector from/to a memory-mapped FIFO. (I think we also discussed a way to do ordered writes for such cases earlier, which is necessary for FIFO-based communication; I don't recall whether this was discussed around strides. If there is a special way to declare ordered writes, then I'm only concerned with using a FIFO with that mode.)
toggle quoted message
Show quoted text
On Mon, Nov 9, 2020 at 8:38 AM Krste Asanovic < krste@...> wrote:
Also on github as issue #595
In our earlier TG discussion in 9/18 meeting, we were in favor of
allowing vector strided load instructions with rs1=x0 to perform fewer
memory accesses than the number of active elements. This allows
higher-performing splats of a scalar memory value into a vector.
In writing this up, I inadvertently made this true for stores too.
But on review, I can't see a reason to not also allow strided stores
(which are now unordered), to also perform fewer memory operations (in
effect, picking a random active element to write back). The behavior
is indistinguishable from a possible legal execution of prior scheme,
and has potential niche use of storing element value to memory when it
is known all elements have same value.
https://github.com/riscv/riscv-v-spec/commit/398d453e3592efbac77cc8f6658009759901185a#diff-ea57dd7a8daf0aa62f553688c1970c8e6608945d25597f8661c5ea6670fb509c
I suppose we could also reserve the encoding with strided stores of
rs1=x0, but this would add some asymmetry. Software could then get a
similar effect by settng vl=1 before the store.
Krste
|
|

Krste Asanovic
I think this is a bad idea for both loads and stores. If the intent is a single load or single store, then there should be another way to do it.
Using vector loads/stores with stride=0 is one way to read/write a vector from/to a memory-mapped FIFO.
(I think we also discussed a way to do ordered writes for such cases earlier, which is necessary for FIFO-based communication; I don't recall whether this was discussed around strides. If there is a special way to declare ordered writes, then I'm only concerned with using a FIFO with that mode.)
These are all supported with ordered scatters/gathers to/from a single address.
We wanted to remove ordering requirements from all other vector load/store types.
Krste
On Mon, Nov 9, 2020 at 8:38 AM Krste Asanovic < krste@...> wrote:
Also on github as issue #595
In our earlier TG discussion in 9/18 meeting, we were in favor of
allowing vector strided load instructions with rs1=x0 to perform fewer
memory accesses than the number of active elements. This allows
higher-performing splats of a scalar memory value into a vector.
In writing this up, I inadvertently made this true for stores too.
But on review, I can't see a reason to not also allow strided stores
(which are now unordered), to also perform fewer memory operations (in
effect, picking a random active element to write back). The behavior
is indistinguishable from a possible legal execution of prior scheme,
and has potential niche use of storing element value to memory when it
is known all elements have same value.
https://github.com/riscv/riscv-v-spec/commit/398d453e3592efbac77cc8f6658009759901185a#diff-ea57dd7a8daf0aa62f553688c1970c8e6608945d25597f8661c5ea6670fb509c
I suppose we could also reserve the encoding with strided stores of
rs1=x0, but this would add some asymmetry. Software could then get a
similar effect by settng vl=1 before the store.
Krste
|
|

Krste Asanovic
I made an error copied from my meeting notes - this should be when rs2=x0 (i.e., the stride value),
toggle quoted message
Show quoted text
I think this is a bad idea for both loads and stores. If the intent is a single load or single store, then there should be another way to do it.
Using vector loads/stores with stride=0 is one way to read/write a vector from/to a memory-mapped FIFO.
(I think we also discussed a way to do ordered writes for such cases earlier, which is necessary for FIFO-based communication; I don't recall whether this was discussed around strides. If there is a special way to declare ordered writes, then I'm only concerned with using a FIFO with that mode.)
These are all supported with ordered scatters/gathers to/from a single address.
We wanted to remove ordering requirements from all other vector load/store types.
Krste
On Mon, Nov 9, 2020 at 8:38 AM Krste Asanovic < krste@...> wrote:
Also on github as issue #595
In our earlier TG discussion in 9/18 meeting, we were in favor of
allowing vector strided load instructions with rs1=x0 to perform fewer
memory accesses than the number of active elements. This allows
higher-performing splats of a scalar memory value into a vector.
In writing this up, I inadvertently made this true for stores too.
But on review, I can't see a reason to not also allow strided stores
(which are now unordered), to also perform fewer memory operations (in
effect, picking a random active element to write back). The behavior
is indistinguishable from a possible legal execution of prior scheme,
and has potential niche use of storing element value to memory when it
is known all elements have same value.
https://github.com/riscv/riscv-v-spec/commit/398d453e3592efbac77cc8f6658009759901185a#diff-ea57dd7a8daf0aa62f553688c1970c8e6608945d25597f8661c5ea6670fb509c
I suppose we could also reserve the encoding with strided stores of
rs1=x0, but this would add some asymmetry. Software could then get a
similar effect by settng vl=1 before the store.
Krste
|
|

Nick Knight
Sorry, slightly off topic, but what was the rationale for
When `rs2!=x0` and the value of `x[rs2]=0`, the implementation must perform one memory access for each active element (but these accesses will not be ordered).
I guess I'm thinking about the possibility of a toolchain relaxing `li, x1, 0; inst x1` into `inst x0`.
On Mon, Nov 9, 2020 at 10:09 AM Krste Asanovic < krste@...> wrote: I made an error copied from my meeting notes - this should be when rs2=x0 (i.e., the stride value),
Krste
I think this is a bad idea for both loads and stores. If the intent is a single load or single store, then there should be another way to do it.
Using vector loads/stores with stride=0 is one way to read/write a vector from/to a memory-mapped FIFO.
(I think we also discussed a way to do ordered writes for such cases earlier, which is necessary for FIFO-based communication; I don't recall whether this was discussed around strides. If there is a special way to declare ordered writes, then I'm only concerned with using a FIFO with that mode.)
These are all supported with ordered scatters/gathers to/from a single address.
We wanted to remove ordering requirements from all other vector load/store types.
Krste
On Mon, Nov 9, 2020 at 8:38 AM Krste Asanovic < krste@...> wrote:
Also on github as issue #595
In our earlier TG discussion in 9/18 meeting, we were in favor of
allowing vector strided load instructions with rs1=x0 to perform fewer
memory accesses than the number of active elements. This allows
higher-performing splats of a scalar memory value into a vector.
In writing this up, I inadvertently made this true for stores too.
But on review, I can't see a reason to not also allow strided stores
(which are now unordered), to also perform fewer memory operations (in
effect, picking a random active element to write back). The behavior
is indistinguishable from a possible legal execution of prior scheme,
and has potential niche use of storing element value to memory when it
is known all elements have same value.
https://github.com/riscv/riscv-v-spec/commit/398d453e3592efbac77cc8f6658009759901185a#diff-ea57dd7a8daf0aa62f553688c1970c8e6608945d25597f8661c5ea6670fb509c
I suppose we could also reserve the encoding with strided stores of
rs1=x0, but this would add some asymmetry. Software could then get a
similar effect by settng vl=1 before the store.
Krste
|
|

Krste Asanovic
There’s a comment about this in spec already.
But note that this would be in a case where you're relying on having multiple accesses in a non-deterministic order to one memory location, which is probably fraught for other reasons.
toggle quoted message
Show quoted text
Sorry, slightly off topic, but what was the rationale for
When `rs2!=x0` and the value of `x[rs2]=0`, the implementation must perform one memory access for each active element (but these accesses will not be ordered).
I guess I'm thinking about the possibility of a toolchain relaxing `li, x1, 0; inst x1` into `inst x0`.
On Mon, Nov 9, 2020 at 10:09 AM Krste Asanovic < krste@...> wrote: I made an error copied from my meeting notes - this should be when rs2=x0 (i.e., the stride value),
Krste
I think this is a bad idea for both loads and stores. If the intent is a single load or single store, then there should be another way to do it.
Using vector loads/stores with stride=0 is one way to read/write a vector from/to a memory-mapped FIFO.
(I think we also discussed a way to do ordered writes for such cases earlier, which is necessary for FIFO-based communication; I don't recall whether this was discussed around strides. If there is a special way to declare ordered writes, then I'm only concerned with using a FIFO with that mode.)
These are all supported with ordered scatters/gathers to/from a single address.
We wanted to remove ordering requirements from all other vector load/store types.
Krste
On Mon, Nov 9, 2020 at 8:38 AM Krste Asanovic < krste@...> wrote: Also on github as issue #595
In our earlier TG discussion in 9/18 meeting, we were in favor of allowing vector strided load instructions with rs1=x0 to perform fewer memory accesses than the number of active elements. This allows higher-performing splats of a scalar memory value into a vector.
In writing this up, I inadvertently made this true for stores too. But on review, I can't see a reason to not also allow strided stores (which are now unordered), to also perform fewer memory operations (in effect, picking a random active element to write back). The behavior is indistinguishable from a possible legal execution of prior scheme, and has potential niche use of storing element value to memory when it is known all elements have same value.
https://github.com/riscv/riscv-v-spec/commit/398d453e3592efbac77cc8f6658009759901185a#diff-ea57dd7a8daf0aa62f553688c1970c8e6608945d25597f8661c5ea6670fb509c
I suppose we could also reserve the encoding with strided stores of rs1=x0, but this would add some asymmetry. Software could then get a similar effect by settng vl=1 before the store.
Krste
|
|

Nick Knight
I understand now. I'm on board iff the memory consistency model experts assent.
toggle quoted message
Show quoted text
On Mon, Nov 9, 2020 at 11:41 AM Krste Asanovic < krste@...> wrote: There’s a comment about this in spec already.
But note that this would be in a case where you're relying on having multiple accesses in a non-deterministic order to one memory location, which is probably fraught for other reasons.
Krste
Sorry, slightly off topic, but what was the rationale for
When `rs2!=x0` and the value of `x[rs2]=0`, the implementation must perform one memory access for each active element (but these accesses will not be ordered).
I guess I'm thinking about the possibility of a toolchain relaxing `li, x1, 0; inst x1` into `inst x0`.
On Mon, Nov 9, 2020 at 10:09 AM Krste Asanovic < krste@...> wrote: I made an error copied from my meeting notes - this should be when rs2=x0 (i.e., the stride value),
Krste
I think this is a bad idea for both loads and stores. If the intent is a single load or single store, then there should be another way to do it.
Using vector loads/stores with stride=0 is one way to read/write a vector from/to a memory-mapped FIFO.
(I think we also discussed a way to do ordered writes for such cases earlier, which is necessary for FIFO-based communication; I don't recall whether this was discussed around strides. If there is a special way to declare ordered writes, then I'm only concerned with using a FIFO with that mode.)
These are all supported with ordered scatters/gathers to/from a single address.
We wanted to remove ordering requirements from all other vector load/store types.
Krste
On Mon, Nov 9, 2020 at 8:38 AM Krste Asanovic < krste@...> wrote: Also on github as issue #595
In our earlier TG discussion in 9/18 meeting, we were in favor of allowing vector strided load instructions with rs1=x0 to perform fewer memory accesses than the number of active elements. This allows higher-performing splats of a scalar memory value into a vector.
In writing this up, I inadvertently made this true for stores too. But on review, I can't see a reason to not also allow strided stores (which are now unordered), to also perform fewer memory operations (in effect, picking a random active element to write back). The behavior is indistinguishable from a possible legal execution of prior scheme, and has potential niche use of storing element value to memory when it is known all elements have same value.
https://github.com/riscv/riscv-v-spec/commit/398d453e3592efbac77cc8f6658009759901185a#diff-ea57dd7a8daf0aa62f553688c1970c8e6608945d25597f8661c5ea6670fb509c
I suppose we could also reserve the encoding with strided stores of rs1=x0, but this would add some asymmetry. Software could then get a similar effect by settng vl=1 before the store.
Krste
|
|
This sounds right to me as well. No use making a special case for strided stores with rs2=x0.
Bill
On 11/9/20 12:04 PM, Nick Knight wrote:
toggle quoted message
Show quoted text
EXTERNAL MAIL
I understand now. I'm on board iff the memory consistency model experts assent.
On Mon, Nov 9, 2020 at 11:41 AM Krste Asanovic < krste@...> wrote:
There’s a comment about this in spec already.
But note that this would be in a case where you're relying on having multiple accesses in a non-deterministic order to one memory location, which is probably fraught for other reasons.
Krste
Sorry, slightly off topic, but what was the rationale for
When `rs2!=x0` and the value of `x[rs2]=0`, the implementation must perform one memory access for each active element (but these accesses will not be ordered).
I guess I'm thinking about the possibility of a toolchain relaxing `li, x1, 0; inst x1` into `inst x0`.
On Mon, Nov 9, 2020 at 10:09 AM Krste Asanovic < krste@...> wrote:
I made an error copied from my meeting notes - this should be when rs2=x0 (i.e., the stride value),
Krste
I think this is a bad idea for both loads and stores. If the intent is a single load or single store, then there should be another way to do it.
Using vector loads/stores with stride=0 is one way to read/write a vector from/to a memory-mapped FIFO.
(I think we also discussed a way to do ordered writes for such cases earlier, which is necessary for FIFO-based communication; I don't recall whether this was discussed around strides. If there is a special way to declare ordered writes, then
I'm only concerned with using a FIFO with that mode.)
These are all supported with ordered scatters/gathers to/from a single address.
We wanted to remove ordering requirements from all other vector load/store types.
Krste
On Mon, Nov 9, 2020 at 8:38 AM Krste Asanovic < krste@...> wrote:
Also on github as issue #595
In our earlier TG discussion in 9/18 meeting, we were in favor of
allowing vector strided load instructions with rs1=x0 to perform fewer
memory accesses than the number of active elements. This allows
higher-performing splats of a scalar memory value into a vector.
In writing this up, I inadvertently made this true for stores too.
But on review, I can't see a reason to not also allow strided stores
(which are now unordered), to also perform fewer memory operations (in
effect, picking a random active element to write back). The behavior
is indistinguishable from a possible legal execution of prior scheme,
and has potential niche use of storing element value to memory when it
is known all elements have same value.
https://github.com/riscv/riscv-v-spec/commit/398d453e3592efbac77cc8f6658009759901185a#diff-ea57dd7a8daf0aa62f553688c1970c8e6608945d25597f8661c5ea6670fb509c
I suppose we could also reserve the encoding with strided stores of
rs1=x0, but this would add some asymmetry. Software could then get a
similar effect by settng vl=1 before the store.
Krste
|
|