|
Vector TG minutes for 2020/12/18 meeting
in terms of overlap with that case — that case normally selects maximally sized AVL. the implied goals there are to make best use of vector register capacity and throughput. l i’m suggesting a case wh
in terms of overlap with that case — that case normally selects maximally sized AVL. the implied goals there are to make best use of vector register capacity and throughput. l i’m suggesting a case wh
|
By
Guy Lemieux
· #569
·
|
|
Vector TG minutes for 2020/12/18 meeting
I agree with you. I had suggested the mapping of 00000 to an implementation-defined value (chosen by the CPU architect). For some architectures, this may be 16, for others it may be 32, or even 2. The
I agree with you. I had suggested the mapping of 00000 to an implementation-defined value (chosen by the CPU architect). For some architectures, this may be 16, for others it may be 32, or even 2. The
|
By
Guy Lemieux
· #567
·
|
|
Vector TG minutes for 2020/12/18 meeting
for vsetivli, with the uimm=00000 encoding, rather than setting vl to 32, how setting it to some other meaning? one option is to set vl=VLMAX. i have some concerns about software using this safely (eg
for vsetivli, with the uimm=00000 encoding, rather than setting vl to 32, how setting it to some other meaning? one option is to set vl=VLMAX. i have some concerns about software using this safely (eg
|
By
Guy Lemieux
· #547
·
|
|
vector strided stores when rs1=x0
I think this is a bad idea for both loads and stores. If the intent is a single load or single store, then there should be another way to do it. Using vector loads/stores with stride=0 is one way to r
I think this is a bad idea for both loads and stores. If the intent is a single load or single store, then there should be another way to do it. Using vector loads/stores with stride=0 is one way to r
|
By
Guy Lemieux
· #506
·
|
|
[RISC-V] [tech] [RISC-V] [tech-*] STRATEGIC FEATURE COEXISTANCE was:([tech-fast-int] usefulness of PUSHINT/POPINT from [tech-code-size])
Thanks Tim, I think that sums it up nicely. I just wanted to put a pointer out to the original post that I made on isa-dev regarding opcode sharing / management: https://groups.google.com/a/groups.ris
Thanks Tim, I think that sums it up nicely. I just wanted to put a pointer out to the original post that I made on isa-dev regarding opcode sharing / management: https://groups.google.com/a/groups.ris
|
By
Guy Lemieux
· #493
·
|
|
V extension groups analogue to the standard groups
I think a common embedded and FPGA scenarios will be F on the scalar side but no F on the vector side. Adding F to V is nontrivial in area, particularly for FPGAs that lack FPUs, yet an integer-only V
I think a common embedded and FPGA scenarios will be F on the scalar side but no F on the vector side. Adding F to V is nontrivial in area, particularly for FPGAs that lack FPUs, yet an integer-only V
|
By
Guy Lemieux
· #367
·
|
|
Vector Task Group minutes 2020/5/15
Alex, Keep in mind: a) I amended my proposal to reduce the code bloat identified by Nick b) the effect of the bloat is almost entirely about text segment size, not power or instruction bandwidth, beca
Alex, Keep in mind: a) I amended my proposal to reduce the code bloat identified by Nick b) the effect of the bloat is almost entirely about text segment size, not power or instruction bandwidth, beca
|
By
Guy Lemieux
· #182
·
|
|
Vector Task Group minutes 2020/5/15 - precise layout not matter
I propose 2 data layouts: memory layout, and internal register group layout. I am not going to specify which internal register group layout to operate upon, because I haven't read the 0.9 spec and don
I propose 2 data layouts: memory layout, and internal register group layout. I am not going to specify which internal register group layout to operate upon, because I haven't read the 0.9 spec and don
|
By
Guy Lemieux
· #177
·
|
|
Vector Task Group minutes 2020/5/15
Nick, thanks for that code snippet, it's really insightful. I have a few comments: a) this is for LMUL=8, the worst-case (most code bloat) b) this would be automatically generated by a compiler, so vi
Nick, thanks for that code snippet, it's really insightful. I have a few comments: a) this is for LMUL=8, the worst-case (most code bloat) b) this would be automatically generated by a compiler, so vi
|
By
Guy Lemieux
· #174
·
|
|
Vector Task Group minutes 2020/5/15
The precise data layout pattern does not matter. What matters is that a single distribution pattern is agreed upon to avoid fragmenting the software ecosystem. With my additional restriction, the load
The precise data layout pattern does not matter. What matters is that a single distribution pattern is agreed upon to avoid fragmenting the software ecosystem. With my additional restriction, the load
|
By
Guy Lemieux
· #172
·
|
|
Vector Task Group minutes 2020/5/15
As a follow-up, the main goal of LMUL>1 is to get better storage efficiency out of the register file, allowing for slightly higher compute unit utilization. The memory system should not require LMUL>1
As a follow-up, the main goal of LMUL>1 is to get better storage efficiency out of the register file, allowing for slightly higher compute unit utilization. The memory system should not require LMUL>1
|
By
Guy Lemieux
· #169
·
|
|
Vector Task Group minutes 2020/5/15
I support this scheme, but I would further add a restriction on loads/stores to only support LMUL=1 (no register groups). Instead, any data stored in a registe group with LMUL!=1 must first be “cast”
I support this scheme, but I would further add a restriction on loads/stores to only support LMUL=1 (no register groups). Instead, any data stored in a registe group with LMUL!=1 must first be “cast”
|
By
Guy Lemieux
· #168
·
|
|
[RISC-V][tech-vector-ext] Intrinsics for vector programming in C.
sorry I wasn’t clear. the data types, eg vint64m1_t, are for data elements. all of the example API definitions also use these same element data types as arguments, but they should be using a data type
sorry I wasn’t clear. the data types, eg vint64m1_t, are for data elements. all of the example API definitions also use these same element data types as arguments, but they should be using a data type
|
By
Guy Lemieux
· #139
·
|
|
[RISC-V][tech-vector-ext] Intrinsics for vector programming in C.
well done! can you perhaps explain how the vector operands are to be used/allocated? ie, it appears you wish to use the type system to name vector registers, but there is no limit on these so there mu
well done! can you perhaps explain how the vector operands are to be used/allocated? ie, it appears you wish to use the type system to name vector registers, but there is no limit on these so there mu
|
By
Guy Lemieux
· #135
·
|
|
64-bit instruction encoding wish list
my response below is now off-topic, and covers the more flexible reductions wanted by Nagendra. i discourage any further followups here (instead, please search for another recent series of posts by Na
my response below is now off-topic, and covers the more flexible reductions wanted by Nagendra. i discourage any further followups here (instead, please search for another recent series of posts by Na
|
By
Guy Lemieux
· #51
·
|
|
Vector Indexed Loads - Partial Return?
Since this is happening to an already-slow memory access, I would guess performing checks in software would be the better way to go. There is no current talk about aborting a long-running indexed load
Since this is happening to an already-slow memory access, I would guess performing checks in software would be the better way to go. There is no current talk about aborting a long-running indexed load
|
By
Guy Lemieux
· #42
·
|
|
A couple of questions about the vector spec
possibly vslide1up after every reduction, producing a vector of reductions (possibly in backwards order, unless you rearrange your outer loop order). I'm not suggesting that you use vrgather to conver
possibly vslide1up after every reduction, producing a vector of reductions (possibly in backwards order, unless you rearrange your outer loop order). I'm not suggesting that you use vrgather to conver
|
By
Guy Lemieux
· #38
·
|
|
A couple of questions about the vector spec
1. A vector register is deliberately used as the destination of reductions. If the destination is a scalar register, then tight coupling between the vector and scalar units would be necessary, and con
1. A vector register is deliberately used as the destination of reductions. If the destination is a scalar register, then tight coupling between the vector and scalar units would be necessary, and con
|
By
Guy Lemieux
· #35
·
|
|
[tech-vector-ext] Some proposals
On a context switch, the underlying physical registers that hold spill values need to be saved/restored to memory, as well as the associated rs1 values. This means we need extra instructions to iterat
On a context switch, the underlying physical registers that hold spill values need to be saved/restored to memory, as well as the associated rs1 values. This means we need extra instructions to iterat
|
By
Guy Lemieux
· #32
·
|
|
Slidedown overlapping of dest and source regsiters
I'm curious why you chose to be symmetrical (no need), and why you decided incrementing for slideup decrementing for slidedn (I would do the opposite). By incrementing for vslidedown, and decrementing
I'm curious why you chose to be symmetrical (no need), and why you decided incrementing for slideup decrementing for slidedn (I would do the opposite). By incrementing for vslidedown, and decrementing
|
By
Guy Lemieux
· #15
·
|