Calling Convention for Vector ?
"戎杰杰
Hi,
Anyone know extra designed ABI information (like Calling Convention)
about for vector register ?
--Jojo
|
|
andrew@...
There is a brief sketch of the Linux vector calling convention here: Note this is the convention for normal C ABI calls; a separate convention will be adopted for vector millicode calls. On Mon, Dec 23, 2019 at 2:12 PM "戎杰杰 <jiejie.rjj@...> wrote:
|
|
"戎杰杰
Hi,
Thanks for your mention.
It’s so clear & simple, there is no convention for vector args & return of function ?
also, according our long time designed cpu experiments, there should be some
callee saved vector registers for performance across some complicated function calls, right ? :)
Any considers or details for excluding like vector args ?
--Jojo
在 2019年12月24日 +0800 AM4:46,Andrew Waterman <andrew@...>,写道:
|
|
Earl Killian
Vectors are passed in memory and returned in memory. Vectors are arbitrary length, whereas the vector registers are fixed length, and can only be used to temporarily hold a portion of a memory vector. Thus it doesn’t make sense to pass or return things in vector registers, or to have the registers saved or restored as part of the calling convention.
toggle quoted message
Show quoted text
On Dec 25, 2019, at 20:17, "戎杰杰 <jiejie.rjj@...> wrote: |
|
On Thu, Dec 26, 2019 at 2:01 PM Earl Killian <earl.killian@...> wrote:
Vectors are passed in memory and returned in memory. Vectors are arbitrary length, whereas the vector registers are fixed length, and can only be used to temporarily hold a portion of a memory vector. Thus it doesn’t make sense to pass or return things in vector registers, or to have the registers saved or restored as part of the calling convention.Some code will not want args in vector regs, so that we don't have to save/restore them around calls. Some code will want args in vector regs, so that they can have subroutines that operate on vectors. If you have already loaded part of a vector into a vector register, it is silly to send it back to memory just so you can call a function that reads it back in. It is better to leave it in a register to reduce memory bandwidth. So we need two calling conventions. Or alternatively, one calling convention with optional vector support that can be enabled only when needed. If you look at ARM SVE, you will see that this is what they have done. I think this is more complicated for rvv though as we have LMUL up to 8, which means we need 16 registers worst case for two arguments, which will have to be v8-15 or v16-v23 or v24-v31 because of alignment issues. Plus we need v0 for an optional mask so we can't use v1-v7 for arguments. And vlen will have to be an implicit argument. Someone will have to spend time doing experiments to see how well this works in practice to make sure it is reasonable. And we will need a reasonable compiler first before we can do experiments, which we don't really have yet, and may not have for a while. Not to mention hardware to test on. I think it will be a while before we can formally specify a vector calling convention. Jim |
|
"戎杰杰
Hi,
We met some problems as your mention also.
Consider some code will want args in vector regs, we study from SVE
vregs layout and config our RISCV vregs layout as following:
| v0-7 | v0-7 | Temporaries | Caller |
| v8-15 | v8-15 | Function arguments/return values | Caller |
| v16-23 | v16-23 | Function arguments | Caller |
| v24-31 | v24-31 | Saved register | Callee |
This configuration will fix like v0 mask reg,
or we can use 16 registers for two arguments in 8 LMUL.
We can make a draft to improving call convention with args in vector :)
--Jojo
在 2019年12月28日 +0800 AM12:12,Jim Wilson <jimw@...>,写道: On Thu, Dec 26, 2019 at 2:01 PM Earl Killian <earl.killian@...> wrote: |
|
andrew@...
Providing callee-saved vector registers in the regular C calling convention might actually degrade performance, as most vector computation is done in leaf functions or in strip-mine loops that don't call functions. Functions that want to use all the vector registers will have to spill some callee-saved registers, even if the callee-saved registers aren't providing much benefit. By contrast, the vector millicode calling convention (for routines like element-wise transcendentals) would likely benefit from an alternate calling convention that has some callee-saved vector registers. On Mon, Jan 13, 2020 at 12:35 AM 戎杰杰 <jiejie.rjj@...> wrote:
|
|
Andy Glew Si5
Oh, heck [*]:
Callee saved registers of any form can have bad performance where there is a potential partial register issue. E.g. on an out of order machine with register renaming. Although even some simple non-out of order microarchitectures benefit from register renaming.
RISC-V vectors have partial register issues due to masks and vector length.
(Note *: I sent something like this email to Andrew, since I was chicken to talk to the list. Embarrassingly, justifying my cowardice, I flipped a bit between callee and caller saved registers in that original email. It's callee save that has partial register issues. Andrew reminded me about vector masks as a cause of partial register issues, which I should've known about if my brain had been working right, and told me about vector length as a cause of partial register issues in RISC-V, which I should've realized but admittedly have not worked on a vector length architecture in many years.)
From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of Andrew Waterman
Providing callee-saved vector registers in the regular C calling convention might actually degrade performance, as most vector computation is done in leaf functions or in strip-mine loops that don't call functions. Functions that want to use all the vector registers will have to spill some callee-saved registers, even if the callee-saved registers aren't providing much benefit.
By contrast, the vector millicode calling convention (for routines like element-wise transcendentals) would likely benefit from an alternate calling convention that has some callee-saved vector registers.
On Mon, Jan 13, 2020 at 12:35 AM 戎杰杰 <jiejie.rjj@...> wrote:
|
|