Re: Calling Convention for Vector ?

Andy Glew Si5

Oh, heck [*]:


Callee saved registers of any form can have bad performance where there is a potential partial register issue. E.g. on an out of order machine with register renaming. Although even some simple non-out of order microarchitectures benefit from register renaming.


RISC-V vectors have partial register issues due to masks and vector length.


(Note *: I sent something like this email to Andrew, since I was chicken to talk to the list. Embarrassingly, justifying my cowardice, I flipped a bit between callee and caller saved registers in that original email. It's callee save that has partial register issues. Andrew reminded me about vector masks as a cause of partial register issues, which I should've known about if my brain had been working right, and told me about vector length as a cause of partial register issues in RISC-V, which I should've realized but admittedly have not worked on a vector length architecture in many years.)


From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of Andrew Waterman
Sent: Monday, January 13, 2020 14:18
To: 戎杰杰 <jiejie.rjj@...>
Cc: Earl Killian <earl.killian@...>; Jim Wilson <jimw@...>; tech-vector-ext@...; mingjie@...
Subject: Re: [RISC-V] [tech-vector-ext] Calling Convention for Vector ?


Providing callee-saved vector registers in the regular C calling convention might actually degrade performance, as most vector computation is done in leaf functions or in strip-mine loops that don't call functions.  Functions that want to use all the vector registers will have to spill some callee-saved registers, even if the callee-saved registers aren't providing much benefit.


By contrast, the vector millicode calling convention (for routines like element-wise transcendentals) would likely benefit from an alternate calling convention that has some callee-saved vector registers.


On Mon, Jan 13, 2020 at 12:35 AM 戎杰杰 <jiejie.rjj@...> wrote:



 We met some problems as your mention also.


 Consider some code will want args in vector regs, we study from SVE

 vregs layout and config our RISCV vregs layout as following:


 | v0-7     | v0-7     | Temporaries | Caller |

 | v8-15   | v8-15   | Function arguments/return values | Caller |

 | v16-23 | v16-23 | Function arguments | Caller |

 | v24-31 | v24-31 | Saved register | Callee |


 This configuration will fix like v0 mask reg,

 or we can use 16 registers for two arguments in 8 LMUL.

 We can make a draft to improving call convention with args in vector :)




20191228 +0800 AM12:12Jim Wilson <jimw@...>,写道:

On Thu, Dec 26, 2019 at 2:01 PM Earl Killian <earl.killian@...> wrote:

Vectors are passed in memory and returned in memory. Vectors are arbitrary length, whereas the vector registers are fixed length, and can only be used to temporarily hold a portion of a memory vector. Thus it doesn’t make sense to pass or return things in vector registers, or to have the registers saved or restored as part of the calling convention.

Some code will not want args in vector regs, so that we don't have to
save/restore them around calls. Some code will want args in vector
regs, so that they can have subroutines that operate on vectors. If
you have already loaded part of a vector into a vector register, it is
silly to send it back to memory just so you can call a function that
reads it back in. It is better to leave it in a register to reduce
memory bandwidth. So we need two calling conventions. Or
alternatively, one calling convention with optional vector support
that can be enabled only when needed. If you look at ARM SVE, you
will see that this is what they have done.

I think this is more complicated for rvv though as we have LMUL up to
8, which means we need 16 registers worst case for two arguments,
which will have to be v8-15 or v16-v23 or v24-v31 because of alignment
issues. Plus we need v0 for an optional mask so we can't use v1-v7
for arguments. And vlen will have to be an implicit argument.
Someone will have to spend time doing experiments to see how well this
works in practice to make sure it is reasonable. And we will need a
reasonable compiler first before we can do experiments, which we don't
really have yet, and may not have for a while. Not to mention
hardware to test on. I think it will be a while before we can
formally specify a vector calling convention.


Join { to automatically receive all group messages.