Slidedown overlapping of dest and source regsiters
Thang Tran
The slideup instruction has this restriction:
The destination vector register group for vslideup cannot overlap the source vector register group or the mask register, otherwise an illegal instruction exception is raised. The slidedown instruction has different restriction: The destination vector register group cannot overlap the mask register if LMUL>1, otherwise an illegal instruction exception is raised. The overlapping of the source and destination registers assumes the implementation to be in a certain way which is inflexible. I think that the slidedown instruction should have the same restriction of non-overlapping of source and destination registers. Thanks, Thang |
|
andrew@...
It's important that the slidedown instruction can overwrite its source operand. Debuggers will use this feature to populate a vector register in-place without clobbering other architectural state. On Tue, Jan 28, 2020 at 10:59 AM Thang Tran <thang@...> wrote: The slideup instruction has this restriction: |
|
Thang Tran
Hi Andrew, I do not understand your statement. Why is it important? Why is the difference with slideup?
The slideup cannot clobber the source operand with destination operand because the destination register writes to source register before the source operand is read.
The slidedown instruction should be the same because my implementation would writes to the source register before the source operand is read. The allowed overlapping of source & destination registers assumes a certain implementation of slidedown which is not good for other people.
Thanks, Thang
From: Andrew Waterman [mailto:andrew@...]
Sent: Tuesday, January 28, 2020 11:23 AM To: Thang Tran <thang@...> Cc: Krste Asanovic <krste@...>; tech-vector-ext@... Subject: Re: [RISC-V] [tech-vector-ext] Slidedown overlapping of dest and source regsiters
It's important that the slidedown instruction can overwrite its source operand. Debuggers will use this feature to populate a vector register in-place without clobbering other architectural state.
On Tue, Jan 28, 2020 at 10:59 AM Thang Tran <thang@...> wrote:
|
|
Guy Lemieux
Hi Thang,
toggle quoted message
Show quoted text
I think Andrew is suggesting that the vslideup restriction is there to allow some flexibility with implementations. However, one of (vslideup/vslidedown) needs to allow the same source/dest register (group) because the debugger is going to use this feature to inject new data without clobbering other vector registers. I believe most implementations iterating over a vector will be incrementing the element index -- this allows vslidedown to safely clobber earlier elements (higher index values are being read out while lower index values are being written, so the lower index values will have been previously read and the elements are in-transit in the pipeline). If your vector implementation is decrementing the element index, then you couldn't allow src/dst overlap with vslidedown, but you could allow it with vslideup. Hence, there is an implicit assumption here about implementations (ie, count up is preferred, or else you have to buffer the whole vector register group). I'm not sure how the debugger would be using this feature, but if I had to guess, I think the debugger would actually be using vslide1down (not vslidedown) to inject data into a vector. So, perhaps the overlapping src/dst requirement should only be for vslide1down? Also, as an alternative, there are also various vmv instructions that could be used by the debugger which move one element at a time and do allow overlapping src/dst. I don't think debugger performance is crucial. Guy On Tue, Jan 28, 2020 at 12:42 PM Thang Tran <thang@...> wrote:
|
|
Thang Tran
Thanks Guy for the explanation, but my implementation is both incrementing element index for slideup and decrementing element index for slidedown (which is symmetrical implementation and simplest from my point of view).
toggle quoted message
Show quoted text
I have no issue with dest/source registers overlapping for slide1down and slide1up. As you suggested can be used for debugging. Thanks, Thang -----Original Message-----
From: Guy Lemieux [mailto:glemieux@...] Sent: Tuesday, January 28, 2020 1:40 PM To: Thang Tran <thang@...> Cc: Andrew Waterman <andrew@...>; Krste Asanovic <krste@...>; tech-vector-ext@... Subject: Re: [RISC-V] [tech-vector-ext] Slidedown overlapping of dest and source regsiters Hi Thang, I think Andrew is suggesting that the vslideup restriction is there to allow some flexibility with implementations. However, one of (vslideup/vslidedown) needs to allow the same source/dest register (group) because the debugger is going to use this feature to inject new data without clobbering other vector registers. I believe most implementations iterating over a vector will be incrementing the element index -- this allows vslidedown to safely clobber earlier elements (higher index values are being read out while lower index values are being written, so the lower index values will have been previously read and the elements are in-transit in the pipeline). If your vector implementation is decrementing the element index, then you couldn't allow src/dst overlap with vslidedown, but you could allow it with vslideup. Hence, there is an implicit assumption here about implementations (ie, count up is preferred, or else you have to buffer the whole vector register group). I'm not sure how the debugger would be using this feature, but if I had to guess, I think the debugger would actually be using vslide1down (not vslidedown) to inject data into a vector. So, perhaps the overlapping src/dst requirement should only be for vslide1down? Also, as an alternative, there are also various vmv instructions that could be used by the debugger which move one element at a time and do allow overlapping src/dst. I don't think debugger performance is crucial. Guy On Tue, Jan 28, 2020 at 12:42 PM Thang Tran <thang@...> wrote:
|
|
Guy Lemieux
Thanks Guy for the explanation, but my implementation is both incrementing element index for slideup and decrementing element index for slidedown (which is symmetrical implementation and simplest from my point of view).I'm curious why you chose to be symmetrical (no need), and why you decided incrementing for slideup decrementing for slidedn (I would do the opposite). By incrementing for vslidedown, and decrementing for vslideup, it eliminates the race condition in both directions and allows overlapping src/dst for both. However, by supporting both incrementing and decrementing, you are adding extra hardware that isn't strictly necessary. Guy |
|
andrew@...
On Tue, Jan 28, 2020 at 1:40 PM Guy Lemieux <glemieux@...> wrote: Hi Thang, Oops, yes, I meant vslide1down. Using vslide1down isn't about performance; it's the only way I know of for the debugger to construct a vector without additional storage. The alternative would have been to add an instruction to insert an element into an arbitrary element position, which for various reasons was deemed a less-preferable alternative.
|
|