Re: Smaller embedded version of the Vector extension


Andrew Waterman
 

It’s actually not fundamental to the ISA design that VLEN >= ELEN. An implementation with VLEN=32 could support SEW=64 whenever LMUL >= 2. This approach starts to pose code-generation headaches, but it is at least theoretically viable.

As compared to cutting the number of registers in half, the above approach has the advantage of offering more vector registers when longer elements are not needed, even though the total storage cost is the same.

On Wed, Jun 2, 2021 at 8:21 AM Tariq Kurd via lists.riscv.org <tariq.kurd=huawei.com@...> wrote:
















OK, so it seems that to run our software (which Tony Cole referred to) we need VLEN>=64 for our embedded application.



Is there any scope for reducing the number of V registers? Could RV32E_Vmin have 16 X and V registers?



I know it doesn’t affect the number of F registers, which is tackled by having Zfinx instead to save area – but it seems that we need another solution for the vectors.



 



Then we can match ARM MVE for area – 8x128-bit compared to 16x64-bit



 



Tariq



 



From: tech-vector-ext@... <tech-vector-ext@...>

On Behalf Of Bruce Hoult


Sent: 02 June 2021 13:34


To: Tony Cole <tony.cole@...>


Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; Shaofei (B) <shaofei1@...>


Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension



 







I an not a fan of the vslide instructions. It seems they expose the size of the vector registers in a very unfortunate way. In particular they break down if VLEN=1. Most

code would be better off storing and loading with an offset.







 







I think I saw somewhere they are largely intended for debuggers.







 







On Thu, Jun 3, 2021 at 12:15 AM Tony Cole <tony.cole@...> wrote:











So, (on a 32x 32-bit vector register machine) the widening and narrowing instructions can use 64-bit elements (for destination and source respectively),

but not any of other instructions, correct?



 



Note: I use many instructions while processing 64-bit “wide” and “quad” elements, e.g. vrgather_vx_i64m8, vslide1down_vx_i64m4, vslidedown_vx_i64m8,

vredsum_vs_i64m8, etc.



 



Therefore, this code would not work on a 32x 32-bit vector register machine.



 



 



Tony



 



 



From:

tech-vector-ext@... [mailto:tech-vector-ext@...]

On Behalf Of Bruce Hoult


Sent: 02 June 2021 12:18


To: Tony Cole <tony.cole@...>


Cc: Tariq Kurd <tariq.kurd@...>;

tech-vector-ext@...; Shaofei (B) <shaofei1@...>


Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension



 







Note that the effective LMUL is limited to 8, the same as the actual LMUL, so if you've set e32m4 (32 bit elements with LMUL=4) then you can only widen to 64 bit results, not 128

bit. 







 







On Wed, Jun 2, 2021 at 11:15 PM Bruce Hoult <bruce@...> wrote:











Yes. The Standard Element Width (SEW) would be limited to 32 bits, but the widening multiplies and accumulates produce the same number of wider results using multiple registers

(higher effective LMUL)







 







See section 5.2. Vector Operands







 







Each vector operand has an effective element width (EEW) and an effective LMUL (EMUL) that is used to determine the size and location

of all the elements within a vector register group. By default, for most operands of most instructions, EEW=SEW and EMUL=LMUL.








Some vector instructions have source and destination vector operands with the same number of elements but different widths, so that EEW and EMUL differ from SEW and LMUL respectively but EEW/EMUL = SEW/LMUL. For example, most widening arithmetic instructions

have a source group with EEW=SEW and EMUL=LMUL but destination group with EEW=2*SEW and EMUL=2*LMUL. Narrowing instructions have a source operand that has EEW=2*SEW and EMUL=2*LMUL but destination where EEW=SEW and EMUL=LMUL.





Vector operands or results may occupy one or more vector registers depending on EMUL, but are always specified using the lowest-numbered

vector register in the group. Using other than the lowest-numbered vector register to specify a vector register group is a reserved encoding.







 







 







 







On Wed, Jun 2, 2021 at 11:11 PM Tony Cole <tony.cole@...> wrote:











Having 32x 32 bit registers with LMUL=4, giving 8x 128 bits - does this allow for 64-bit elements?



I don't think it does, but it’s not clear in the spec.



 



I use 64-bit elements for “wide” and “quad” accumulators.



 



 



From:

tech-vector-ext@... [mailto:tech-vector-ext@...]

On Behalf Of Bruce Hoult


Sent: 02 June 2021 11:19


To: Tariq Kurd <
tariq.kurd@...>


Cc:
tech-vector-ext@...; Shaofei (B) <shaofei1@...>


Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension



 







There is nothing to prevent implementing 32x 32 bit registers on a 32 bit CPU. The application processor spec has quite







recently (a few months) specified a 128 bit minimum register size but I don't think there's any good reason for this,







especially in embedded.







 







With that configuration, LMUL=4 gives 8x 128 bits, the same as MVE.







 







If floating point is desired then Zfinx is available, sharing int & fp scalar registers instead of fp and vector registers.







 







Of course profiles (or just custom chips for custom applications) can define subsets of instructions.







 







On Wed, Jun 2, 2021 at 10:05 PM Tariq Kurd via

lists.riscv.org <tariq.kurd=huawei.com@...> wrote:











Hi everyone,



 



Are there any plans for a cut-down configuration of the vector extension suitable for embedded cores? It seems that the 32x128-bit register file is suitable for application class

cores but it very large for embedded cores, especially if the F registers also need to be implemented (which I think is the case, unless a Zfinx version is specified).



 



ARM MVE only has 8x128-bit registers for FP and Vector, so it much more suitable for embedded applications.



https://en.wikichip.org/wiki/arm/helium



 



What’s the approach here? Should embedded applications implement the P-extension instead?



 



Tariq



 



Tariq Kurd



Processor Design

I RISC-V Cores, Bristol



E-mail:

Tariq.Kurd@...



Company:

Huawei technologies R&D (UK) Ltd

I Address: 290

Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32 4TR,
 UK      



 



315px-Huawei   

http://www.huawei.com







This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any

use of the information contained herein in any way (including, but not limited to, total or partial disclosure,reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify

the sender by phone or email immediately and delete it !





本邮件及其附件含有华为公司的保密信息,仅限于发送给上面 地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!



 



















































Join {tech-vector-ext@lists.riscv.org to automatically receive all group messages.