Re: Smaller embedded version of the Vector extension
We do allow supported SEW to vary with LMUL, so implementation can
support single-width operations on SEW=64. See section 4.5,
| So, (on a 32x 32-bit vector register machine) the widening and narrowingOn Wed, 2 Jun 2021 12:14:33 +0000, "Tony Cole via lists.riscv.org" <email@example.com> said:
| instructions can use 64-bit elements (for destination and source
| respectively), but not any of other instructions, correct?
| Note: I use many instructions while processing 64-bit “wide” and “quad”
| elements, e.g. vrgather_vx_i64m8, vslide1down_vx_i64m4, vslidedown_vx_i64m8,
| vredsum_vs_i64m8, etc.
| Therefore, this code would not work on a 32x 32-bit vector register machine.
| From: firstname.lastname@example.org [mailto:email@example.com]
| On Behalf Of Bruce Hoult
| Sent: 02 June 2021 12:18
| To: Tony Cole <firstname.lastname@example.org>
| Cc: Tariq Kurd <email@example.com>; firstname.lastname@example.org;
| Shaofei (B) <email@example.com>
| Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector
| Note that the effective LMUL is limited to 8, the same as the actual LMUL, so
| if you've set e32m4 (32 bit elements with LMUL=4) then you can only widen to
| 64 bit results, not 128 bit.
| On Wed, Jun 2, 2021 at 11:15 PM Bruce Hoult <firstname.lastname@example.org> wrote:
| Yes. The Standard Element Width (SEW) would be limited to 32 bits, but the
| widening multiplies and accumulates produce the same number of wider
| results using multiple registers (higher effective LMUL)
| See section 5.2. Vector Operands
| Each vector operand has an effective element width (EEW) and an effective
| LMUL (EMUL) that is used to determine the size and location of all the
| elements within a vector register group. By default, for most operands of
| most instructions, EEW=SEW and EMUL=LMUL.
| Some vector instructions have source and destination vector operands with
| the same number of elements but different widths, so that EEW and EMUL
| differ from SEW and LMUL respectively but EEW/EMUL = SEW/LMUL. For
| example, most widening arithmetic instructions have a source group with
| EEW=SEW and EMUL=LMUL but destination group with EEW=2*SEW and EMUL=
| 2*LMUL. Narrowing instructions have a source operand that has EEW=2*SEW
| and EMUL=2*LMUL but destination where EEW=SEW and EMUL=LMUL.
| Vector operands or results may occupy one or more vector registers
| depending on EMUL, but are always specified using the lowest-numbered
| vector register in the group. Using other than the lowest-numbered vector
| register to specify a vector register group is a reserved encoding.
| On Wed, Jun 2, 2021 at 11:11 PM Tony Cole <email@example.com> wrote:
| Having 32x 32 bit registers with LMUL=4, giving 8x 128 bits - does
| this allow for 64-bit elements?
| I don't think it does, but it’s not clear in the spec.
| I use 64-bit elements for “wide” and “quad” accumulators.
| From: firstname.lastname@example.org [mailto:
| email@example.com] On Behalf Of Bruce Hoult
| Sent: 02 June 2021 11:19
| To: Tariq Kurd <firstname.lastname@example.org>
| Cc: email@example.com; Shaofei (B) <
| Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of
| the Vector extension
| There is nothing to prevent implementing 32x 32 bit registers on a 32
| bit CPU. The application processor spec has quite
| recently (a few months) specified a 128 bit minimum register size but
| I don't think there's any good reason for this,
| especially in embedded.
| With that configuration, LMUL=4 gives 8x 128 bits, the same as MVE.
| If floating point is desired then Zfinx is available, sharing int & fp
| scalar registers instead of fp and vector registers.
| Of course profiles (or just custom chips for custom applications) can
| define subsets of instructions.
| On Wed, Jun 2, 2021 at 10:05 PM Tariq Kurd via lists.riscv.org
| <firstname.lastname@example.org> wrote:
| Hi everyone,
| Are there any plans for a cut-down configuration of the vector
| extension suitable for embedded cores? It seems that the
| 32x128-bit register file is suitable for application class cores
| but it very large for embedded cores, especially if the F
| registers also need to be implemented (which I think is the case,
| unless a Zfinx version is specified).
| ARM MVE only has 8x128-bit registers for FP and Vector, so it much
| more suitable for embedded applications.
| What’s the approach here? Should embedded applications implement
| the P-extension instead?
| Tariq Kurd
| Processor Design I RISC-V Cores, Bristol
| E-mail: Tariq.Kurd@Huawei.com
| Company: Huawei technologies R&D (UK) Ltd I Address: 290 Park
| Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32 4TR, UK
| 315px-Huawei http://www.huawei.com
| This e-mail and its attachments contain confidential information
| from HUAWEI, which is intended only for the person or entity whose
| address is listed above. Any use of the information contained
| herein in any way (including, but not limited to, total or partial
| disclosure,reproduction, or dissemination) by persons other than
| the intended recipient(s) is prohibited. If you receive this
| e-mail in error, please notify the sender by phone or email
| immediately and delete it !
| x[DELETED ATTACHMENT image001.png, PNG image]
| x[DELETED ATTACHMENT image002.jpg, JPEG image]