Re: Smaller embedded version of the Vector extension
We do allow supported SEW to vary with LMUL, so implementation can
support single-width operations on SEW=64. See section 4.5, Krste | So, (on a 32x 32-bit vector register machine) the widening and narrowingOn Wed, 2 Jun 2021 12:14:33 +0000, "Tony Cole via lists.riscv.org" <tony.cole=huawei.com@...> said: | instructions can use 64-bit elements (for destination and source | respectively), but not any of other instructions, correct? | Note: I use many instructions while processing 64-bit “wide” and “quad” | elements, e.g. vrgather_vx_i64m8, vslide1down_vx_i64m4, vslidedown_vx_i64m8, | vredsum_vs_i64m8, etc. | Therefore, this code would not work on a 32x 32-bit vector register machine. | Tony | From: tech-vector-ext@... [mailto:tech-vector-ext@...] | On Behalf Of Bruce Hoult | Sent: 02 June 2021 12:18 | To: Tony Cole <tony.cole@...> | Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; | Shaofei (B) <shaofei1@...> | Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector | extension | Note that the effective LMUL is limited to 8, the same as the actual LMUL, so | if you've set e32m4 (32 bit elements with LMUL=4) then you can only widen to | 64 bit results, not 128 bit. | On Wed, Jun 2, 2021 at 11:15 PM Bruce Hoult <bruce@...> wrote: | Yes. The Standard Element Width (SEW) would be limited to 32 bits, but the | widening multiplies and accumulates produce the same number of wider | results using multiple registers (higher effective LMUL) | See section 5.2. Vector Operands | Each vector operand has an effective element width (EEW) and an effective | LMUL (EMUL) that is used to determine the size and location of all the | elements within a vector register group. By default, for most operands of | most instructions, EEW=SEW and EMUL=LMUL. | Some vector instructions have source and destination vector operands with | the same number of elements but different widths, so that EEW and EMUL | differ from SEW and LMUL respectively but EEW/EMUL = SEW/LMUL. For | example, most widening arithmetic instructions have a source group with | EEW=SEW and EMUL=LMUL but destination group with EEW=2*SEW and EMUL= | 2*LMUL. Narrowing instructions have a source operand that has EEW=2*SEW | and EMUL=2*LMUL but destination where EEW=SEW and EMUL=LMUL. | Vector operands or results may occupy one or more vector registers | depending on EMUL, but are always specified using the lowest-numbered | vector register in the group. Using other than the lowest-numbered vector | register to specify a vector register group is a reserved encoding. | On Wed, Jun 2, 2021 at 11:11 PM Tony Cole <tony.cole@...> wrote: | Having 32x 32 bit registers with LMUL=4, giving 8x 128 bits - does | this allow for 64-bit elements? | I don't think it does, but it’s not clear in the spec. | I use 64-bit elements for “wide” and “quad” accumulators. | From: tech-vector-ext@... [mailto: | tech-vector-ext@...] On Behalf Of Bruce Hoult | Sent: 02 June 2021 11:19 | To: Tariq Kurd <tariq.kurd@...> | Cc: tech-vector-ext@...; Shaofei (B) < | shaofei1@...> | Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of | the Vector extension | There is nothing to prevent implementing 32x 32 bit registers on a 32 | bit CPU. The application processor spec has quite | recently (a few months) specified a 128 bit minimum register size but | I don't think there's any good reason for this, | especially in embedded. | With that configuration, LMUL=4 gives 8x 128 bits, the same as MVE. | If floating point is desired then Zfinx is available, sharing int & fp | scalar registers instead of fp and vector registers. | Of course profiles (or just custom chips for custom applications) can | define subsets of instructions. | On Wed, Jun 2, 2021 at 10:05 PM Tariq Kurd via lists.riscv.org | <tariq.kurd=huawei.com@...> wrote: | Hi everyone, | Are there any plans for a cut-down configuration of the vector | extension suitable for embedded cores? It seems that the | 32x128-bit register file is suitable for application class cores | but it very large for embedded cores, especially if the F | registers also need to be implemented (which I think is the case, | unless a Zfinx version is specified). | ARM MVE only has 8x128-bit registers for FP and Vector, so it much | more suitable for embedded applications. | https://en.wikichip.org/wiki/arm/helium | What’s the approach here? Should embedded applications implement | the P-extension instead? | Tariq | Tariq Kurd | Processor Design I RISC-V Cores, Bristol | E-mail: Tariq.Kurd@... | Company: Huawei technologies R&D (UK) Ltd I Address: 290 Park | Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32 4TR, UK | 315px-Huawei http://www.huawei.com | cid:image002.jpg@... | This e-mail and its attachments contain confidential information | from HUAWEI, which is intended only for the person or entity whose | address is listed above. Any use of the information contained | herein in any way (including, but not limited to, total or partial | disclosure,reproduction, or dissemination) by persons other than | the intended recipient(s) is prohibited. If you receive this | e-mail in error, please notify the sender by phone or email | immediately and delete it ! | 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的 | 个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地 | 泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电 | 话或邮件通知发件人并删除本邮件! | | x[DELETED ATTACHMENT image001.png, PNG image] | x[DELETED ATTACHMENT image002.jpg, JPEG image] |
|