Re: Smaller embedded version of the Vector extension


Krste Asanovic
 

If there was no cost, then supporting VLEN=64 on general apps
processor profile would be a good thing to do. But not allowing
standard software to assume VLEN>=128 imposes a non-trivial impact on
bigger cores, and expectation is the vast majority of apps cores will
want VLEN>=128.

As Zalman points out, the main advantage is removing stripmining code
when it is known vectors will fit, and translating existing code is
one important such use case though not the only one. Removing
stripmining reduces static and dynamic code size and increases
performance. While LMUL>1 allows more cases to be handled without
stripmining, it also reduces available arch registers.

Anyone can of course still build a compatible apps processor with
VLEN=64, but this would fail to run some of the code written for
VLEN>=128 case. And almost anything goes in embedded space.

Krste

On Thu, 3 Jun 2021 13:35:03 -0700, Zalman Stern <zalman@google.com> said:
| "...if written correctly" is precisely the point. If VLEN is specified as >=128, code that targets 128-bits explicitly by
| setting VL to an appropriate constant for a large swath *is* correct. This allows one to do basically what NEON/SSE do today as
| a baseline for performance.

| Whether this is worthwhile or not may be debated, but insisting that everything should be completely vector length agnostic or
| it is broken is missing the point. Ideally there would be a lot more quantitative data on this, but I'm not going to tilt at
| that windmill right now. The worst case for the overhead of hardware vector length independence occurs at the smallest sizes as
| well.

| In general it's pretty dubious that the same set of fully lowered instruction bits can efficiently cover everything from the
| bottom of the embedded space to HPC. Ideally we'd be moving to more sophisticated lowering -- e.g. dynamic and multi-stage
| compilation -- rather than forcing the issue in the ISA design.

| Another way to go would be to split 32-bit and 64-bit implementations such that the VLEN >= 64 for 32-bit implementations and
| VLEN >= 128 for 64-bit ones. (Application code is rarely going to target 32-bit these days. Minimal embedded implementations
| are probably 32-bit.) Though truth be told, code likely needs a scalar fallback anyway unless the V extension is required.
| (Which it almost certainly won't be if we're talking embedded space.) As such, VLEN not being large enough for the expectations
| code was compiled to is the same as not having the vector unit.

| -Z-

| On Thu, Jun 3, 2021 at 9:33 AM Tony Cole via lists.riscv.org <tony.cole=huawei.com@lists.riscv.org> wrote:

| Software should still work with VLEN>=64 if written correctly, as it should be VLEN agnostic.
| Maybe it should be a recommendation that VLEN>=128, with a minimum of 64 for app processors?

| Lower performance is an implementation cost/benefit decision.

| Tony

| -----Original Message-----
| From: tech-vector-ext@lists.riscv.org [mailto:tech-vector-ext@lists.riscv.org] On Behalf Of Krste Asanovic
| Sent: 03 June 2021 17:24
| To: Guy Lemieux <guy.lemieux@gmail.com>
| Cc: Andrew Waterman <andrew@sifive.com>; Tariq Kurd <tariq.kurd@huawei.com>; Shaofei (B) <shaofei1@hisilicon.com>;
| tech-vector-ext@lists.riscv.org
| Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

|| On Jun 3, 2021, at 9:16 AM, Guy Lemieux <guy.lemieux@gmail.com> wrote:
||
|| What is the advantage to RVV requiring VLEN >= 128?
||
|| I think this should be changed to VLEN >= 64 because:
||
|| 1) VLEN = 64 is more likely for small implementations; creating a
|| mandatory expectation to improve software portability

| This is the requirement for app processors, which are not generally small cores.
| Most competing SIMD extensions are at least 128b per vector register.

||
|| 2) two implementations, each with VLEN >= 64, do not expose anything
|| new to software that is not already exposed by VLEN >= 128
||
|| 3) allowing VLEN =32 would expose something new to software (register
|| file data layout when SEW=64)
||
|| 4) are there any disadvantages to VLEN >= 64 (versus the current VLEN
||| = 128)? (I can't see any)

| Lower performance on codes that work well on other app architectures.

| Krste

||
|| Guy
||
||
|| On Wed, Jun 2, 2021 at 11:13 AM <krste@berkeley.edu> wrote:
|||
|||
||| The VLEN>=128 constraint is only for the application processor "V"
||| extension for the app profile - not for embedded vectors which can
||| have VLEN=32.
|||
||| From spec Introduction:
||| '
||| The term base vector extension is used informally to describe the standard set of vector ISA components that will be
| required for the single-letter "V" extension, which is intended for use in standard server and application-processor
| platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I,
| RV64I) as described below.
|||
||| Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of
| mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when
| each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string
| names beginning with the "Zv" prefix.
||| '
|||
||| There are a set Zve* names for the embedded subsets (see github issue
||| #550).
|||
||| A minimal embedded implementaton using RV32E+Zfinx+vectors would be
||| same state size as ARM MVE.
|||
||| P extension does not have floating-point, but for short
||| integer/fixed-point SIMD makes sense as alternative.
|||
||| The software fragmentation issue is that some library routines that
||| expose VLEN might not be portable between app cores and embedded
||| cores, but these are different software ecosystems (e.g. ABI/calling
||| convention might be different) and only a few kinds of routine rely
||| on VLEN.
|||
||| For app cores that can afford VLEN>=128, the advantage is the removal
||| of stripmining code in cases that operate on fixed-size vectors.
|||
||| Krste
|||
|||
|||
|||||||| On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@gmail.com> said:
|||
||| | Allowing VLEN<128 would allow for smaller vector register files,
||| | bit it would also result in a profile that is not
||| | forward-compatible with the V spec. This would produce another fracture the software ecosystem.
|||
||| | To avoid such a fracture, there are two choices:
||| | (1) go with P instead
||| | (2) relax the V spec to allow smaller implementations
|||
||| | So the key question for this group is whether to relax the minimum
||| | VLEN to 32 or 64?
|||
||| | note: a possible justification for keeping 128 might be to
||| | recommend (1) instead. I don’t know anything about P, but it seems
||| | like it could be speced in a way that is competitive/comparable with Helium.
|||
||| | Guy
|||
||| | PS — I have started to design an “RVV-lite” profile which would be
||| | more amenable to embedded implementations. However, I have adopted
||| | a stance that it must remain forward compatible with the full V
||| | spec, so I have not considered VLEN below 128. I am happy to share
||| | my work on this and involve other contributors — email me if you would like to see a copy.
|||
||| | On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@sifive.com> wrote:
|||
||| |     The uppercase-V V extension is meant to cater to apps processors, where
||| |     the VLEN >= 128 constraint is not inappropriate and is sometimes
||| |     beneficial.  But there's nothing fundamental about the ISA design that
||| |     prohibits VLEN < 128.  A minimal configuration is VLEN=ELEN=32, giving the
||| |     same total amount of state as MVE.  (And if you set LMUL=4, then you even
||| |     get the same shape: 8 registers of 128 bits apiece.)
|||
||| |     Such a thing wouldn't be called V, but perhaps something like Zvmin.
||| |     Other than agreeing on a feature set and assigning it a name, the
||| |     architecting is already done.
|||
||| |     (If you search the spec for Zfinx, you'll see that a Zfinx variant is
||| |     planned, but only barely sketched out.)
|||
||| |     On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
||| |     huawei.com@lists.riscv.org> wrote:
|||
||| |         Hi everyone,
|||
||| |
|||
||| |         Are there any plans for a cut-down configuration of the vector
||| |         extension suitable for embedded cores? It seems that the 32x128-bit
||| |         register file is suitable for application class cores but it very
||| |         large for embedded cores, especially if
|||
||| |         the F registers also need to be implemented (which I think is the
||| |         case, unless a Zfinx version is specified).
|||
||| |
|||
||| |         ARM MVE only has 8x128-bit registers for FP and Vector, so it much
||| |         more suitable for embedded applications.
|||
||| |         https://en.wikichip.org/wiki/arm/helium
|||
||| |
|||
||| |         What’s the approach here? Should embedded applications implement the
||| |         P-extension instead?
|||
||| |
|||
||| |         Tariq
|||
||| |
|||
||| |         Tariq Kurd
|||
||| |         Processor Design
|||
||| |         I RISC-V Cores, Bristol
|||
||| |         E-mail:
|||
||| |         Tariq.Kurd@Huawei.com
|||
||| |         Company:
|||
||| |         Huawei technologies R&D (UK) Ltd
|||
||| |         I Address: 290
|||
||| |         Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
||| |         4TR, UK
|||
||| |
|||
||| |         315px-Huawei
|||
||| |         http://www.huawei.com
|||
||| |         cid:image002.jpg@01D4BC65.4BB52AF0
|||
||| |         This e-mail and its attachments contain confidential information from
||| |         HUAWEI, which
|||
||| |         is intended only for the person or entity whose address is listed
||| |         above. Any use of the information contained herein in any way
||| |         (including, but not limited to, total or partial
||| |         disclosure,reproduction, or dissemination) by persons other than the
||| |         intended recipient(s)
|||
||| |         is prohibited. If you receive this e-mail in error, please notify the
||| |         sender by phone or email immediately and delete it !
|||
||| |         本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
||| |         或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
||| |         制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
||| |         发件人并删除本邮件!
|||
||| |
|||
||| |
||| | x[DELETED ATTACHMENT image001.png, PNG image] x[DELETED ATTACHMENT
||| | image002.jpg, JPEG image]

|

Join tech-vector-ext@lists.riscv.org to automatically receive all group messages.