Date   

Re: Smaller embedded version of the Vector extension

Zalman Stern
 

If the minimum VLEN is at least 128-bits, one can translate NEON/SSE intrinsics directly without having to have every vector instruction dominated by a loop over the vector length.

-Z-


On Thu, Jun 3, 2021 at 9:38 AM Guy Lemieux <guy.lemieux@...> wrote:
Krste, to be clear,The issue



On Thu, Jun 3, 2021 at 9:24 AM Krste Asanovic <krste@...> wrote:
> > On Jun 3, 2021, at 9:16 AM, Guy Lemieux <guy.lemieux@...> wrote:
> >
> > What is the advantage to RVV requiring VLEN >= 128?
> >
> > I think this should be changed to VLEN >= 64 because:
> >
> > 1) VLEN = 64 is more likely for small implementations; creating a
> > mandatory expectation to improve software portability
>
> This is the requirement for app processors, which are not generally small cores.
> Most competing SIMD extensions are at least 128b per vector register.


The RVV spec should be inclusive, rather than exclusive. Setting VLEN
>= 128 is a higher threshold that makes it less inclusive.


> > 4) are there any disadvantages to VLEN >= 64 (versus the current VLEN
> >> = 128)? (I can't see any)
>
> Lower performance on codes that work well on other app architectures.

Sorry I wasn't clear. Of course, an implementation with VLEN=64 would
likely be slower than one with VLEN=128.

To clarify: are there any disadvantages to allowing VLEN=64 in the
spec as a minimum threshold?

Software should be agnostic of VLEN, but the truth is programmers will
squeeze out every last bit where they can and they will latch on to
this minimum value when doing things like re-using LSBs of pointers,
setting minimum chunk sizes, etc. Hence, asking them to expect VLEN=64
as a minimum would be better (more inclusive).

I can't see how this would hurt performance.

Guy






Re: Smaller embedded version of the Vector extension

Guy Lemieux
 

Krste, to be clear,The issue



On Thu, Jun 3, 2021 at 9:24 AM Krste Asanovic <krste@berkeley.edu> wrote:
On Jun 3, 2021, at 9:16 AM, Guy Lemieux <guy.lemieux@gmail.com> wrote:

What is the advantage to RVV requiring VLEN >= 128?

I think this should be changed to VLEN >= 64 because:

1) VLEN = 64 is more likely for small implementations; creating a
mandatory expectation to improve software portability
This is the requirement for app processors, which are not generally small cores.
Most competing SIMD extensions are at least 128b per vector register.

The RVV spec should be inclusive, rather than exclusive. Setting VLEN
= 128 is a higher threshold that makes it less inclusive.

4) are there any disadvantages to VLEN >= 64 (versus the current VLEN
= 128)? (I can't see any)
Lower performance on codes that work well on other app architectures.
Sorry I wasn't clear. Of course, an implementation with VLEN=64 would
likely be slower than one with VLEN=128.

To clarify: are there any disadvantages to allowing VLEN=64 in the
spec as a minimum threshold?

Software should be agnostic of VLEN, but the truth is programmers will
squeeze out every last bit where they can and they will latch on to
this minimum value when doing things like re-using LSBs of pointers,
setting minimum chunk sizes, etc. Hence, asking them to expect VLEN=64
as a minimum would be better (more inclusive).

I can't see how this would hurt performance.

Guy


Re: Smaller embedded version of the Vector extension

Tony Cole
 

Software should still work with VLEN>=64 if written correctly, as it should be VLEN agnostic.
Maybe it should be a recommendation that VLEN>=128, with a minimum of 64 for app processors?

Lower performance is an implementation cost/benefit decision.

Tony

-----Original Message-----
From: tech-vector-ext@lists.riscv.org [mailto:tech-vector-ext@lists.riscv.org] On Behalf Of Krste Asanovic
Sent: 03 June 2021 17:24
To: Guy Lemieux <guy.lemieux@gmail.com>
Cc: Andrew Waterman <andrew@sifive.com>; Tariq Kurd <tariq.kurd@huawei.com>; Shaofei (B) <shaofei1@hisilicon.com>; tech-vector-ext@lists.riscv.org
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension



On Jun 3, 2021, at 9:16 AM, Guy Lemieux <guy.lemieux@gmail.com> wrote:

What is the advantage to RVV requiring VLEN >= 128?

I think this should be changed to VLEN >= 64 because:

1) VLEN = 64 is more likely for small implementations; creating a
mandatory expectation to improve software portability
This is the requirement for app processors, which are not generally small cores.
Most competing SIMD extensions are at least 128b per vector register.


2) two implementations, each with VLEN >= 64, do not expose anything
new to software that is not already exposed by VLEN >= 128

3) allowing VLEN =32 would expose something new to software (register
file data layout when SEW=64)

4) are there any disadvantages to VLEN >= 64 (versus the current VLEN
= 128)? (I can't see any)
Lower performance on codes that work well on other app architectures.

Krste


Guy


On Wed, Jun 2, 2021 at 11:13 AM <krste@berkeley.edu> wrote:


The VLEN>=128 constraint is only for the application processor "V"
extension for the app profile - not for embedded vectors which can
have VLEN=32.

From spec Introduction:
'
The term base vector extension is used informally to describe the standard set of vector ISA components that will be required for the single-letter "V" extension, which is intended for use in standard server and application-processor platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I, RV64I) as described below.

Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string names beginning with the "Zv" prefix.
'

There are a set Zve* names for the embedded subsets (see github issue
#550).

A minimal embedded implementaton using RV32E+Zfinx+vectors would be
same state size as ARM MVE.

P extension does not have floating-point, but for short
integer/fixed-point SIMD makes sense as alternative.

The software fragmentation issue is that some library routines that
expose VLEN might not be portable between app cores and embedded
cores, but these are different software ecosystems (e.g. ABI/calling
convention might be different) and only a few kinds of routine rely
on VLEN.

For app cores that can afford VLEN>=128, the advantage is the removal
of stripmining code in cases that operate on fixed-size vectors.

Krste



On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@gmail.com> said:
| Allowing VLEN<128 would allow for smaller vector register files,
| bit it would also result in a profile that is not
| forward-compatible with the V spec. This would produce another fracture the software ecosystem.

| To avoid such a fracture, there are two choices:
| (1) go with P instead
| (2) relax the V spec to allow smaller implementations

| So the key question for this group is whether to relax the minimum
| VLEN to 32 or 64?

| note: a possible justification for keeping 128 might be to
| recommend (1) instead. I don’t know anything about P, but it seems
| like it could be speced in a way that is competitive/comparable with Helium.

| Guy

| PS — I have started to design an “RVV-lite” profile which would be
| more amenable to embedded implementations. However, I have adopted
| a stance that it must remain forward compatible with the full V
| spec, so I have not considered VLEN below 128. I am happy to share
| my work on this and involve other contributors — email me if you would like to see a copy.

| On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@sifive.com> wrote:

| The uppercase-V V extension is meant to cater to apps processors, where
| the VLEN >= 128 constraint is not inappropriate and is sometimes
| beneficial. But there's nothing fundamental about the ISA design that
| prohibits VLEN < 128. A minimal configuration is VLEN=ELEN=32, giving the
| same total amount of state as MVE. (And if you set LMUL=4, then you even
| get the same shape: 8 registers of 128 bits apiece.)

| Such a thing wouldn't be called V, but perhaps something like Zvmin.
| Other than agreeing on a feature set and assigning it a name, the
| architecting is already done.

| (If you search the spec for Zfinx, you'll see that a Zfinx variant is
| planned, but only barely sketched out.)

| On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
| huawei.com@lists.riscv.org> wrote:

| Hi everyone,

|

| Are there any plans for a cut-down configuration of the vector
| extension suitable for embedded cores? It seems that the 32x128-bit
| register file is suitable for application class cores but it very
| large for embedded cores, especially if

| the F registers also need to be implemented (which I think is the
| case, unless a Zfinx version is specified).

|

| ARM MVE only has 8x128-bit registers for FP and Vector, so it much
| more suitable for embedded applications.

| https://en.wikichip.org/wiki/arm/helium

|

| What’s the approach here? Should embedded applications implement the
| P-extension instead?

|

| Tariq

|

| Tariq Kurd

| Processor Design

| I RISC-V Cores, Bristol

| E-mail:

| Tariq.Kurd@Huawei.com

| Company:

| Huawei technologies R&D (UK) Ltd

| I Address: 290

| Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
| 4TR, UK

|

| 315px-Huawei

| http://www.huawei.com

| cid:image002.jpg@01D4BC65.4BB52AF0

| This e-mail and its attachments contain confidential information from
| HUAWEI, which

| is intended only for the person or entity whose address is listed
| above. Any use of the information contained herein in any way
| (including, but not limited to, total or partial
| disclosure,reproduction, or dissemination) by persons other than the
| intended recipient(s)

| is prohibited. If you receive this e-mail in error, please notify the
| sender by phone or email immediately and delete it !

| 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
| 或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
| 制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
| 发件人并删除本邮件!

|

|
| x[DELETED ATTACHMENT image001.png, PNG image] x[DELETED ATTACHMENT
| image002.jpg, JPEG image]


Re: Smaller embedded version of the Vector extension

Krste Asanovic
 

On Jun 3, 2021, at 9:16 AM, Guy Lemieux <guy.lemieux@gmail.com> wrote:

What is the advantage to RVV requiring VLEN >= 128?

I think this should be changed to VLEN >= 64 because:

1) VLEN = 64 is more likely for small implementations; creating a
mandatory expectation to improve software portability
This is the requirement for app processors, which are not generally small cores.
Most competing SIMD extensions are at least 128b per vector register.


2) two implementations, each with VLEN >= 64, do not expose anything
new to software that is not already exposed by VLEN >= 128

3) allowing VLEN =32 would expose something new to software (register
file data layout when SEW=64)

4) are there any disadvantages to VLEN >= 64 (versus the current VLEN
= 128)? (I can't see any)
Lower performance on codes that work well on other app architectures.

Krste


Guy


On Wed, Jun 2, 2021 at 11:13 AM <krste@berkeley.edu> wrote:


The VLEN>=128 constraint is only for the application processor "V"
extension for the app profile - not for embedded vectors which can
have VLEN=32.

From spec Introduction:
'
The term base vector extension is used informally to describe the standard set of vector ISA components that will be required for the single-letter "V" extension, which is intended for use in standard server and application-processor platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I, RV64I) as described below.

Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string names beginning with the "Zv" prefix.
'

There are a set Zve* names for the embedded subsets (see github issue
#550).

A minimal embedded implementaton using RV32E+Zfinx+vectors would be
same state size as ARM MVE.

P extension does not have floating-point, but for short
integer/fixed-point SIMD makes sense as alternative.

The software fragmentation issue is that some library routines that
expose VLEN might not be portable between app cores and embedded
cores, but these are different software ecosystems (e.g. ABI/calling
convention might be different) and only a few kinds of routine rely on
VLEN.

For app cores that can afford VLEN>=128, the advantage is the removal
of stripmining code in cases that operate on fixed-size vectors.

Krste



On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@gmail.com> said:
| Allowing VLEN<128 would allow for smaller vector register files, bit it would
| also result in a profile that is not forward-compatible with the V spec. This
| would produce another fracture the software ecosystem.

| To avoid such a fracture, there are two choices:
| (1) go with P instead
| (2) relax the V spec to allow smaller implementations

| So the key question for this group is whether to relax the minimum VLEN to 32
| or 64?

| note: a possible justification for keeping 128 might be to recommend (1)
| instead. I don’t know anything about P, but it seems like it could be speced
| in a way that is competitive/comparable with Helium.

| Guy

| PS — I have started to design an “RVV-lite” profile which would be more
| amenable to embedded implementations. However, I have adopted a stance that it
| must remain forward compatible with the full V spec, so I have not considered
| VLEN below 128. I am happy to share my work on this and involve other
| contributors — email me if you would like to see a copy.

| On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@sifive.com> wrote:

| The uppercase-V V extension is meant to cater to apps processors, where
| the VLEN >= 128 constraint is not inappropriate and is sometimes
| beneficial. But there's nothing fundamental about the ISA design that
| prohibits VLEN < 128. A minimal configuration is VLEN=ELEN=32, giving the
| same total amount of state as MVE. (And if you set LMUL=4, then you even
| get the same shape: 8 registers of 128 bits apiece.)

| Such a thing wouldn't be called V, but perhaps something like Zvmin.
| Other than agreeing on a feature set and assigning it a name, the
| architecting is already done.

| (If you search the spec for Zfinx, you'll see that a Zfinx variant is
| planned, but only barely sketched out.)

| On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
| huawei.com@lists.riscv.org> wrote:

| Hi everyone,

|

| Are there any plans for a cut-down configuration of the vector
| extension suitable for embedded cores? It seems that the 32x128-bit
| register file is suitable for application class cores but it very
| large for embedded cores, especially if

| the F registers also need to be implemented (which I think is the
| case, unless a Zfinx version is specified).

|

| ARM MVE only has 8x128-bit registers for FP and Vector, so it much
| more suitable for embedded applications.

| https://en.wikichip.org/wiki/arm/helium

|

| What’s the approach here? Should embedded applications implement the
| P-extension instead?

|

| Tariq

|

| Tariq Kurd

| Processor Design

| I RISC-V Cores, Bristol

| E-mail:

| Tariq.Kurd@Huawei.com

| Company:

| Huawei technologies R&D (UK) Ltd

| I Address: 290

| Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
| 4TR, UK

|

| 315px-Huawei

| http://www.huawei.com

| cid:image002.jpg@01D4BC65.4BB52AF0

| This e-mail and its attachments contain confidential information from
| HUAWEI, which

| is intended only for the person or entity whose address is listed
| above. Any use of the information contained herein in any way
| (including, but not limited to, total or partial
| disclosure,reproduction, or dissemination) by persons other than the
| intended recipient(s)

| is prohibited. If you receive this e-mail in error, please notify the
| sender by phone or email immediately and delete it !

| 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
| 或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
| 制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
| 发件人并删除本邮件!

|

|
| x[DELETED ATTACHMENT image001.png, PNG image]
| x[DELETED ATTACHMENT image002.jpg, JPEG image]


Re: Smaller embedded version of the Vector extension

Guy Lemieux
 

What is the advantage to RVV requiring VLEN >= 128?

I think this should be changed to VLEN >= 64 because:

1) VLEN = 64 is more likely for small implementations; creating a
mandatory expectation to improve software portability

2) two implementations, each with VLEN >= 64, do not expose anything
new to software that is not already exposed by VLEN >= 128

3) allowing VLEN =32 would expose something new to software (register
file data layout when SEW=64)

4) are there any disadvantages to VLEN >= 64 (versus the current VLEN
= 128)? (I can't see any)
Guy


On Wed, Jun 2, 2021 at 11:13 AM <krste@berkeley.edu> wrote:


The VLEN>=128 constraint is only for the application processor "V"
extension for the app profile - not for embedded vectors which can
have VLEN=32.

From spec Introduction:
'
The term base vector extension is used informally to describe the standard set of vector ISA components that will be required for the single-letter "V" extension, which is intended for use in standard server and application-processor platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I, RV64I) as described below.

Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string names beginning with the "Zv" prefix.
'

There are a set Zve* names for the embedded subsets (see github issue
#550).

A minimal embedded implementaton using RV32E+Zfinx+vectors would be
same state size as ARM MVE.

P extension does not have floating-point, but for short
integer/fixed-point SIMD makes sense as alternative.

The software fragmentation issue is that some library routines that
expose VLEN might not be portable between app cores and embedded
cores, but these are different software ecosystems (e.g. ABI/calling
convention might be different) and only a few kinds of routine rely on
VLEN.

For app cores that can afford VLEN>=128, the advantage is the removal
of stripmining code in cases that operate on fixed-size vectors.

Krste



On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@gmail.com> said:
| Allowing VLEN<128 would allow for smaller vector register files, bit it would
| also result in a profile that is not forward-compatible with the V spec. This
| would produce another fracture the software ecosystem.

| To avoid such a fracture, there are two choices:
| (1) go with P instead
| (2) relax the V spec to allow smaller implementations

| So the key question for this group is whether to relax the minimum VLEN to 32
| or 64?

| note: a possible justification for keeping 128 might be to recommend (1)
| instead. I don’t know anything about P, but it seems like it could be speced
| in a way that is competitive/comparable with Helium.

| Guy

| PS — I have started to design an “RVV-lite” profile which would be more
| amenable to embedded implementations. However, I have adopted a stance that it
| must remain forward compatible with the full V spec, so I have not considered
| VLEN below 128. I am happy to share my work on this and involve other
| contributors — email me if you would like to see a copy.

| On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@sifive.com> wrote:

| The uppercase-V V extension is meant to cater to apps processors, where
| the VLEN >= 128 constraint is not inappropriate and is sometimes
| beneficial. But there's nothing fundamental about the ISA design that
| prohibits VLEN < 128. A minimal configuration is VLEN=ELEN=32, giving the
| same total amount of state as MVE. (And if you set LMUL=4, then you even
| get the same shape: 8 registers of 128 bits apiece.)

| Such a thing wouldn't be called V, but perhaps something like Zvmin.
| Other than agreeing on a feature set and assigning it a name, the
| architecting is already done.

| (If you search the spec for Zfinx, you'll see that a Zfinx variant is
| planned, but only barely sketched out.)

| On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
| huawei.com@lists.riscv.org> wrote:

| Hi everyone,

|

| Are there any plans for a cut-down configuration of the vector
| extension suitable for embedded cores? It seems that the 32x128-bit
| register file is suitable for application class cores but it very
| large for embedded cores, especially if

| the F registers also need to be implemented (which I think is the
| case, unless a Zfinx version is specified).

|

| ARM MVE only has 8x128-bit registers for FP and Vector, so it much
| more suitable for embedded applications.

| https://en.wikichip.org/wiki/arm/helium

|

| What’s the approach here? Should embedded applications implement the
| P-extension instead?

|

| Tariq

|

| Tariq Kurd

| Processor Design

| I RISC-V Cores, Bristol

| E-mail:

| Tariq.Kurd@Huawei.com

| Company:

| Huawei technologies R&D (UK) Ltd

| I Address: 290

| Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
| 4TR, UK

|

| 315px-Huawei

| http://www.huawei.com

| cid:image002.jpg@01D4BC65.4BB52AF0

| This e-mail and its attachments contain confidential information from
| HUAWEI, which

| is intended only for the person or entity whose address is listed
| above. Any use of the information contained herein in any way
| (including, but not limited to, total or partial
| disclosure,reproduction, or dissemination) by persons other than the
| intended recipient(s)

| is prohibited. If you receive this e-mail in error, please notify the
| sender by phone or email immediately and delete it !

| 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
| 或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
| 制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
| 发件人并删除本邮件!

|

|
| x[DELETED ATTACHMENT image001.png, PNG image]
| x[DELETED ATTACHMENT image002.jpg, JPEG image]


Re: Smaller embedded version of the Vector extension

Krste Asanovic
 

see github issue #550
Krste

On Jun 3, 2021, at 2:02 AM, Shaofei (B) <shaofei1@...> wrote:

Hi, Krste:

 The RISC-V V TG have the plan to support a lowcost vector extension in RVMxx profile?  

 Best Regards
 Shaofei
 2021.6.3

-----邮件原件-----
发件人: krste@... [mailto:krste@...]
发送时间: 2021年6月3日 2:13
收件人: Guy Lemieux <guy.lemieux@...>
抄送: Andrew Waterman <andrew@...>; Tariq Kurd <tariq.kurd@...>; Shaofei (B) <shaofei1@...>; tech-vector-ext@...
主题: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension


The VLEN>=128 constraint is only for the application processor "V"
extension for the app profile - not for embedded vectors which can have VLEN=32.

From spec Introduction:
'
The term base vector extension is used informally to describe the standard set of vector ISA components that will be required for the single-letter "V" extension, which is intended for use in standard server and application-processor platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I, RV64I) as described below.

Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string names beginning with the "Zv" prefix.
'

There are a set Zve* names for the embedded subsets (see github issue #550).

A minimal embedded implementaton using RV32E+Zfinx+vectors would be same state size as ARM MVE.

P extension does not have floating-point, but for short integer/fixed-point SIMD makes sense as alternative.

The software fragmentation issue is that some library routines that expose VLEN might not be portable between app cores and embedded cores, but these are different software ecosystems (e.g. ABI/calling convention might be different) and only a few kinds of routine rely on VLEN.

For app cores that can afford VLEN>=128, the advantage is the removal of stripmining code in cases that operate on fixed-size vectors.

Krste



On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@...> said:

| Allowing VLEN<128 would allow for smaller vector register files, bit
| it would also result in a profile that is not forward-compatible with
| the V spec. This would produce another fracture the software ecosystem.

| To avoid such a fracture, there are two choices:
| (1) go with P instead
| (2) relax the V spec to allow smaller implementations

| So the key question for this group is whether to relax the minimum
| VLEN to 32 or 64?

| note: a possible justification for keeping 128 might be to recommend
| (1) instead. I don’t know anything about P, but it seems like it could
| be speced in a way that is competitive/comparable with Helium.

| Guy

| PS — I have started to design an “RVV-lite” profile which would be
| more amenable to embedded implementations. However, I have adopted a
| stance that it must remain forward compatible with the full V spec, so
| I have not considered VLEN below 128. I am happy to share my work on
| this and involve other contributors — email me if you would like to see a copy.

| On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@...> wrote:

|     The uppercase-V V extension is meant to cater to apps processors, where
|     the VLEN >= 128 constraint is not inappropriate and is sometimes
|     beneficial.  But there's nothing fundamental about the ISA design that
|     prohibits VLEN < 128.  A minimal configuration is VLEN=ELEN=32, giving the
|     same total amount of state as MVE.  (And if you set LMUL=4, then you even
|     get the same shape: 8 registers of 128 bits apiece.)

|     Such a thing wouldn't be called V, but perhaps something like Zvmin. 
|     Other than agreeing on a feature set and assigning it a name, the
|     architecting is already done.

|     (If you search the spec for Zfinx, you'll see that a Zfinx variant is
|     planned, but only barely sketched out.)

|     On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
|     huawei.com@...> wrote:

|         Hi everyone,

|          

|         Are there any plans for a cut-down configuration of the vector
|         extension suitable for embedded cores? It seems that the 32x128-bit
|         register file is suitable for application class cores but it very
|         large for embedded cores, especially if

|         the F registers also need to be implemented (which I think is the
|         case, unless a Zfinx version is specified).

|          

|         ARM MVE only has 8x128-bit registers for FP and Vector, so it much
|         more suitable for embedded applications.

|         https://en.wikichip.org/wiki/arm/helium

|          

|         What’s the approach here? Should embedded applications implement the
|         P-extension instead?

|          

|         Tariq

|          

|         Tariq Kurd

|         Processor Design

|         I RISC-V Cores, Bristol

|         E-mail:

|         Tariq.Kurd@...

|         Company:

|         Huawei technologies R&D (UK) Ltd

|         I Address: 290

|         Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
|         4TR, UK

|          

|         315px-Huawei

|         http://www.huawei.com

|         

|         This e-mail and its attachments contain confidential information from
|         HUAWEI, which

|         is intended only for the person or entity whose address is listed
|         above. Any use of the information contained herein in any way
|         (including, but not limited to, total or partial
|         disclosure,reproduction, or dissemination) by persons other than the
|         intended recipient(s)

|         is prohibited. If you receive this e-mail in error, please notify the
|         sender by phone or email immediately and delete it !

|         本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
|         或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
|         制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
|         发件人并删除本邮件!

|          

|  x[DELETED ATTACHMENT image001.png, PNG
| image] x[DELETED ATTACHMENT image002.jpg, JPEG image]







Re: Smaller embedded version of the Vector extension

Tariq Kurd
 

This is a good question.
So if the RVM22 profile requires VLEN=32, ELEN=64, LMUL=8 then the vector registers will have the same amount of state as ARM MVE.

Tariq

-----Original Message-----
From: Shaofei (B)
Sent: 03 June 2021 10:03
To: krste@berkeley.edu; Guy Lemieux <guy.lemieux@gmail.com>; Shaofei (B) <shaofei1@hisilicon.com>
Cc: Andrew Waterman <andrew@sifive.com>; Tariq Kurd <tariq.kurd@huawei.com>; tech-vector-ext@lists.riscv.org
Subject: 答复: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

Hi, Krste:

The RISC-V V TG have the plan to support a lowcost vector extension in RVMxx profile?

Best Regards
Shaofei
2021.6.3

-----邮件原件-----
发件人: krste@berkeley.edu [mailto:krste@berkeley.edu]
发送时间: 2021年6月3日 2:13
收件人: Guy Lemieux <guy.lemieux@gmail.com>
抄送: Andrew Waterman <andrew@sifive.com>; Tariq Kurd <tariq.kurd@huawei.com>; Shaofei (B) <shaofei1@hisilicon.com>; tech-vector-ext@lists.riscv.org
主题: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension


The VLEN>=128 constraint is only for the application processor "V"
extension for the app profile - not for embedded vectors which can have VLEN=32.

From spec Introduction:
'
The term base vector extension is used informally to describe the standard set of vector ISA components that will be required for the single-letter "V" extension, which is intended for use in standard server and application-processor platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I, RV64I) as described below.

Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string names beginning with the "Zv" prefix.
'

There are a set Zve* names for the embedded subsets (see github issue #550).

A minimal embedded implementaton using RV32E+Zfinx+vectors would be same state size as ARM MVE.

P extension does not have floating-point, but for short integer/fixed-point SIMD makes sense as alternative.

The software fragmentation issue is that some library routines that expose VLEN might not be portable between app cores and embedded cores, but these are different software ecosystems (e.g. ABI/calling convention might be different) and only a few kinds of routine rely on VLEN.

For app cores that can afford VLEN>=128, the advantage is the removal of stripmining code in cases that operate on fixed-size vectors.

Krste



On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@gmail.com> said:
| Allowing VLEN<128 would allow for smaller vector register files, bit
| it would also result in a profile that is not forward-compatible with
| the V spec. This would produce another fracture the software ecosystem.

| To avoid such a fracture, there are two choices:
| (1) go with P instead
| (2) relax the V spec to allow smaller implementations

| So the key question for this group is whether to relax the minimum
| VLEN to 32 or 64?

| note: a possible justification for keeping 128 might be to recommend
| (1) instead. I don’t know anything about P, but it seems like it could
| be speced in a way that is competitive/comparable with Helium.

| Guy

| PS — I have started to design an “RVV-lite” profile which would be
| more amenable to embedded implementations. However, I have adopted a
| stance that it must remain forward compatible with the full V spec, so
| I have not considered VLEN below 128. I am happy to share my work on
| this and involve other contributors — email me if you would like to see a copy.

| On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@sifive.com> wrote:

| The uppercase-V V extension is meant to cater to apps processors, where
| the VLEN >= 128 constraint is not inappropriate and is sometimes
| beneficial.  But there's nothing fundamental about the ISA design that
| prohibits VLEN < 128.  A minimal configuration is VLEN=ELEN=32, giving the
| same total amount of state as MVE.  (And if you set LMUL=4, then you even
| get the same shape: 8 registers of 128 bits apiece.)

| Such a thing wouldn't be called V, but perhaps something like Zvmin. 
| Other than agreeing on a feature set and assigning it a name, the
| architecting is already done.

| (If you search the spec for Zfinx, you'll see that a Zfinx variant is
| planned, but only barely sketched out.)

| On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
| huawei.com@lists.riscv.org> wrote:

| Hi everyone,

|  

| Are there any plans for a cut-down configuration of the vector
| extension suitable for embedded cores? It seems that the 32x128-bit
| register file is suitable for application class cores but it very
| large for embedded cores, especially if

| the F registers also need to be implemented (which I think is the
| case, unless a Zfinx version is specified).

|  

| ARM MVE only has 8x128-bit registers for FP and Vector, so it much
| more suitable for embedded applications.

| https://en.wikichip.org/wiki/arm/helium

|  

| What’s the approach here? Should embedded applications implement the
| P-extension instead?

|  

| Tariq

|  

| Tariq Kurd

| Processor Design

| I RISC-V Cores, Bristol

| E-mail:

| Tariq.Kurd@Huawei.com

| Company:

| Huawei technologies R&D (UK) Ltd

| I Address: 290

| Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
| 4TR, UK

|  

| 315px-Huawei

| http://www.huawei.com

| cid:image002.jpg@01D4BC65.4BB52AF0

| This e-mail and its attachments contain confidential information from
| HUAWEI, which

| is intended only for the person or entity whose address is listed
| above. Any use of the information contained herein in any way
| (including, but not limited to, total or partial
| disclosure,reproduction, or dissemination) by persons other than the
| intended recipient(s)

| is prohibited. If you receive this e-mail in error, please notify the
| sender by phone or email immediately and delete it !

| 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
| 或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
| 制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
| 发件人并删除本邮件!

|  

| x[DELETED ATTACHMENT image001.png, PNG
| image] x[DELETED ATTACHMENT image002.jpg, JPEG image]


Re: 答复: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

Shaofei (B)
 

Hi, Krste:

The RISC-V V TG have the plan to support a lowcost vector extension in RVMxx profile?

Best Regards
Shaofei
2021.6.3

-----邮件原件-----
发件人: krste@berkeley.edu [mailto:krste@berkeley.edu]
发送时间: 2021年6月3日 2:13
收件人: Guy Lemieux <guy.lemieux@gmail.com>
抄送: Andrew Waterman <andrew@sifive.com>; Tariq Kurd <tariq.kurd@huawei.com>; Shaofei (B) <shaofei1@hisilicon.com>; tech-vector-ext@lists.riscv.org
主题: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension


The VLEN>=128 constraint is only for the application processor "V"
extension for the app profile - not for embedded vectors which can have VLEN=32.

From spec Introduction:
'
The term base vector extension is used informally to describe the standard set of vector ISA components that will be required for the single-letter "V" extension, which is intended for use in standard server and application-processor platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I, RV64I) as described below.

Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string names beginning with the "Zv" prefix.
'

There are a set Zve* names for the embedded subsets (see github issue #550).

A minimal embedded implementaton using RV32E+Zfinx+vectors would be same state size as ARM MVE.

P extension does not have floating-point, but for short integer/fixed-point SIMD makes sense as alternative.

The software fragmentation issue is that some library routines that expose VLEN might not be portable between app cores and embedded cores, but these are different software ecosystems (e.g. ABI/calling convention might be different) and only a few kinds of routine rely on VLEN.

For app cores that can afford VLEN>=128, the advantage is the removal of stripmining code in cases that operate on fixed-size vectors.

Krste



On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@gmail.com> said:
| Allowing VLEN<128 would allow for smaller vector register files, bit
| it would also result in a profile that is not forward-compatible with
| the V spec. This would produce another fracture the software ecosystem.

| To avoid such a fracture, there are two choices:
| (1) go with P instead
| (2) relax the V spec to allow smaller implementations

| So the key question for this group is whether to relax the minimum
| VLEN to 32 or 64?

| note: a possible justification for keeping 128 might be to recommend
| (1) instead. I don’t know anything about P, but it seems like it could
| be speced in a way that is competitive/comparable with Helium.

| Guy

| PS — I have started to design an “RVV-lite” profile which would be
| more amenable to embedded implementations. However, I have adopted a
| stance that it must remain forward compatible with the full V spec, so
| I have not considered VLEN below 128. I am happy to share my work on
| this and involve other contributors — email me if you would like to see a copy.

| On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@sifive.com> wrote:

| The uppercase-V V extension is meant to cater to apps processors, where
| the VLEN >= 128 constraint is not inappropriate and is sometimes
| beneficial.  But there's nothing fundamental about the ISA design that
| prohibits VLEN < 128.  A minimal configuration is VLEN=ELEN=32, giving the
| same total amount of state as MVE.  (And if you set LMUL=4, then you even
| get the same shape: 8 registers of 128 bits apiece.)

| Such a thing wouldn't be called V, but perhaps something like Zvmin. 
| Other than agreeing on a feature set and assigning it a name, the
| architecting is already done.

| (If you search the spec for Zfinx, you'll see that a Zfinx variant is
| planned, but only barely sketched out.)

| On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
| huawei.com@lists.riscv.org> wrote:

| Hi everyone,

|  

| Are there any plans for a cut-down configuration of the vector
| extension suitable for embedded cores? It seems that the 32x128-bit
| register file is suitable for application class cores but it very
| large for embedded cores, especially if

| the F registers also need to be implemented (which I think is the
| case, unless a Zfinx version is specified).

|  

| ARM MVE only has 8x128-bit registers for FP and Vector, so it much
| more suitable for embedded applications.

| https://en.wikichip.org/wiki/arm/helium

|  

| What’s the approach here? Should embedded applications implement the
| P-extension instead?

|  

| Tariq

|  

| Tariq Kurd

| Processor Design

| I RISC-V Cores, Bristol

| E-mail:

| Tariq.Kurd@Huawei.com

| Company:

| Huawei technologies R&D (UK) Ltd

| I Address: 290

| Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
| 4TR, UK

|  

| 315px-Huawei

| http://www.huawei.com

| cid:image002.jpg@01D4BC65.4BB52AF0

| This e-mail and its attachments contain confidential information from
| HUAWEI, which

| is intended only for the person or entity whose address is listed
| above. Any use of the information contained herein in any way
| (including, but not limited to, total or partial
| disclosure,reproduction, or dissemination) by persons other than the
| intended recipient(s)

| is prohibited. If you receive this e-mail in error, please notify the
| sender by phone or email immediately and delete it !

| 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
| 或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
| 制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
| 发件人并删除本邮件!

|  

| x[DELETED ATTACHMENT image001.png, PNG
| image] x[DELETED ATTACHMENT image002.jpg, JPEG image]


Re: Smaller embedded version of the Vector extension

Nick Knight
 

Hi Tony,

All of the vector permutation instructions can be simulated using the memory system. For example, vslide can be simulated by storing the vector register and loading it at an offset; vrgather can be simulated by an indexed store followed by a unit-stride load (or unit-stride store and indexed load); etc. Whether or not this is more efficient depends on details of the microarchitecture and particular workload.

Best,
Nick Knight


On Wed, Jun 2, 2021 at 1:35 PM Tony Cole via lists.riscv.org <tony.cole=huawei.com@...> wrote:

Hi Bruce,

 

Do you mean vrgather instead of vslide?

 

I use vrgather_vx_* and vslidedown to perform a vector element rotate (and other things), see:

 

        https://github.com/riscv/riscv-v-spec/issues/671#issuecomment-837035001

 

-        I use vrgather_vx_i64m8( vec, 0, vl ) to splat the scalar in element 0 of vec to all elements in the result, I just want it in the top element but there isn’t a better instruction for that.

 

I think you are referring to: vrgather_vv_*  ??

 

 

Tony

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Tony Cole via lists.riscv.org
Sent: 02 June 2021 18:13
To: Bruce Hoult <bruce@...>
Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

Hi Bruce,

 

“I an not a fan of the vslide instructions. It seems they expose the size of the vector registers in a very unfortunate way. In particular they break down if VLEN=1. Most code would be better off storing and loading with an offset.”

 

I don't see what you mean, please can you elaborate with examples of why/how it exposes the size of the vector register in a very unfortunate way and breaking down if VLEN=1 (do you mean LMUL=1??).

 

The vslide instruction speeds up my code a lot as it reduce reloading (mostly the same) data over and over again.

 

 

Tony

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Bruce Hoult
Sent: 02 June 2021 13:34
To: Tony Cole <tony.cole@...>
Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

I an not a fan of the vslide instructions. It seems they expose the size of the vector registers in a very unfortunate way. In particular they break down if VLEN=1. Most code would be better off storing and loading with an offset.

 

I think I saw somewhere they are largely intended for debuggers.

 

On Thu, Jun 3, 2021 at 12:15 AM Tony Cole <tony.cole@...> wrote:

So, (on a 32x 32-bit vector register machine) the widening and narrowing instructions can use 64-bit elements (for destination and source respectively), but not any of other instructions, correct?

 

Note: I use many instructions while processing 64-bit “wide” and “quad” elements, e.g. vrgather_vx_i64m8, vslide1down_vx_i64m4, vslidedown_vx_i64m8, vredsum_vs_i64m8, etc.

 

Therefore, this code would not work on a 32x 32-bit vector register machine.

 

 

Tony

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Bruce Hoult
Sent: 02 June 2021 12:18
To: Tony Cole <tony.cole@...>
Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

Note that the effective LMUL is limited to 8, the same as the actual LMUL, so if you've set e32m4 (32 bit elements with LMUL=4) then you can only widen to 64 bit results, not 128 bit. 

 

On Wed, Jun 2, 2021 at 11:15 PM Bruce Hoult <bruce@...> wrote:

Yes. The Standard Element Width (SEW) would be limited to 32 bits, but the widening multiplies and accumulates produce the same number of wider results using multiple registers (higher effective LMUL)

 

See section 5.2. Vector Operands

 

Each vector operand has an effective element width (EEW) and an effective LMUL (EMUL) that is used to determine the size and location of all the elements within a vector register group. By default, for most operands of most instructions, EEW=SEW and EMUL=LMUL.


Some vector instructions have source and destination vector operands with the same number of elements but different widths, so that EEW and EMUL differ from SEW and LMUL respectively but EEW/EMUL = SEW/LMUL. For example, most widening arithmetic instructions have a source group with EEW=SEW and EMUL=LMUL but destination group with EEW=2*SEW and EMUL=2*LMUL. Narrowing instructions have a source operand that has EEW=2*SEW and EMUL=2*LMUL but destination where EEW=SEW and EMUL=LMUL.

Vector operands or results may occupy one or more vector registers depending on EMUL, but are always specified using the lowest-numbered vector register in the group. Using other than the lowest-numbered vector register to specify a vector register group is a reserved encoding.

 

 

 

On Wed, Jun 2, 2021 at 11:11 PM Tony Cole <tony.cole@...> wrote:

Having 32x 32 bit registers with LMUL=4, giving 8x 128 bits - does this allow for 64-bit elements?

I don't think it does, but it’s not clear in the spec.

 

I use 64-bit elements for “wide” and “quad” accumulators.

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Bruce Hoult
Sent: 02 June 2021 11:19
To: Tariq Kurd <
tariq.kurd@...>
Cc:
tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

There is nothing to prevent implementing 32x 32 bit registers on a 32 bit CPU. The application processor spec has quite

recently (a few months) specified a 128 bit minimum register size but I don't think there's any good reason for this,

especially in embedded.

 

With that configuration, LMUL=4 gives 8x 128 bits, the same as MVE.

 

If floating point is desired then Zfinx is available, sharing int & fp scalar registers instead of fp and vector registers.

 

Of course profiles (or just custom chips for custom applications) can define subsets of instructions.

 

On Wed, Jun 2, 2021 at 10:05 PM Tariq Kurd via lists.riscv.org <tariq.kurd=huawei.com@...> wrote:

Hi everyone,

 

Are there any plans for a cut-down configuration of the vector extension suitable for embedded cores? It seems that the 32x128-bit register file is suitable for application class cores but it very large for embedded cores, especially if the F registers also need to be implemented (which I think is the case, unless a Zfinx version is specified).

 

ARM MVE only has 8x128-bit registers for FP and Vector, so it much more suitable for embedded applications.

https://en.wikichip.org/wiki/arm/helium

 

What’s the approach here? Should embedded applications implement the P-extension instead?

 

Tariq

 

Tariq Kurd

Processor Design I RISC-V Cores, Bristol

E-mail: Tariq.Kurd@...

Company: Huawei technologies R&D (UK) Ltd I Address: 290 Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32 4TR, UK      

 

315px-Huawei    http://www.huawei.com

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure,reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it !

本邮件及其附件含有华为公司的保密信息,仅限于发送给上面 地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!

 


Re: Smaller embedded version of the Vector extension

Tony Cole
 

Hi Bruce,

 

Do you mean vrgather instead of vslide?

 

I use vrgather_vx_* and vslidedown to perform a vector element rotate (and other things), see:

 

        https://github.com/riscv/riscv-v-spec/issues/671#issuecomment-837035001

 

-        I use vrgather_vx_i64m8( vec, 0, vl ) to splat the scalar in element 0 of vec to all elements in the result, I just want it in the top element but there isn’t a better instruction for that.

 

I think you are referring to: vrgather_vv_*  ??

 

 

Tony

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Tony Cole via lists.riscv.org
Sent: 02 June 2021 18:13
To: Bruce Hoult <bruce@...>
Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

Hi Bruce,

 

“I an not a fan of the vslide instructions. It seems they expose the size of the vector registers in a very unfortunate way. In particular they break down if VLEN=1. Most code would be better off storing and loading with an offset.”

 

I don't see what you mean, please can you elaborate with examples of why/how it exposes the size of the vector register in a very unfortunate way and breaking down if VLEN=1 (do you mean LMUL=1??).

 

The vslide instruction speeds up my code a lot as it reduce reloading (mostly the same) data over and over again.

 

 

Tony

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Bruce Hoult
Sent: 02 June 2021 13:34
To: Tony Cole <tony.cole@...>
Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

I an not a fan of the vslide instructions. It seems they expose the size of the vector registers in a very unfortunate way. In particular they break down if VLEN=1. Most code would be better off storing and loading with an offset.

 

I think I saw somewhere they are largely intended for debuggers.

 

On Thu, Jun 3, 2021 at 12:15 AM Tony Cole <tony.cole@...> wrote:

So, (on a 32x 32-bit vector register machine) the widening and narrowing instructions can use 64-bit elements (for destination and source respectively), but not any of other instructions, correct?

 

Note: I use many instructions while processing 64-bit “wide” and “quad” elements, e.g. vrgather_vx_i64m8, vslide1down_vx_i64m4, vslidedown_vx_i64m8, vredsum_vs_i64m8, etc.

 

Therefore, this code would not work on a 32x 32-bit vector register machine.

 

 

Tony

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Bruce Hoult
Sent: 02 June 2021 12:18
To: Tony Cole <tony.cole@...>
Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

Note that the effective LMUL is limited to 8, the same as the actual LMUL, so if you've set e32m4 (32 bit elements with LMUL=4) then you can only widen to 64 bit results, not 128 bit. 

 

On Wed, Jun 2, 2021 at 11:15 PM Bruce Hoult <bruce@...> wrote:

Yes. The Standard Element Width (SEW) would be limited to 32 bits, but the widening multiplies and accumulates produce the same number of wider results using multiple registers (higher effective LMUL)

 

See section 5.2. Vector Operands

 

Each vector operand has an effective element width (EEW) and an effective LMUL (EMUL) that is used to determine the size and location of all the elements within a vector register group. By default, for most operands of most instructions, EEW=SEW and EMUL=LMUL.


Some vector instructions have source and destination vector operands with the same number of elements but different widths, so that EEW and EMUL differ from SEW and LMUL respectively but EEW/EMUL = SEW/LMUL. For example, most widening arithmetic instructions have a source group with EEW=SEW and EMUL=LMUL but destination group with EEW=2*SEW and EMUL=2*LMUL. Narrowing instructions have a source operand that has EEW=2*SEW and EMUL=2*LMUL but destination where EEW=SEW and EMUL=LMUL.

Vector operands or results may occupy one or more vector registers depending on EMUL, but are always specified using the lowest-numbered vector register in the group. Using other than the lowest-numbered vector register to specify a vector register group is a reserved encoding.

 

 

 

On Wed, Jun 2, 2021 at 11:11 PM Tony Cole <tony.cole@...> wrote:

Having 32x 32 bit registers with LMUL=4, giving 8x 128 bits - does this allow for 64-bit elements?

I don't think it does, but it’s not clear in the spec.

 

I use 64-bit elements for “wide” and “quad” accumulators.

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Bruce Hoult
Sent: 02 June 2021 11:19
To: Tariq Kurd <
tariq.kurd@...>
Cc:
tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

There is nothing to prevent implementing 32x 32 bit registers on a 32 bit CPU. The application processor spec has quite

recently (a few months) specified a 128 bit minimum register size but I don't think there's any good reason for this,

especially in embedded.

 

With that configuration, LMUL=4 gives 8x 128 bits, the same as MVE.

 

If floating point is desired then Zfinx is available, sharing int & fp scalar registers instead of fp and vector registers.

 

Of course profiles (or just custom chips for custom applications) can define subsets of instructions.

 

On Wed, Jun 2, 2021 at 10:05 PM Tariq Kurd via lists.riscv.org <tariq.kurd=huawei.com@...> wrote:

Hi everyone,

 

Are there any plans for a cut-down configuration of the vector extension suitable for embedded cores? It seems that the 32x128-bit register file is suitable for application class cores but it very large for embedded cores, especially if the F registers also need to be implemented (which I think is the case, unless a Zfinx version is specified).

 

ARM MVE only has 8x128-bit registers for FP and Vector, so it much more suitable for embedded applications.

https://en.wikichip.org/wiki/arm/helium

 

What’s the approach here? Should embedded applications implement the P-extension instead?

 

Tariq

 

Tariq Kurd

Processor Design I RISC-V Cores, Bristol

E-mail: Tariq.Kurd@...

Company: Huawei technologies R&D (UK) Ltd I Address: 290 Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32 4TR, UK      

 

315px-Huawei    http://www.huawei.com

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure,reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it !

本邮件及其附件含有华为公司的保密信息,仅限于发送给上面 地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!

 


Re: Smaller embedded version of the Vector extension

Krste Asanovic
 

On Wed, 2 Jun 2021 11:19:36 -0700, Mark Himelstein <markhimelstein@riscv.org> said:
| could an extension just change state like the number of vector registers?
|

Don't understand tbis question - please elaborate.

Krste


Re: Smaller embedded version of the Vector extension

Tony Cole
 

Thanks, I must have missed this bit:

"4.5. Mapping with LMUL > 1 and ELEN > VLEN
If vector registers are grouped to support larger SEW, with ELEN > VLEN, the vector registers in the group are concatenated
to form a single array of bytes, with the lowest-numbered register in the group holding the lowest-addressed bytes from the
memory layout."

-----Original Message-----
From: krste@berkeley.edu [mailto:krste@berkeley.edu]
Sent: 02 June 2021 19:17
To: Tony Cole <tony.cole@huawei.com>
Cc: Bruce Hoult <bruce@hoult.org>; Tariq Kurd <tariq.kurd@huawei.com>; tech-vector-ext@lists.riscv.org; Shaofei (B) <shaofei1@hisilicon.com>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension


We do allow supported SEW to vary with LMUL, so implementation can support single-width operations on SEW=64. See section 4.5,

Krste

On Wed, 2 Jun 2021 12:14:33 +0000, "Tony Cole via lists.riscv.org" <tony.cole=huawei.com@lists.riscv.org> said:
| So, (on a 32x 32-bit vector register machine) the widening and
| narrowing instructions can use 64-bit elements (for destination and
| source respectively), but not any of other instructions, correct?

| Note: I use many instructions while processing 64-bit “wide” and “quad”
| elements, e.g. vrgather_vx_i64m8, vslide1down_vx_i64m4,
| vslidedown_vx_i64m8, vredsum_vs_i64m8, etc.

| Therefore, this code would not work on a 32x 32-bit vector register machine.

| Tony

| From: tech-vector-ext@lists.riscv.org
| [mailto:tech-vector-ext@lists.riscv.org]
| On Behalf Of Bruce Hoult
| Sent: 02 June 2021 12:18
| To: Tony Cole <tony.cole@huawei.com>
| Cc: Tariq Kurd <tariq.kurd@huawei.com>;
| tech-vector-ext@lists.riscv.org; Shaofei (B) <shaofei1@hisilicon.com>
| Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of
| the Vector extension

| Note that the effective LMUL is limited to 8, the same as the actual
| LMUL, so if you've set e32m4 (32 bit elements with LMUL=4) then you
| can only widen to
| 64 bit results, not 128 bit.

| On Wed, Jun 2, 2021 at 11:15 PM Bruce Hoult <bruce@hoult.org> wrote:

| Yes. The Standard Element Width (SEW) would be limited to 32 bits, but the
| widening multiplies and accumulates produce the same number of wider
| results using multiple registers (higher effective LMUL)

| See section 5.2. Vector Operands

| Each vector operand has an effective element width (EEW) and an effective
| LMUL (EMUL) that is used to determine the size and location of all the
| elements within a vector register group. By default, for most operands of
| most instructions, EEW=SEW and EMUL=LMUL.

| Some vector instructions have source and destination vector operands with
| the same number of elements but different widths, so that EEW and EMUL
| differ from SEW and LMUL respectively but EEW/EMUL = SEW/LMUL. For
| example, most widening arithmetic instructions have a source group with
| EEW=SEW and EMUL=LMUL but destination group with EEW=2*SEW and EMUL=
| 2*LMUL. Narrowing instructions have a source operand that has EEW=2*SEW
| and EMUL=2*LMUL but destination where EEW=SEW and EMUL=LMUL.

| Vector operands or results may occupy one or more vector registers
| depending on EMUL, but are always specified using the lowest-numbered
| vector register in the group. Using other than the lowest-numbered vector
| register to specify a vector register group is a reserved encoding.

| On Wed, Jun 2, 2021 at 11:11 PM Tony Cole <tony.cole@huawei.com> wrote:

| Having 32x 32 bit registers with LMUL=4, giving 8x 128 bits - does
| this allow for 64-bit elements?

| I don't think it does, but it’s not clear in the spec.

| I use 64-bit elements for “wide” and “quad” accumulators.

| From: tech-vector-ext@lists.riscv.org [mailto:
| tech-vector-ext@lists.riscv.org] On Behalf Of Bruce Hoult
| Sent: 02 June 2021 11:19
| To: Tariq Kurd <tariq.kurd@huawei.com>
| Cc: tech-vector-ext@lists.riscv.org; Shaofei (B) <
| shaofei1@hisilicon.com>
| Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of
| the Vector extension

| There is nothing to prevent implementing 32x 32 bit registers on a 32
| bit CPU. The application processor spec has quite

| recently (a few months) specified a 128 bit minimum register size but
| I don't think there's any good reason for this,

| especially in embedded.

| With that configuration, LMUL=4 gives 8x 128 bits, the same as MVE.

| If floating point is desired then Zfinx is available, sharing int & fp
| scalar registers instead of fp and vector registers.

| Of course profiles (or just custom chips for custom applications) can
| define subsets of instructions.

| On Wed, Jun 2, 2021 at 10:05 PM Tariq Kurd via lists.riscv.org
| <tariq.kurd=huawei.com@lists.riscv.org> wrote:

| Hi everyone,

| Are there any plans for a cut-down configuration of the vector
| extension suitable for embedded cores? It seems that the
| 32x128-bit register file is suitable for application class cores
| but it very large for embedded cores, especially if the F
| registers also need to be implemented (which I think is the case,
| unless a Zfinx version is specified).

| ARM MVE only has 8x128-bit registers for FP and Vector, so it much
| more suitable for embedded applications.

| https://en.wikichip.org/wiki/arm/helium

| What’s the approach here? Should embedded applications implement
| the P-extension instead?

| Tariq

| Tariq Kurd

| Processor Design I RISC-V Cores, Bristol

| E-mail: Tariq.Kurd@Huawei.com

| Company: Huawei technologies R&D (UK) Ltd I Address: 290 Park
| Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32 4TR, UK

| 315px-Huawei http://www.huawei.com

| cid:image002.jpg@01D4BC65.4BB52AF0

| This e-mail and its attachments contain confidential information
| from HUAWEI, which is intended only for the person or entity whose
| address is listed above. Any use of the information contained
| herein in any way (including, but not limited to, total or partial
| disclosure,reproduction, or dissemination) by persons other than
| the intended recipient(s) is prohibited. If you receive this
| e-mail in error, please notify the sender by phone or email
| immediately and delete it !

| 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的
| 个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地
| 泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电
| 话或邮件通知发件人并删除本邮件!

| x[DELETED ATTACHMENT image001.png, PNG
| image] x[DELETED ATTACHMENT image002.jpg, JPEG image]


Re: Smaller embedded version of the Vector extension

Krste Asanovic
 

Section 4.5,

Krste

On Wed, 2 Jun 2021 08:41:52 -0700, "Guy Lemieux" <guy.lemieux@gmail.com> said:
| On Wed, Jun 2, 2021 at 8:38 AM Andrew Waterman <andrew@sifive.com> wrote:
| It’s actually not fundamental to the ISA design that VLEN >= ELEN. An
| implementation with VLEN=32 could support SEW=64 whenever LMUL >= 2. 

| I think the concern here is lack of a clearly defined data layout pattern for
| such cases.

| eg, should the LSBs be in the odd or even register half, or should it be
| implementation-defined?

| Guy
|


Re: Smaller embedded version of the Vector extension

mark
 

could an extension just change state like the number of vector registers?

On Wed, Jun 2, 2021 at 11:13 AM Krste Asanovic <krste@...> wrote:

The VLEN>=128 constraint is only for the application processor "V"
extension for the app profile - not for embedded vectors which can
have VLEN=32.

From spec Introduction:
'
The term base vector extension is used informally to describe the standard set of vector ISA components that will be required for the single-letter "V" extension, which is intended for use in standard server and application-processor platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I, RV64I) as described below.

Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string names beginning with the "Zv" prefix.
'

There are a set Zve* names for the embedded subsets (see github issue
#550).

A minimal embedded implementaton using RV32E+Zfinx+vectors would be
same state size as ARM MVE.

P extension does not have floating-point, but for short
integer/fixed-point SIMD makes sense as alternative.

The software fragmentation issue is that some library routines that
expose VLEN might not be portable between app cores and embedded
cores, but these are different software ecosystems (e.g. ABI/calling
convention might be different) and only a few kinds of routine rely on
VLEN.

For app cores that can afford VLEN>=128, the advantage is the removal
of stripmining code in cases that operate on fixed-size vectors.

Krste



>>>>> On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@...> said:

| Allowing VLEN<128 would allow for smaller vector register files, bit it would
| also result in a profile that is not forward-compatible with the V spec. This
| would produce another fracture the software ecosystem.

| To avoid such a fracture, there are two choices:
| (1) go with P instead
| (2) relax the V spec to allow smaller implementations

| So the key question for this group is whether to relax the minimum VLEN to 32
| or 64?

| note: a possible justification for keeping 128 might be to recommend (1)
| instead. I don’t know anything about P, but it seems like it could be speced
| in a way that is competitive/comparable with Helium.

| Guy

| PS — I have started to design an “RVV-lite” profile which would be more
| amenable to embedded implementations. However, I have adopted a stance that it
| must remain forward compatible with the full V spec, so I have not considered
| VLEN below 128. I am happy to share my work on this and involve other
| contributors — email me if you would like to see a copy.

| On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@...> wrote:

|     The uppercase-V V extension is meant to cater to apps processors, where
|     the VLEN >= 128 constraint is not inappropriate and is sometimes
|     beneficial.  But there's nothing fundamental about the ISA design that
|     prohibits VLEN < 128.  A minimal configuration is VLEN=ELEN=32, giving the
|     same total amount of state as MVE.  (And if you set LMUL=4, then you even
|     get the same shape: 8 registers of 128 bits apiece.)

|     Such a thing wouldn't be called V, but perhaps something like Zvmin. 
|     Other than agreeing on a feature set and assigning it a name, the
|     architecting is already done.

|     (If you search the spec for Zfinx, you'll see that a Zfinx variant is
|     planned, but only barely sketched out.)

|     On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
|     huawei.com@...> wrote:

|         Hi everyone,

|          

|         Are there any plans for a cut-down configuration of the vector
|         extension suitable for embedded cores? It seems that the 32x128-bit
|         register file is suitable for application class cores but it very
|         large for embedded cores, especially if

|         the F registers also need to be implemented (which I think is the
|         case, unless a Zfinx version is specified).

|          

|         ARM MVE only has 8x128-bit registers for FP and Vector, so it much
|         more suitable for embedded applications.

|         https://en.wikichip.org/wiki/arm/helium

|          

|         What’s the approach here? Should embedded applications implement the
|         P-extension instead?

|          

|         Tariq

|          

|         Tariq Kurd

|         Processor Design

|         I RISC-V Cores, Bristol

|         E-mail:

|         Tariq.Kurd@...

|         Company:

|         Huawei technologies R&D (UK) Ltd

|         I Address: 290

|         Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
|         4TR, UK      

|          

|         315px-Huawei   

|         http://www.huawei.com

|         cid:image002.jpg@...

|         This e-mail and its attachments contain confidential information from
|         HUAWEI, which

|         is intended only for the person or entity whose address is listed
|         above. Any use of the information contained herein in any way
|         (including, but not limited to, total or partial
|         disclosure,reproduction, or dissemination) by persons other than the
|         intended recipient(s)

|         is prohibited. If you receive this e-mail in error, please notify the
|         sender by phone or email immediately and delete it !

|         本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
|         或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
|         制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
|         发件人并删除本邮件!

|          

|
| x[DELETED ATTACHMENT image001.png, PNG image]
| x[DELETED ATTACHMENT image002.jpg, JPEG image]






Re: Smaller embedded version of the Vector extension

Krste Asanovic
 

We do allow supported SEW to vary with LMUL, so implementation can
support single-width operations on SEW=64. See section 4.5,

Krste

On Wed, 2 Jun 2021 12:14:33 +0000, "Tony Cole via lists.riscv.org" <tony.cole=huawei.com@lists.riscv.org> said:
| So, (on a 32x 32-bit vector register machine) the widening and narrowing
| instructions can use 64-bit elements (for destination and source
| respectively), but not any of other instructions, correct?

| Note: I use many instructions while processing 64-bit “wide” and “quad”
| elements, e.g. vrgather_vx_i64m8, vslide1down_vx_i64m4, vslidedown_vx_i64m8,
| vredsum_vs_i64m8, etc.

| Therefore, this code would not work on a 32x 32-bit vector register machine.

| Tony

| From: tech-vector-ext@lists.riscv.org [mailto:tech-vector-ext@lists.riscv.org]
| On Behalf Of Bruce Hoult
| Sent: 02 June 2021 12:18
| To: Tony Cole <tony.cole@huawei.com>
| Cc: Tariq Kurd <tariq.kurd@huawei.com>; tech-vector-ext@lists.riscv.org;
| Shaofei (B) <shaofei1@hisilicon.com>
| Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector
| extension

| Note that the effective LMUL is limited to 8, the same as the actual LMUL, so
| if you've set e32m4 (32 bit elements with LMUL=4) then you can only widen to
| 64 bit results, not 128 bit.

| On Wed, Jun 2, 2021 at 11:15 PM Bruce Hoult <bruce@hoult.org> wrote:

| Yes. The Standard Element Width (SEW) would be limited to 32 bits, but the
| widening multiplies and accumulates produce the same number of wider
| results using multiple registers (higher effective LMUL)

| See section 5.2. Vector Operands

| Each vector operand has an effective element width (EEW) and an effective
| LMUL (EMUL) that is used to determine the size and location of all the
| elements within a vector register group. By default, for most operands of
| most instructions, EEW=SEW and EMUL=LMUL.

| Some vector instructions have source and destination vector operands with
| the same number of elements but different widths, so that EEW and EMUL
| differ from SEW and LMUL respectively but EEW/EMUL = SEW/LMUL. For
| example, most widening arithmetic instructions have a source group with
| EEW=SEW and EMUL=LMUL but destination group with EEW=2*SEW and EMUL=
| 2*LMUL. Narrowing instructions have a source operand that has EEW=2*SEW
| and EMUL=2*LMUL but destination where EEW=SEW and EMUL=LMUL.

| Vector operands or results may occupy one or more vector registers
| depending on EMUL, but are always specified using the lowest-numbered
| vector register in the group. Using other than the lowest-numbered vector
| register to specify a vector register group is a reserved encoding.

| On Wed, Jun 2, 2021 at 11:11 PM Tony Cole <tony.cole@huawei.com> wrote:

| Having 32x 32 bit registers with LMUL=4, giving 8x 128 bits - does
| this allow for 64-bit elements?

| I don't think it does, but it’s not clear in the spec.

| I use 64-bit elements for “wide” and “quad” accumulators.

| From: tech-vector-ext@lists.riscv.org [mailto:
| tech-vector-ext@lists.riscv.org] On Behalf Of Bruce Hoult
| Sent: 02 June 2021 11:19
| To: Tariq Kurd <tariq.kurd@huawei.com>
| Cc: tech-vector-ext@lists.riscv.org; Shaofei (B) <
| shaofei1@hisilicon.com>
| Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of
| the Vector extension

| There is nothing to prevent implementing 32x 32 bit registers on a 32
| bit CPU. The application processor spec has quite

| recently (a few months) specified a 128 bit minimum register size but
| I don't think there's any good reason for this,

| especially in embedded.

| With that configuration, LMUL=4 gives 8x 128 bits, the same as MVE.

| If floating point is desired then Zfinx is available, sharing int & fp
| scalar registers instead of fp and vector registers.

| Of course profiles (or just custom chips for custom applications) can
| define subsets of instructions.

| On Wed, Jun 2, 2021 at 10:05 PM Tariq Kurd via lists.riscv.org
| <tariq.kurd=huawei.com@lists.riscv.org> wrote:

| Hi everyone,

| Are there any plans for a cut-down configuration of the vector
| extension suitable for embedded cores? It seems that the
| 32x128-bit register file is suitable for application class cores
| but it very large for embedded cores, especially if the F
| registers also need to be implemented (which I think is the case,
| unless a Zfinx version is specified).

| ARM MVE only has 8x128-bit registers for FP and Vector, so it much
| more suitable for embedded applications.

| https://en.wikichip.org/wiki/arm/helium

| What’s the approach here? Should embedded applications implement
| the P-extension instead?

| Tariq

| Tariq Kurd

| Processor Design I RISC-V Cores, Bristol

| E-mail: Tariq.Kurd@Huawei.com

| Company: Huawei technologies R&D (UK) Ltd I Address: 290 Park
| Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32 4TR, UK

| 315px-Huawei http://www.huawei.com

| cid:image002.jpg@01D4BC65.4BB52AF0

| This e-mail and its attachments contain confidential information
| from HUAWEI, which is intended only for the person or entity whose
| address is listed above. Any use of the information contained
| herein in any way (including, but not limited to, total or partial
| disclosure,reproduction, or dissemination) by persons other than
| the intended recipient(s) is prohibited. If you receive this
| e-mail in error, please notify the sender by phone or email
| immediately and delete it !

| 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的
| 个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地
| 泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电
| 话或邮件通知发件人并删除本邮件!

|
| x[DELETED ATTACHMENT image001.png, PNG image]
| x[DELETED ATTACHMENT image002.jpg, JPEG image]


Re: Smaller embedded version of the Vector extension

Krste Asanovic
 

The VLEN>=128 constraint is only for the application processor "V"
extension for the app profile - not for embedded vectors which can
have VLEN=32.

From spec Introduction:
'
The term base vector extension is used informally to describe the standard set of vector ISA components that will be required for the single-letter "V" extension, which is intended for use in standard server and application-processor platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I, RV64I) as described below.

Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string names beginning with the "Zv" prefix.
'

There are a set Zve* names for the embedded subsets (see github issue
#550).

A minimal embedded implementaton using RV32E+Zfinx+vectors would be
same state size as ARM MVE.

P extension does not have floating-point, but for short
integer/fixed-point SIMD makes sense as alternative.

The software fragmentation issue is that some library routines that
expose VLEN might not be portable between app cores and embedded
cores, but these are different software ecosystems (e.g. ABI/calling
convention might be different) and only a few kinds of routine rely on
VLEN.

For app cores that can afford VLEN>=128, the advantage is the removal
of stripmining code in cases that operate on fixed-size vectors.

Krste



On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@gmail.com> said:
| Allowing VLEN<128 would allow for smaller vector register files, bit it would
| also result in a profile that is not forward-compatible with the V spec. This
| would produce another fracture the software ecosystem.

| To avoid such a fracture, there are two choices:
| (1) go with P instead
| (2) relax the V spec to allow smaller implementations

| So the key question for this group is whether to relax the minimum VLEN to 32
| or 64?

| note: a possible justification for keeping 128 might be to recommend (1)
| instead. I don’t know anything about P, but it seems like it could be speced
| in a way that is competitive/comparable with Helium.

| Guy

| PS — I have started to design an “RVV-lite” profile which would be more
| amenable to embedded implementations. However, I have adopted a stance that it
| must remain forward compatible with the full V spec, so I have not considered
| VLEN below 128. I am happy to share my work on this and involve other
| contributors — email me if you would like to see a copy.

| On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@sifive.com> wrote:

| The uppercase-V V extension is meant to cater to apps processors, where
| the VLEN >= 128 constraint is not inappropriate and is sometimes
| beneficial.  But there's nothing fundamental about the ISA design that
| prohibits VLEN < 128.  A minimal configuration is VLEN=ELEN=32, giving the
| same total amount of state as MVE.  (And if you set LMUL=4, then you even
| get the same shape: 8 registers of 128 bits apiece.)

| Such a thing wouldn't be called V, but perhaps something like Zvmin. 
| Other than agreeing on a feature set and assigning it a name, the
| architecting is already done.

| (If you search the spec for Zfinx, you'll see that a Zfinx variant is
| planned, but only barely sketched out.)

| On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
| huawei.com@lists.riscv.org> wrote:

| Hi everyone,

|  

| Are there any plans for a cut-down configuration of the vector
| extension suitable for embedded cores? It seems that the 32x128-bit
| register file is suitable for application class cores but it very
| large for embedded cores, especially if

| the F registers also need to be implemented (which I think is the
| case, unless a Zfinx version is specified).

|  

| ARM MVE only has 8x128-bit registers for FP and Vector, so it much
| more suitable for embedded applications.

| https://en.wikichip.org/wiki/arm/helium

|  

| What’s the approach here? Should embedded applications implement the
| P-extension instead?

|  

| Tariq

|  

| Tariq Kurd

| Processor Design

| I RISC-V Cores, Bristol

| E-mail:

| Tariq.Kurd@Huawei.com

| Company:

| Huawei technologies R&D (UK) Ltd

| I Address: 290

| Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
| 4TR, UK      

|  

| 315px-Huawei   

| http://www.huawei.com

| cid:image002.jpg@01D4BC65.4BB52AF0

| This e-mail and its attachments contain confidential information from
| HUAWEI, which

| is intended only for the person or entity whose address is listed
| above. Any use of the information contained herein in any way
| (including, but not limited to, total or partial
| disclosure,reproduction, or dissemination) by persons other than the
| intended recipient(s)

| is prohibited. If you receive this e-mail in error, please notify the
| sender by phone or email immediately and delete it !

| 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
| 或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
| 制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
| 发件人并删除本邮件!

|  

|
| x[DELETED ATTACHMENT image001.png, PNG image]
| x[DELETED ATTACHMENT image002.jpg, JPEG image]


Re: Smaller embedded version of the Vector extension

Thang Tran
 

It seems that restriction of minimum LMUL=2 would be half number of vector registers and LMUL=4 would be 8 vector registers.

Thang

 

From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of Tariq Kurd via lists.riscv.org
Sent: Wednesday, June 2, 2021 8:21 AM
To: Bruce Hoult <bruce@...>; Tony Cole <tony.cole@...>
Cc: tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

OK, so it seems that to run our software (which Tony Cole referred to) we need VLEN>=64 for our embedded application.

Is there any scope for reducing the number of V registers? Could RV32E_Vmin have 16 X and V registers?

I know it doesn’t affect the number of F registers, which is tackled by having Zfinx instead to save area – but it seems that we need another solution for the vectors.

 

Then we can match ARM MVE for area – 8x128-bit compared to 16x64-bit

 

Tariq

 

From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of Bruce Hoult
Sent: 02 June 2021 13:34
To: Tony Cole <tony.cole@...>
Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

I an not a fan of the vslide instructions. It seems they expose the size of the vector registers in a very unfortunate way. In particular they break down if VLEN=1. Most code would be better off storing and loading with an offset.

 

I think I saw somewhere they are largely intended for debuggers.

 

On Thu, Jun 3, 2021 at 12:15 AM Tony Cole <tony.cole@...> wrote:

So, (on a 32x 32-bit vector register machine) the widening and narrowing instructions can use 64-bit elements (for destination and source respectively), but not any of other instructions, correct?

 

Note: I use many instructions while processing 64-bit “wide” and “quad” elements, e.g. vrgather_vx_i64m8, vslide1down_vx_i64m4, vslidedown_vx_i64m8, vredsum_vs_i64m8, etc.

 

Therefore, this code would not work on a 32x 32-bit vector register machine.

 

 

Tony

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Bruce Hoult
Sent: 02 June 2021 12:18
To: Tony Cole <tony.cole@...>
Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

Note that the effective LMUL is limited to 8, the same as the actual LMUL, so if you've set e32m4 (32 bit elements with LMUL=4) then you can only widen to 64 bit results, not 128 bit. 

 

On Wed, Jun 2, 2021 at 11:15 PM Bruce Hoult <bruce@...> wrote:

Yes. The Standard Element Width (SEW) would be limited to 32 bits, but the widening multiplies and accumulates produce the same number of wider results using multiple registers (higher effective LMUL)

 

See section 5.2. Vector Operands

 

Each vector operand has an effective element width (EEW) and an effective LMUL (EMUL) that is used to determine the size and location of all the elements within a vector register group. By default, for most operands of most instructions, EEW=SEW and EMUL=LMUL.


Some vector instructions have source and destination vector operands with the same number of elements but different widths, so that EEW and EMUL differ from SEW and LMUL respectively but EEW/EMUL = SEW/LMUL. For example, most widening arithmetic instructions have a source group with EEW=SEW and EMUL=LMUL but destination group with EEW=2*SEW and EMUL=2*LMUL. Narrowing instructions have a source operand that has EEW=2*SEW and EMUL=2*LMUL but destination where EEW=SEW and EMUL=LMUL.

Vector operands or results may occupy one or more vector registers depending on EMUL, but are always specified using the lowest-numbered vector register in the group. Using other than the lowest-numbered vector register to specify a vector register group is a reserved encoding.

 

 

 

On Wed, Jun 2, 2021 at 11:11 PM Tony Cole <tony.cole@...> wrote:

Having 32x 32 bit registers with LMUL=4, giving 8x 128 bits - does this allow for 64-bit elements?

I don't think it does, but it’s not clear in the spec.

 

I use 64-bit elements for “wide” and “quad” accumulators.

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Bruce Hoult
Sent: 02 June 2021 11:19
To: Tariq Kurd <tariq.kurd@...>
Cc: tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

There is nothing to prevent implementing 32x 32 bit registers on a 32 bit CPU. The application processor spec has quite

recently (a few months) specified a 128 bit minimum register size but I don't think there's any good reason for this,

especially in embedded.

 

With that configuration, LMUL=4 gives 8x 128 bits, the same as MVE.

 

If floating point is desired then Zfinx is available, sharing int & fp scalar registers instead of fp and vector registers.

 

Of course profiles (or just custom chips for custom applications) can define subsets of instructions.

 

On Wed, Jun 2, 2021 at 10:05 PM Tariq Kurd via lists.riscv.org <tariq.kurd=huawei.com@...> wrote:

Hi everyone,

 

Are there any plans for a cut-down configuration of the vector extension suitable for embedded cores? It seems that the 32x128-bit register file is suitable for application class cores but it very large for embedded cores, especially if the F registers also need to be implemented (which I think is the case, unless a Zfinx version is specified).

 

ARM MVE only has 8x128-bit registers for FP and Vector, so it much more suitable for embedded applications.

https://en.wikichip.org/wiki/arm/helium

 

What’s the approach here? Should embedded applications implement the P-extension instead?

 

Tariq

 

Tariq Kurd

Processor Design I RISC-V Cores, Bristol

E-mail: Tariq.Kurd@...

Company: Huawei technologies R&D (UK) Ltd I Address: 290 Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32 4TR, UK      

 

315px-Huawei    http://www.huawei.com

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure,reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it !

本邮件及其附件含有华为公司的保密信息,仅限于发送给上面 地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!

 


Re: Smaller embedded version of the Vector extension

Tony Cole
 

Hi Bruce,

 

“I an not a fan of the vslide instructions. It seems they expose the size of the vector registers in a very unfortunate way. In particular they break down if VLEN=1. Most code would be better off storing and loading with an offset.”

 

I don't see what you mean, please can you elaborate with examples of why/how it exposes the size of the vector register in a very unfortunate way and breaking down if VLEN=1 (do you mean LMUL=1??).

 

The vslide instruction speeds up my code a lot as it reduce reloading (mostly the same) data over and over again.

 

 

Tony

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Bruce Hoult
Sent: 02 June 2021 13:34
To: Tony Cole <tony.cole@...>
Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

I an not a fan of the vslide instructions. It seems they expose the size of the vector registers in a very unfortunate way. In particular they break down if VLEN=1. Most code would be better off storing and loading with an offset.

 

I think I saw somewhere they are largely intended for debuggers.

 

On Thu, Jun 3, 2021 at 12:15 AM Tony Cole <tony.cole@...> wrote:

So, (on a 32x 32-bit vector register machine) the widening and narrowing instructions can use 64-bit elements (for destination and source respectively), but not any of other instructions, correct?

 

Note: I use many instructions while processing 64-bit “wide” and “quad” elements, e.g. vrgather_vx_i64m8, vslide1down_vx_i64m4, vslidedown_vx_i64m8, vredsum_vs_i64m8, etc.

 

Therefore, this code would not work on a 32x 32-bit vector register machine.

 

 

Tony

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Bruce Hoult
Sent: 02 June 2021 12:18
To: Tony Cole <tony.cole@...>
Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

Note that the effective LMUL is limited to 8, the same as the actual LMUL, so if you've set e32m4 (32 bit elements with LMUL=4) then you can only widen to 64 bit results, not 128 bit. 

 

On Wed, Jun 2, 2021 at 11:15 PM Bruce Hoult <bruce@...> wrote:

Yes. The Standard Element Width (SEW) would be limited to 32 bits, but the widening multiplies and accumulates produce the same number of wider results using multiple registers (higher effective LMUL)

 

See section 5.2. Vector Operands

 

Each vector operand has an effective element width (EEW) and an effective LMUL (EMUL) that is used to determine the size and location of all the elements within a vector register group. By default, for most operands of most instructions, EEW=SEW and EMUL=LMUL.


Some vector instructions have source and destination vector operands with the same number of elements but different widths, so that EEW and EMUL differ from SEW and LMUL respectively but EEW/EMUL = SEW/LMUL. For example, most widening arithmetic instructions have a source group with EEW=SEW and EMUL=LMUL but destination group with EEW=2*SEW and EMUL=2*LMUL. Narrowing instructions have a source operand that has EEW=2*SEW and EMUL=2*LMUL but destination where EEW=SEW and EMUL=LMUL.

Vector operands or results may occupy one or more vector registers depending on EMUL, but are always specified using the lowest-numbered vector register in the group. Using other than the lowest-numbered vector register to specify a vector register group is a reserved encoding.

 

 

 

On Wed, Jun 2, 2021 at 11:11 PM Tony Cole <tony.cole@...> wrote:

Having 32x 32 bit registers with LMUL=4, giving 8x 128 bits - does this allow for 64-bit elements?

I don't think it does, but it’s not clear in the spec.

 

I use 64-bit elements for “wide” and “quad” accumulators.

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Bruce Hoult
Sent: 02 June 2021 11:19
To: Tariq Kurd <
tariq.kurd@...>
Cc:
tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

There is nothing to prevent implementing 32x 32 bit registers on a 32 bit CPU. The application processor spec has quite

recently (a few months) specified a 128 bit minimum register size but I don't think there's any good reason for this,

especially in embedded.

 

With that configuration, LMUL=4 gives 8x 128 bits, the same as MVE.

 

If floating point is desired then Zfinx is available, sharing int & fp scalar registers instead of fp and vector registers.

 

Of course profiles (or just custom chips for custom applications) can define subsets of instructions.

 

On Wed, Jun 2, 2021 at 10:05 PM Tariq Kurd via lists.riscv.org <tariq.kurd=huawei.com@...> wrote:

Hi everyone,

 

Are there any plans for a cut-down configuration of the vector extension suitable for embedded cores? It seems that the 32x128-bit register file is suitable for application class cores but it very large for embedded cores, especially if the F registers also need to be implemented (which I think is the case, unless a Zfinx version is specified).

 

ARM MVE only has 8x128-bit registers for FP and Vector, so it much more suitable for embedded applications.

https://en.wikichip.org/wiki/arm/helium

 

What’s the approach here? Should embedded applications implement the P-extension instead?

 

Tariq

 

Tariq Kurd

Processor Design I RISC-V Cores, Bristol

E-mail: Tariq.Kurd@...

Company: Huawei technologies R&D (UK) Ltd I Address: 290 Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32 4TR, UK      

 

315px-Huawei    http://www.huawei.com

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure,reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it !

本邮件及其附件含有华为公司的保密信息,仅限于发送给上面 地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!

 


Re: Smaller embedded version of the Vector extension

Guy Lemieux
 



On Wed, Jun 2, 2021 at 8:38 AM Andrew Waterman <andrew@...> wrote:
It’s actually not fundamental to the ISA design that VLEN >= ELEN. An implementation with VLEN=32 could support SEW=64 whenever LMUL >= 2. 

I think the concern here is lack of a clearly defined data layout pattern for such cases.

eg, should the LSBs be in the odd or even register half, or should it be implementation-defined?

Guy


Re: Smaller embedded version of the Vector extension

Guy Lemieux
 

For widening and narrowing instructions to work, the V spec depends upon changing SEW (to EEW) and LMUL (to EMUL),  such that EEW/EMUL ==  SEW/LMUL. That is, to change the element size (widen or narrow) to EEW, one must also change the EMUL setting accordingly.

In my RVV-lite proposal, I recommend a simplification where the only settings permitted are SEW/LMUL = 8/1, 16/2, 32/4, and 64/8, thereby creating 32 named registers of bytes, 16 halfs, 8 words, and 4 dwords. This allows the widening and narrowing to work, and it ensures that VLMAX is the same for all element sizes. The primary negative side effect is named registers available for the larger sizes, but this seems an acceptable simplification of both hardware and software.

In other words, if you want to further reduce the number of named registers below the 32 specified by V, then you will have to consider the impact on the narrowing/widening instructions. For example, you could fix SEW/LMUL at 16, eg SEW/LMUL = 8/0.5 which under-utilizes vector data storage by 50% if you are operating on bytes. Or, you could remove widening/narrowing instructions entirely. Or, you could introduce new widening/narrowing instructions that do not use EEW and/or EMUL (eg, they fix EMUL==LMUL, and deal with the shortening of VLMAX somehow).

Guy


On Wed, Jun 2, 2021 at 8:21 AM Tariq Kurd via lists.riscv.org <tariq.kurd=huawei.com@...> wrote:

OK, so it seems that to run our software (which Tony Cole referred to) we need VLEN>=64 for our embedded application.

Is there any scope for reducing the number of V registers? Could RV32E_Vmin have 16 X and V registers?

I know it doesn’t affect the number of F registers, which is tackled by having Zfinx instead to save area – but it seems that we need another solution for the vectors.

 

Then we can match ARM MVE for area – 8x128-bit compared to 16x64-bit

 

Tariq

 

From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of Bruce Hoult
Sent: 02 June 2021 13:34
To: Tony Cole <tony.cole@...>
Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

I an not a fan of the vslide instructions. It seems they expose the size of the vector registers in a very unfortunate way. In particular they break down if VLEN=1. Most code would be better off storing and loading with an offset.

 

I think I saw somewhere they are largely intended for debuggers.

 

On Thu, Jun 3, 2021 at 12:15 AM Tony Cole <tony.cole@...> wrote:

So, (on a 32x 32-bit vector register machine) the widening and narrowing instructions can use 64-bit elements (for destination and source respectively), but not any of other instructions, correct?

 

Note: I use many instructions while processing 64-bit “wide” and “quad” elements, e.g. vrgather_vx_i64m8, vslide1down_vx_i64m4, vslidedown_vx_i64m8, vredsum_vs_i64m8, etc.

 

Therefore, this code would not work on a 32x 32-bit vector register machine.

 

 

Tony

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Bruce Hoult
Sent: 02 June 2021 12:18
To: Tony Cole <tony.cole@...>
Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

Note that the effective LMUL is limited to 8, the same as the actual LMUL, so if you've set e32m4 (32 bit elements with LMUL=4) then you can only widen to 64 bit results, not 128 bit. 

 

On Wed, Jun 2, 2021 at 11:15 PM Bruce Hoult <bruce@...> wrote:

Yes. The Standard Element Width (SEW) would be limited to 32 bits, but the widening multiplies and accumulates produce the same number of wider results using multiple registers (higher effective LMUL)

 

See section 5.2. Vector Operands

 

Each vector operand has an effective element width (EEW) and an effective LMUL (EMUL) that is used to determine the size and location of all the elements within a vector register group. By default, for most operands of most instructions, EEW=SEW and EMUL=LMUL.


Some vector instructions have source and destination vector operands with the same number of elements but different widths, so that EEW and EMUL differ from SEW and LMUL respectively but EEW/EMUL = SEW/LMUL. For example, most widening arithmetic instructions have a source group with EEW=SEW and EMUL=LMUL but destination group with EEW=2*SEW and EMUL=2*LMUL. Narrowing instructions have a source operand that has EEW=2*SEW and EMUL=2*LMUL but destination where EEW=SEW and EMUL=LMUL.

Vector operands or results may occupy one or more vector registers depending on EMUL, but are always specified using the lowest-numbered vector register in the group. Using other than the lowest-numbered vector register to specify a vector register group is a reserved encoding.

 

 

 

On Wed, Jun 2, 2021 at 11:11 PM Tony Cole <tony.cole@...> wrote:

Having 32x 32 bit registers with LMUL=4, giving 8x 128 bits - does this allow for 64-bit elements?

I don't think it does, but it’s not clear in the spec.

 

I use 64-bit elements for “wide” and “quad” accumulators.

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Bruce Hoult
Sent: 02 June 2021 11:19
To: Tariq Kurd <
tariq.kurd@...>
Cc:
tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

There is nothing to prevent implementing 32x 32 bit registers on a 32 bit CPU. The application processor spec has quite

recently (a few months) specified a 128 bit minimum register size but I don't think there's any good reason for this,

especially in embedded.

 

With that configuration, LMUL=4 gives 8x 128 bits, the same as MVE.

 

If floating point is desired then Zfinx is available, sharing int & fp scalar registers instead of fp and vector registers.

 

Of course profiles (or just custom chips for custom applications) can define subsets of instructions.

 

On Wed, Jun 2, 2021 at 10:05 PM Tariq Kurd via lists.riscv.org <tariq.kurd=huawei.com@...> wrote:

Hi everyone,

 

Are there any plans for a cut-down configuration of the vector extension suitable for embedded cores? It seems that the 32x128-bit register file is suitable for application class cores but it very large for embedded cores, especially if the F registers also need to be implemented (which I think is the case, unless a Zfinx version is specified).

 

ARM MVE only has 8x128-bit registers for FP and Vector, so it much more suitable for embedded applications.

https://en.wikichip.org/wiki/arm/helium

 

What’s the approach here? Should embedded applications implement the P-extension instead?

 

Tariq

 

Tariq Kurd

Processor Design I RISC-V Cores, Bristol

E-mail: Tariq.Kurd@...

Company: Huawei technologies R&D (UK) Ltd I Address: 290 Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32 4TR, UK      

 

315px-Huawei    http://www.huawei.com

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure,reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it !

本邮件及其附件含有华为公司的保密信息,仅限于发送给上面 地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!

 

61 - 80 of 696