Date   

Re: Configuring qemu for Vector Extension

Bruce Hoult
 

Do you support RVV 0.7.1 as well as tracking 1.0?

RVV 0.7.1 is the only version available in real mass-produced hardware at the moment, and probably for the next 12 to 18 months at a guess. I myself own a board with RVV 0.7.1 support (Allwinner Nezha). and Sipeed are promising a cheap board with it soon (maybe Pine64 also, but we haven't heard any recent confirmation of their plans announced at the start of the year)

0.7.1 is incompatible with 1.0 in a couple of important ways -- mostly subelement operations were replaced by fractional LMUL, loads and stores became pure bit transfers with any necessary sign or zero extension done register-to-register afterwards, and changes to policy options for tail and masked off elements -- but it's still a very nice, very practical length-agnostic Vector instruction set, up there with SVE 1/2 and RVV 1.0.

It's going to be some time before most people have access to either SVE or RVV 1.0.

On Tue, Sep 14, 2021 at 5:52 PM Wei Wu (吴伟) <lazyparser@...> wrote:
Hi Mick,

As Jim said, you may need the right toolchain and right qemu for the
version you want, which is not an easy task.

BTW, the PLCT Lab is working on setting an all-in-one developer
environment for unratified extensions, including Vector.

Currently QEMU and GNU Toolchain are available. Feel free to try it out:

We cherry-pick and rebase the B/K/P/V patches and merge them into one branch.

QEMU:
https://github.com/plctlab/plct-qemu/tree/new-machine-dev

gcc:
https://github.com/pz9115/riscv-gcc/tree/riscv-gcc-experimenal-v

binutils:
https://github.com/pz9115/riscv-binutils-gdb/tree/riscv-binutils-experimental-v

It is still under development, and I hope that the PLCT Lab might be
able to provide several online QEMU VMs for public access and
experiment in 2 weeks.

On Tue, Sep 14, 2021 at 12:06 AM Jim Wilson <jimw@...> wrote:
>
> On Sun, Sep 12, 2021 at 5:21 PM Mick Thomas Lim <mickthomaslim@...> wrote:
>>
>> A "Hello World" program compiled with riscv64-unknown-linux-gnu-gcc does work.
>> But we aren't seeing expected behavior when running the simple rvv_vadd.c program described here:
>> https://lists.riscv.org/g/tech-vector-ext/message/364
>
>
> There are many thousands of different incompatible draft versions of the vector spec.  If you don't have exactly matching versions of the compiler and qemu and libraries, it isn't going to work.  Unfortunately, it will continue to be difficult to work with the vector spec until they stop changing it in incompatible ways.
>
> The current vector work incidentally is in clang not gcc.  The gcc support may not be compatible with anything else as it hasn't been properly updated.
>
>> This is the qemu run command we're using for buildroot:
>> qemu-system-riscv64 -cpu rv64,x-v=true,vlen=256,elen=64,vext_spec=v0.7.1    -M virt -nographic    -bios output/images/fw_jump.elf    -kernel output/images/Image    -append "root=/dev/vda ro"    -drive file=output/images/rootfs.ext2,format=raw,id=hd0    -device virtio-blk-device,drive=hd0    -netdev user,id=net0 -device virtio-net-device,netdev=net0
>
>
> The v0.7.1 draft has been obsolete for about 2 years now.  That won't be useful.  Unless maybe you have Alibaba compilers as this is what Alibaba implemented in their SoCs.  Otherwise, you are better off with a v0.9x or v1.0 qemu.  I would expect to find patches for that on the qemu mailing list.  But another person pointed to a branch in a SiFive github tree that may work for you.
>
> Jim
>
>



--
Best wishes,
Wei Wu (吴伟)






Specify byte index/offset for Strided/Indexed instructions. Minor document improvement RISC-V "V" Vector Extension Version 1.0-rc1-20210608

Tony Cole
 

It might be worth updating sections:

7.5 Vector Strided Instructions
7.6 Vector Indexed Instructions

with the address calculations to specify the stride offsets and indexs and byte sized, rather than element sized.

I know this is specified in section 7.2 (Vector Load/Store Addressing Modes), but it is useful to have it where the instructions are specified as well (as I wasted time finding it!)


Clarification of Fractional LMUL requirements, and the storage/derivation of ELEN/SEWLMUL1MAX values

Krste Asanovic
 

Thanks for the suggestions.

I tried to clean up and clarify this section:

https://github.com/riscv/riscv-v-spec/commit/3cc98373f954df996c2d7973ef0fc38bc866f620

Krste

On Wed, 8 Sep 2021 15:25:39 -0700, "Gregory Kielian via lists.riscv.org" <gkielian=google.com@lists.riscv.org> said:
| Hi,
| Re-reading section 3.3.2 in the documentation (link), would like to propose adding goal, constraints, steps for implementing Fraction LMUL

| I think adding these would really help clarify both the VFLMUL idea and implementation. I've been having extensive discussions around this,
| re-reading this section a bunch, and thinking it would probably be good to add additional lines to the vspec.adoc to clarify the idea.

| Sharing my tentative understanding below (and some questions on ELEN and SEWLMUL1MAX), derived mainly from looking at the spike lmul checks and
| 3.3.2, curious as well if this captures the main intent of the fractional-lmul or there are aspects which are missing or equations require
| adjustment:

| • Goal clarification:

| □ Fractional LMUL allows the result of widening operations to be definitively contained within a single vector register.

| □ The advantage this provides seems (at least) two-fold

| ☆ Any register is usable for widening with fractional LMULs (opposed to integer LMUL can be used only for registers evenly divisible
| by the LMUL, e.g. v0, v8, v16, v24 for LMUL = 8).

| ☆ Related to above, less registers are locked down by the application of widening, reducing register availability bottlenecks and the
| needed number of stores/loads to-and-from memory.

| □ In order to ensure that the result of widening operations can be contained in a single register, there are certain constraints (see
| below)
| • Constraints:

| □ SEW <= ELEN*VFLMUL

| ☆ Example 1: ELEN = e32, VFLMUL= ⅛

| ○ SEW <= ELEN*VLMUL = 4, VFLMUL ⅛ illegal for ELEN e32

| ☆ Example 2: ELEN e32, VFLMUL = ¼

| ○ SEW <= ELEN*VFLMUL = 8, therefore SEW must be e8

| ☆ Example 2: ELEN e32, VFLMUL = ½ 

| ○ SEW <= ELEN*VLMUL = 16, therefore SEW must either e8, e16

| □ Note: For architectures where ELEN > SEWLMUL1MAX, one would go throught the same exercise as above but with s/ELEN/SEWLMUL1MAX.

| • Where to store/how-to-derive of ELEN and/or SEWLMUL1MAX:

| □ ELEN/SEWLMUL1MAX are not stored in CSR’s, ELEN may be derived from the extension:

| ☆ Example: ELEN = e32 for ZVE32x

| □ SEWLMUL1MAX storage/derivation questions (this particular one is unclear to me):

| ☆ If ELEN > SEWLMUL1MAX, how would one derive SEWLMUL1MAX from the ELEN?

| ☆ Where (e.g. any CSR) would the SEWLMUL1MAX be stored?

| ☆ Would this be derived from knowing the specific extension and perhaps the Vlenb and held in a special architecture specific
| register?

| Suggested edits for discussion: 

| • Adding SEW equation, possibly in mathematical notation, to clarify the policy

| • Adding some examples to clarify the policy

| • Adding goal/intent and advantages of using fractional-lmul vs lmul and vice versa

| Would be happy to contribute pull requests after confirming whether this understanding is correct, and clarifying questions about the
| SEWLMUL1MAX/ELEN derivation/storage.

| All the best,

| Gregory

|


Re: Configuring qemu for Vector Extension

Wei Wu (吴伟)
 

Hi Mick,

As Jim said, you may need the right toolchain and right qemu for the
version you want, which is not an easy task.

BTW, the PLCT Lab is working on setting an all-in-one developer
environment for unratified extensions, including Vector.

Currently QEMU and GNU Toolchain are available. Feel free to try it out:

We cherry-pick and rebase the B/K/P/V patches and merge them into one branch.

QEMU:
https://github.com/plctlab/plct-qemu/tree/new-machine-dev

gcc:
https://github.com/pz9115/riscv-gcc/tree/riscv-gcc-experimenal-v

binutils:
https://github.com/pz9115/riscv-binutils-gdb/tree/riscv-binutils-experimental-v

It is still under development, and I hope that the PLCT Lab might be
able to provide several online QEMU VMs for public access and
experiment in 2 weeks.

On Tue, Sep 14, 2021 at 12:06 AM Jim Wilson <jimw@sifive.com> wrote:

On Sun, Sep 12, 2021 at 5:21 PM Mick Thomas Lim <mickthomaslim@gmail.com> wrote:

A "Hello World" program compiled with riscv64-unknown-linux-gnu-gcc does work.
But we aren't seeing expected behavior when running the simple rvv_vadd.c program described here:
https://lists.riscv.org/g/tech-vector-ext/message/364

There are many thousands of different incompatible draft versions of the vector spec. If you don't have exactly matching versions of the compiler and qemu and libraries, it isn't going to work. Unfortunately, it will continue to be difficult to work with the vector spec until they stop changing it in incompatible ways.

The current vector work incidentally is in clang not gcc. The gcc support may not be compatible with anything else as it hasn't been properly updated.

This is the qemu run command we're using for buildroot:
qemu-system-riscv64 -cpu rv64,x-v=true,vlen=256,elen=64,vext_spec=v0.7.1 -M virt -nographic -bios output/images/fw_jump.elf -kernel output/images/Image -append "root=/dev/vda ro" -drive file=output/images/rootfs.ext2,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -netdev user,id=net0 -device virtio-net-device,netdev=net0

The v0.7.1 draft has been obsolete for about 2 years now. That won't be useful. Unless maybe you have Alibaba compilers as this is what Alibaba implemented in their SoCs. Otherwise, you are better off with a v0.9x or v1.0 qemu. I would expect to find patches for that on the qemu mailing list. But another person pointed to a branch in a SiFive github tree that may work for you.

Jim

--
Best wishes,
Wei Wu (吴伟)


回复:[RISC-V] [tech-vector-ext] Configuring qemu for Vector Extension

"刘志伟
 

Hi Mick,

The vector 0.7.1 version has been implemented in T-Head Xuantie c910v CPU and AllWinner D1 Soc.  If that's what you want or If you want to use the QEMU upstream currently, I can give you some advice.

The toolchian binary you can download from the link:

You can compile your program with -march rv64gcv -mabi=lp64dv.

The programming intrinsic is very similiar to the upstrem gcc and please see the attachment.

Thanks,
Zhiwei

------------------------------------------------------------------
发件人:Mick Thomas Lim <mickthomaslim@...>
发送时间:2021年9月13日(星期一) 08:24
收件人:tech-vector-ext <tech-vector-ext@...>
主 题:[RISC-V] [tech-vector-ext] Configuring qemu for Vector Extension

Does a known-good guide exist for building riscv64 qemu to be able to run Vector instructions?

From the main qemu repo, we are to build the riscv64-softmmu target and run the 64-bit Buildroot Image, as described here:

A "Hello World" program compiled with riscv64-unknown-linux-gnu-gcc does work.
But we aren't seeing expected behavior when running the simple rvv_vadd.c program described here:

For example: vl = vsetvl_e32m2 (n) seems to return 0.

This is the qemu run command we're using for buildroot:
qemu-system-riscv64 -cpu rv64,x-v=true,vlen=256,elen=64,vext_spec=v0.7.1    -M virt -nographic    -bios output/images/fw_jump.elf    -kernel output/images/Image    -append "root=/dev/vda ro"    -drive file=output/images/rootfs.ext2,format=raw,id=hd0    -device virtio-blk-device,drive=hd0    -netdev user,id=net0 -device virtio-net-device,netdev=net0

Assistance would much appreciated!

Sincerely,
Mick


Re: Configuring qemu for Vector Extension

Jim Wilson
 

On Sun, Sep 12, 2021 at 5:21 PM Mick Thomas Lim <mickthomaslim@...> wrote:
A "Hello World" program compiled with riscv64-unknown-linux-gnu-gcc does work.
But we aren't seeing expected behavior when running the simple rvv_vadd.c program described here:

There are many thousands of different incompatible draft versions of the vector spec.  If you don't have exactly matching versions of the compiler and qemu and libraries, it isn't going to work.  Unfortunately, it will continue to be difficult to work with the vector spec until they stop changing it in incompatible ways. 

The current vector work incidentally is in clang not gcc.  The gcc support may not be compatible with anything else as it hasn't been properly updated.

This is the qemu run command we're using for buildroot:
qemu-system-riscv64 -cpu rv64,x-v=true,vlen=256,elen=64,vext_spec=v0.7.1    -M virt -nographic    -bios output/images/fw_jump.elf    -kernel output/images/Image    -append "root=/dev/vda ro"    -drive file=output/images/rootfs.ext2,format=raw,id=hd0    -device virtio-blk-device,drive=hd0    -netdev user,id=net0 -device virtio-net-device,netdev=net0

The v0.7.1 draft has been obsolete for about 2 years now.  That won't be useful.  Unless maybe you have Alibaba compilers as this is what Alibaba implemented in their SoCs.  Otherwise, you are better off with a v0.9x or v1.0 qemu.  I would expect to find patches for that on the qemu mailing list.  But another person pointed to a branch in a SiFive github tree that may work for you.

Jim


Re: Configuring qemu for Vector Extension

Tony Cole
 

Hi Mick,

 

I use the RISC-V Vector QEMU branch from SiFive (for 32-bit, don’t know about 64-bit support though):

 

https://github.com/sifive/qemu/tree/rvv-1.0-upstream-v7-fix

 

Also, there may be a later version.

 

 

Follow the building instructions in the README.rst.

 

 

Hope this helps.

 

 

Tony

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Mick Thomas Lim
Sent: 13 September 2021 01:21
To: tech-vector-ext@...
Subject: [RISC-V] [tech-vector-ext] Configuring qemu for Vector Extension

 

Does a known-good guide exist for building riscv64 qemu to be able to run Vector instructions?

 

From the main qemu repo, we are to build the riscv64-softmmu target and run the 64-bit Buildroot Image, as described here:

 

A "Hello World" program compiled with riscv64-unknown-linux-gnu-gcc does work.

But we aren't seeing expected behavior when running the simple rvv_vadd.c program described here:

 

For example: vl = vsetvl_e32m2 (n) seems to return 0.

 

This is the qemu run command we're using for buildroot:
qemu-system-riscv64 -cpu rv64,x-v=true,vlen=256,elen=64,vext_spec=v0.7.1    -M virt -nographic    -bios output/images/fw_jump.elf    -kernel output/images/Image    -append "root=/dev/vda ro"    -drive file=output/images/rootfs.ext2,format=raw,id=hd0    -device virtio-blk-device,drive=hd0    -netdev user,id=net0 -device virtio-net-device,netdev=net0

 

Assistance would much appreciated!

 

Sincerely,

Mick


Configuring qemu for Vector Extension

Mick Thomas Lim
 

Does a known-good guide exist for building riscv64 qemu to be able to run Vector instructions?

From the main qemu repo, we are to build the riscv64-softmmu target and run the 64-bit Buildroot Image, as described here:

A "Hello World" program compiled with riscv64-unknown-linux-gnu-gcc does work.
But we aren't seeing expected behavior when running the simple rvv_vadd.c program described here:

For example: vl = vsetvl_e32m2 (n) seems to return 0.

This is the qemu run command we're using for buildroot:
qemu-system-riscv64 -cpu rv64,x-v=true,vlen=256,elen=64,vext_spec=v0.7.1    -M virt -nographic    -bios output/images/fw_jump.elf    -kernel output/images/Image    -append "root=/dev/vda ro"    -drive file=output/images/rootfs.ext2,format=raw,id=hd0    -device virtio-blk-device,drive=hd0    -netdev user,id=net0 -device virtio-net-device,netdev=net0

Assistance would much appreciated!

Sincerely,
Mick


Clarification of Fractional LMUL requirements, and the storage/derivation of ELEN/SEWLMUL1MAX values

Gregory Kielian
 

Hi,


Re-reading section 3.3.2 in the documentation (link), would like to propose adding goal, constraints, steps for implementing Fraction LMUL


I think adding these would really help clarify both the VFLMUL idea and implementation. I've been having extensive discussions around this, re-reading this section a bunch, and thinking it would probably be good to add additional lines to the vspec.adoc to clarify the idea.


Sharing my tentative understanding below (and some questions on ELEN and SEWLMUL1MAX), derived mainly from looking at the spike lmul checks and 3.3.2, curious as well if this captures the main intent of the fractional-lmul or there are aspects which are missing or equations require adjustment:


  • Goal clarification:

    • Fractional LMUL allows the result of widening operations to be definitively contained within a single vector register.

    • The advantage this provides seems (at least) two-fold

      • Any register is usable for widening with fractional LMULs (opposed to integer LMUL can be used only for registers evenly divisible by the LMUL, e.g. v0, v8, v16, v24 for LMUL = 8).

      • Related to above, less registers are locked down by the application of widening, reducing register availability bottlenecks and the needed number of stores/loads to-and-from memory.

    • In order to ensure that the result of widening operations can be contained in a single register, there are certain constraints (see below)
  • Constraints:

    • SEW <= ELEN*VFLMUL

      • Example 1: ELEN = e32, VFLMUL= ⅛

        • SEW <= ELEN*VLMUL = 4, VFLMUL ⅛ illegal for ELEN e32

      • Example 2: ELEN e32, VFLMUL = ¼

        • SEW <= ELEN*VFLMUL = 8, therefore SEW must be e8

      • Example 2: ELEN e32, VFLMUL = ½ 

        • SEW <= ELEN*VLMUL = 16, therefore SEW must either e8, e16

    • Note: For architectures where ELEN > SEWLMUL1MAX, one would go throught the same exercise as above but with s/ELEN/SEWLMUL1MAX.

  • Where to store/how-to-derive of ELEN and/or SEWLMUL1MAX:

    • ELEN/SEWLMUL1MAX are not stored in CSR’s, ELEN may be derived from the extension:

      • Example: ELEN = e32 for ZVE32x

    • SEWLMUL1MAX storage/derivation questions (this particular one is unclear to me):

      • If ELEN > SEWLMUL1MAX, how would one derive SEWLMUL1MAX from the ELEN?

      • Where (e.g. any CSR) would the SEWLMUL1MAX be stored?

      • Would this be derived from knowing the specific extension and perhaps the Vlenb and held in a special architecture specific register?


Suggested edits for discussion: 

  • Adding SEW equation, possibly in mathematical notation, to clarify the policy

  • Adding some examples to clarify the policy

  • Adding goal/intent and advantages of using fractional-lmul vs lmul and vice versa


Would be happy to contribute pull requests after confirming whether this understanding is correct, and clarifying questions about the SEWLMUL1MAX/ELEN derivation/storage.


All the best,

Gregory


Multiple accesses required to the same location for strided memory accesses

Bill Huffman
 

I see that section 7.5 of the vector spec currently says:

 

When rs2=x0, then an implementation is allowed, but not required, to perform fewer memory operations than the number of active elements, and may perform different numbers of memory operations across different dynamic executions of the same static instruction.

 

Note Compilers must be aware to not use the x0 form for rs2 when the immediate stride is 0 if the intent to is to require all memory accesses are performed.

 

When rs2!=x0 and the value of x[rs2]=0, the implementation must perform one memory access for each active element (but these accesses will not be ordered).

 

Note When repeating ordered vector accesses to the same memory address are required, then an ordered indexed operation can be used.

 

I’m not sure from reading this whether strided accesses that overlap are required to read the memory location multiple times.  The first three paragraphs sound like they are.  The fourth paragraph (the note) sounds like they are not – if one wants multiple accesses of the same memory location, one should use an ordered indexed operation (with constant index).

 

I thought we had said that the ordered indexed operations were the only ones that were constrained to access memory as many times as the naïvely interpreted instruction said.  That seems to mean the first three paragraphs should be changed.

 

It would seem quite unfortunate to require strided memory operations to be special cased for zero stride (but not x0).  If so, we also need to say what happens for positive and negative strides with absolute value less then the element size being accessed – or, for segmented accesses, less than the multiple segment size.

 

If strided, segmented loads where the stride is one segment are required to do multiple accesses, that would be even more unfortunate as it would keep them from being used efficiently for stencil operations.

 

      Bill


Re: Zve should be a strict subset of V, use new option to relax VLEN

ghost
 

* I wonder if there could be a table for what Zve* and V include in instructions with both YES and NO on appropriate lines of the table for what each extension must include. The statements, understandably, include what instructions are supported. But I have trouble with the process-of-elimination to see what's not included as well as with the comparison of Zve to V.
FYI, I'm working with Elisa Sawyer and others on a style and content guide for extension proposals, mostly based on the excellent bitmanip v1.0.0-rc1 draft. That draft includes a table like the one you're suggesting, and I'm advocating putting that in the guide as a recommendation.

--

L Peter Deutsch <ghost@major2nd.com> :: Aladdin Enterprises :: Healdsburg, CA

Was your vote really counted? http://www.verifiedvoting.org


Re: Zve should be a strict subset of V, use new option to relax VLEN

Bill Huffman
 

Hi Krste,

The descriptions of Zvl* look good. A couple of comments:

* The V description says vector length greater than or equal to 128. Should it instead refer to Zvl128b?

* I wonder if there could be a table for what Zve* and V include in instructions with both YES and NO on appropriate lines of the table for what each extension must include. The statements, understandably, include what instructions are supported. But I have trouble with the process-of-elimination to see what's not included as well as with the comparison of Zve to V.

Bill

-----Original Message-----
From: tech-vector-ext@lists.riscv.org <tech-vector-ext@lists.riscv.org> On Behalf Of Krste Asanovic
Sent: Thursday, July 15, 2021 3:03 AM
To: Bill Huffman <huffman@cadence.com>
Cc: Guy Lemieux <guy.lemieux@gmail.com>; tech-vector-ext@lists.riscv.org
Subject: Re: [RISC-V] [tech-vector-ext] Zve should be a strict subset of V, use new option to relax VLEN

EXTERNAL MAIL



I added vector length extensions to spec.

Please review,

Krste

On Mon, 12 Jul 2021 14:31:07 +0000, "Bill Huffman" <huffman@cadence.com> said:
| Hello Guy,
| It definitely would be good for Zve to be a strict subset of V. I
| think that means the same thing as that any binary that runs on Zve will run correctly on Z. But I’m not seeing how any code that runs on a Zve compliant core with VLEN < 128 will fail to run on V. Do you have an example? Is there a further relaxation that I’m not thinking about besides VLEN < 128?

| Separately, I don’t think we can add an option that restricts. All code that will run without an option should run with the option. But I think V may be able to be a superset without that.

| Bill

| From: tech-vector-ext@lists.riscv.org
| <tech-vector-ext@lists.riscv.org> On Behalf Of Guy Lemieux
| Sent: Monday, July 12, 2021 7:29 AM
| To: tech-vector-ext@lists.riscv.org
| Subject: [RISC-V] [tech-vector-ext] Zve should be a strict subset of
| V, use new option to relax VLEN

| EXTERNAL MAIL

| Hi,

| The way 18.1 and 18.2 currently read in the V spec is a bit confusing.

| It defines Zve as "Vector extensions for Embedded Processors", and V as a "Vector Extension for Application Processor".

| 1) Processors vs Processor?

| 2) It appears the Zve extension relaxes VLEN rules which are not supported by V. This appears to be the only change that prevents Zve from being a strict subset of V.

| 3) I think Zve should be a strict subset of V. The relaxation of VLEN
| should be an option that can be added to Zve or V. Some AP may wish to remain code-compatible with Zve, and this will make things more clear. This will clarify code generation and aid compatibility. Perhaps call this option Zvlen?

| 4) If there are other differences that I've missed in (3), they should be similarly separated.

| Thank you,

| Guy

|


Re: Zve should be a strict subset of V, use new option to relax VLEN

Krste Asanovic
 

I added vector length extensions to spec.

Please review,

Krste

On Mon, 12 Jul 2021 14:31:07 +0000, "Bill Huffman" <huffman@cadence.com> said:
| Hello Guy,
| It definitely would be good for Zve to be a strict subset of V. I think that means the same thing as that any binary that runs on Zve will run correctly on Z. But I’m not seeing how any code that
| runs on a Zve compliant core with VLEN < 128 will fail to run on V. Do you have an example? Is there a further relaxation that I’m not thinking about besides VLEN < 128?

| Separately, I don’t think we can add an option that restricts. All code that will run without an option should run with the option. But I think V may be able to be a superset without that.

| Bill

| From: tech-vector-ext@lists.riscv.org <tech-vector-ext@lists.riscv.org> On Behalf Of Guy Lemieux
| Sent: Monday, July 12, 2021 7:29 AM
| To: tech-vector-ext@lists.riscv.org
| Subject: [RISC-V] [tech-vector-ext] Zve should be a strict subset of V, use new option to relax VLEN

| EXTERNAL MAIL

| Hi,

| The way 18.1 and 18.2 currently read in the V spec is a bit confusing.

| It defines Zve as "Vector extensions for Embedded Processors", and V as a "Vector Extension for Application Processor".

| 1) Processors vs Processor?

| 2) It appears the Zve extension relaxes VLEN rules which are not supported by V. This appears to be the only change that prevents Zve from being a strict subset of V.

| 3) I think Zve should be a strict subset of V. The relaxation of VLEN should be an option that can be added to Zve or V. Some AP may wish to remain code-compatible with Zve, and this will make
| things more clear. This will clarify code generation and aid compatibility. Perhaps call this option Zvlen?

| 4) If there are other differences that I've missed in (3), they should be similarly separated.

| Thank you,

| Guy

|


Vector TG Meeting Minutes 2021/07/09

Krste Asanovic
 

Date: 2021/07/09
Task Group: Vector Extension
Chair: Krste Asanovic
Vice-Chair: Roger Espasa
Number of Attendees: ~12
Current issues on github: https://github.com/riscv/riscv-v-spec

We had a short meeting to uncover any remaining issues with the spec
before public review. There were no new issues raised.

We discussed mechanics of public review process. The spec will have
final edits made to close of remaining clarifications then be sent
out for public review.


Re: Zve should be a strict subset of V, use new option to relax VLEN

Bill Huffman
 

Hello Guy,

 

It definitely would be good for Zve to be a strict subset of V.  I think that means the same thing as that any binary that runs on Zve will run correctly on Z.  But I’m not seeing how any code that runs on a Zve compliant core with VLEN < 128 will fail to run on V.  Do you have an example?  Is there a further relaxation that I’m not thinking about besides VLEN < 128?

 

Separately, I don’t think we can add an option that restricts.  All code that will run without an option should run with the option.  But I think V may be able to be a superset without that.

 

      Bill

 

From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of Guy Lemieux
Sent: Monday, July 12, 2021 7:29 AM
To: tech-vector-ext@...
Subject: [RISC-V] [tech-vector-ext] Zve should be a strict subset of V, use new option to relax VLEN

 

EXTERNAL MAIL

Hi,

 

The way 18.1 and 18.2 currently read in the V spec is a bit confusing.

 

It defines Zve as "Vector extensions for Embedded Processors", and V as a "Vector Extension for Application Processor".

 

1) Processors vs Processor?

 

2) It appears the Zve extension relaxes VLEN rules which are not supported by V. This appears to be the only change that prevents Zve from being a strict subset of V.

 

3) I think Zve should be a strict subset of V.  The relaxation of VLEN should be an option that can be added to Zve or V. Some AP may wish to remain code-compatible with Zve, and this will make things more clear. This will clarify code generation and aid compatibility. Perhaps call this option Zvlen?

 

4) If there are other differences that I've missed in (3), they should be similarly separated.

 

Thank you,

Guy


Zve should be a strict subset of V, use new option to relax VLEN

Guy Lemieux
 

Hi,

The way 18.1 and 18.2 currently read in the V spec is a bit confusing.

It defines Zve as "Vector extensions for Embedded Processors", and V as a "Vector Extension for Application Processor".

1) Processors vs Processor?

2) It appears the Zve extension relaxes VLEN rules which are not supported by V. This appears to be the only change that prevents Zve from being a strict subset of V.

3) I think Zve should be a strict subset of V.  The relaxation of VLEN should be an option that can be added to Zve or V. Some AP may wish to remain code-compatible with Zve, and this will make things more clear. This will clarify code generation and aid compatibility. Perhaps call this option Zvlen?

4) If there are other differences that I've missed in (3), they should be similarly separated.

Thank you,
Guy


Re: Vector TG Meeting tomorrow

David Horner
 

I apparently missed the meeting that I thought was at noon eastern.

There are of course the remaining open for v1.0 issues.

I gather what was discussed was if we could reasonably move to public review without finalizing all of them.

To the extent that addendums could be added, such as the table of "reserved" equivalent instructions, the usage section and the prior art section I agree that these could be added in parallel with the review.
The other items I mentioned... imprecise intent wording... and the explanation of the encoding decisions.... both of these I think warrant a delay to have them in place for public consumption.

On Fri, Jul 9, 2021, 11:10 mark, <markhimelstein@...> wrote:
just  to qualify, I think we are talking about RVA22 (application target) and not RVM22 (microcontroller target).

On Fri, Jul 9, 2021 at 8:07 AM Jan Wassenberg via lists.riscv.org <janwas=google.com@...> wrote:
Mentioned by Krste in the meeting: processor profile already requires VLEN >= 128.

On Fri, Jul 9, 2021 at 2:51 PM Jan Wassenberg via lists.riscv.org <janwas=google.com@...> wrote:
A topic to discuss: lower bound on VLEN.

The upper bound is helpful but even VL-agnostic code sometimes wants at least 128 bits.
Example: N parallel instances of AES (16 bytes each), or N 128-bit results from 64x64 normal or carryless multiplication.

We can get this already (assuming SEW_LMUL1MAX = 64) by setting LMUL=2, but it seems like a poor tradeoff that
software should halve the number of registers/groups, just so that hardware could theoretically have single-element vectors.

Can we mandate VLEN >= 2*SEW_LMUL1MAX, perhaps in a profile? That would help software :)

BTW, are we intending to have the same binaries work on different implementations? It seems the only way to discover SEW_LMUL1MAX
is to try various SEW/LMUL and check for vill. Because LMUL is baked into the intrinsic function name,
software that wants portable binaries would have to recompile all vector code for LMUL=1,2,4,8, and then
pick the first one that works.

That's very burdensome, a profile guaranteeing SEW_LMUL1MAX = 64 or at least LMUL2MAX = 64 would also help a lot.

On Fri, Jul 9, 2021 at 6:58 AM Krste Asanovic <krste@...> wrote:
We’ll meet tomorrow to see if there are any remaining concerns before going Into public review,
Krste







Re: Vector TG Meeting tomorrow

mark
 

just  to qualify, I think we are talking about RVA22 (application target) and not RVM22 (microcontroller target).

On Fri, Jul 9, 2021 at 8:07 AM Jan Wassenberg via lists.riscv.org <janwas=google.com@...> wrote:
Mentioned by Krste in the meeting: processor profile already requires VLEN >= 128.

On Fri, Jul 9, 2021 at 2:51 PM Jan Wassenberg via lists.riscv.org <janwas=google.com@...> wrote:
A topic to discuss: lower bound on VLEN.

The upper bound is helpful but even VL-agnostic code sometimes wants at least 128 bits.
Example: N parallel instances of AES (16 bytes each), or N 128-bit results from 64x64 normal or carryless multiplication.

We can get this already (assuming SEW_LMUL1MAX = 64) by setting LMUL=2, but it seems like a poor tradeoff that
software should halve the number of registers/groups, just so that hardware could theoretically have single-element vectors.

Can we mandate VLEN >= 2*SEW_LMUL1MAX, perhaps in a profile? That would help software :)

BTW, are we intending to have the same binaries work on different implementations? It seems the only way to discover SEW_LMUL1MAX
is to try various SEW/LMUL and check for vill. Because LMUL is baked into the intrinsic function name,
software that wants portable binaries would have to recompile all vector code for LMUL=1,2,4,8, and then
pick the first one that works.

That's very burdensome, a profile guaranteeing SEW_LMUL1MAX = 64 or at least LMUL2MAX = 64 would also help a lot.

On Fri, Jul 9, 2021 at 6:58 AM Krste Asanovic <krste@...> wrote:
We’ll meet tomorrow to see if there are any remaining concerns before going Into public review,
Krste







Re: Vector TG Meeting tomorrow

Jan Wassenberg
 

Mentioned by Krste in the meeting: processor profile already requires VLEN >= 128.


On Fri, Jul 9, 2021 at 2:51 PM Jan Wassenberg via lists.riscv.org <janwas=google.com@...> wrote:
A topic to discuss: lower bound on VLEN.

The upper bound is helpful but even VL-agnostic code sometimes wants at least 128 bits.
Example: N parallel instances of AES (16 bytes each), or N 128-bit results from 64x64 normal or carryless multiplication.

We can get this already (assuming SEW_LMUL1MAX = 64) by setting LMUL=2, but it seems like a poor tradeoff that
software should halve the number of registers/groups, just so that hardware could theoretically have single-element vectors.

Can we mandate VLEN >= 2*SEW_LMUL1MAX, perhaps in a profile? That would help software :)

BTW, are we intending to have the same binaries work on different implementations? It seems the only way to discover SEW_LMUL1MAX
is to try various SEW/LMUL and check for vill. Because LMUL is baked into the intrinsic function name,
software that wants portable binaries would have to recompile all vector code for LMUL=1,2,4,8, and then
pick the first one that works.

That's very burdensome, a profile guaranteeing SEW_LMUL1MAX = 64 or at least LMUL2MAX = 64 would also help a lot.

On Fri, Jul 9, 2021 at 6:58 AM Krste Asanovic <krste@...> wrote:
We’ll meet tomorrow to see if there are any remaining concerns before going Into public review,
Krste







Re: Vector TG Meeting tomorrow

Jan Wassenberg
 

A topic to discuss: lower bound on VLEN.

The upper bound is helpful but even VL-agnostic code sometimes wants at least 128 bits.
Example: N parallel instances of AES (16 bytes each), or N 128-bit results from 64x64 normal or carryless multiplication.

We can get this already (assuming SEW_LMUL1MAX = 64) by setting LMUL=2, but it seems like a poor tradeoff that
software should halve the number of registers/groups, just so that hardware could theoretically have single-element vectors.

Can we mandate VLEN >= 2*SEW_LMUL1MAX, perhaps in a profile? That would help software :)

BTW, are we intending to have the same binaries work on different implementations? It seems the only way to discover SEW_LMUL1MAX
is to try various SEW/LMUL and check for vill. Because LMUL is baked into the intrinsic function name,
software that wants portable binaries would have to recompile all vector code for LMUL=1,2,4,8, and then
pick the first one that works.

That's very burdensome, a profile guaranteeing SEW_LMUL1MAX = 64 or at least LMUL2MAX = 64 would also help a lot.

On Fri, Jul 9, 2021 at 6:58 AM Krste Asanovic <krste@...> wrote:
We’ll meet tomorrow to see if there are any remaining concerns before going Into public review,
Krste






81 - 100 of 761