Re: GNU toolchain with RVV intrinsic support

David Horner

Thank you very much for this advancement.
I have two concerns, in the body is a response.

On 2020-08-21 9:34 a.m., Kito Cheng wrote:
I am pleased to announce that our/SiFive's RVV intrinsic enabled GCC are open-sourced now.

We put the sources on riscv's github, and the RVV intrinsics have been integrated in the riscv-gnu-toolchain, so you can build the RVV intrinsic enabled GNU toolchain as usual.

 $ git clone git@...:riscv/riscv-gnu-toolchain.git -b rvv-intrinsic
 $ <path-to-riscv-gnu-toolchain>/configure --with-arch=rv64gcv_zfh --prefix=<INSTALL-PATH>
 $ make newlib build-qemu
 $ cat rvv_vadd.c
> #include <riscv_vector.h>
> #include <stdio.h>
> void vec_add_rvv
Shouldn't this be vec_add32_rvv ? It is not a generalized vector add.
(int *a, int *b, int *c, size_t n) {
>   size_t vl;
>   vint32m2_t va, vb, vc;
>   for (;vl = vsetvl_e32m2 (n);n -= vl) {
>     vb = vle32_v_i32m2 (b);
>     vc = vle32_v_i32m2 (c);
>     va = vadd_vv_i32m2 (vb, vc);
>     vse32_v_i32m2 (a, va);
>     a += vl;
The vector pointer should be advanced by vl * 32.
(I originally thought the vl = vsetvl may have done the by 32 scaling and that n was in bytes,
but I have now convinced myself that the problem is likely the pointer advance,
 and the VLEN is  at least 256 so only one pass of the loop for the below test case.)
>     b += vl;
>     c += vl;
>   }
> }
> int x[10] = {1,2,3,4,5,6,7,8,9,0};
> int y[10] = {0,9,8,7,6,5,4,3,2,1};
> int z[10];
> int main()
> {
>   int i;
>   vec_add_rvv(z, x, y, 10);

>   for (i=0; i<10; i++)
>     printf ("%d ", z[i]);
>   printf("\n");
>   return 0;
> }

 $ riscv64-unknown-elf-gcc rvv_vadd.c -O2
 $ qemu-riscv64 -cpu rv64,x-v=true,vlen=256,elen=64,vext_spec=v1.0 a.out

It is verified with our internal testsuite and several internal projects, however this project is still a work in progress, and we intend to improve the work continually. Feedback and bug reports are welcome, as well as contributions and pull-requests.

Current status:
- Implement ~95% RVV intrinsic function listed in the intrinsic spec (
- FP16 supported for both vector and scalar.
  - fp16 uses __fp16 temporally, this might change in future.
- Fractional LMUL is not implemented yet.
- RV32 is not well supported for scalar-vector operations with SEW=64.
- Function call with vector type is not well supported yet, arguments will be passed/returned in memory in current implementation.
- *NO* auto vectorization support.

