Re: [RISC-V][tech-vector-ext] Intrinsics for vector programming in C.

Chih-Mao Chen


We (Andes) have also been working on RVV intrinsics with our lead customers for a while, and we appreciate that you published this RFC to spur discussion about vector intrinsics.

After reviewing the RFC, we found it's very similar to our approach. You have some good ideas which we don’t have, but we would also like and present some key benefits in our approach for discussion. Hopefully, together we can converge into a better RVV intrinsic standard:

  • Our vector types also encode SEW and LMUL into the names, but they have a slightly shorter form, e.g. vi32m1_t, vf64m1_t, which is more consistent with the naming of intrinsic functions.

  • We also encode MLEN into vector mask types but they are named vmaskN_t instead of vboolN_t, and we also provide some user-friendly typedefs so users can derive the mask type from the vector type, e.g.:

    • typedef vmask8_t vmask_i8m1_t;
    • typedef vmask8_t vmask_i16m2_t;
  • Our intrinsic functions are overloaded over different vtype whenever possible, which roughly corresponds to the _Generic interface in the RFC.

    • For instructions executed under a mask, it is overloaded under the same name by providing a mask argument and an optional maskedoff arguments after the operands. If the maskedoff argument is not given, masked off elements have undefined values.

    • Both implicit and explicit VL passing are supported. For the latter case, intrinsics accept an optional avl argument, and VL is not changed if avl is not provided.

    • We try to overload intrinsics that are similar in semantics whenever possible. For example, both vmin and vminu are overloaded as vmin, as the extra u suffix to denote signedness, which is significant at instruction-level, has already been encoded in the input types.

    • Based on the previous points, an intrinsic function has the following forms:

    • vop(vs1, vs2)

    • vop(vs1, vs2, mask)

    • vop(vs1, vs2, mask, maskedoff)

    • vop(vs1, vs2, avl)

    • vop(vs1, vs2, mask, avl)

    • vop(vs1, vs2, mask, maskedoff, avl)

  • Our intrinsic for vsetvl takes three arguments: avl, sew and lmul, instead of providing functions with different vtype. This retains the flexibility where programmers can choose vtype based on a runtime value, and compilers can optimize vsetvl with constant sew and lmul arguments to vsetvli instruction.

Now, onto the RFC itself:

  • It seems that many of the exceptions in naming are caused by the fact that functions are encoded by their return type, and some intrinsics can produce the same output type under different vtype settings. If the function is encoded by the vtype when the instruction is executed, the naming scheme could be simplified. It is also more consistent that an intrinsic's type suffix must be the same as the matching vsetvl (which is not the case for widening intrinsics in the RFC).

  • The non-overloaded vmadc_{vv,vx,vvm,vxm} functions in the RFC also needs to encode the input type, e.g. vmadc_vvm_i8m1_v8. Again, this exception would not be needed if intrinsics are encoded by current vtype instead of return type.

  • vwmul_{vv,vx}_u* intrinsics should have be renamed to vwmulu to match the instruction mnemonic.

  • In the C11 generic version, vmv_v_x, vfmv_v_f cannot be overloaded based on the scalar type alone, since it could be splatted into vector types with different LMUL.

  • The RFC says that configuration setting intrinsics should return an opaque type _VL_T. This is weird as the user must obtain a _VL_T value from vsetvl, which when used by explicit VL setting intrinsics will be set by a vsetvl again (unless removed by a compiler pass). As a low-level interface, it should just return a size_t and leave the VL abstraction to higher-level programming models.

  • The RFC talks about types and functions for segment load/stores, but such functions are missing from the intrinsics list.

  • vcopy and vsplat are mentioned but are not in the function list. Also, is it necessary to provide the vcopy intrinsics when users can just assign vectors with = operator?

  • Both vmacc and vmadd instructions are merged into a single vma intrinsic in the RFC. However, they are still independently listed in the function list. Also, should vnmsac and vnmsub also be merged?

  • Both vzero and vundefined shows prototype with the non-existent vfloat8m*_t type.

  • The sample code shows load/store by dereferencing pointers to vector types. If those operations are supported, it should be documented in the RFC.

  • The name of the intrinsic header file (the sample code uses riscv_header.h) also needs to be documented.

  • Finally, the RFC says that it is based on the v0.8 version of RVV specification, but it seems that some changes in v0.9 have slipped through, e.g. the removal of widening/narrowing load/stores and the introduction of vfslide1up. It would be easier to track and reference if it is pinned to one specific version.

On Fri, May 8, 2020 at 12:15 AM, Kai Wang wrote: > Hi, > > > We, EPI, SiPearl, and SiFive, have come out with a RFC for vector > intrinsics. Although there are still some issues under discussion, we think > it is time to publish the document to collect more feedback from the > community. You could access the documents from the github repository[0]. > > > In this RFC[1], we defined the type system, programming interface and > naming rules for vector intrinsics in the C language. > > > Currently, there are a few issues[2] under discussion. > > 1. With or without vl argument in the intrinsic interface. > 2. C operators for scalable vector types. > 3. Vector types for segment load/store. > 4. Fractional LMUL representation. > > We need your opinions and feedback about these issues. Besides these issues, > welcome any feedback about vector intrinsic design. > > > [0] > > [1] > > > [2] >

Join { to automatically receive all group messages.