Re: Vector TG minutes for 2020/12/18 meeting


Zalman Stern
 

Does it get easier if the specification is just the immediate value plus one?

I really don't understand how this encoding is particularly great for immediates as many of the values are likely very rarely or even never used and it seems like one can't get long enough values even for existing SIMD hardware in some data types. Compare to e.g.:
    (first_bit ? 3 : 1) << rest_of_the_bits
or:
    map[] = { 1, 3, 5, 8 }; // Or maybe something else for 5 and 8
    map[first_two_bits] << rest_of_the_bits;

I.e. get a lot of powers of two, multiples of three-vecs for graphics, maybe something else.

-Z-


On Mon, Dec 21, 2020 at 10:47 AM Guy Lemieux <guy.lemieux@...> wrote:
for vsetivli, with the uimm=00000 encoding, rather than setting vl to 32, how setting it to some other meaning?

one option is to set vl=VLMAX. i have some concerns about software using this safely (eg, if VLMAX turns out to be much larger than software anticipated, then it would fail; correcting this requires more instructions than just using the regular vsetvl/vsetvli would have used). 

another option is to allow an implementation-defined vl to be chosen by hardware; this could be anywhere between 1 and VLMAX. for example, implementations may just choose vl=32, or they may choose something else. it allows the CPU architect to devise a scheme that best fits the implementation. this may consider factors like the effective width of the execution engine, the pipeline depth (to reduce likelihood of stalls from dependent instructions), or that the vector register file is actually a multi-level memory hierarchy where some smaller values may operate with greater efficiency (lower power), or matching VL to the optimal memory system burst length. perhaps some guidance by the spec could be given here for the default scheme, eg whether the implementation optimizes for best performance or power (while still allowing implementations to modify this default via an implementation-defined CSR). software using a few extra cycles to check the returned vl against AVL should not a big problem (the simplest solution being vsetvli followed by vsetivli)

g


On Fri, Dec 18, 2020 at 6:13 PM Krste Asanovic <krste@...> wrote:

# vsetivli

A new variant of vsetvl was proposed providing an immediate as the AVL
in rs1[4:0].  The immediate encoding is the same as for CSR immediate
instructions. The instruction would have bit 31:30 = 11 and bits 29:20
would be encoded same as vsetvli.

This would be used when AVL was statically known, and known to fit
inside vector register group.  Compared with existing PoR, it removes
need to load immediate into a spare scalar register before executing
vsetvli, and is useful for handling scalar values in vector register
(vl=1) and other cases where short fixed-sized vectors are the
datatype (e.g., graphics).

There was discussion on whether uimm=00000 should represent 32 or be
reserved.  32 is more useful, but adds a little complexity to
hardware.

There was also discussion on whether instruction should set vill if
selected AVL is not supported, or whether should clip vl to VLMAX as
with other instructions, or if behavior should be reserved.  Group
generally favored writing vill to expose software errors.

Join tech-vector-ext@lists.riscv.org to automatically receive all group messages.