In trying to make SEW level interleave by augmenting the instruction set (including casting),
I have a few observations.
- arithmetic operators need to function at a given SEW and there is no in-memory form requirement.
- exploitation of in-memory form in SEW * n elements is substantially (if not completely)
of SEW level bit-wise operations and/or/xor/move/load and shift and
SEW level masking (not SEW * n).
That use of in-memory form for an algorithm can be identified and provided only when needed.
That the need for such instructions will be neither statically nor dynamically frequent in common code.
That the gather of SEW level elements to build a SEW * n result is not prohibitively expensive for the "only when needed" and infrequent aspects of the algorithm.
If these are true, then we can provide augmented forms of the bitwise and shift instructions that
source a SEW level set of n consecutive elements and
if another vector source is needed either
another such SEW level set of n consecutive elements or
a SEW * n element and
stores a SEW * n element with the operator applied by each SEW level element in turn, under the mask at the SEW level.
The total number of SEW elements processed is determined by vl.
Lets say the value in vl is required to be a multiple of n, for now.
The two needed data elements are
- n (the aggregate level for the target) (lets call it inmemn)
2 bits would appear necessary for XLEN=64,
with derived values of values 1 (standard operation),2,4,8 (allowing byte to double).
However,3 bits would additionally allow factors of 16 through 128 which might be useful for encryption)
- single bit indicator whether second source two is level SEW or SEW * n . (lets call it inmem2)
If these were incorporated into the bitwise/shift variants the opcodes would be increased by a minimum of 3 bits.
It would appear the vtype opcode compression method should be leveraged again.
These two "parameters", inmem2 and inmemn, could be included in vtype as a persistent modifier.
However, it is fully conceivable that most of the data masaging cam be done at the SEW level with only the last operation required to pace it in a SEW * n destination.
This would be a good justification to allow a transient form of the vmodinstr prefix. Issue #423 - additional instructions to set vtype fields.
Note, neither of these changes the vl of these instructions. And further, the execution of these is expected to be infrequent (one of the postulates).
Therefore it is a candidate for the alternate vmodtype instruction, rather than further vsetvli immediate bit use. (also issue #423)
On 2020-04-27 3:56 p.m., Bill Huffman wrote:
I expect the SEW level sourcing of the augmented bitwise/shift could readily use this path via a passthrough to the execution units
OK. That's also quite do-able. Physical layout and control issuesyes it was.
I don't expect it would be well accepted by a purchaser that their 4096 bit vector accelerator is 256 bit brain damaged by what are infrequent but "essential" in-memory mapped transforms.and then support the SLEN=VLEN extension albeit at reducedAgreed. That's feasible. It might be set by vsetvl, but unchanged by
The high end market would look else where than RISCV with such dumbed down support.
If we are going to have an industry bucking non-standard internal register format, but provide in-memory format support , it had better be proficient.
Because the augmented instructions are essential to the performance of the algorithm there is even less penalty for SLEN=VLEN implementations.
That is no extraneous moves.
The bits in vtype become no ops, and the prefix becomes a nop that a linker could remove.