Decompress Instruction


Krste Asanovic
 

If the decompress is the inverse of compress, then there will be a
packed vector holding the non-zero elements and a bit mask indicating
which elements should receive the elements after unpacking

7 6 5 4 3 2 1 0 # vid

e d c b a # packed vector of 5 elements
1 0 0 1 1 1 0 1 # mask vector of 8 elements

e 0 0 d c b 0 a # result of decompress

This can be synthesized by using iota and masked vrgather

1 0 0 1 1 1 0 1 # mask vector
4 4 4 3 2 1 1 0 # viota.m
0 0 0 0 0 0 0 0 # zero result register
e 0 0 d c b 0 a # vrgather using viota.m under mask

code is

# v0 holds mask
# v1 holds packed data
# v11 holds decompressed data
viota.m v10, v0 # Calc iota from mask in v0
vmv.v.i v11, 0 # Clear destination
vrgather.vv v11, v1, v10, v0.t # Expand into destination

So decompress is quite fast already.

The reason there is a compress instruction is that it cannot be
synthesized from other instructions in the same way. You could
provide a "compress bit mask into packed indices" instruction, then do
an vrgather, but that is not much simpler than just doing the
compress.

Krste

On Thu, 03 Sep 2020 00:12:51 -0700, "lidawei14 via lists.riscv.org" <lidawei14=huawei.com@...> said:
| Hi all,
| For common AI workloads such as DNNs, data communications between network layers introduce huge pressure
| on capacity and bandwidth of the memory hierarchy.
| For instance, dynamic large activation or feature map data needs to be buffered and communicated across
| multiple layers, which often appears to be sparse (e.g. ReLU).
| People use bit vectors to "compress" the data buffered and "decompress" for the following layer
| computations.

| Here we can see from the spec that "vcompress" has already been included, how about "vdecompress"?

| Thanks,
| Dawei
|

Join tech-vector-ext@lists.riscv.org to automatically receive all group messages.