Decompress Instruction


lidawei14@...
 

Hi all,

For common AI workloads such as DNNs, data communications between network layers introduce huge pressure on capacity and bandwidth of the memory hierarchy. 
For instance, dynamic large activation or feature map data needs to be buffered and communicated across multiple layers, which often appears to be sparse (e.g. ReLU).
People use bit vectors to "compress" the data buffered and "decompress" for the following layer computations.

Here we can see from the spec that "vcompress" has already been included,  how about "vdecompress"?

Thanks,
Dawei


Krste Asanovic
 

If the decompress is the inverse of compress, then there will be a
packed vector holding the non-zero elements and a bit mask indicating
which elements should receive the elements after unpacking

7 6 5 4 3 2 1 0 # vid

e d c b a # packed vector of 5 elements
1 0 0 1 1 1 0 1 # mask vector of 8 elements

e 0 0 d c b 0 a # result of decompress

This can be synthesized by using iota and masked vrgather

1 0 0 1 1 1 0 1 # mask vector
4 4 4 3 2 1 1 0 # viota.m
0 0 0 0 0 0 0 0 # zero result register
e 0 0 d c b 0 a # vrgather using viota.m under mask

code is

# v0 holds mask
# v1 holds packed data
# v11 holds decompressed data
viota.m v10, v0 # Calc iota from mask in v0
vmv.v.i v11, 0 # Clear destination
vrgather.vv v11, v1, v10, v0.t # Expand into destination

So decompress is quite fast already.

The reason there is a compress instruction is that it cannot be
synthesized from other instructions in the same way. You could
provide a "compress bit mask into packed indices" instruction, then do
an vrgather, but that is not much simpler than just doing the
compress.

Krste

On Thu, 03 Sep 2020 00:12:51 -0700, "lidawei14 via lists.riscv.org" <lidawei14=huawei.com@...> said:
| Hi all,
| For common AI workloads such as DNNs, data communications between network layers introduce huge pressure
| on capacity and bandwidth of the memory hierarchy.
| For instance, dynamic large activation or feature map data needs to be buffered and communicated across
| multiple layers, which often appears to be sparse (e.g. ReLU).
| People use bit vectors to "compress" the data buffered and "decompress" for the following layer
| computations.

| Here we can see from the spec that "vcompress" has already been included, how about "vdecompress"?

| Thanks,
| Dawei
|


lidawei14@...
 

Thanks Krste, that makes sense but the logic is not that straight forward, people usually needs "decompress" when they are using "compress", maybe we can add some comment on this at the "vcompress" section?