If the decompress is the inverse of compress, then there will be a
packed vector holding the non-zero elements and a bit mask indicating
which elements should receive the elements after unpacking
7 6 5 4 3 2 1 0 # vid
e d c b a # packed vector of 5 elements
1 0 0 1 1 1 0 1 # mask vector of 8 elements
e 0 0 d c b 0 a # result of decompress
This can be synthesized by using iota and masked vrgather
1 0 0 1 1 1 0 1 # mask vector
4 4 4 3 2 1 1 0 # viota.m
0 0 0 0 0 0 0 0 # zero result register
e 0 0 d c b 0 a # vrgather using viota.m under mask
code is
# v0 holds mask
# v1 holds packed data
# v11 holds decompressed data
viota.m v10, v0 # Calc iota from mask in v0
vmv.v.i v11, 0 # Clear destination
vrgather.vv v11, v1, v10, v0.t # Expand into destination
So decompress is quite fast already.
The reason there is a compress instruction is that it cannot be
synthesized from other instructions in the same way. You could
provide a "compress bit mask into packed indices" instruction, then do
an vrgather, but that is not much simpler than just doing the
compress.
Krste
On Thu, 03 Sep 2020 00:12:51 -0700, "lidawei14 via lists.riscv.org" <lidawei14=huawei.com@...> said:
| Hi all,
| For common AI workloads such as DNNs, data communications between network layers introduce huge pressure
| on capacity and bandwidth of the memory hierarchy.
| For instance, dynamic large activation or feature map data needs to be buffered and communicated across
| multiple layers, which often appears to be sparse (e.g. ReLU).
| People use bit vectors to "compress" the data buffered and "decompress" for the following layer
| computations.
| Here we can see from the spec that "vcompress" has already been included, how about "vdecompress"?
| Thanks,
| Dawei
|