Date
1 - 3 of 3
Decompress Instruction
lidawei14@...
Hi all,
For common AI workloads such as DNNs, data communications between network layers introduce huge pressure on capacity and bandwidth of the memory hierarchy.
For common AI workloads such as DNNs, data communications between network layers introduce huge pressure on capacity and bandwidth of the memory hierarchy.
For instance, dynamic large activation or feature map data needs to be buffered and communicated across multiple layers, which often appears to be sparse (e.g. ReLU).
People use bit vectors to "compress" the data buffered and "decompress" for the following layer computations.
Here we can see from the spec that "vcompress" has already been included, how about "vdecompress"?
Thanks,
Dawei
People use bit vectors to "compress" the data buffered and "decompress" for the following layer computations.
Here we can see from the spec that "vcompress" has already been included, how about "vdecompress"?
Thanks,
Dawei
If the decompress is the inverse of compress, then there will be a
packed vector holding the non-zero elements and a bit mask indicating
which elements should receive the elements after unpacking
7 6 5 4 3 2 1 0 # vid
e d c b a # packed vector of 5 elements
1 0 0 1 1 1 0 1 # mask vector of 8 elements
e 0 0 d c b 0 a # result of decompress
This can be synthesized by using iota and masked vrgather
1 0 0 1 1 1 0 1 # mask vector
4 4 4 3 2 1 1 0 # viota.m
0 0 0 0 0 0 0 0 # zero result register
e 0 0 d c b 0 a # vrgather using viota.m under mask
code is
# v0 holds mask
# v1 holds packed data
# v11 holds decompressed data
viota.m v10, v0 # Calc iota from mask in v0
vmv.v.i v11, 0 # Clear destination
vrgather.vv v11, v1, v10, v0.t # Expand into destination
So decompress is quite fast already.
The reason there is a compress instruction is that it cannot be
synthesized from other instructions in the same way. You could
provide a "compress bit mask into packed indices" instruction, then do
an vrgather, but that is not much simpler than just doing the
compress.
Krste
| For common AI workloads such as DNNs, data communications between network layers introduce huge pressure
| on capacity and bandwidth of the memory hierarchy.
| For instance, dynamic large activation or feature map data needs to be buffered and communicated across
| multiple layers, which often appears to be sparse (e.g. ReLU).
| People use bit vectors to "compress" the data buffered and "decompress" for the following layer
| computations.
| Here we can see from the spec that "vcompress" has already been included, how about "vdecompress"?
| Thanks,
| Dawei
|
packed vector holding the non-zero elements and a bit mask indicating
which elements should receive the elements after unpacking
7 6 5 4 3 2 1 0 # vid
e d c b a # packed vector of 5 elements
1 0 0 1 1 1 0 1 # mask vector of 8 elements
e 0 0 d c b 0 a # result of decompress
This can be synthesized by using iota and masked vrgather
1 0 0 1 1 1 0 1 # mask vector
4 4 4 3 2 1 1 0 # viota.m
0 0 0 0 0 0 0 0 # zero result register
e 0 0 d c b 0 a # vrgather using viota.m under mask
code is
# v0 holds mask
# v1 holds packed data
# v11 holds decompressed data
viota.m v10, v0 # Calc iota from mask in v0
vmv.v.i v11, 0 # Clear destination
vrgather.vv v11, v1, v10, v0.t # Expand into destination
So decompress is quite fast already.
The reason there is a compress instruction is that it cannot be
synthesized from other instructions in the same way. You could
provide a "compress bit mask into packed indices" instruction, then do
an vrgather, but that is not much simpler than just doing the
compress.
Krste
| Hi all,On Thu, 03 Sep 2020 00:12:51 -0700, "lidawei14 via lists.riscv.org" <lidawei14=huawei.com@...> said:
| For common AI workloads such as DNNs, data communications between network layers introduce huge pressure
| on capacity and bandwidth of the memory hierarchy.
| For instance, dynamic large activation or feature map data needs to be buffered and communicated across
| multiple layers, which often appears to be sparse (e.g. ReLU).
| People use bit vectors to "compress" the data buffered and "decompress" for the following layer
| computations.
| Here we can see from the spec that "vcompress" has already been included, how about "vdecompress"?
| Thanks,
| Dawei
|