
Vector TG minutes for 2020/12/18 meeting
Perhaps for explicit naming conventions of mask operations, we can name "vle1.v" to "vmle1.v" instead.
Perhaps for explicit naming conventions of mask operations, we can name "vle1.v" to "vmle1.v" instead.

By
lidawei14@...
· #551
·


Vector Task Group minutes 2020/12/04
In some cases we have widening computations with large LMUL settings, we will quickly run out of v0v31 if we also have to keep masks in these registers.
In some cases we have widening computations with large LMUL settings, we will quickly run out of v0v31 if we also have to keep masks in these registers.

By
lidawei14@...
· #541
·


Vector Task Group minutes 2020/12/04
Hi Krste, This mask loading instruction is exactly the one we look forward. I got some confusion on hiccups, why machines with internal dynamic data striping require hiccups whenever used as a mask? D
Hi Krste, This mask loading instruction is exactly the one we look forward. I got some confusion on hiccups, why machines with internal dynamic data striping require hiccups whenever used as a mask? D

By
lidawei14@...
· #525
·


Sparse MatrixVector Multiply (again) and BitVector Compression
Hi all, If I use EDIV to compute SpMV y = A * x as size r * c blocks, I might have to load size r of y and size c of x, these are shorter than VL = r * c, is there an efficient way to do this by curre
Hi all, If I use EDIV to compute SpMV y = A * x as size r * c blocks, I might have to load size r of y and size c of x, these are shorter than VL = r * c, is there an efficient way to do this by curre

By
lidawei14@...
· #492
·


Sparse MatrixVector Multiply (again) and BitVector Compression
Hi all, Thank you Nick for the reply. I saw EDIV will not be included in v1.0, any issues to be resolved? Can I have a look at the discussion page on EDIV? Thanks a lot, Dawei
Hi all, Thank you Nick for the reply. I saw EDIV will not be included in v1.0, any issues to be resolved? Can I have a look at the discussion page on EDIV? Thanks a lot, Dawei

By
lidawei14@...
· #490
·


Sparse MatrixVector Multiply (again) and BitVector Compression
Hi， Perhaps instead of using bit vector to encode an entire matrix, we can encode a sub block. There is a common sparse matrix format called BCSR that blocks the nonzero values of CSR, so that we can
Hi， Perhaps instead of using bit vector to encode an entire matrix, we can encode a sub block. There is a common sparse matrix format called BCSR that blocks the nonzero values of CSR, so that we can

By
lidawei14@...
· #474
·


Decompress Instruction
Thanks Krste, that makes sense but the logic is not that straight forward, people usually needs "decompress" when they are using "compress", maybe we can add some comment on this at the "vcompress" se
Thanks Krste, that makes sense but the logic is not that straight forward, people usually needs "decompress" when they are using "compress", maybe we can add some comment on this at the "vcompress" se

By
lidawei14@...
· #387
·


Decompress Instruction
Hi all, For common AI workloads such as DNNs, data communications between network layers introduce huge pressure on capacity and bandwidth of the memory hierarchy. For instance, dynamic large activati
Hi all, For common AI workloads such as DNNs, data communications between network layers introduce huge pressure on capacity and bandwidth of the memory hierarchy. For instance, dynamic large activati

By
lidawei14@...
· #385
·


Duplicate Counting Instruction
Hi, Sorry the picture was dropped somehow, here I can present using pure text: For the simple loop example: for (i = 0; i < N; i++) { a[b[i]] = a[c[i]] + 1; } We can use the algorithm: while (i + VLEN
Hi, Sorry the picture was dropped somehow, here I can present using pure text: For the simple loop example: for (i = 0; i < N; i++) { a[b[i]] = a[c[i]] + 1; } We can use the algorithm: while (i + VLEN

By
lidawei14@...
· #281
·


Duplicate Counting Instruction
Hi Krste, Just would like to continue Roger's question on hardware implementation, as you said it can be done with a parallelprefixstyle ORreduction tree, so can you please explain how we can avoid
Hi Krste, Just would like to continue Roger's question on hardware implementation, as you said it can be done with a parallelprefixstyle ORreduction tree, so can you please explain how we can avoid

By
lidawei14@...
· #259
·


Duplicate Counting Instruction
Hi Krste, I read through your code and thanks for correcting my errors, 'or' is a good idea for multiple duplicates. Here I'd like to explain why I made things a bit more complicated in my code. In yo
Hi Krste, I read through your code and thanks for correcting my errors, 'or' is a good idea for multiple duplicates. Here I'd like to explain why I made things a bit more complicated in my code. In yo

By
lidawei14@...
· #255
·


FaultOnlyFirst Indexed Loads Instructions
while (True) { if (unLo > unHi) break; n = ((Int32)block[ptr[unLo]+d])  med; if (n == 0) { mswap(ptr[unLo], ptr[ltLo]); ltLo++; unLo++; continue; }; if (n > 0) break; unLo++; } The loop above is from
while (True) { if (unLo > unHi) break; n = ((Int32)block[ptr[unLo]+d])  med; if (n == 0) { mswap(ptr[unLo], ptr[ltLo]); ltLo++; unLo++; continue; }; if (n > 0) break; unLo++; } The loop above is from

By
lidawei14@...
· #212
·


Duplicate Counting Instruction
Hi all, For some certain cases such as histogram we might have duplicate runtime memory dependences, and the current V extension may fail to vectorize such cases. Therefore, I would like to propose du
Hi all, For some certain cases such as histogram we might have duplicate runtime memory dependences, and the current V extension may fail to vectorize such cases. Therefore, I would like to propose du

By
lidawei14@...
· #211
·


FaultOnlyFirst Indexed Loads Instructions
Hi all, In this page I would like to discuss about faultonlyfirst indexed load instructions since we have certain using cases, for example, SPEC CPU 2006 4.1.bzip2 src/blocksort.c:line 712. For faul
Hi all, In this page I would like to discuss about faultonlyfirst indexed load instructions since we have certain using cases, for example, SPEC CPU 2006 4.1.bzip2 src/blocksort.c:line 712. For faul

By
lidawei14@...
· #201
·
