Date
1 - 1 of 1
答复: [RISC-V] [tech-vector-ext] Vector TG meeting minutes 2020/4/03
Linjie Yu
Hi,all
I have some applications about byte/halfword/word vector load/stores, like gemm, direct convolution and son on. For 3x3 direct convolution, the code without byte/halfword/word vector load/stores can be: Int gvl = vsetvli(16, RVV_E32, RVV_M4); int32xm4_t out = vmvvi_int32xm4(0, gvl); for(unsigned int r = 0; r < 3; ++r) { gvl = vsetvli(16, RVV_E8, RVV_M1); const uint8xm1_t data = vle_uint8xm1(input_ptrs[r], gvl); convolve_row3x1(out, data, conv + r * cols); } inline void convolve_row3x1 (int32xm4_t &out, const uint8xm1_t &row_data, const int16_t *convolution) { const int16_t mat0 = *(convolution); const int16_t mat1 = *(convolution + 1); const int16_t mat2 = *(convolution + 2); unsigned int gvl = vsetvli(16, RVV_E8, RVV_M1); int16xm2_t row = (int16xm2_t)vwadduvx_uint16xm2_uint8xm1(row_data, 0, gvl); gvl = vsetvli(8, RVV_E16, RVV_M2); int16xm2_t row_03 = vslidedownvx_int16xm2(row, 1, gvl); int16xm2_t row_47 = vslidedownvx_int16xm2(row, 2, gvl); out = vwmaccvx_int32xm4_int16xm2(mat0, row, out, gvl); out = vwmaccvx_int32xm4_int16xm2(mat1, row_03, out, gvl); out = vwmaccvx_int32xm4_int16xm2(mat2, row_47, out, gvl); } the code with byte/halfword/word vector load/stores can be: Int gvl = vsetvli(16, RVV_E32, RVV_M4); int32xm4_t out = vmvvi_int32xm4(0, gvl); for(unsigned int r = 0; r < 3; ++r) { gvl = vsetvli(16, RVV_E16, RVV_M2); const uint16xm2_t data = vlbv_uint8xm1(input_ptrs[r], gvl); convolve_row3x1(out, data, conv + r * cols); } inline void convolve_row3x1 (int32xm4_t &out, const uint16xm2_t &row_data, const int16_t *convolution) { const int16_t mat0 = *(convolution); const int16_t mat1 = *(convolution + 1); const int16_t mat2 = *(convolution + 2); gvl = vsetvli(8, RVV_E16, RVV_M2); int16xm2_t row_03 = vslidedownvx_int16xm2(row, 1, gvl); int16xm2_t row_47 = vslidedownvx_int16xm2(row, 2, gvl); out = vwmaccvx_int32xm4_int16xm2(mat0, row, out, gvl); out = vwmaccvx_int32xm4_int16xm2(mat1, row_03, out, gvl); out = vwmaccvx_int32xm4_int16xm2(mat2, row_47, out, gvl); } The instructions number of the code with byte/halfword/word vector load/stores, can be reduced about 15%. But when the kernel size becomes Lager, this gap will be smaller. When the size is 9x9, the gap is about 7%. So, in my opinion, these load/store instructions is useful. Yours Damon -----邮件原件----- 发件人: tech-vector-ext@... <tech-vector-ext@...> 代 表 Krste Asanovic 发送时间: 2020年4月5日 4:43 收件人: tech-vector-ext@... 主题: [RISC-V] [tech-vector-ext] Vector TG meeting minutes 2020/4/03 Date: 2020/4/03 Task Group: Vector Extension Chair: Krste Asanovic Number of Attendees: ~15 Current issues on github: https://github.com/riscv/riscv-v-spec Issues discussed: #354/362 The following issues were discussed. Closing on version v0.9. A list of proposed changes to form v0.9 were presented. The main dispute was around dropping byte/halfword/word vector load/stores. #354/362 Drop byte/halfword/word vector load/stores Most of the meeting time was spent discussing this issue, which was contentious. Participants in favor of retaining these instructions were concerned about the code size and performance impact of dropping them. Proponents in favor of dropping them noted that the main impact was only for integer code (floating-point code does not benefit from these instructions), that performance might be lower using these instructions rather than widening, and that there was a large benefit in reducing memory pipeline complexity. The group was going to consider some examples to be supplied by the members, including some mixed floating-point/integer code. Discussion to continue on mailing list. |
|