Date   

Re: Check mask all ones / all zeros

Andrew Waterman
 



On Thu, May 20, 2021 at 12:27 AM Andrew Waterman <andrew@...> wrote:


On Thu, May 20, 2021 at 12:16 AM Krste Asanovic <krste@...> wrote:
Actually, vfirst,m can be implemented with an early out on long temporal vector machines, whereas vpopc.m has to process all bits.

If the common case for the input data is that all bits would be set/clear, then choice doesn’t really matter, but if common to be able to early out (i.e. test fails), I’d go with vfirst.m

Yeah, it would've been more precise of me to have compared vpopc.m against Roger's hypothetical new instruction, which also must process all bits.

Er, nevermind, I got that wrong again.  Roger's instruction can also early-out with slightly more complexity (if at least one 1 and at least one 0 is detected).



Krste

On May 19, 2021, at 11:30 PM, Roger Ferrer Ibanez <roger.ferrer@...> wrote:

Hi Andrew,

thanks for the prompt and insightful answer. I'll use vpopc.m then.

On 20/5/21 8:25, Andrew Waterman wrote:
PS. You probably already have the current vector length in a GPR, and that quantity is probably the more appropriate thing to compare against than VLMAX.  So you probably don't need to go to the trouble of materializing VLMAX.

Indeed, my question was motivated while looking at some code that operates on whole registers but it can definitely be generalised to any vector length.

Kind regards,

-- 
Roger Ferrer Ibáñez - roger.ferrer@...
Barcelona Supercomputing Center - Centro Nacional de Supercomputación


WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer


Re: Check mask all ones / all zeros

Andrew Waterman
 



On Thu, May 20, 2021 at 12:16 AM Krste Asanovic <krste@...> wrote:
Actually, vfirst,m can be implemented with an early out on long temporal vector machines, whereas vpopc.m has to process all bits.

If the common case for the input data is that all bits would be set/clear, then choice doesn’t really matter, but if common to be able to early out (i.e. test fails), I’d go with vfirst.m

Yeah, it would've been more precise of me to have compared vpopc.m against Roger's hypothetical new instruction, which also must process all bits.


Krste

On May 19, 2021, at 11:30 PM, Roger Ferrer Ibanez <roger.ferrer@...> wrote:

Hi Andrew,

thanks for the prompt and insightful answer. I'll use vpopc.m then.

On 20/5/21 8:25, Andrew Waterman wrote:
PS. You probably already have the current vector length in a GPR, and that quantity is probably the more appropriate thing to compare against than VLMAX.  So you probably don't need to go to the trouble of materializing VLMAX.

Indeed, my question was motivated while looking at some code that operates on whole registers but it can definitely be generalised to any vector length.

Kind regards,

-- 
Roger Ferrer Ibáñez - roger.ferrer@...
Barcelona Supercomputing Center - Centro Nacional de Supercomputación


WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer


Re: Check mask all ones / all zeros

Krste Asanovic
 

Actually, vfirst,m can be implemented with an early out on long temporal vector machines, whereas vpopc.m has to process all bits.

If the common case for the input data is that all bits would be set/clear, then choice doesn’t really matter, but if common to be able to early out (i.e. test fails), I’d go with vfirst.m

Krste

On May 19, 2021, at 11:30 PM, Roger Ferrer Ibanez <roger.ferrer@...> wrote:

Hi Andrew,

thanks for the prompt and insightful answer. I'll use vpopc.m then.

On 20/5/21 8:25, Andrew Waterman wrote:
PS. You probably already have the current vector length in a GPR, and that quantity is probably the more appropriate thing to compare against than VLMAX.  So you probably don't need to go to the trouble of materializing VLMAX.

Indeed, my question was motivated while looking at some code that operates on whole registers but it can definitely be generalised to any vector length.

Kind regards,

-- 
Roger Ferrer Ibáñez - roger.ferrer@...
Barcelona Supercomputing Center - Centro Nacional de Supercomputación


WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer


Re: Check mask all ones / all zeros

Roger Ferrer Ibanez
 

Hi Andrew,

thanks for the prompt and insightful answer. I'll use vpopc.m then.

On 20/5/21 8:25, Andrew Waterman wrote:
PS. You probably already have the current vector length in a GPR, and that quantity is probably the more appropriate thing to compare against than VLMAX.  So you probably don't need to go to the trouble of materializing VLMAX.

Indeed, my question was motivated while looking at some code that operates on whole registers but it can definitely be generalised to any vector length.

Kind regards,

-- 
Roger Ferrer Ibáñez - roger.ferrer@...
Barcelona Supercomputing Center - Centro Nacional de Supercomputación


WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer


Re: Check mask all ones / all zeros

Andrew Waterman
 



On Wed, May 19, 2021 at 10:49 PM Roger Ferrer Ibanez <roger.ferrer@...> wrote:

Hi all,

I could not find any instruction that immediately computes this. Apologies if I missed the obvious here.

Two options came to mind:

  • vpopc.m and check whether the result is 0 (all zeros) or VLMAX(SEW, LMUL). I am under the impression that population count is not a fast operation (though I guess it depends on the actual VLEN)
I think this approach is sufficient, actually.

On the machines I've worked on so far, vpopc.m is no slower than vfirst.m.

For machines with very wide spatial vectors, you could imagine vpopc.m being slightly higher latency than vfirst.m (say, one extra clock cycle) because of the depth of the reduction tree.  But this shouldn't be a dominant effect: in a machine like that, surely the data movement latency will be a more prominent factor than the reduction latency, since the latter scales logarithmically.

PS. You probably already have the current vector length in a GPR, and that quantity is probably the more appropriate thing to compare against than VLMAX.  So you probably don't need to go to the trouble of materializing VLMAX.
  • vfirst.m, returns -1 it the mask is all zeros. For all ones we can do vmnot.m first and then vfirst.m. Might not be much faster than vpopc.m but (at expense of vmnot.m) does not need to compute VLMAX(SEW,LMUL).

Perhaps there are other alternatives?

Thoughts on whether it'd make sense to have a specific instruction for these checks? As in one instruction that returns one of three possible results (e.g. 1 for all ones, -1 for all zeros, 0 otherwise) in a GPR.

Thank you very much,

-- 
Roger Ferrer Ibáñez - roger.ferrer@...
Barcelona Supercomputing Center - Centro Nacional de Supercomputación


WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer


Check mask all ones / all zeros

Roger Ferrer Ibanez
 

Hi all,

I could not find any instruction that immediately computes this. Apologies if I missed the obvious here.

Two options came to mind:

  • vpopc.m and check whether the result is 0 (all zeros) or VLMAX(SEW, LMUL). I am under the impression that population count is not a fast operation (though I guess it depends on the actual VLEN)
  • vfirst.m, returns -1 it the mask is all zeros. For all ones we can do vmnot.m first and then vfirst.m. Might not be much faster than vpopc.m but (at expense of vmnot.m) does not need to compute VLMAX(SEW,LMUL).

Perhaps there are other alternatives?

Thoughts on whether it'd make sense to have a specific instruction for these checks? As in one instruction that returns one of three possible results (e.g. 1 for all ones, -1 for all zeros, 0 otherwise) in a GPR.

Thank you very much,

-- 
Roger Ferrer Ibáñez - roger.ferrer@...
Barcelona Supercomputing Center - Centro Nacional de Supercomputación


WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer


Re: LLVM with RVV intrinsic support

David Horner
 

Excellent.
Congratulations, and thank you!!

On Fri, May 14, 2021, 05:21 Kai Wang, <kai.wang@...> wrote:
Hi,

We would like to announce that the RISC-V V-extension v0.10 has been implemented in LLVM and the work has been committed upstream.


Barcelona Supercomputing Center (BSC), Codeplay Software, and SiFive have worked together to implement the RVV C API intrinsics for the V-extension and have implemented the foundation of CodeGen for Vector Length Specific (VLS) and Vector Length Agnostic (VLA) autovectorization for RISC-V. 


What we have committed to LLVM upstream:

* Support for the v0.10 V-extension specification

* Support for the RVV C intrinsics in https://github.com/riscv/rvv-intrinsic-doc/tree/v0.10

* Implement the draft vector calling convention in https://github.com/riscv/riscv-elf-psabi-doc/pull/171


Known issues:

* C intrinsics for Zvlsseg implementation is under discussion:

 - https://lists.llvm.org/pipermail/llvm-dev/2021-March/149518.html

* What type we should use for fp16 is under discussion:

 - https://github.com/riscv/rvv-intrinsic-doc/issues/18#issuecomment-818472454


RISC-V RVV example:

https://github.com/riscv/rvv-intrinsic-doc/blob/master/rvv_saxpy.c


Build command:

clang --target=riscv64-unknown-elf -march=rv64gcv0p10 -menable-experimental-extensions rvv_saxpy.c -o rvv_saxpy.elf



LLVM with RVV intrinsic support

Kai Wang
 

Hi,

We would like to announce that the RISC-V V-extension v0.10 has been implemented in LLVM and the work has been committed upstream.


Barcelona Supercomputing Center (BSC), Codeplay Software, and SiFive have worked together to implement the RVV C API intrinsics for the V-extension and have implemented the foundation of CodeGen for Vector Length Specific (VLS) and Vector Length Agnostic (VLA) autovectorization for RISC-V. 


What we have committed to LLVM upstream:

* Support for the v0.10 V-extension specification

* Support for the RVV C intrinsics in https://github.com/riscv/rvv-intrinsic-doc/tree/v0.10

* Implement the draft vector calling convention in https://github.com/riscv/riscv-elf-psabi-doc/pull/171


Known issues:

* C intrinsics for Zvlsseg implementation is under discussion:

 - https://lists.llvm.org/pipermail/llvm-dev/2021-March/149518.html

* What type we should use for fp16 is under discussion:

 - https://github.com/riscv/rvv-intrinsic-doc/issues/18#issuecomment-818472454


RISC-V RVV example:

https://github.com/riscv/rvv-intrinsic-doc/blob/master/rvv_saxpy.c


Build command:

clang --target=riscv64-unknown-elf -march=rv64gcv0p10 -menable-experimental-extensions rvv_saxpy.c -o rvv_saxpy.elf



Re: vector intrinsics for both RV32/RV64

Jim Wilson
 

On Wed, May 12, 2021 at 10:52 AM Guy Lemieux <guy.lemieux@...> wrote:
I’m starting a project where we want to use vector intrinsics and generate both 64b and 32b code (for RV64 and RV32).
It looks line the best way to do this right now is with GCC, where we were able to find up-to-date intrinsics for the v0.10 spec:

There is a gcc RVV port from SiFive, but it has been dormant for months, and is not being actively maintained at the moment.  You are better off using LLVM instead which is actively being worked on by multiple parties including SiFive.
Jim


Re: vector intrinsics for both RV32/RV64

Craig Topper
 

Hi Guy,

The latest LLVM git repository should have support for all intrinsics except segment load/store. The intrinsics missed the branch window for the LLVM 12 release, but should be in LLVM 13 when it is released in the second half of the year.

The riscv_vector.h header is autogenerated from other files when clang is built so you won’t find the header in the repository.

~Craig

On May 12, 2021, at 10:52 AM, Guy Lemieux <guy.lemieux@...> wrote:

Hi,

I’m starting a project where we want to use vector intrinsics and generate both 64b and 32b code (for RV64 and RV32).

It looks line the best way to do this right now is with GCC, where we were able to find up-to-date intrinsics for the v0.10 spec:


Is there a similar ability with LLVM? Vector support seems to be added, but no up to date intrinsics yet. This is the closest I could find, but it appears to be a bit out of date (vector spec 0.8) and only for RV32:


Sorry if this is an obvious question — I haven’t dug very deeply into this yet, but I thought this group would be able to give me better answers and save me a bit of time.

Thanks for any pointers.

Guy




vector intrinsics for both RV32/RV64

Guy Lemieux
 

Hi,

I’m starting a project where we want to use vector intrinsics and generate both 64b and 32b code (for RV64 and RV32).

It looks line the best way to do this right now is with GCC, where we were able to find up-to-date intrinsics for the v0.10 spec:


Is there a similar ability with LLVM? Vector support seems to be added, but no up to date intrinsics yet. This is the closest I could find, but it appears to be a bit out of date (vector spec 0.8) and only for RV32:


Sorry if this is an obvious question — I haven’t dug very deeply into this yet, but I thought this group would be able to give me better answers and save me a bit of time.

Thanks for any pointers.

Guy



Re: FYI: ARM vs. RISC-V vector extension conmparison

Bruce Hoult
 

Yeah. I posted this link to our reddit /r/riscv four weeks ago and made a few comments about it.


It was posted on Hacker News yesterday and both Chris BOOM! Celio and I made a few comments and replies:


I have since at the author's request pointed him at the current spec. 


Rather superficial - all about how hard it is for a person to program in assembly language, rather than how a compiler can take advantage of the encoding.

In one HN comment I pointed out that while having scaled indexed addressing mode for your vectors can be nice, those extra bits in a fixed size opcode do come at a cost.

> On a quick reread, I see a complaint that's entirely due to how
> ARM represents indexed load operations, which has absolutely
> nothing to do with the vector ISA whatsoever.

Not exactly true.

If you can use fancy addressing modes in your vector loads and stores and you have a fixed length 32 bit opcode (as both Aarch64 and RISC-V do[1]) then specifying an index register and how much to shift it by is taking up an extra 7 bits of your opcode (5 for register number, 2 for shift amount) vs an instruction that just specifies a base pointer register.

That means one instruction is taking up the opcode space that could otherwise be used by 128 different instructions instead.

That means either your vector ISA has fewer instructions and capabilities than it otherwise could have, or else it is taking up a lot more of the overall opcode space.




On Sat, May 8, 2021 at 9:07 AM Allen Baum <allen.baum@...> wrote:

Rather superficial - all about how hard it is for a person to program in assembly language, rather than how a compiler can take advantage of the encoding.


Re: FYI: ARM vs. RISC-V vector extension conmparison

Nick Knight
 

I guess the publicity doesn't hurt, but I do wish the author had considered our developments here (at riscv/riscv-v-spec). His material appears to derive from Patterson-Waterman's 2017 book (and sigarch blog-post), and the architecture has evolved a bit since.


On Fri, May 7, 2021 at 5:06 PM Allen Baum <allen.baum@...> wrote:

Rather superficial - all about how hard it is for a person to program in assembly language, rather than how a compiler can take advantage of the encoding.


FYI: ARM vs. RISC-V vector extension conmparison

Allen Baum
 


Rather superficial - all about how hard it is for a person to program in assembly language, rather than how a compiler can take advantage of the encoding.


Re: GCC RISC-V Vector Intrinsic Instructions and #defines missing #defines

Kito Cheng
 

Hi Tony:

Could you create issues on github to track that?
https://github.com/riscv/riscv-gcc

Thanks :)

On Sat, Apr 10, 2021 at 9:14 AM Jim Wilson <jimw@...> wrote:

On Fri, Apr 9, 2021 at 3:40 PM Tony Cole via lists.riscv.org <tony.cole=huawei.com@...> wrote:

I’m still new to RISC-V and the Vector extensions, so forgive me if I’ve missed something, the following have been fixed or noted before.

Also, am I sending this to the correct group for GCC RISC-V Vector Intrinsics? If not, who and how should I inform?

I would suggest filing an issue in the riscv/riscv-gnu-toolchain github tree. Put something like vector or rvv in the issue title to make it clear it is a vector related issue. The gcc support is not being actively worked on at the moment. LLVM is the current focus for all vector compiler support. Eventually someone may start working on the gcc vector support again. Meanwhile, bugs filed against the gcc vector support may or may not be fixed.

Jim


Re: GCC RISC-V Vector Intrinsic Instructions and #defines missing #defines

Jim Wilson
 

On Fri, Apr 9, 2021 at 3:40 PM Tony Cole via lists.riscv.org <tony.cole=huawei.com@...> wrote:

I’m still new to RISC-V and the Vector extensions, so forgive me if I’ve missed something, the following have been fixed or noted before.

Also, am I sending this to the correct group for GCC RISC-V Vector Intrinsics? If not, who and how should I inform?


I would suggest filing an issue in the riscv/riscv-gnu-toolchain github tree.  Put something like vector or rvv in the issue title to make it clear it is a vector related issue.  The gcc support is not being actively worked on at the moment.  LLVM is the current focus for all vector compiler support.  Eventually someone may start working on the gcc vector support again.  Meanwhile, bugs filed against the gcc vector support may or may not be fixed.

Jim


GCC RISC-V Vector Intrinsic Instructions and #defines missing #defines

Tony Cole
 

Hi all,

 

I’m still new to RISC-V and the Vector extensions, so forgive me if I’ve missed something, the following have been fixed or noted before.

 

Also, am I sending this to the correct group for GCC RISC-V Vector Intrinsics? If not, who and how should I inform?

 

 

 

I’m currently using: riscv32-unknown-elf-gcc (GCC) 10.1.0    (…/10.1.0–rvv-intrinsic-patch/bin/ riscv32-unknown-elf-gcc – version)

 

 

These (and probably others) don’t exist in the GCC compiler RISCV Vector intrinsics (the m8 versions):

 

        vint32m1_t vwredsum_vs_i16m8_i32m1 (vint32m1_t dst, vint16m8_t vector, vint32m1_t scalar, size_t vl);

        vint64m1_t vwredsum_vs_i32m8_i64m1 (vint64m1_t dst, vint32m8_t vector, vint64m1_t scalar, size_t vl);

 

They are listed in here: https://github.com/riscv/rvv-intrinsic-doc/blob/master/intrinsic_funcs/09_vector_reduction_functions.md

 

 

So, I’ve had to temporally change to (the m4 versions):

 

        vint32m1_t vwredsum_vs_i16m4_i32m1 (vint32m1_t dst, vint16m4_t vector, vint32m1_t scalar, size_t vl);
        vint64m1_t vwredsum_vs_i32m4_i64m1 (vint64m1_t dst, vint32m4_t vector, vint64m1_t scalar, size_t vl);

 

to get it to compile and work.

 

This may have already been fixed? Please let me know.

 

 

 

Also,

 

I was expecting to find some #defines for the rounding modes in riscv-vector.h, something like:

 

/* Vector Fixed-Point Rounding Mode Register vxrm settings

   Use with vwrite_csr(RVV_VXRM, RVV_VXRM_XXX) */

 

#define RVV_VXRM_RNU  (0) /* Round-to-nearest-up (add 0.5 LSB) */

#define RVV_VXRM_RNE  (1) /* Round-to-nearest-even */

#define RVV_VXRM_RDN  (2) /* Round-down (truncate) */

#define RVV_VXRM_ROD  (3) /* Round-to-add (OR bits into LSB, aka "jam") */

 

Tony Cole

CPU Consultant I RISC-V Cores, Bristol

E-mail: Tony.Cole@...

Company: Huawei technologies R&D (UK) Ltd I Address: 290 Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32 4SY, UK      

 

315px-Huawei    http://www.huawei.com

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure,reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it !

本邮件及其附件含有华为公司的保密信息,仅限于发送给上面 地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件

 

 


Possible RISC-V Vector Instructions missing

Tony Cole
 

Hi Vector Team,

 

I’m new to RISC-V and the Vector extensions, so forgive me if I’ve missed something.

 

 

I have searched the specs, emails and git hub issues, but not found anything on this:

 

 

While writing some vector code using the vector intrinsics, I noticed some instructions missing that I expected to see:

 

I noticed there is no saturated reverse subtract version of vssub_vx, e.g. vsrsub_vx (or should it be vrssub_vx ?) and so no vsneg_v pseudo instructions

 

But there are the following integer/float reverse subtract instructions:

vrsub_vx

vfrsub_vf

 

and their pseudo instruction counterparts:

vneg_v

vfneg_v

 

For orthogonality there should be saturated versions of the above, but maybe there is not enough encoding space?

Or possibly remove vrsub_vx & vfrsub_vf to gain encoding space ??

 

Note: I wanted to use vsrsub_vx (to do vsneg_v), but instead achieved it by loading a vector with zero and performing vssub_vv.

 

Tony Cole

CPU Consultant I RISC-V Cores, Bristol

E-mail: Tony.Cole@...

Company: Huawei technologies R&D (UK) Ltd I Address: 290 Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32 4SY, UK      

 

315px-Huawei    http://www.huawei.com

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure,reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it !

本邮件及其附件含有华为公司的保密信息,仅限于发送给上面 地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件

 


No vector task group meeting tomorrow

Krste Asanovic
 

I haven’t seen any burning issues come by, and am still trying to clean up spec.

So unless someone has agenda items, I’m canceling meeting tomorrow,
Krste


No vector TG meeting this week

Krste Asanovic
 

I’m still working on spec cleanup and I don’ t have any major outstanding issues to discuss, so will cancel the TG meeting this week.

Please bring up any burning issues on this mailing list,
Krste

261 - 280 of 862