[RFC] Drafting a formal v1.0 release for RVV C Intrinsic API
eop.chen@...
Hi all,
We (SiFive) are going to draft out a formal v1.0 release for the RVV C
intrinsic API. Next week we are going to provide a roadmap, including time
reserved for comments on what is left on the table and needs to be cleared
before the release. All existing issue will be settled. The ones that have
converged will be closed and opening ones will be tagged as "resolve for v1.0"
or "resolve after v1.0" that we can bring up for discussion in future
meeting(s).
Here are some initial thoughts on items before the release:
Release a generator script that produces the current C intrinsic API. Other
languages that seek to implement the intrinsics will able to leverage this.
Release a pdf version, better formatted document for the RVV C intrinsic API.
We hope to expand RVV users by providing a better conditioned document.
Schedule out timelines on requesting for comments on current items. Maybe
a monthly meeting? We hope to gather more inputs and reach consensus.
Our take on the release is to consider the completeness of current intrinsic
API-s and do minimal fixes and leave the current implementation "as-is" for
v1.0.
Looking forward for input and hope we can close this by the end of this year.
This post is also cc-ed to amongst vector TG, toolchain & runtime TG, graphic TG and HPC TG. Link to RFC issue under rvv-intrinsic-doc Regards, eop Chen
|
|
Re: Notice of Group Archival
Jeff Scheel <jeff@...>
Krste has requested that this group not be archived due to pending work on Zvfh and Zvfhmin extensions. Based on this, we will wait until this work completes to proceed with archival.
Thanks! -Jeff --
Jeff Scheel (he/him/his)
Linux Foundation, RISC-V Technical Program Manager
|
|
Notice of Group Archival
Jeff Scheel <jeff@...>
Community members, The Vector Extension Task Group community has completed its work and is slated to be deactivated and archived on August 15, 2022. If you believe that this decision has been made in error and the action should be postponed, please send an email to help@... with an explanation. Future discussions previously addressed by the group, should be addressed to: Thanks, -Jeff -- Jeff Scheel (he/him/his)Linux Foundation, RISC-V Technical Program Manager
|
|
Re: Seeking inputs for evaluating vector ABI design
Kito Cheng
Hi Peter:
The relevance to the present discussion is that RTCG may require detailedDiscovery ABI/API are discussed in another place, which might not be part of psABI, it would be more like Linux specific stuffs, SiFive folks and Rivos folks have some discussion about the design of configuration discovery API, might included a set of new system call to let program in user space has a mechanism get more detail information, that will public info soon (it would be public at least before the Linux plumber 2022 I guess). On Wed, Jul 27, 2022 at 12:49 AM L Peter Deutsch <ghost@...> wrote:
|
|
Re: Seeking inputs for evaluating vector ABI design
Kito Cheng
Hi Zalman:
toggle quoted messageShow quoted text
Define a standardized vector ABI means we can have a common interface and agreement among different compiler and libraries, isn't means we must use that everywhere, we did have several way to doing some dynamic function version selection (e.g. ifunc), and it won't changing existing software ecosystem (means NO need to recompile everything with vector ABI), it's extension of standardized ABI. I could imagine optimized library functions with RVV might not have any performance gain for all HW platform, but that would be software optimization issue rather than ABI issue I think:)
On Wed, Jul 27, 2022 at 8:13 AM Bruce Hoult <bruce@...> wrote:
|
|
Re: RISCV Vector Compliance Test Suite
Allen Baum
If you're in a hurry, Imperas has developed a set of vector tests also, and they're likely very comprehensive. I don't know which configurations are supported though.
On Mon, Jul 25, 2022 at 5:54 AM Kito Cheng <kito.cheng@...> wrote: FYI: https://github.com/riscv-software-src/riscv-tests/pull/400
|
|
Re: Seeking inputs for evaluating vector ABI design
On Wed, Jul 27, 2022 at 4:21 AM Zalman Stern via lists.riscv.org <zalman=google.com@...> wrote:
First of all, discussion of libc functions such as strcmp is irrelevant to this thread, as they do not have vector register arguments. They pass pointers to arguments in memory and use (and always will use) the standard ABI, not an augmented Vector ABI as Kito is proposing.
If you have a machine with the properties you describe, and having a machine run both some heavy HPC task and some trivial task that uses the vector unit for strcpy() on the same core results in a severe overall performance penalty then you might indeed be advised not to do that. Run those lightweight spoiler tasks on different cores, or install a libc that doesn't use the vector unit. For everyone else with desktop PCs or phones or cloud servers etc, the vector unit should be used as much as possible! ARM seem to be intending to vectorise every loop in every program. I don't know if or when they will achieve that, or whether RISC-V compilers will do the same, but in the meantime getting memcpy(), memset(), strlen(), strcpy(), strcmp() and all their friends to use the vector unit is low hanging fruit that can instantly make a measurable improvement to every program on the machine. I ran some benchmarks of memcpy() and strcpy() on an Allwinner D1 machine (which has only 128 bit vector registers) 15 months ago (April 2021). Not only was in-cache performance often doubled, the "which version do I choose?" overhead for small sizes was reduced a lot. That machine has some quirks. Or course it is implementing RVV draft 0.7.1, but functions such as these are binary-compatible between them. It has only 128 bit vector registers, whereas it looks as if SiFive for example are intending 256 bit minimum. Most vector instructions on the D1 (C906 core) take 3*LMUL cycles regardless of whether the actual vector might use fewer than LMUL registers.
|
|
Re: Seeking inputs for evaluating vector ABI design
ghost
I would like to emphasize Zalman Stern's point about trading off hardware
economy for dynamic software optimization, in the context of a larger comment about optimizing compiled code for RISC-V. The specification of RVV is designed very well to work well across a variety of hardware implementations without requiring different code, but IMO one of the great truths of system design is that "compilation beats interpretation," and in this context, execution-time parameterization as defined for RVV is a form of interpretation that, like many kinds of interpretation, trades space and time overhead for convenience. For it to be most effective, the representation *from* which run-time code is generated must be sufficiently high-level: the higher the level, the greater the opportunities to tailor the code to the hardware. Not having experience with vector-amenable computation, I can't say anything more specific, other than to note the historical tug of war between, on the one hand, compilers that recognize vectorizable constructs in low-level languages like C, and on the other, very high-level languages like APL or Halide. The relevance to the present discussion is that RTCG may require detailed configuration discovery ABI/API that goes beyond the ABI for functional code. I hope the work of the relevant group(s) will take this into consideration. -- L Peter Deutsch :: Aladdin Enterprises :: Healdsburg, CA & Burnaby, BC
|
|
Re: Seeking inputs for evaluating vector ABI design
Zalman Stern
A plea to not design the future around vague and ill-considered use cases... The C string library is generally used for legacy/convenience on small strings. People with real performance on the table use something else. Yes it still matters, but if we're looking at really using a vector unit for text handling, an interface that is not pointers and zero termination based is almost certainly required. The opportunity is more to design that API than to shoehorn vectors under libc. Having a heterogeneous set of cores on an SoC is a given at this point. The small cores likely will not have a vector unit at all, but if one is going to push the vector extension into general purpose workloads, there will be pressure to have a small implementation on smaller cores and a high performance one on big cores. Fortunately the instruction set allows scaling the hardware implementation cleanly, but making all that seamless in software is tricky. A design that allows per thread constraints on which available vector units are acceptable is perhaps the thing to try for. Ideally this would be somewhat dynamic and setting and unsetting the constraints would be cheap. Note that many mainstream operating systems effectively ban this sort of hardware design and relegate big vector units to an accelerator role. The programming model for the accelerator is completely different than for the general purpose CPU. This has to change. With variable length vectors, there are also going to inherently be costs in supporting the dynamic size. The stack frame layout will need to support variable length slots or plan for a large maximum size, etc. Allowing one to constrain the compilation to a specific size is potentially a big win for cases where the hardware is known (e.g. firmware) or when doing just in time compilation. Specialization, having one or more versions of a routine optimized for known hardware, is also very likely to be a win over support for fully dynamic size vectors in many cases. Allowing the calling convention to support fixed size layout when it is known is important. Providing a means to efficiently dispatch to specialized routines is a good idea as well. (E.g. a restricted dynamic linking mechanism that has zero runtime overhead.) -Z-
On Tue, Jul 26, 2022 at 7:11 AM Kito Cheng <kito.cheng@...> wrote: Hi Jan:
|
|
Re: Seeking inputs for evaluating vector ABI design
Kito Cheng
Hi Jan:
Thanks for your amazing work! I think that it is very useful, it savesNOTE: We don't have a complete compiler auto vectorizerMight this implementation of math functions be helpful? It already supports RVV via intrinsics. us time to re-implement those functions with RVV :)
|
|
Re: Seeking inputs for evaluating vector ABI design
Jan Wassenberg
Hi Kito, NOTE: We don't have a complete compiler auto vectorizer Might this implementation of math functions be helpful? It already supports RVV via intrinsics.
|
|
Seeking inputs for evaluating vector ABI design
Kito Cheng
Hi:
I am Kito from the RISC-V psABI group, we've defined a basic vector ABI, which allows function use vector registers within function, that could be used for optimize several libraries like libc, e.g. we can use vector instruction to accelerate several memory and string manipulation functions like strcmp or memcpy. However we still missing a complete vector ABI which includes a vector calling convention and vector libraries interface for RISC-V vector extensions, that's high priority job for psABI group this year, one of major goal of this mail is seeking potential benchmark for evaluating the design of vector ABI and make sure no missing item in the plan, so any feedbacks are appreciated! # Vector Calling Convention (Highest priority) Vector calling convention will include following items: - Define a vector calling convention variant to allow program pass value with scalable vector type (e.g. vint32m1_t) within vector registers. - Define a vector calling convention variant to allow program pass value with fixed-vector type (e.g. int32x4_t) within vector registers. - Vector function signature/mangling # Vector Libraries Interface - Interface for math function, e.g. vector version of sin function, define the function name, function signature and the behavior for tail and masked-off elements. # Benchmarks: We would like to collect any benchmarks which contain function calls inside kernel function, since we need to evaluate the design of calling conversion like how many registers used to pass parameters and return value, and the allocation of callee-save and caller-save registers. Currently we are consider using follow benchmarks to evaluate the design of calling convention: - TSVC - PolyBenchC NOTE: We don't have a complete compiler auto vectorizer implementation, especially the ability for those math functions, so we'll rewrite the vectorized version by hand for evaluation. Thanks!
|
|
Re: RISCV Vector Compliance Test Suite
Kito Cheng
FYI: https://github.com/riscv-software-src/riscv-tests/pull/400
On Mon, Jul 25, 2022 at 8:49 PM Alexander Podoplelov <alexander.podoplelov@...> wrote:
|
|
Re: RISCV Vector Compliance Test Suite
Alexander Podoplelov
Also, could you please inform me about
RISC-V Vector compliance tests v1.0?
25.07.2022 13:52, Umer Shahid пишет:
|
|
Re: RISCV Vector Compliance Test Suite
Great, thanks for letting me know. Regards, Umer
On Mon, Jul 25, 2022 at 1:51 PM Krste Asanovic <krste@...> wrote:
--
Umer Shahid Member Technical Staff 10xEngineers Mobile: +92-334-4072836
|
|
Re: RISCV Vector Compliance Test Suite
Xi Wang has been developed vector compliance tests at RIOS lab,
toggle quoted messageShow quoted text
Krste
|
|
RISCV Vector Compliance Test Suite
Hello all,
I hope you are fine, safe, and healthy. I want to know if there is any test suite or platform which can be used to run RISC-V Vector compliance tests? We, in our team, have started to work on RVV version 1.0 compliance testing but we are unable to find any suitable test suite to generate or run our tests on it. If any team is working on it or anybody knows someone who has worked in this domain then please connect this thread to that person. Regards, Umer
|
|
Re: Vector element groups
| While I share some concern about the cited language, as this is a concept, and not a spec, I think the time to require checkingOn Fri, 15 Jul 2022 09:10:49 -0700, Earl Killian <earl.killian@...> said: | would be when individual specs implement the concept. I would think it would require some pretty good justification to not have | an exception. On further thought, I do think it makes sense to require raising of an illegal instruction exception when vl is not a multiple of element group size rather than leaving reserved. Will be updating the doc with rationale. Krste
|
|
Re: Vector element groups
| On another topic, I have this vague feeling that it would be best if we had VL and SEW always set for vector instructions, andOn Fri, 15 Jul 2022 09:10:49 -0700, Earl Killian <earl.killian@...> said: | not be implicit in the opcode, but I have not fleshed out this thought. Perhaps someone who has thought about it more would | like to elucidate the issues? We already have vector loads and stores with static EEW in the instruction, which ignore dynamic SEW. Future 64-bit encodings would also have static EEWs in instruction. If static encoding space was available, we would not have had dynamic SEW at all. The current EG proposal does require vl to be set. Krste
|
|
Re: Vector element groups
Nicolas Brunie
Hi Yann, I think Ken is referencing the optimization of splitting the sha256's state in two and merging rounds. It is for example described here : https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sha-extensions.html Regards, Nicolas
Le mar. 19 juil. 2022 à 01:47, Yann Loisel <yann.loisel@...> a écrit :
|
|