Re: Vector TG minutes for 2020/12/18 meeting
| in terms of overlap with that case — that case normally selects maximally sized AVL. the implied goals there are to make best use of vector register capacity andOn Tue, 16 Feb 2021 15:12:46 -0800, Guy Lemieux <guy.lemieux@gmail.com> said: | throughput. l | i’m suggesting a case where a minimally sized AVL is used, as chosen by the architect. | this allows a programmer to optimize for minimum latency while still | getting good throughput. in some cases, the full VLMAX state may still be used to hold data, but operations are chunked down to minimally sized AVL (eg for | latency reasons). I still don't see how hardware can set a <VLMAX value that will work well for any code in loop. Your latency comment seems to imply an external observer sees the individual strips go by (e.g., in DSP applicaiton where data comes in and goes out in chunks), as otherwise only total time to finish loop matters. In these situations, I also can't see having the microarchitecture pick the chunk size - usually the I/O latency constraint sets the chunk size and goal of vector execution is to execute the chunks as efficiently as possible. Krste | i’m not sure of the portability concerns. if an implementation is free to set VLMAX, and software must be written for any possible AVL that is returned, then it | appears to me that deliberately returning a smaller implementation-defined AVL should still be portable. | programming for min-latency isn’t common in HPC, but can be useful in real-time systems. | g | On Tue, Feb 16, 2021 at 3:01 PM <krste@berkeley.edu> wrote: | There's a large overlap here with the (rd!=x0,rs1=x0) case that | selects AVL=VLMAX. If migration is intended, then VLMAX should be | same across harts. | Machines with long temporal vector registers might benefit from using | less than VLMAX, but this is highly dependent on specifics of the | interaction of the microarchitecture and the scheduled application | kernel (otherwise, the long vector registers were a waste of | resources). I can't see how to do this portably beyond selecting | VLMAX. | Krste | | Of course, the implementation-defined value must be fixed across all harts, so thread migration doesn't break software. | | Guy | | On Mon, Feb 15, 2021 at 11:30 PM <krste@berkeley.edu> wrote: | | Replying to old thread to add rationale for current choice. | |||||| On Mon, 21 Dec 2020 13:52:07 -0800, Zalman Stern <zalman@google.com> said: | | | Does it get easier if the specification is just the immediate value plus one? | | No - this costs more gates on critical path. Mapping 00000 => 32 is | | simpler in area and delay. | | | I really don't understand how this encoding is particularly great for immediates as many of the valuhes are likely very rarely or even never used and | it | | seems | | | like one can't get long enough values even for existing SIMD hardware in some data types. Compare to e.g.: | | | (first_bit ? 3 : 1) << rest_of_the_bits | | | or: | | | map[] = { 1, 3, 5, 8 }; // Or maybe something else for 5 and 8 | | | map[first_two_bits] << rest_of_the_bits; | | | I.e. get a lot of powers of two, multiples of three-vecs for graphics, maybe something else. | | As a counter-example for this particular example, one code I looked at | | recently related to AR/VR used 9 as one dimension. | | The challenge is agreeing on the best mapping from the 32 immediate | | encodings to the most commonly used AVL values. | | More creative mappings do consume some incremental logic and path | | delay (as well as adding some complexity to software toolchain). | | While they can provide small gains in some cases, this is offset by | | small losses in other cases (someone will want AVL=17 somewhere, and | | it's not clear that say AVL=40 is a substantially better use of | | encoding). There is not huge penalty if the immediate does not fit, | | at most a li instruction, which might be hoisted out of the loop. | | The curent v0.10 definition uses the obvious mapping of the immediate. | | Simplicity is a virtue, and any potential gains are small for AVL > | | 31, where most implementation costs are amortized over the longer | | vector and many implementations won't support longer lengths for a | | given datatype in any case. | | Krste | | | -Z- | | | On Mon, Dec 21, 2020 at 10:47 AM Guy Lemieux <guy.lemieux@gmail.com> wrote: | | | for vsetivli, with the uimm=00000 encoding, rather than setting vl to 32, how setting it to some other meaning? | | | one option is to set vl=VLMAX. i have some concerns about software using this safely (eg, if VLMAX turns out to be much larger than software | | anticipated, | | | then it would fail; correcting this requires more instructions than just using the regular vsetvl/vsetvli would have used). | | | another option is to allow an implementation-defined vl to be chosen by hardware; this could be anywhere between 1 and VLMAX. for example, | | implementations | | | may just choose vl=32, or they may choose something else. it allows the CPU architect to devise a scheme that best fits the implementation. this | may | | | consider factors like the effective width of the execution engine, the pipeline depth (to reduce likelihood of stalls from dependent | instructions), or | | | that the vector register file is actually a multi-level memory hierarchy where some smaller values may operate with greater efficiency (lower | power), | | or | | | matching VL to the optimal memory system burst length. perhaps some guidance by the spec could be given here for the default scheme, eg whether | the | | | implementation optimizes for best performance or power (while still allowing implementations to modify this default via an implementation-defined | CSR). | | | software using a few extra cycles to check the returned vl against AVL should not a big problem (the simplest solution being vsetvli followed by | | vsetivli) | | | g | | | On Fri, Dec 18, 2020 at 6:13 PM Krste Asanovic <krste@berkeley.edu> wrote: | | | # vsetivli | | | A new variant of vsetvl was proposed providing an immediate as the AVL | | | in rs1[4:0]. The immediate encoding is the same as for CSR immediate | | | instructions. The instruction would have bit 31:30 = 11 and bits 29:20 | | | would be encoded same as vsetvli. | | | This would be used when AVL was statically known, and known to fit | | | inside vector register group. Compared with existing PoR, it removes | | | need to load immediate into a spare scalar register before executing | | | vsetvli, and is useful for handling scalar values in vector register | | | (vl=1) and other cases where short fixed-sized vectors are the | | | datatype (e.g., graphics). | | | There was discussion on whether uimm=00000 should represent 32 or be | | | reserved. 32 is more useful, but adds a little complexity to | | | hardware. | | | There was also discussion on whether instruction should set vill if | | | selected AVL is not supported, or whether should clip vl to VLMAX as | | | with other instructions, or if behavior should be reserved. Group | | | generally favored writing vill to expose software errors. | | |
|
|
Re: Vector TG minutes for 2020/12/18 meeting
Guy Lemieux
in terms of overlap with that case — that case normally selects maximally sized AVL. the implied goals there are to make best use of vector register capacity and throughput. l i’m suggesting a case where a minimally sized AVL is used, as chosen by the architect. this allows a programmer to optimize for minimum latency while still getting good throughput. in some cases, the full VLMAX state may still be used to hold data, but operations are chunked down to minimally sized AVL (eg for latency reasons). i’m not sure of the portability concerns. if an implementation is free to set VLMAX, and software must be written for any possible AVL that is returned, then it appears to me that deliberately returning a smaller implementation-defined AVL should still be portable. programming for min-latency isn’t common in HPC, but can be useful in real-time systems. g
On Tue, Feb 16, 2021 at 3:01 PM <krste@...> wrote: There's a large overlap here with the (rd!=x0,rs1=x0) case that
|
|
Re: Vector TG minutes for 2020/12/18 meeting
| I agree with you.On Tue, 16 Feb 2021 01:48:57 -0800, Guy Lemieux <guy.lemieux@gmail.com> said: | I had suggested the mapping of 00000 to an implementation-defined value (chosen by the CPU architect). For some architectures, this may be 16, for others it may | be 32, or even 2. | The value selected should be selected as the minimum recommended vector length that can achieve good performance (high FU utilization or good memory bandwidth, | or a balance) on the underlying hardware. | This would greatly simplify software that just wants to get "reasonable" acceleration without writing code to measure performance of the underlying hardware. | Such code may select poor values if harts are heterogeneous and a thread migrates. By making this implementation-defined, a value suitable for all harts can be | selected by the processor architect. There's a large overlap here with the (rd!=x0,rs1=x0) case that selects AVL=VLMAX. If migration is intended, then VLMAX should be same across harts. Machines with long temporal vector registers might benefit from using less than VLMAX, but this is highly dependent on specifics of the interaction of the microarchitecture and the scheduled application kernel (otherwise, the long vector registers were a waste of resources). I can't see how to do this portably beyond selecting VLMAX. Krste | Of course, the implementation-defined value must be fixed across all harts, so thread migration doesn't break software. | Guy | On Mon, Feb 15, 2021 at 11:30 PM <krste@berkeley.edu> wrote: | Replying to old thread to add rationale for current choice. |||||| On Mon, 21 Dec 2020 13:52:07 -0800, Zalman Stern <zalman@google.com> said: | | Does it get easier if the specification is just the immediate value plus one? | No - this costs more gates on critical path. Mapping 00000 => 32 is | simpler in area and delay. | | I really don't understand how this encoding is particularly great for immediates as many of the valuhes are likely very rarely or even never used and it | seems | | like one can't get long enough values even for existing SIMD hardware in some data types. Compare to e.g.: | | (first_bit ? 3 : 1) << rest_of_the_bits | | or: | | map[] = { 1, 3, 5, 8 }; // Or maybe something else for 5 and 8 | | map[first_two_bits] << rest_of_the_bits; | | I.e. get a lot of powers of two, multiples of three-vecs for graphics, maybe something else. | As a counter-example for this particular example, one code I looked at | recently related to AR/VR used 9 as one dimension. | The challenge is agreeing on the best mapping from the 32 immediate | encodings to the most commonly used AVL values. | More creative mappings do consume some incremental logic and path | delay (as well as adding some complexity to software toolchain). | While they can provide small gains in some cases, this is offset by | small losses in other cases (someone will want AVL=17 somewhere, and | it's not clear that say AVL=40 is a substantially better use of | encoding). There is not huge penalty if the immediate does not fit, | at most a li instruction, which might be hoisted out of the loop. | The curent v0.10 definition uses the obvious mapping of the immediate. | Simplicity is a virtue, and any potential gains are small for AVL > | 31, where most implementation costs are amortized over the longer | vector and many implementations won't support longer lengths for a | given datatype in any case. | Krste | | -Z- | | On Mon, Dec 21, 2020 at 10:47 AM Guy Lemieux <guy.lemieux@gmail.com> wrote: | | for vsetivli, with the uimm=00000 encoding, rather than setting vl to 32, how setting it to some other meaning? | | one option is to set vl=VLMAX. i have some concerns about software using this safely (eg, if VLMAX turns out to be much larger than software | anticipated, | | then it would fail; correcting this requires more instructions than just using the regular vsetvl/vsetvli would have used). | | another option is to allow an implementation-defined vl to be chosen by hardware; this could be anywhere between 1 and VLMAX. for example, | implementations | | may just choose vl=32, or they may choose something else. it allows the CPU architect to devise a scheme that best fits the implementation. this may | | consider factors like the effective width of the execution engine, the pipeline depth (to reduce likelihood of stalls from dependent instructions), or | | that the vector register file is actually a multi-level memory hierarchy where some smaller values may operate with greater efficiency (lower power), | or | | matching VL to the optimal memory system burst length. perhaps some guidance by the spec could be given here for the default scheme, eg whether the | | implementation optimizes for best performance or power (while still allowing implementations to modify this default via an implementation-defined CSR). | | software using a few extra cycles to check the returned vl against AVL should not a big problem (the simplest solution being vsetvli followed by | vsetivli) | | g | | On Fri, Dec 18, 2020 at 6:13 PM Krste Asanovic <krste@berkeley.edu> wrote: | | # vsetivli | | A new variant of vsetvl was proposed providing an immediate as the AVL | | in rs1[4:0]. The immediate encoding is the same as for CSR immediate | | instructions. The instruction would have bit 31:30 = 11 and bits 29:20 | | would be encoded same as vsetvli. | | This would be used when AVL was statically known, and known to fit | | inside vector register group. Compared with existing PoR, it removes | | need to load immediate into a spare scalar register before executing | | vsetvli, and is useful for handling scalar values in vector register | | (vl=1) and other cases where short fixed-sized vectors are the | | datatype (e.g., graphics). | | There was discussion on whether uimm=00000 should represent 32 or be | | reserved. 32 is more useful, but adds a little complexity to | | hardware. | | There was also discussion on whether instruction should set vill if | | selected AVL is not supported, or whether should clip vl to VLMAX as | | with other instructions, or if behavior should be reserved. Group | | generally favored writing vill to expose software errors. | |
|
|
Re: Vector TG minutes for 2020/12/18 meeting
Guy Lemieux
I agree with you. I had suggested the mapping of 00000 to an implementation-defined value (chosen by the CPU architect). For some architectures, this may be 16, for others it may be 32, or even 2. The value selected should be selected as the minimum recommended vector length that can achieve good performance (high FU utilization or good memory bandwidth, or a balance) on the underlying hardware. This would greatly simplify software that just wants to get "reasonable" acceleration without writing code to measure performance of the underlying hardware. Such code may select poor values if harts are heterogeneous and a thread migrates. By making this implementation-defined, a value suitable for all harts can be selected by the processor architect. Of course, the implementation-defined value must be fixed across all harts, so thread migration doesn't break software. Guy
On Mon, Feb 15, 2021 at 11:30 PM <krste@...> wrote:
|
|
Re: Vector TG minutes for 2020/12/18 meeting
Replying to old thread to add rationale for current choice.
| Does it get easier if the specification is just the immediate value plus one?On Mon, 21 Dec 2020 13:52:07 -0800, Zalman Stern <zalman@google.com> said: No - this costs more gates on critical path. Mapping 00000 => 32 is simpler in area and delay. | I really don't understand how this encoding is particularly great for immediates as many of the valuhes are likely very rarely or even never used and it seems | like one can't get long enough values even for existing SIMD hardware in some data types. Compare to e.g.: | (first_bit ? 3 : 1) << rest_of_the_bits | or: | map[] = { 1, 3, 5, 8 }; // Or maybe something else for 5 and 8 | map[first_two_bits] << rest_of_the_bits; | I.e. get a lot of powers of two, multiples of three-vecs for graphics, maybe something else. As a counter-example for this particular example, one code I looked at recently related to AR/VR used 9 as one dimension. The challenge is agreeing on the best mapping from the 32 immediate encodings to the most commonly used AVL values. More creative mappings do consume some incremental logic and path delay (as well as adding some complexity to software toolchain). While they can provide small gains in some cases, this is offset by small losses in other cases (someone will want AVL=17 somewhere, and it's not clear that say AVL=40 is a substantially better use of encoding). There is not huge penalty if the immediate does not fit, at most a li instruction, which might be hoisted out of the loop. The curent v0.10 definition uses the obvious mapping of the immediate. Simplicity is a virtue, and any potential gains are small for AVL > 31, where most implementation costs are amortized over the longer vector and many implementations won't support longer lengths for a given datatype in any case. Krste | -Z- | On Mon, Dec 21, 2020 at 10:47 AM Guy Lemieux <guy.lemieux@gmail.com> wrote: | for vsetivli, with the uimm=00000 encoding, rather than setting vl to 32, how setting it to some other meaning? | one option is to set vl=VLMAX. i have some concerns about software using this safely (eg, if VLMAX turns out to be much larger than software anticipated, | then it would fail; correcting this requires more instructions than just using the regular vsetvl/vsetvli would have used). | another option is to allow an implementation-defined vl to be chosen by hardware; this could be anywhere between 1 and VLMAX. for example, implementations | may just choose vl=32, or they may choose something else. it allows the CPU architect to devise a scheme that best fits the implementation. this may | consider factors like the effective width of the execution engine, the pipeline depth (to reduce likelihood of stalls from dependent instructions), or | that the vector register file is actually a multi-level memory hierarchy where some smaller values may operate with greater efficiency (lower power), or | matching VL to the optimal memory system burst length. perhaps some guidance by the spec could be given here for the default scheme, eg whether the | implementation optimizes for best performance or power (while still allowing implementations to modify this default via an implementation-defined CSR). | software using a few extra cycles to check the returned vl against AVL should not a big problem (the simplest solution being vsetvli followed by vsetivli) | g | On Fri, Dec 18, 2020 at 6:13 PM Krste Asanovic <krste@berkeley.edu> wrote: | # vsetivli | A new variant of vsetvl was proposed providing an immediate as the AVL | in rs1[4:0]. The immediate encoding is the same as for CSR immediate | instructions. The instruction would have bit 31:30 = 11 and bits 29:20 | would be encoded same as vsetvli. | This would be used when AVL was statically known, and known to fit | inside vector register group. Compared with existing PoR, it removes | need to load immediate into a spare scalar register before executing | vsetvli, and is useful for handling scalar values in vector register | (vl=1) and other cases where short fixed-sized vectors are the | datatype (e.g., graphics). | There was discussion on whether uimm=00000 should represent 32 or be | reserved. 32 is more useful, but adds a little complexity to | hardware. | There was also discussion on whether instruction should set vill if | selected AVL is not supported, or whether should clip vl to VLMAX as | with other instructions, or if behavior should be reserved. Group | generally favored writing vill to expose software errors. |
|
|
Re: New member request for participation info
On Sun, Feb 7, 2021 at 9:48 PM ghost <ghost@...> wrote: been overlooked. Could someone please tell me (1) where to find the Try gitrhub.com/riscv/riscv-v-spec. There is also software stuff like riscv/rvv-intrinsic-doc that is defining compiler intrinsics for the vector spec. Like bitmanip, you can file issues or pull requests which is probably the best approach. Or you can send email to this list. Or raise issues in the meeting. You can find the "Tech Groups Calendar" on the wiki, along with a lot of other useful info like specifications status. Jim
|
|
New member request for participation info
ghost
Hello,
I've just joined the RISC-V technical community and the V Extension Task group. I have very substantial experience in careful technical documentation (I wrote the RFCs for gzip and DEFLATE, and was one of the few non-Adobe reviewers for the PostScript and PDF reference manuals). What I'm hoping to contribute to the RISC-V community is primarily documentation review. I know that the V extension(s) are quite close to release for public comment, but I would still like the opportunity to review them in detail -- another pair of eyes can sometimes spot things that have been overlooked. Could someone please tell me (1) where to find the current spec drafts, and (2) the best way to share any observations? Thanks - L Peter Deutsch <ghost@major2nd.com> :: Aladdin Enterprises :: Healdsburg, CA Was your vote really counted? http://www.verifiedvoting.org
|
|
Vector Task Group minutes for 2021/02/05 meeting
Date: 2021/02/05
Task Group: Vector Extension Chair: Krste Asanovic Vice-Chair: Roger Espasa Number of Attendees: ~16 Current issues on github: https://github.com/riscv/riscv-v-spec # Next Meeting It was decided to meet again in two weeks (Feb 19) to allow time for everyone to digest and comment on the v0.10 release version. Please send PRs for any small typos and clarifications, and use mailing list for larger issues. Issues discussed # Assembly synatx There was a desire to move away from allowing vsetvl to imply "undisturbed" behavior by default, to ensure maximum use of "agnostic" by software. The assembler can issue errors instead of warnings when the ta/tu/ma/mu fields are not explicitly given, with perhaps an option to allow with a warning to allow older code to be compiled. # Spec formatting There was some discussion on use of Wavedrom formatting tools. New tools to give diagrams for the register layout will be added. There was also a promise of somewhat faster build times for the doc. There apparently is a central flow for running document generation on commits at riscv.org, and we need to sync up with that process. # Extend agnostic behavior of mask logical operations There was a request to extend tail-agnostic behavior of mask logical instructions to allow the tail to be overwritten with values corresponding to the logical operation (as opposed to agnostic values that currently can only be all 1s or the previous destination value). This is a relaxation of requireements, so would not affect compatibility of existing implementations. To be discussed.
|
|
Next RISC-V Vector Task Group Meeting reminder
We’ll meet tomorrow in usual slot per TG calendar.
The agenda is to review any feedback on the 0.10 spec and then to proceed through any outstanding issues, Krste
|
|
Request for Candidates for Vector Extension Task Group Chair and Vice-Chair
Chuanhua Chang
Hi all,
As part of the (first) annual process of holding elections for chairs of current Task Groups, this is a request for candidates for the chair and vice-chair positions in the Vector Extension task group. This is open to both current chairs and new candidates. All candidates must be nominated by February 15. If you would like to nominate yourself (or someone else), please send an email with the candidate's name, member affiliation, and qualifications to Krste and myself (cc'ed above; we are the chairs of the overseeing Unprivileged Architecture Standing Committee).
Qualifications: * Must be knowledgeable about vector and SIMD instruction sets * Experience in vectorizing compilers and compiler technology in general is a plus, but not a strict requirement * Good understanding of memory ordering is highly desirable * Must be knowledgeable about the RISC-V ISA, including the privileged ISA specification (but need not be an expert in all aspects of the ISA) * Experience developing standards or specifications is a plus, but not a strict requirement * Must be organized; must be able to schedule and run regular meetings and ensure that minutes are recorded * Must be responsive to emails from the group, from other RISC-V staff and contributors, and from the public during the public review process * Must be able to lead and work with contributors from diverse backgrounds, with varying levels of experience, and across many different companies and institutions
General Chair & Vice-Chair Duties: * Driving the specification development process forward according to the group charter * Managing project status on the status spreadsheet * Interacting with the community through meetings, mailing list, github, wiki, and Google docs * Responding to queries within 48 hrs (see the Question & Answer policy) * Managing & running regular meetings as per the group charter * Attending weekly tech-chairs meetings
Please also read the Groups policy to understand this role. If you have any questions, contact Krste and myself.
Best regards, Krste and Chuanhua
|
|
Re: Vector TG minutes, 2021/1/29
Jan Wassenberg
Thanks Andrew, Bill, Krste, I've implemented this per our discussion.
On Sat, Jan 30, 2021 at 11:24 PM Andrew Waterman <andrew@...> wrote:
|
|
Re: Vector TG minutes, 2021/1/29
Andrew Waterman
I attempted to write a sequence that addresses that observation, along with a couple other pedantic details, and posted it to the ticket.
On Sat, Jan 30, 2021 at 11:37 AM Bill Huffman <huffman@...> wrote: Krste,
|
|
Re: Vector TG minutes, 2021/1/29
Bill Huffman
Krste,
toggle quoted messageShow quoted text
I think the round float to integer as float sequence needs to use vmfle.vf with the usage of the mask inverted. Otherwise NaN values will use the integer instead of the original NaN. Bill
-----Original Message-----
From: tech-vector-ext@lists.riscv.org <tech-vector-ext@lists.riscv.org> On Behalf Of Krste Asanovic Sent: Friday, January 29, 2021 8:52 PM To: tech-vector-ext@lists.riscv.org Subject: [RISC-V] [tech-vector-ext] Vector TG minutes, 2021/1/29 EXTERNAL MAIL Date: 2021/01/29 Task Group: Vector Extension Chair: Krste Asanovic Co-Chair: Roger Espasa Number of Attendees: ~16 Current issues on github: https://urldefense.com/v3/__https://github.com/riscv/riscv-v-spec__;!!EHscmS1ygiU1lA!S65qKyz6MSIPofhHPvb7N7aMd1B9iyvNSfZqcbXAgxL4BM6BEBnR3Y3RZOmaCz8$ # Meetings Meetings will continue in the regular Friday time slot as given on the task groups' Google calendar. # v0.10 release Version v0.10 (zero-point-ten) is tagged in repo, incorporating all specification changes agreed to date. A numbered version was requested by toolchain folks to help with their release process. An archived pdf build of v0.10 is available at: https://urldefense.com/v3/__https://github.com/riscv/riscv-v-spec/releases/download/v0.10/riscv-v-spec-0.10.pdf__;!!EHscmS1ygiU1lA!S65qKyz6MSIPofhHPvb7N7aMd1B9iyvNSfZqcbXAgxL4BM6BEBnR3Y3RUN2S-AA$ This version is intended to provide a stable milestone for internal development, but is not ready for public review or ratification. However, the intent is that there are no substantial changes in instruction specifications before reaching frozen v1.0 status for public review, though of course there cannot be any guarantees. The repo version name has been updated for the next stage of editing. Issues discussed: #623 Round Float to integer as float The current ISA spec does not have an instruction that rounds a float to an integer but leaving result as a float (IEEE RoundToIntegral*). The following five vector instruction sequence was quickly proposed, but not checked for correctness or completeness: vfcvt.rtz.x.f.v v8, v4 # Round to integer of same width (could use frm) vfcvt.f.x.v v8, v8 # Round back to float of same width vfabs.f.f.v v12, v4 # Get magnitude of original value. vmfgt.vf v0, v4, f0 # Is large integer already? f0 has threshold. vmerge.vvm v4, v8, v4, v0 # Leave alone if already an integer Group was to evaluate correct equivalent sequence and to determine relative importance of supporting the operation directly. #550 Zve* subsets There was discussion on the exact composition of the Zve* subsets, though mostly there was agreement with decisions in earlier meetings. One question is whether multiplies that produce the upper word of a ELEN*ELEN-bit product (vmulh* and vsmul*) should be mandated on all embedded subsets when ELEN>=64.
|
|
Vector TG minutes, 2021/1/29
Date: 2021/01/29
Task Group: Vector Extension Chair: Krste Asanovic Co-Chair: Roger Espasa Number of Attendees: ~16 Current issues on github: https://github.com/riscv/riscv-v-spec # Meetings Meetings will continue in the regular Friday time slot as given on the task groups' Google calendar. # v0.10 release Version v0.10 (zero-point-ten) is tagged in repo, incorporating all specification changes agreed to date. A numbered version was requested by toolchain folks to help with their release process. An archived pdf build of v0.10 is available at: https://github.com/riscv/riscv-v-spec/releases/download/v0.10/riscv-v-spec-0.10.pdf This version is intended to provide a stable milestone for internal development, but is not ready for public review or ratification. However, the intent is that there are no substantial changes in instruction specifications before reaching frozen v1.0 status for public review, though of course there cannot be any guarantees. The repo version name has been updated for the next stage of editing. Issues discussed: #623 Round Float to integer as float The current ISA spec does not have an instruction that rounds a float to an integer but leaving result as a float (IEEE RoundToIntegral*). The following five vector instruction sequence was quickly proposed, but not checked for correctness or completeness: vfcvt.rtz.x.f.v v8, v4 # Round to integer of same width (could use frm) vfcvt.f.x.v v8, v8 # Round back to float of same width vfabs.f.f.v v12, v4 # Get magnitude of original value. vmfgt.vf v0, v4, f0 # Is large integer already? f0 has threshold. vmerge.vvm v4, v8, v4, v0 # Leave alone if already an integer Group was to evaluate correct equivalent sequence and to determine relative importance of supporting the operation directly. #550 Zve* subsets There was discussion on the exact composition of the Zve* subsets, though mostly there was agreement with decisions in earlier meetings. One question is whether multiplies that produce the upper word of a ELEN*ELEN-bit product (vmulh* and vsmul*) should be mandated on all embedded subsets when ELEN>=64.
|
|
v0.10 release of vector spec
I cut a v0.10 release after adding all the substantial pending updates. There is still a bunch of work to do before public review, but this is a convenient milestone for toolchain developers,
Krste
|
|
Next Vector TG meeting tomorrow, Friday Jan 29
I scheduled next vector TG meeting tomorrow in usual slot with usual zoom link on TG Google calendar.
I hope to push out updated spec sometime before then, Krste
|
|
Re: Restarting vector TG meetings next week
Jeffrey Osier-Mixon <josiermixon@...>
Hi Krste - was this ever scheduled? thanks
On Thu, Jan 21, 2021 at 4:12 PM Krste Asanovic <krste@...> wrote: I was going to restart the vector TG meetings next week (Jan 29), and have a goal of having most pending updates added to the spec a few days before then.
|
|
About vmv.x.s should be vs1 = 0?
yahan@...
I see it on riscv-v-spec commit: 0e8cdeb26bb98de2b1089d79a681af2c5a65e712
vmv.x.s belong to VWXUNARY0 and OPMVV
But OPMVV has only vs1 no rs1, see :
So i think `vmv.x.s rd, vs2 # x[rd] = vs2[0] (rs1=0)` should be fixed to `vmv.x.s rd, vs2 # x[rd] = vs2[0] (vs1=0)` or
encode vmv.x.s to VRXUNARY0 and OPMVX?
see also : https://github.com/riscv/riscv-v-spec/issues/625
|
|
Restarting vector TG meetings next week
I was going to restart the vector TG meetings next week (Jan 29), and have a goal of having most pending updates added to the spec a few days before then.
Krste
|
|
Re: Vector TG minutes for 2020/12/18 meeting
lidawei14@...
Perhaps for explicit naming conventions of mask operations, we can name "vle1.v" to "vmle1.v" instead.
|
|