Date   

chapter 7.8. Vector Load/Store Segment Instructions

Alexander Podoplelov
 

Hello!

I have a question about vector segment load and stores.

In table 14 we have NFIELDS from 1 to 8.

In paragraphes 7.8.1-3 we have format like

vlseg<nf>e<eew>.v vd, (rs1), vm

vsseg<nf>e<eew>.v vs3, (rs1), vm

From specification it is not clear for me

Is it possible to have instruction like vlseg1e8.v vd, (rs1), vm

This question is about all vector segment load and stores.

Right now assembly do not know opcodes for these instructions.

Despite of there is no any sense of using vlseg1e8.v vd, (rs1), vm (please, correct me if I wrong) I suppose it is need to be noted somewhere about supporting / not supporting these opcodes.

Best Regards, Aleksandr Podoplelov


about masked-off bits for instructions vmsbf.m, vmsif.m, vmsof.m #defines

lilei2@...
 

Hi,
I have a question about masked-off bits. 
I am not sure what is the behavior of destination inactive masked-off bits for instructions vmsbf.m, vmsif.m, vmsof.m. Does the "xxxx" means we can fill any value to these bits, regardless of vtype.vma? 
I copied the example codes below, which is from section 15.4 of RVV spec 1.0 frozen:
1 1 0 0 0 0 1 1 v0 vcontents
1 0 0 1 0 1 0 0 v3 contents
                       vmsbf.m v2, v3, v0.t
0 1 x x x x 1 1  v2 contents
 
In addition, whether all mask-result instructions need to fill the mask-off bits according to the vtype.vma policy, such as vector integer compare instructions?
And is it allowed that the implementation choose to only support mask-agnostic and tail-agnostic for mask-result instructions?
Thanks.


Re: The Width of vcsr and vstart

Krste Asanovic
 

Thanks for spotting the oversight. 
The spec was updated to indicate these should be treated as XLEN-bit wide registers.

There is no effective difference right now given that upper bits are not currently defined, but there may be some use for >32 bits in vcsr in some distant future. Vstart could also acquire extra exception state in some distant future.

Krste


On Dec 16, 2021, at 1:32 AM, Andrew Waterman <andrew@...> wrote:

For the current V extension, it's correct to treat both vcsr and vstart as 32-bit registers.  I agree the spec should clearly indicate whether or not these registers will always be 32 bits (like fcsr).


On Wed, Dec 15, 2021 at 6:31 PM Tianyi Xia via lists.riscv.org <tianshi.xty=alibaba-inc.com@...> wrote:
Hi,all
Debuggers can use abstract commands to access CSRs. When using abstract commands, debuggers need to specify the bit width of 
CSRs. The bit widths of vcsr and vstart are not clearly defined in Vector Extension version1.0. In an RV64 implementation, The debugger is not clear whether the bit width of these two CSRs should be regarded as 32 or 64. May be we need specify the bit width of these CSRs in the spec, XLEN or fixed-length 32bit.
 The fcsr defined in RISC-V Unprivileged ISA is fixed-length 32bit CSR. The register structure of vcsr is similar to fcsr.So maybe vcsr should also be defined as a fixed-length 32bit register?

 

Thanks,

 

Tianyi Xia




Re: [RISC-V] [tech-unprivileged] [RISC-V] [tech-vector-ext] FP Trapped exceptions needed for portability

Allen Baum
 

IF the Trap-on-masked-fflags op isn't executed often, then a 4 instruction sequence 
(CSRRD FFLAGS, ANDI, BNE, .+4, ECALL) would do that, so there is a workaround.
IF that has a performance impact, then it argues that you may need actual trapping behavior.


On Fri, Dec 17, 2021 at 5:05 PM ghost <ghost@...> wrote:
> I’d suggest identifying important use cases for this. I’d also be looking at
> software techniques where the compiler inserts checks to provide the
> necessary support for the use cases first.

Along with this, I'd suggest considering an extension that consists of just
one instruction: trap if (FP flags & mask in instruction) is non-zero.  I'm
not a hardware designer, but it seems to me that this would allow
floating-point computation to run at full speed until a point selected by
the programmer or compiler where a precise trap was needed, and the more
instructions the compiler can place between the FP computation and the
conditional trap, the less likely a pipeline stall.

--

L Peter Deutsch <ghost@...> :: Aladdin Enterprises :: Healdsburg, CA

         Was your vote really counted?  http://www.verifiedvoting.org






Re: [RISC-V] [tech-unprivileged] [RISC-V] [tech-vector-ext] FP Trapped exceptions needed for portability

Krste Asanovic
 

I'll note that ARM appears to detect tininess before rounding, while
x86 does so after rounding.

Also, current ARM compilers don't support exception trapping on
AArch64.

https://developer.arm.com/documentation/dui0808/a/floating-point-support/exception-types-recognized-by-the-arm-floating-point-environment

These decisions would not seem to match an intent by ARM to emulate
x86 FP behavior to ease porting.

Krste

On Fri, 17 Dec 2021 16:56:58 -0800, Zalman Stern <zalman@...> said:
| I’d suggest identifying important use cases for this. I’d also be looking at software techniques where the compiler inserts checks to provide the necessary support for the
| use cases first.

| Probably the number one use case is a software emulator for x86 binaries on RISC V. (Because one has to provide the exact x86 behavior regardless of whether it is a strong
| requirement for significant applications.) This alone could have driven things for ARM. The way to investigate would be to look at how Apple’s emulator works.

| Glancing at the large corpus of code one can search at Google, yeah, there's enough stuff claiming a SIGFPE is going to happen in certain circumstances that floating point
| exceptions can't be written off. But most of it looks like stuff that would far better be handled by having the compiler check a hardware provided flag and raise the
| exception rather than having hardware do everything. (It is mostly stuff that is providing some fairly widely used, non-HPC, mathematical functionality and trying to ensure
| a program crashes when numerical invariants are violated.)

| My first thought was to ask why one would want this at all as I've done a fair bit of signal-processing/HPC-ish work in shipping applications and floating-point exceptions
| are only ever used as a debugging tool. Generally most of my interaction with the feature has been fixing performance and correctness issues when floating-point exceptions
| inadvertently get enabled.

| -Z-

| -Z-

| On Fri, Dec 17, 2021 at 3:46 PM Bruce Hoult <bruce@...> wrote:

| On Sat, Dec 18, 2021 at 9:09 AM Earl Killian <earl.killian@...> wrote:

| The question I have is whether having this in scalar only would be sufficient? If porting an application were to need exception traps, it seems plausible to disable
| compiler vectorization.

| The MIPS patent should have expired by now, so it would solve the problem (except for inexact) on a simple in-order core. Does anyone know if x86 code uses inexact
| traps?

| What's the use-case for trapping on inexact (or even caring about it) in FP? Using doubles as 53 bit integers? I did that myself in accounting software back in the 80s
| and 90s, but it's a bit pointless on a 64 bit machine.
|


Re: [RISC-V] [tech-unprivileged] [RISC-V] [tech-vector-ext] FP Trapped exceptions needed for portability

ghost
 

I’d suggest identifying important use cases for this. I’d also be looking at
software techniques where the compiler inserts checks to provide the
necessary support for the use cases first.
Along with this, I'd suggest considering an extension that consists of just
one instruction: trap if (FP flags & mask in instruction) is non-zero. I'm
not a hardware designer, but it seems to me that this would allow
floating-point computation to run at full speed until a point selected by
the programmer or compiler where a precise trap was needed, and the more
instructions the compiler can place between the FP computation and the
conditional trap, the less likely a pipeline stall.

--

L Peter Deutsch <ghost@...> :: Aladdin Enterprises :: Healdsburg, CA

Was your vote really counted? http://www.verifiedvoting.org


Re: [RISC-V] [tech-unprivileged] [RISC-V] [tech-vector-ext] FP Trapped exceptions needed for portability

Zalman Stern
 

I’d suggest identifying important use cases for this. I’d also be looking at software techniques where the compiler inserts checks to provide the necessary support for the use cases first.

Probably the number one use case is a software emulator for x86 binaries on RISC V. (Because one has to provide the exact x86 behavior regardless of whether it is a strong requirement for significant applications.) This alone could have driven things for ARM. The way to investigate would be to look at how Apple’s emulator works.

Glancing at the large corpus of code one can search at Google, yeah, there's enough stuff claiming a SIGFPE is going to happen in certain circumstances that floating point exceptions can't be written off. But most of it looks like stuff that would far better be handled by having the compiler check a hardware provided flag and raise the exception rather than having hardware do everything. (It is mostly stuff that is providing some fairly widely used, non-HPC, mathematical functionality and trying to ensure a program crashes when numerical invariants are violated.)

My first thought was to ask why one would want this at all as I've done a fair bit of signal-processing/HPC-ish work in shipping applications and floating-point exceptions are only ever used as a debugging tool. Generally most of my interaction with the feature has been fixing performance and correctness issues when floating-point exceptions inadvertently get enabled.

-Z-

-Z-

On Fri, Dec 17, 2021 at 3:46 PM Bruce Hoult <bruce@...> wrote:
On Sat, Dec 18, 2021 at 9:09 AM Earl Killian <earl.killian@...> wrote:
The question I have is whether having this in scalar only would be sufficient? If porting an application were to need exception traps, it seems plausible to disable compiler vectorization.

The MIPS patent should have expired by now, so it would solve the problem (except for inexact) on a simple in-order core. Does anyone know if x86 code uses inexact traps?

What's the use-case for trapping on inexact (or even caring about it) in FP? Using doubles as 53 bit integers? I did that myself in accounting software back in the 80s and 90s, but it's a bit pointless on a 64 bit machine.


Re: [RISC-V] [tech-unprivileged] [RISC-V] [tech-vector-ext] FP Trapped exceptions needed for portability

Bruce Hoult
 

On Sat, Dec 18, 2021 at 9:09 AM Earl Killian <earl.killian@...> wrote:
The question I have is whether having this in scalar only would be sufficient? If porting an application were to need exception traps, it seems plausible to disable compiler vectorization.

The MIPS patent should have expired by now, so it would solve the problem (except for inexact) on a simple in-order core. Does anyone know if x86 code uses inexact traps?

What's the use-case for trapping on inexact (or even caring about it) in FP? Using doubles as 53 bit integers? I did that myself in accounting software back in the 80s and 90s, but it's a bit pointless on a 64 bit machine.


Re: [RISC-V] [tech-unprivileged] [RISC-V] [tech-vector-ext] FP Trapped exceptions needed for portability

Earl Killian
 

The question I have is whether having this in scalar only would be sufficient? If porting an application were to need exception traps, it seems plausible to disable compiler vectorization.

The MIPS patent should have expired by now, so it would solve the problem (except for inexact) on a simple in-order core. Does anyone know if x86 code uses inexact traps?

On Dec 17, 2021, at 12:02, Krste Asanovic <krste@...> wrote:


On Dec 17, 2021, at 11:48 AM, andrew@... wrote:

Defining a standard extension that provides precise traps on FP exceptions seems like a reasonable thing to do, if only to facilitate the use case you mention in a standard way. The strategy would presumably be to add another five bits to the fcsr that indicate which exceptions will raise traps.

Yes, this would be the obvious path to take.

Some use cases, including maybe this one, might prefer FP traps to be horizontal into user mode.

But I’ll also briefly remark that not requiring traps on FP exceptions has been a godsend for implementing high-performance in-order cores, where data-dependent traps would preclude early retirement and deferred execution of these instructions. So there’s good reason never to make such an extension mandatory, even in the RVA profiles.

There is the old MIPS FPU pipeline trick of conservative early check of exponent ranges to determine trap is impossible to reduce impact in this case.

But FP trap handling is too non-standard/heavyweight/buggy to get widespread use in portable code (along with other corners of IEEE FP spec), so agree it doesn’t seem to be on path to RVA mandate.

Krste



On Fri, Dec 17, 2021 at 11:31 AM Ken Dockser <kad@...> wrote:
While I understand that it had been decided long ago (relatively speaking) that RISC-V would not support trapping on floating-point exceptions, I am wondering if we need to revisit this.

I have heard that ARM's rationale for adding floating-point exception trap capabilities in ARMv7.8 was not because of an inherent need for new code, but for enabling the efficient porting of X86 code to ARM.

Does anyone out there have any experience with porting X86 code to RISC-V? Has the lack of trapped FP exceptions hindered such porting?
Likewise, is there an interest in proposing a TG to create an extension that adds FP trap capabilities to Scalar and Vector FP.

Thanks,
Ken 

















Re: [RISC-V] [tech-unprivileged] [RISC-V] [tech-vector-ext] FP Trapped exceptions needed for portability

Krste Asanovic
 


On Dec 17, 2021, at 11:48 AM, andrew@... wrote:

Defining a standard extension that provides precise traps on FP exceptions seems like a reasonable thing to do, if only to facilitate the use case you mention in a standard way. The strategy would presumably be to add another five bits to the fcsr that indicate which exceptions will raise traps.

Yes, this would be the obvious path to take.

Some use cases, including maybe this one, might prefer FP traps to be horizontal into user mode.

But I’ll also briefly remark that not requiring traps on FP exceptions has been a godsend for implementing high-performance in-order cores, where data-dependent traps would preclude early retirement and deferred execution of these instructions. So there’s good reason never to make such an extension mandatory, even in the RVA profiles.

There is the old MIPS FPU pipeline trick of conservative early check of exponent ranges to determine trap is impossible to reduce impact in this case.

But FP trap handling is too non-standard/heavyweight/buggy to get widespread use in portable code (along with other corners of IEEE FP spec), so agree it doesn’t seem to be on path to RVA mandate.

Krste



On Fri, Dec 17, 2021 at 11:31 AM Ken Dockser <kad@...> wrote:
While I understand that it had been decided long ago (relatively speaking) that RISC-V would not support trapping on floating-point exceptions, I am wondering if we need to revisit this.

I have heard that ARM's rationale for adding floating-point exception trap capabilities in ARMv7.8 was not because of an inherent need for new code, but for enabling the efficient porting of X86 code to ARM.

Does anyone out there have any experience with porting X86 code to RISC-V? Has the lack of trapped FP exceptions hindered such porting?
Likewise, is there an interest in proposing a TG to create an extension that adds FP trap capabilities to Scalar and Vector FP.

Thanks,
Ken 
















Re: [EXT] Re: [RISC-V] [tech-vector-ext] FP Trapped exceptions needed for portability

Jeff Scott
 

Completely agree.  Was very happy RISC-V did not include FPU exceptions.

 

Jeff

 

From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of Andrew Waterman via lists.riscv.org
Sent: Friday, December 17, 2021 1:48 PM
To: Ken Dockser <kad@...>
Cc: tech-alternate-fp@...; tech-unprivileged@...; tech-vector-ext@...
Subject: [EXT] Re: [RISC-V] [tech-vector-ext] FP Trapped exceptions needed for portability

 

Caution: EXT Email

Defining a standard extension that provides precise traps on FP exceptions seems like a reasonable thing to do, if only to facilitate the use case you mention in a standard way. The strategy would presumably be to add another five bits to the fcsr that indicate which exceptions will raise traps.

 

But I’ll also briefly remark that not requiring traps on FP exceptions has been a godsend for implementing high-performance in-order cores, where data-dependent traps would preclude early retirement and deferred execution of these instructions. So there’s good reason never to make such an extension mandatory, even in the RVA profiles.

 

On Fri, Dec 17, 2021 at 11:31 AM Ken Dockser <kad@...> wrote:

While I understand that it had been decided long ago (relatively speaking) that RISC-V would not support trapping on floating-point exceptions, I am wondering if we need to revisit this.

 

I have heard that ARM's rationale for adding floating-point exception trap capabilities in ARMv7.8 was not because of an inherent need for new code, but for enabling the efficient porting of X86 code to ARM.

 

Does anyone out there have any experience with porting X86 code to RISC-V? Has the lack of trapped FP exceptions hindered such porting?

Likewise, is there an interest in proposing a TG to create an extension that adds FP trap capabilities to Scalar and Vector FP.

 

Thanks,

Ken 










 


Re: FP Trapped exceptions needed for portability

Andrew Waterman
 

Defining a standard extension that provides precise traps on FP exceptions seems like a reasonable thing to do, if only to facilitate the use case you mention in a standard way. The strategy would presumably be to add another five bits to the fcsr that indicate which exceptions will raise traps.

But I’ll also briefly remark that not requiring traps on FP exceptions has been a godsend for implementing high-performance in-order cores, where data-dependent traps would preclude early retirement and deferred execution of these instructions. So there’s good reason never to make such an extension mandatory, even in the RVA profiles.

On Fri, Dec 17, 2021 at 11:31 AM Ken Dockser <kad@...> wrote:
While I understand that it had been decided long ago (relatively speaking) that RISC-V would not support trapping on floating-point exceptions, I am wondering if we need to revisit this.

I have heard that ARM's rationale for adding floating-point exception trap capabilities in ARMv7.8 was not because of an inherent need for new code, but for enabling the efficient porting of X86 code to ARM.

Does anyone out there have any experience with porting X86 code to RISC-V? Has the lack of trapped FP exceptions hindered such porting?
Likewise, is there an interest in proposing a TG to create an extension that adds FP trap capabilities to Scalar and Vector FP.

Thanks,
Ken 













FP Trapped exceptions needed for portability

Ken Dockser
 

While I understand that it had been decided long ago (relatively speaking) that RISC-V would not support trapping on floating-point exceptions, I am wondering if we need to revisit this.

I have heard that ARM's rationale for adding floating-point exception trap capabilities in ARMv7.8 was not because of an inherent need for new code, but for enabling the efficient porting of X86 code to ARM.

Does anyone out there have any experience with porting X86 code to RISC-V? Has the lack of trapped FP exceptions hindered such porting?
Likewise, is there an interest in proposing a TG to create an extension that adds FP trap capabilities to Scalar and Vector FP.

Thanks,
Ken 


Re: The Width of vcsr and vstart

Andrew Waterman
 

For the current V extension, it's correct to treat both vcsr and vstart as 32-bit registers.  I agree the spec should clearly indicate whether or not these registers will always be 32 bits (like fcsr).


On Wed, Dec 15, 2021 at 6:31 PM Tianyi Xia via lists.riscv.org <tianshi.xty=alibaba-inc.com@...> wrote:

Hi,all
Debuggers can use abstract commands to access CSRs. When using abstract commands, debuggers need to specify the bit width of 
CSRs. The bit widths of vcsr and vstart are not clearly defined in Vector Extension version1.0. In an RV64 implementation, The debugger is not clear whether the bit width of these two CSRs should be regarded as 32 or 64. May be we need specify the bit width of these CSRs in the spec, XLEN or fixed-length 32bit.

 The fcsr defined in RISC-V Unprivileged ISA is fixed-length 32bit CSR. The register structure of vcsr is similar to fcsr.So maybe vcsr should also be defined as a fixed-length 32bit register?

 

Thanks,

 

Tianyi Xia


The Width of vcsr and vstart

Tianyi Xia <tianshi.xty@...>
 

Hi,all
Debuggers can use abstract commands to access CSRs. When using abstract commands, debuggers need to specify the bit width of 
CSRs. The bit widths of vcsr and vstart are not clearly defined in Vector Extension version1.0. In an RV64 implementation, The debugger is not clear whether the bit width of these two CSRs should be regarded as 32 or 64. May be we need specify the bit width of these CSRs in the spec, XLEN or fixed-length 32bit.

 The fcsr defined in RISC-V Unprivileged ISA is fixed-length 32bit CSR. The register structure of vcsr is similar to fcsr.So maybe vcsr should also be defined as a fixed-length 32bit register?

 

Thanks,

 

Tianyi Xia


Re: Vector Memory Ordering

Krste Asanovic
 

On Mon, 13 Dec 2021 12:09:54 -0800, "Ken Dockser" <kad@...> said:
| Reviving this old thread with a question and a suggestion:
| Question: What is the use case for supporting non-idempotent memory in a RISC-V Vector implementation as a part of a general-purpose
| rich-OS application processor? While I can certainly envision embedded applications that use non-idempotent memory, it seems unlikely
| that non-idempotent memory would be used when running arbitrary application code.

To reduce overhead, some embedded Linux systems allow user-mode code
to access devices directly, e.g., for dpdk networking. If anything,
there is a trend to support more user-level access to
devices/accelerators to reduce overhead, and to provide more isolation
between tasks (as opposed to shared device driver in kernel). Such
devices might have non-idempotent memory regions.

| Suggestion: The current Vector specification's comments about supporting non-idempotent memory can easily mislead one into thinking
| that such support is required in all compliant implementations. We need an explicit clarification in the specification along the
| lines of "Vector extension support for handling non-idempotent memory accesses is not required in implementations that prohibit or
| otherwise prevent (e.g., by trapping) such accesses." While my suggested sentence would likely benefit from some wordsmithing, I
| think that what I am trying to convey is essential in defining what is architecturally required.

We could add some non-normative text as a note to implementers, but
this allowance just follows from the general RISC-V concept that
certain memory address ranges only support certain operations (PMAs).

Calling this out as a special case in the spec could then require
repeating the statement all throughout all memory instructions for
consistency, to avoid questions about why some instructions have
optional support for some memory types versus others. We try to
factor out these concepts in the spec.

Krste


| Thanks,
| Ken
|


Re: RVV assembler and simulation

Jim Wilson
 

On Mon, Dec 13, 2021 at 9:34 AM Peter Lieber <peteralieber@...> wrote:
I am working on some experiment and I need to simulate RVV r1.0.  Is spike my best bet for this?

All I want to start with is writing bare metal assembly, and copy some memory buffers between the sim and host.

Is there an assembler available that support RVV 1.0?

Upstream maniline binutils had rvv 1.0 support added recently.  This isn't in any release yet, so you have to use the top of the development tree.  Upstream LLVM has had rvv support for a while, tracking the evolving rvv draft.  I don't follow llvm so I don't know the current state.  rvv support is certainly there, but I don't know what version they are at.  If they have rvv 1.0 support, which is likely, then it is probably only in the development tree and not in the last release.

Yes, spike has rvv support.  Again, check the rvv version.  There have been a lot of different incompatible draft versions and if you accidentally mix tools that support different drafts it won't work.  I would guess that spike has rvv 1.0 support but don't track it so don't know for sure.

Jim


Re: Vector Memory Ordering

Ken Dockser
 

Reviving this old thread with a question and a suggestion:

Question: What is the use case for supporting non-idempotent memory in a RISC-V Vector implementation as a part of a general-purpose rich-OS application processor? While I can certainly envision embedded applications that use non-idempotent memory, it seems unlikely that non-idempotent memory would be used when running arbitrary application code.

Suggestion: The current Vector specification's comments about supporting non-idempotent memory can easily mislead one into thinking that such support is required in all compliant implementations. We need an explicit  clarification in the specification along the lines of "Vector extension support for handling non-idempotent memory accesses is not required in implementations that prohibit or otherwise prevent (e.g., by trapping) such accesses." While my suggested sentence would likely benefit from some wordsmithing, I think that what I am trying to convey is essential in defining what is architecturally required.

Thanks,
Ken


RVV assembler and simulation

Peter Lieber
 

I am working on some experiment and I need to simulate RVV r1.0.  Is spike my best bet for this?

All I want to start with is writing bare metal assembly, and copy some memory buffers between the sim and host.

Is there an assembler available that support RVV 1.0?


Re: RISC-V Vector Extension post-public review updates - fault flagging

Bruce Hoult
 

On Thu, Nov 18, 2021 at 5:07 PM David Horner <ds2horner@...> wrote:

But if there were, the vl would need to be truncated to the first in sequence that faulted.

That would potentially require back tracking by the handler from the element load that faulted first, to each of the earlier loads in the list.

Simple implementations could simply execute it sequentially. Or have the trap handler execute the loads sequentially if any of them fault.

A substantial effort to essentially be thrown away on the next try to discover the page mappings.

We don't care how slowly malicious code runs.

This was another reasons that ff gather was rejected. It does not play well with the parallel load behaviour that is allowed for loads.

It plays just as well as any gather does, in the absence of faults.

Faulting is very much NOT expected behaviour. You're probably about to terminate the program anyway, or drop into the debugger. The main requirement is that the user can see which iteration of their loop would have failed if the code had been left as scalar instructions instead of auto-vectorised.

If FF gather were implemented the designer would probably always trap on any fault.

If the OS determines the virtual addresses are legitimate it could preemptively page-in/allocate the requested addresses.

If any of the addresses are illegal/illegitimate it could certainly mark this application suspect and escalate its management to whatever security features are enabled.

This is a legal option for load sequential first fault but we have at most 2 pages/regions to deal with.

One region, but it could be many page table entries, given sufficiently long vector registers -- up to 17 with 65536 bit VLEN and LMUL=8.

41 - 60 of 820