Date   

Public review for Zvfh/Zvfhmin

Krste Asanovic
 

We are delighted to announce the start of the public review period for the following proposed standard extensions to the RISC-V ISA:

Zvfh
Zvfhmin

The review period begins today, January 24, 2023 and ends on March 10, 2023 (inclusive).

This extension is part of the Unprivileged Specification.

These extensions are described in the PDF spec available at:
https://drive.google.com/file/d/1DtyeYOEYQq-TgDGHOp3vG8hod0nZ1Al9/view?usp=share_link

which was generated from the source available in the following GitHub repo:
https://github.com/riscv/riscv-v-spec/blob/d404111e51298b428f5bba6041d6288a6a209664/v-spec.adoc#zvfhmin-vector-extension-for-minimal-half-precision-floating-point-arithmetic

To respond to the public review, please either email comments to the
public isa-dev mailing list or add issues and/or pull requests (PRs)
to the Vector GitHub repo: https://github.com/riscv/riscv-v-spec/. We
welcome all input and appreciate your time and effort in helping us by
reviewing the specification.

During the public review period, corrections, comments, and
suggestions, will be gathered for review by the Vector Task Group. Any
minor corrections and/or uncontroversial changes will be incorporated
into the specification. Any remaining issues or proposed changes will
be addressed in the public review summary report. If there are no
issues that require incompatible changes to the public review
specification, the Unprivileged ISA Committee will recommend the
updated specifications be approved and ratified by the RISC-V
Technical Steering Committee and the RISC-V Board of Directors.

Thanks to all the contributors for all their hard work.

Krste Ananović
Chair, Unprivileged ISA Committee


Re: RISC-V V C Intrinsic API v1.0 release meeting reminder (January 05th, 2023)

eop Chen
 

Eop Chen <eop.chen@...> 於 2023年1月4日 下午11:18 寫道:


Hi all,

A reminder that the next open meeting to discuss on the RISC-V V C Intrinsic API v1.0 release is going to
be held on  2023/01/05 6AM (GMT -7) / 11PM (GMT +8).

The agenda can be found in the second page of the meeting slides (link).
Please join the calendar to be constantly notified - Google calender linkICal
We also have a mailing list now hosted by RISC-V International (link).

Regards,

eop Chen



RISC-V V C Intrinsic API v1.0 release meeting reminder (January 05th, 2023)

eop Chen
 


Hi all,

A reminder that the next open meeting to discuss on the RISC-V V C Intrinsic API v1.0 release is going to
be held on  2023/01/05 6AM (GMT -7) / 11PM (GMT +8).

The agenda can be found in the second page of the meeting slides (link).
Please join the calendar to be constantly notified - Google calender linkICal
We also have a mailing list now hosted by RISC-V International (link).

Regards,

eop Chen


Re: Overlapping P and V opcodes

mark
 

also please note the unified discovery effort is underway. i am adding in philipp and aaron to the chain.

Mark

--------
sent from a mobile device. please forgive any typos.

On Dec 28, 2022, at 10:50 AM, Krste Asanovic <krste@...> wrote:


We generally want to avoid using CSRs for runtime discovery, instead
preferring a user-accessible data structure provided by the runtime
system. Under Linux, there is a user API to figure out the current
hart's capabilities. Other execution environments could adopt a
similar approach to provide the same information to user tasks.

At the M-mode level, the misa CSR has a bit for the V extension, and
assuming P moves forward as a single-letter extension, that could also
be allocated in the misa CSR. However, the misa.V bit only indicates
presence of the base V extension, not of any further Zv* extensions
that would overlap P opcodes. Similarly, the misa.P bit would only
indicate the existence of the base P extension, not any further Zp*
extensions.

The proposed base P and ratified base V extensions do not conflict, so
assuming P moves forward as currently proposed, both misa.V and misa.P
could be set in the misa.CSR. If a hart has further Zv* extensions
that conflict with P, then misa.P would not be set. Similarly, if a
hart had further Zp* extensions that conflicted with V, then misa.V
would not be set. If a single hart implementation wanted to support
multiple alternative extensions that conflicted in instruction
encoding then the *envcfg registers should be used.

Work continues on more general platform-level discovery mechanisms
that can capture a greater variety of extensions.

Krste



On Fri, 23 Dec 2022 00:15:02 -0800, "Andrew Waterman" <andrew@...> said:
| On Fri, Dec 23, 2022 at 12:06 AM Allen Baum <allen.baum@...>
| wrote:

| I actually interpreted Krste's statement differently. Not
| necessarily correctly.
| It sounded to me like there wasn't enough room in the current vector
| encoding space to fit possible extensions,
| so the thought was to *extend* into the P encoding space, e.g. any unused
| opcodes under the P-extension major opcode.

| Given what I know about the number of P-extension opcode (from memory),
| the number of unused opcodes might be pretty tiny though.,
| That makes it less likely my interpretation would be correct.

| Yeah, Rich's interpretation was right; see
| https://lists.riscv.org/g/tech-crypto-ext/message/856

| On Thu, Dec 22, 2022 at 8:56 PM Rich Fuhler <rfuhler@...> wrote:

| Thanks much, Andrew. Makes perfect sense.

| -rich

| -------- Original message --------
| From: Andrew Waterman <andrew@...>
| Date: 12/22/22 7:25 PM (GMT-08:00)
| To: Rich Fuhler <rfuhler@...>
| Cc: tech-vector-ext@...
| Subject: Re: [RISC-V] [tech-vector-ext] Overlapping P and V opcodes

| Hey Rich,

| Although I'm not Krste, I'll inject my two cents. I think there are
| two separate issues here:

| - The immediate one. Formally at RVIA, the P extension is a proposal,
| not yet a standard. So, vendors who have implemented the proposal are
| effectively implementing what the spec refers to as a non-conforming
| extension (because it uses standard encoding space despite not being a
| standard). This is a perfectly legitimate thing to do. Assuming Zvk*
| is standardized as proposed, then the usual technique for resolving
| conflicts between a standard extension and a non-conforming extension
| applies: a custom mode bit (specified by the vendor) can be used to
| enable the non-conforming extension. When the custom mode bit enables
| the non-conforming extension, the processor does not implement Zvk*.
| At the opposite polarity, the processor might implement Zvk*.

| - The longer-term one. When P (or more likely Zp*) is standardized,
| RVIA could choose to define analogous mode bits (e.g., as part of the
| *envcfg CSRs). This would make sense to do as part of the
| standardization of Zp*, rather than as a preemptive measure, because
| we don't yet know with certainty which opcodes Zp* will use.

| HTH,
| Andrew

| On Thu, Dec 22, 2022 at 4:36 PM Rich Fuhler <rfuhler@...>
| wrote:

| Hi Krste,

| At the summit, you mentioned that the V opcode space would be
| extended by using P opcodes since an implementation would not have
| both P and V. Andes vector cores do have the P extension and we
| would need the ability to determine in a multitasking scenario,
| which unit was in use by a thread. Is there one CSR that has this
| information in it?

| -rich

|


Re: Overlapping P and V opcodes

Krste Asanovic
 

We generally want to avoid using CSRs for runtime discovery, instead
preferring a user-accessible data structure provided by the runtime
system. Under Linux, there is a user API to figure out the current
hart's capabilities. Other execution environments could adopt a
similar approach to provide the same information to user tasks.

At the M-mode level, the misa CSR has a bit for the V extension, and
assuming P moves forward as a single-letter extension, that could also
be allocated in the misa CSR. However, the misa.V bit only indicates
presence of the base V extension, not of any further Zv* extensions
that would overlap P opcodes. Similarly, the misa.P bit would only
indicate the existence of the base P extension, not any further Zp*
extensions.

The proposed base P and ratified base V extensions do not conflict, so
assuming P moves forward as currently proposed, both misa.V and misa.P
could be set in the misa.CSR. If a hart has further Zv* extensions
that conflict with P, then misa.P would not be set. Similarly, if a
hart had further Zp* extensions that conflicted with V, then misa.V
would not be set. If a single hart implementation wanted to support
multiple alternative extensions that conflicted in instruction
encoding then the *envcfg registers should be used.

Work continues on more general platform-level discovery mechanisms
that can capture a greater variety of extensions.

Krste



On Fri, 23 Dec 2022 00:15:02 -0800, "Andrew Waterman" <andrew@...> said:
| On Fri, Dec 23, 2022 at 12:06 AM Allen Baum <allen.baum@...>
| wrote:

| I actually interpreted Krste's statement differently. Not
| necessarily correctly.
| It sounded to me like there wasn't enough room in the current vector
| encoding space to fit possible extensions, 
| so the thought was to *extend* into the P encoding space, e.g. any unused
| opcodes under the P-extension major opcode.

| Given what I know about the number of P-extension opcode (from memory),
| the number of unused opcodes might be pretty tiny though., 
| That makes it less likely my interpretation would be correct.

| Yeah, Rich's interpretation was right; see 
| https://lists.riscv.org/g/tech-crypto-ext/message/856

| On Thu, Dec 22, 2022 at 8:56 PM Rich Fuhler <rfuhler@...> wrote:

| Thanks much, Andrew. Makes perfect sense.

| -rich

| -------- Original message --------
| From: Andrew Waterman <andrew@...>
| Date: 12/22/22 7:25 PM (GMT-08:00)
| To: Rich Fuhler <rfuhler@...>
| Cc: tech-vector-ext@...
| Subject: Re: [RISC-V] [tech-vector-ext] Overlapping P and V opcodes

| Hey Rich,

| Although I'm not Krste, I'll inject my two cents.  I think there are
| two separate issues here:

| - The immediate one.  Formally at RVIA, the P extension is a proposal,
| not yet a standard.  So, vendors who have implemented the proposal are
| effectively implementing what the spec refers to as a non-conforming
| extension (because it uses standard encoding space despite not being a
| standard).  This is a perfectly legitimate thing to do.  Assuming Zvk*
| is standardized as proposed, then the usual technique for resolving
| conflicts between a standard extension and a non-conforming extension
| applies: a custom mode bit (specified by the vendor) can be used to
| enable the non-conforming extension.  When the custom mode bit enables
| the non-conforming extension, the processor does not implement Zvk*. 
| At the opposite polarity, the processor might implement Zvk*.

| - The longer-term one.  When P (or more likely Zp*) is standardized,
| RVIA could choose to define analogous mode bits (e.g., as part of the
| *envcfg CSRs).  This would make sense to do as part of the
| standardization of Zp*, rather than as a preemptive measure, because
| we don't yet know with certainty which opcodes Zp* will use.

| HTH,
| Andrew

| On Thu, Dec 22, 2022 at 4:36 PM Rich Fuhler <rfuhler@...>
| wrote:

| Hi Krste,

| At the summit, you mentioned that the V opcode space would be
| extended by using P opcodes since an implementation would not have
| both P and V. Andes vector cores do have the P extension and we
| would need the ability to determine in a multitasking scenario,
| which unit was in use by a thread. Is there one CSR that has this
| information in it?

| -rich

|


Re: Overlapping P and V opcodes

Andrew Waterman
 



On Fri, Dec 23, 2022 at 12:06 AM Allen Baum <allen.baum@...> wrote:
I actually interpreted Krste's statement differently. Not necessarily correctly.
It sounded to me like there wasn't enough room in the current vector encoding space to fit possible extensions, 
so the thought was to *extend* into the P encoding space, e.g. any unused opcodes under the P-extension major opcode.

Given what I know about the number of P-extension opcode (from memory), the number of unused opcodes might be pretty tiny though., 
That makes it less likely my interpretation would be correct.

Yeah, Rich's interpretation was right; see https://lists.riscv.org/g/tech-crypto-ext/message/856


On Thu, Dec 22, 2022 at 8:56 PM Rich Fuhler <rfuhler@...> wrote:
Thanks much, Andrew. Makes perfect sense.



-rich



-------- Original message --------
From: Andrew Waterman <andrew@...>
Date: 12/22/22 7:25 PM (GMT-08:00)
To: Rich Fuhler <rfuhler@...>
Subject: Re: [RISC-V] [tech-vector-ext] Overlapping P and V opcodes

Hey Rich,

Although I'm not Krste, I'll inject my two cents.  I think there are two separate issues here:

- The immediate one.  Formally at RVIA, the P extension is a proposal, not yet a standard.  So, vendors who have implemented the proposal are effectively implementing what the spec refers to as a non-conforming extension (because it uses standard encoding space despite not being a standard).  This is a perfectly legitimate thing to do.  Assuming Zvk* is standardized as proposed, then the usual technique for resolving conflicts between a standard extension and a non-conforming extension applies: a custom mode bit (specified by the vendor) can be used to enable the non-conforming extension.  When the custom mode bit enables the non-conforming extension, the processor does not implement Zvk*.  At the opposite polarity, the processor might implement Zvk*.

- The longer-term one.  When P (or more likely Zp*) is standardized, RVIA could choose to define analogous mode bits (e.g., as part of the *envcfg CSRs).  This would make sense to do as part of the standardization of Zp*, rather than as a preemptive measure, because we don't yet know with certainty which opcodes Zp* will use.

HTH,
Andrew

On Thu, Dec 22, 2022 at 4:36 PM Rich Fuhler <rfuhler@...> wrote:
Hi Krste,

At the summit, you mentioned that the V opcode space would be extended by using P opcodes since an implementation would not have both P and V. Andes vector cores do have the P extension and we would need the ability to determine in a multitasking scenario, which unit was in use by a thread. Is there one CSR that has this information in it?

-rich


Re: Overlapping P and V opcodes

Allen Baum
 

I actually interpreted Krste's statement differently. Not necessarily correctly.
It sounded to me like there wasn't enough room in the current vector encoding space to fit possible extensions, 
so the thought was to *extend* into the P encoding space, e.g. any unused opcodes under the P-extension major opcode.

Given what I know about the number of P-extension opcode (from memory), the number of unused opcodes might be pretty tiny though., 
That makes it less likely my interpretation would be correct.

On Thu, Dec 22, 2022 at 8:56 PM Rich Fuhler <rfuhler@...> wrote:
Thanks much, Andrew. Makes perfect sense.



-rich



-------- Original message --------
From: Andrew Waterman <andrew@...>
Date: 12/22/22 7:25 PM (GMT-08:00)
To: Rich Fuhler <rfuhler@...>
Subject: Re: [RISC-V] [tech-vector-ext] Overlapping P and V opcodes

Hey Rich,

Although I'm not Krste, I'll inject my two cents.  I think there are two separate issues here:

- The immediate one.  Formally at RVIA, the P extension is a proposal, not yet a standard.  So, vendors who have implemented the proposal are effectively implementing what the spec refers to as a non-conforming extension (because it uses standard encoding space despite not being a standard).  This is a perfectly legitimate thing to do.  Assuming Zvk* is standardized as proposed, then the usual technique for resolving conflicts between a standard extension and a non-conforming extension applies: a custom mode bit (specified by the vendor) can be used to enable the non-conforming extension.  When the custom mode bit enables the non-conforming extension, the processor does not implement Zvk*.  At the opposite polarity, the processor might implement Zvk*.

- The longer-term one.  When P (or more likely Zp*) is standardized, RVIA could choose to define analogous mode bits (e.g., as part of the *envcfg CSRs).  This would make sense to do as part of the standardization of Zp*, rather than as a preemptive measure, because we don't yet know with certainty which opcodes Zp* will use.

HTH,
Andrew

On Thu, Dec 22, 2022 at 4:36 PM Rich Fuhler <rfuhler@...> wrote:
Hi Krste,

At the summit, you mentioned that the V opcode space would be extended by using P opcodes since an implementation would not have both P and V. Andes vector cores do have the P extension and we would need the ability to determine in a multitasking scenario, which unit was in use by a thread. Is there one CSR that has this information in it?

-rich


Re: Overlapping P and V opcodes

Rich Fuhler
 

Thanks much, Andrew. Makes perfect sense.



-rich



-------- Original message --------
From: Andrew Waterman <andrew@...>
Date: 12/22/22 7:25 PM (GMT-08:00)
To: Rich Fuhler <rfuhler@...>
Cc: tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] Overlapping P and V opcodes

Hey Rich,

Although I'm not Krste, I'll inject my two cents.  I think there are two separate issues here:

- The immediate one.  Formally at RVIA, the P extension is a proposal, not yet a standard.  So, vendors who have implemented the proposal are effectively implementing what the spec refers to as a non-conforming extension (because it uses standard encoding space despite not being a standard).  This is a perfectly legitimate thing to do.  Assuming Zvk* is standardized as proposed, then the usual technique for resolving conflicts between a standard extension and a non-conforming extension applies: a custom mode bit (specified by the vendor) can be used to enable the non-conforming extension.  When the custom mode bit enables the non-conforming extension, the processor does not implement Zvk*.  At the opposite polarity, the processor might implement Zvk*.

- The longer-term one.  When P (or more likely Zp*) is standardized, RVIA could choose to define analogous mode bits (e.g., as part of the *envcfg CSRs).  This would make sense to do as part of the standardization of Zp*, rather than as a preemptive measure, because we don't yet know with certainty which opcodes Zp* will use.

HTH,
Andrew

On Thu, Dec 22, 2022 at 4:36 PM Rich Fuhler <rfuhler@...> wrote:
Hi Krste,

At the summit, you mentioned that the V opcode space would be extended by using P opcodes since an implementation would not have both P and V. Andes vector cores do have the P extension and we would need the ability to determine in a multitasking scenario, which unit was in use by a thread. Is there one CSR that has this information in it?

-rich


Re: Overlapping P and V opcodes

Andrew Waterman
 

Hey Rich,

Although I'm not Krste, I'll inject my two cents.  I think there are two separate issues here:

- The immediate one.  Formally at RVIA, the P extension is a proposal, not yet a standard.  So, vendors who have implemented the proposal are effectively implementing what the spec refers to as a non-conforming extension (because it uses standard encoding space despite not being a standard).  This is a perfectly legitimate thing to do.  Assuming Zvk* is standardized as proposed, then the usual technique for resolving conflicts between a standard extension and a non-conforming extension applies: a custom mode bit (specified by the vendor) can be used to enable the non-conforming extension.  When the custom mode bit enables the non-conforming extension, the processor does not implement Zvk*.  At the opposite polarity, the processor might implement Zvk*.

- The longer-term one.  When P (or more likely Zp*) is standardized, RVIA could choose to define analogous mode bits (e.g., as part of the *envcfg CSRs).  This would make sense to do as part of the standardization of Zp*, rather than as a preemptive measure, because we don't yet know with certainty which opcodes Zp* will use.

HTH,
Andrew


On Thu, Dec 22, 2022 at 4:36 PM Rich Fuhler <rfuhler@...> wrote:
Hi Krste,

At the summit, you mentioned that the V opcode space would be extended by using P opcodes since an implementation would not have both P and V. Andes vector cores do have the P extension and we would need the ability to determine in a multitasking scenario, which unit was in use by a thread. Is there one CSR that has this information in it?

-rich


Overlapping P and V opcodes

Rich Fuhler
 

Hi Krste,

At the summit, you mentioned that the V opcode space would be extended by using P opcodes since an implementation would not have both P and V. Andes vector cores do have the P extension and we would need the ability to determine in a multitasking scenario, which unit was in use by a thread. Is there one CSR that has this information in it?

-rich


Re: [RISC-V][tech-rvv-intrinsics] RISC-V V C Intrinsic API v1.0 release meeting reminder (November 28th, 2022)

eop Chen
 

eop Chen via lists.riscv.org <eop.chen=sifive.com@...> 於 2022年11月28日 上午1:04 寫道:

Hi all,

A reminder that the next open meeting to discuss on the RISC-V V C Intrinsic API v1.0 release is going to
be held on  2022/11/28 6AM (GMT -7) / 11PM (GMT +8).

For folks in Asia, be noted that the daylight saving time has made the meeting 1 hour later than the original time.

The agenda can be found in the second page of the meeting slides (link).
Please join the calendar to be constantly notified - Google calender linkICal
We also have a mailing list now hosted by RISC-V International (link).

Regards,

eop Chen


RISC-V V C Intrinsic API v1.0 release meeting reminder (November 28th, 2022)

eop Chen
 

Hi all,

A reminder that the next open meeting to discuss on the RISC-V V C Intrinsic API v1.0 release is going to
be held on  2022/11/28 6AM (GMT -7) / 11PM (GMT +8).

For folks in Asia, be noted that the daylight saving time has made the meeting 1 hour later than the original time.

The agenda can be found in the second page of the meeting slides (link).
Please join the calendar to be constantly notified - Google calender linkICal
We also have a mailing list now hosted by RISC-V International (link).

Regards,

eop Chen


Re: Fix for omission in vector spec RVV 1.0 around source/dest overlap

Allen Baum
 

I'm going to have to eat my words (in tiny bites).
Krste is correct that implementations of the earlier spec version will remain spec compatible.
Architectural Compatibility tests would report both tail-undisturbed and tail-agnostic implementations to be so.

But, SW will not necessarily be compatible with the newer version; software written to not depend on a specific behavior will.
So compatibility must be defined carefully here.

As Guy, and others have said, there is little SW available at this time, and it is highly unlikely that any of it will be affected by this specific case.
But, as Guy points out, while the original behavior seems unusable, that doesn't mean that it couldn't be taken advantage of in a useful way.

So it is SW that needs to be cognizant of this, not so much the HW implementation.
Naming this a a profile requirement (i.e. specifying old or new behavior) is one way 
to ensure that SW won't be blindsided (however unlike that will really be)

From a selfish point of view, the existence of agnostic - specifically that the resulting value has a non-deterministic set of two possible answers - make our standard testing methodology difficult, but this isn't the worst of the non-deterministic cases.

On Tue, Nov 22, 2022 at 3:29 PM Guy Lemieux <guy.lemieux@...> wrote:
Changing the spec does not make implementations which honour the old
spec non-compliant. This is the first compatibility check.

However, changing the spec does make software that depends on that
behaviour of the old spec incompatible with implementations of the new
spec. In this sense, the specification change is not
backward-compatible with software.

There is not a lot of software out in the wild, so I don't think we
would be breaking very much.

First, I'd like to note there are two parts of the original register
that we are talking about:
(1) the non-written elements (left behind because the new EEW is smaller), and
(2) the tail (which would be ignored because of VL at the original EEW).

The problem with EEW-changing instructions where source/dest overlap
is in region (1). Normally, since EEW is smaller, this means the upper
1/2, 3/4, or 7/8 elements of the source vector register would be
non-written elements. (For the widening reductions, then there would
be even more non-written elements.)

However, I don't really like this proposal because a few things come to mind:

(a) This introduces an inconsistency in the spec, which programmers
have no reason to expect. The reason for the inconsistency is "to make
implementations easier", which programmers do not fully understand
(and therefore do not expect). From a programmer's perspective, there
is a mode setting that says undisturbed, so implementations should
honour it, not choose whichever mode is easier depending upon the
instruction. I don't like the idea of introducing such non-predictable
inconsistencies, as they tend to cause debugging nightmares.

(b) The undisturbed mode gives the potential to use this feature as a
software-managed data cache (to reduce data fetches, either for
performance, power, or because the data is volatile). Although the
lower element(s) may get clobbered, there may be value in preserving
the upper element(s) which is based on the exact same argument for
having the undisturbed mode in the first place with regular
(EEW-preserving) instructions.

(c) Programmers already have the ability to choose the agnostic mode
for performance, so it should not matter that the undisturbed mode of
these EEW-changing instructions might run more slowly to copy or
rearrange the data.

(d) This seems to be asking for permission to be excused from the
really hard part of the homework because it's hard (or slow).

An alternative way to handle this is to add a profile/platform
allowance to the spec that implementations which are always agnostic
for all instructions (probably for both tails and masks)? This would
simplify HPC-oriented implementations, and allow complete consistency
(aka architectural uniformity/predictability) within that
profile/platform. The biggest issue is how to handle the resulting
schism in software?

Thanks,
Guy


On Tue, Nov 22, 2022 at 1:14 PM Krste Asanovic <krste@...> wrote:
>
> Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
> Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.
>
> I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
> It is really not something software would do, so the effort would be wasted.
>
> That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
> It should have been caught earlier, but the fix is benign.
>
> Krste
>
> On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:
>
> Allen,
>
> Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it.  Is that correct?
>
> Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries?  I see a claim made here, but no evidence given.  I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion.  Has anyone else done that work in a form they can summarize and share?
>
> Philip
>
> On 11/22/22 12:19, Allen Baum wrote:
>
> There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:
>
> The following changes have been made since version 1.11, which, while not strictly backwards compatible,
> are not anticipated to cause software portability problems in practice:
>
>  The rationale for this  "clarification" explicitly says  this changes the cases that " can not be sensibly used by application software, ", which is the key.
> So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior
>
> I would agree that the wording
>     "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
> is too strong. It seems like it would be more accurate to say
>     "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."
>
> On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
>>
>> Krste,
>>
>> I am confused how this proposed change does not break compatibility with
>> the 1.0 vector spec.  If there's a bit of code in the wild which can
>> witness and rely upon the old behavior, doesn't the new restriction make
>> that bit of assembly non-compliant with the proposed spec version?
>>
>> I do accept that the proposed spec allows a subset of the legal assembly
>> programs that the old one does.  My point of confusion is how that can
>> claim to be compatible when there are assembly programs which are well
>> defined under the old spec, and yet not under the new spec.   Your point
>> below seems to address how hardware which implemented the v1.0 spec is
>> compatible with the spec after the proposed change, but I don't see the
>> same for software.  That is, this doesn't seem compatible with software
>> written to the old spec.
>>
>> Yours,
>> Philip Reames
>>
>> On 11/1/22 23:51, Krste Asanovic wrote:
>> > A few issues have been identified in corners of the vector spec.
>> >
>> > The first change was an error of omission in not catching some cases
>> > of source and destination register overlap that can not be sensibly
>> > used by application software, but which add complexity for
>> > implementations that internally rearrange data based on EEW.
>> >
>> > The problematic case is when source and destination overlap but have
>> > different EEW, and the instruction is mask-undisturbed or
>> > tail-undisturbed.  This case does not have a real use in software, as
>> > the elements being left undisturbed are a different EEW than the new
>> > elements being written.  This operation requires that the same
>> > architectural register is treated as two different EEWs by one
>> > instruction, which adds considerable complexity to implementations
>> > that rearrange data internally based on EEW for no benefit.
>> >
>> > Proposed addition is:
>> >
>> > "when source and destination registers overlap and have mismatched
>> > EEW, the instruction is mask- and tail-agnostic, regardless of vta and
>> > mta".
>> >
>> > The proposed solution defines this case as always agnostic so existing
>> > implementations can continue to work as before (e.g., implementing
>> > undisturbed when requested), while not burdening implementations that
>> > rearrange data internally.  The assertion is that no software would
>> > rely on the undisturbed behavior in this case.
>> >
>> > Note, this also applies to widening reductions.
>> >
>> > IMPORTANT: The proposed fix does not break compatibility of
>> > implementations adhering to the ratified v1.0 spec.
>> >
>> > The proposal is to add this to the vector spec as a bug fix.
>> >
>> > Krste
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>
>






Re: Fix for omission in vector spec RVV 1.0 around source/dest overlap

Nick Knight
 

I too was concerned about potential software issues when this thread began a few weeks ago. As a SW person at a HW company, who has been developing against churning specs for several years.. this is a sensitive subject. So, I did some research.

First, I grepped through all the RVV codes in SiFive's repos and upstream contributions (that I'm aware of). I could not find a single compatibility issue. (Plenty of destructive mixed-EEW examples, but none that rely on undisturbed behavior.)

Next, I asked some GCC and LLVM developers, who confirmed that neither of these compilers are yet able to leverage mixed-EEW src/dst overlap, regardless of mask/tail policy. So I think it highly unlikely to find incompatibilities in compiled codes.

Lastly, I asked around the water cooler: my colleagues and I were unable to dream up a convincing use case for the old behavior.

At this point, I am no longer concerned. I feel this is a strict improvement to RVV and I support it.

Best,
Nick Knight
Algorithms & Libraries, SiFive

On Tue, Nov 22, 2022 at 6:39 PM Wei Wu (吴伟) <lazyparser@...> wrote:
Hi all,

I'm not against the proposal. I do think we should be careful/prudent.

On Wed, Nov 23, 2022 at 8:56 AM Bruce Hoult <bruce@...> wrote:
>
> >There is not a lot of software out in the wild, so I don't think we would be breaking very much.

OpenCV, OpenJDK, MLIR, NCNN and several other open source projects
have already been ported to RVV 1.0. We should be careful.

> Precisely zero, outside what people are running on emulators, yes? Or maybe in-house, There is zero RVV 1.0 hardware in the hands of end-users, and so zero software running on it.

We may still lack (enough) understanding of the progress of the RISC-V
ecosystem in China, Japan, and South Korea. There are many chip
startups in East Asia with the rapid realization of RVV as their main
competitive advantage.

Again, I'm not against the proposal. What I want to emphasize is that
we should be very careful about the consequences of changing ratified
spec. RISC-V is open standard, and we may have underestimated the
companies and users under the iceberg.

--
Best wishes,
Wei Wu (吴伟)






Re: Fix for omission in vector spec RVV 1.0 around source/dest overlap

Wei Wu (吴伟)
 

Hi all,

I'm not against the proposal. I do think we should be careful/prudent.

On Wed, Nov 23, 2022 at 8:56 AM Bruce Hoult <bruce@...> wrote:

There is not a lot of software out in the wild, so I don't think we would be breaking very much.
OpenCV, OpenJDK, MLIR, NCNN and several other open source projects
have already been ported to RVV 1.0. We should be careful.

Precisely zero, outside what people are running on emulators, yes? Or maybe in-house, There is zero RVV 1.0 hardware in the hands of end-users, and so zero software running on it.
We may still lack (enough) understanding of the progress of the RISC-V
ecosystem in China, Japan, and South Korea. There are many chip
startups in East Asia with the rapid realization of RVV as their main
competitive advantage.

Again, I'm not against the proposal. What I want to emphasize is that
we should be very careful about the consequences of changing ratified
spec. RISC-V is open standard, and we may have underestimated the
companies and users under the iceberg.

--
Best wishes,
Wei Wu (吴伟)


Re: Fix for omission in vector spec RVV 1.0 around source/dest overlap

Bruce Hoult
 

>There is not a lot of software out in the wild, so I don't think we would be breaking very much.

Precisely zero, outside what people are running on emulators, yes? Or maybe in-house, There is zero RVV 1.0 hardware in the hands of end-users, and so zero software running on it.

I hear this is going to change very soon, but that's the situation right now, as I understand it.


On Wed, Nov 23, 2022 at 12:29 PM Guy Lemieux <guy.lemieux@...> wrote:
Changing the spec does not make implementations which honour the old
spec non-compliant. This is the first compatibility check.

However, changing the spec does make software that depends on that
behaviour of the old spec incompatible with implementations of the new
spec. In this sense, the specification change is not
backward-compatible with software.

There is not a lot of software out in the wild, so I don't think we
would be breaking very much.

First, I'd like to note there are two parts of the original register
that we are talking about:
(1) the non-written elements (left behind because the new EEW is smaller), and
(2) the tail (which would be ignored because of VL at the original EEW).

The problem with EEW-changing instructions where source/dest overlap
is in region (1). Normally, since EEW is smaller, this means the upper
1/2, 3/4, or 7/8 elements of the source vector register would be
non-written elements. (For the widening reductions, then there would
be even more non-written elements.)

However, I don't really like this proposal because a few things come to mind:

(a) This introduces an inconsistency in the spec, which programmers
have no reason to expect. The reason for the inconsistency is "to make
implementations easier", which programmers do not fully understand
(and therefore do not expect). From a programmer's perspective, there
is a mode setting that says undisturbed, so implementations should
honour it, not choose whichever mode is easier depending upon the
instruction. I don't like the idea of introducing such non-predictable
inconsistencies, as they tend to cause debugging nightmares.

(b) The undisturbed mode gives the potential to use this feature as a
software-managed data cache (to reduce data fetches, either for
performance, power, or because the data is volatile). Although the
lower element(s) may get clobbered, there may be value in preserving
the upper element(s) which is based on the exact same argument for
having the undisturbed mode in the first place with regular
(EEW-preserving) instructions.

(c) Programmers already have the ability to choose the agnostic mode
for performance, so it should not matter that the undisturbed mode of
these EEW-changing instructions might run more slowly to copy or
rearrange the data.

(d) This seems to be asking for permission to be excused from the
really hard part of the homework because it's hard (or slow).

An alternative way to handle this is to add a profile/platform
allowance to the spec that implementations which are always agnostic
for all instructions (probably for both tails and masks)? This would
simplify HPC-oriented implementations, and allow complete consistency
(aka architectural uniformity/predictability) within that
profile/platform. The biggest issue is how to handle the resulting
schism in software?

Thanks,
Guy


On Tue, Nov 22, 2022 at 1:14 PM Krste Asanovic <krste@...> wrote:
>
> Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
> Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.
>
> I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
> It is really not something software would do, so the effort would be wasted.
>
> That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
> It should have been caught earlier, but the fix is benign.
>
> Krste
>
> On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:
>
> Allen,
>
> Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it.  Is that correct?
>
> Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries?  I see a claim made here, but no evidence given.  I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion.  Has anyone else done that work in a form they can summarize and share?
>
> Philip
>
> On 11/22/22 12:19, Allen Baum wrote:
>
> There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:
>
> The following changes have been made since version 1.11, which, while not strictly backwards compatible,
> are not anticipated to cause software portability problems in practice:
>
>  The rationale for this  "clarification" explicitly says  this changes the cases that " can not be sensibly used by application software, ", which is the key.
> So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior
>
> I would agree that the wording
>     "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
> is too strong. It seems like it would be more accurate to say
>     "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."
>
> On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
>>
>> Krste,
>>
>> I am confused how this proposed change does not break compatibility with
>> the 1.0 vector spec.  If there's a bit of code in the wild which can
>> witness and rely upon the old behavior, doesn't the new restriction make
>> that bit of assembly non-compliant with the proposed spec version?
>>
>> I do accept that the proposed spec allows a subset of the legal assembly
>> programs that the old one does.  My point of confusion is how that can
>> claim to be compatible when there are assembly programs which are well
>> defined under the old spec, and yet not under the new spec.   Your point
>> below seems to address how hardware which implemented the v1.0 spec is
>> compatible with the spec after the proposed change, but I don't see the
>> same for software.  That is, this doesn't seem compatible with software
>> written to the old spec.
>>
>> Yours,
>> Philip Reames
>>
>> On 11/1/22 23:51, Krste Asanovic wrote:
>> > A few issues have been identified in corners of the vector spec.
>> >
>> > The first change was an error of omission in not catching some cases
>> > of source and destination register overlap that can not be sensibly
>> > used by application software, but which add complexity for
>> > implementations that internally rearrange data based on EEW.
>> >
>> > The problematic case is when source and destination overlap but have
>> > different EEW, and the instruction is mask-undisturbed or
>> > tail-undisturbed.  This case does not have a real use in software, as
>> > the elements being left undisturbed are a different EEW than the new
>> > elements being written.  This operation requires that the same
>> > architectural register is treated as two different EEWs by one
>> > instruction, which adds considerable complexity to implementations
>> > that rearrange data internally based on EEW for no benefit.
>> >
>> > Proposed addition is:
>> >
>> > "when source and destination registers overlap and have mismatched
>> > EEW, the instruction is mask- and tail-agnostic, regardless of vta and
>> > mta".
>> >
>> > The proposed solution defines this case as always agnostic so existing
>> > implementations can continue to work as before (e.g., implementing
>> > undisturbed when requested), while not burdening implementations that
>> > rearrange data internally.  The assertion is that no software would
>> > rely on the undisturbed behavior in this case.
>> >
>> > Note, this also applies to widening reductions.
>> >
>> > IMPORTANT: The proposed fix does not break compatibility of
>> > implementations adhering to the ratified v1.0 spec.
>> >
>> > The proposal is to add this to the vector spec as a bug fix.
>> >
>> > Krste
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>
>






Re: Fix for omission in vector spec RVV 1.0 around source/dest overlap

Guy Lemieux
 

Changing the spec does not make implementations which honour the old
spec non-compliant. This is the first compatibility check.

However, changing the spec does make software that depends on that
behaviour of the old spec incompatible with implementations of the new
spec. In this sense, the specification change is not
backward-compatible with software.

There is not a lot of software out in the wild, so I don't think we
would be breaking very much.

First, I'd like to note there are two parts of the original register
that we are talking about:
(1) the non-written elements (left behind because the new EEW is smaller), and
(2) the tail (which would be ignored because of VL at the original EEW).

The problem with EEW-changing instructions where source/dest overlap
is in region (1). Normally, since EEW is smaller, this means the upper
1/2, 3/4, or 7/8 elements of the source vector register would be
non-written elements. (For the widening reductions, then there would
be even more non-written elements.)

However, I don't really like this proposal because a few things come to mind:

(a) This introduces an inconsistency in the spec, which programmers
have no reason to expect. The reason for the inconsistency is "to make
implementations easier", which programmers do not fully understand
(and therefore do not expect). From a programmer's perspective, there
is a mode setting that says undisturbed, so implementations should
honour it, not choose whichever mode is easier depending upon the
instruction. I don't like the idea of introducing such non-predictable
inconsistencies, as they tend to cause debugging nightmares.

(b) The undisturbed mode gives the potential to use this feature as a
software-managed data cache (to reduce data fetches, either for
performance, power, or because the data is volatile). Although the
lower element(s) may get clobbered, there may be value in preserving
the upper element(s) which is based on the exact same argument for
having the undisturbed mode in the first place with regular
(EEW-preserving) instructions.

(c) Programmers already have the ability to choose the agnostic mode
for performance, so it should not matter that the undisturbed mode of
these EEW-changing instructions might run more slowly to copy or
rearrange the data.

(d) This seems to be asking for permission to be excused from the
really hard part of the homework because it's hard (or slow).

An alternative way to handle this is to add a profile/platform
allowance to the spec that implementations which are always agnostic
for all instructions (probably for both tails and masks)? This would
simplify HPC-oriented implementations, and allow complete consistency
(aka architectural uniformity/predictability) within that
profile/platform. The biggest issue is how to handle the resulting
schism in software?

Thanks,
Guy

On Tue, Nov 22, 2022 at 1:14 PM Krste Asanovic <krste@...> wrote:

Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.

I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
It is really not something software would do, so the effort would be wasted.

That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
It should have been caught earlier, but the fix is benign.

Krste

On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:

Allen,

Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it. Is that correct?

Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries? I see a claim made here, but no evidence given. I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion. Has anyone else done that work in a form they can summarize and share?

Philip

On 11/22/22 12:19, Allen Baum wrote:

There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:

The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice:

The rationale for this "clarification" explicitly says this changes the cases that " can not be sensibly used by application software, ", which is the key.
So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior

I would agree that the wording
"IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
is too strong. It seems like it would be more accurate to say
"IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."

On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:

Krste,

I am confused how this proposed change does not break compatibility with
the 1.0 vector spec. If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?

I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does. My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec. Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software. That is, this doesn't seem compatible with software
written to the old spec.

Yours,
Philip Reames

On 11/1/22 23:51, Krste Asanovic wrote:
A few issues have been identified in corners of the vector spec.

The first change was an error of omission in not catching some cases
of source and destination register overlap that can not be sensibly
used by application software, but which add complexity for
implementations that internally rearrange data based on EEW.

The problematic case is when source and destination overlap but have
different EEW, and the instruction is mask-undisturbed or
tail-undisturbed. This case does not have a real use in software, as
the elements being left undisturbed are a different EEW than the new
elements being written. This operation requires that the same
architectural register is treated as two different EEWs by one
instruction, which adds considerable complexity to implementations
that rearrange data internally based on EEW for no benefit.

Proposed addition is:

"when source and destination registers overlap and have mismatched
EEW, the instruction is mask- and tail-agnostic, regardless of vta and
mta".

The proposed solution defines this case as always agnostic so existing
implementations can continue to work as before (e.g., implementing
undisturbed when requested), while not burdening implementations that
rearrange data internally. The assertion is that no software would
rely on the undisturbed behavior in this case.

Note, this also applies to widening reductions.

IMPORTANT: The proposed fix does not break compatibility of
implementations adhering to the ratified v1.0 spec.

The proposal is to add this to the vector spec as a bug fix.

Krste








Re: Fix for omission in vector spec RVV 1.0 around source/dest overlap

Philip Reames
 

Krste,

Since my input is clearly not welcome, I will stop here.

Philip

On 11/22/22 14:46, Krste Asanovic wrote:

This does not need intuition.
Again - please read the actual delta to understand why software would not ever do this.
It is very clear once you understand the capability that is being removed.

Krste

On Nov 22, 2022, at 2:41 PM, Philip Reames <preames@...> wrote:

Krste,

Do you have any evidence to backup your claim that this isn't something software would do?  Or is this intuition?

I want to be clear here, I'm not arguing this claim is wrong.  I'm simply trying to understand what work has already been done here. 

Philip

On 11/22/22 13:14, Krste Asanovic wrote:
Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.

I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
It is really not something software would do, so the effort would be wasted.

That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
It should have been caught earlier, but the fix is benign.

Krste

On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:

Allen,

Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it.  Is that correct?

Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries?  I see a claim made here, but no evidence given.  I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion.  Has anyone else done that work in a form they can summarize and share?

Philip

On 11/22/22 12:19, Allen Baum wrote:
There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:

The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice: 

 The rationale for this  "clarification" explicitly says  this changes the cases that " can not be sensibly used by application software, ", which is the key.
So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior

I would agree that the wording 
    "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
is too strong. It seems like it would be more accurate to say 
    "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."

On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
Krste,

I am confused how this proposed change does not break compatibility with
the 1.0 vector spec.  If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?

I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does.  My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec.   Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software.  That is, this doesn't seem compatible with software
written to the old spec.

Yours,
Philip Reames

On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the vector spec.
>
> The first change was an error of omission in not catching some cases
> of source and destination register overlap that can not be sensibly
> used by application software, but which add complexity for
> implementations that internally rearrange data based on EEW.
>
> The problematic case is when source and destination overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed.  This case does not have a real use in software, as
> the elements being left undisturbed are a different EEW than the new
> elements being written.  This operation requires that the same
> architectural register is treated as two different EEWs by one
> instruction, which adds considerable complexity to implementations
> that rearrange data internally based on EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have mismatched
> EEW, the instruction is mask- and tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case as always agnostic so existing
> implementations can continue to work as before (e.g., implementing
> undisturbed when requested), while not burdening implementations that
> rearrange data internally.  The assertion is that no software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug fix.
>
> Krste
>
>
>
>
>








Re: Fix for omission in vector spec RVV 1.0 around source/dest overlap

Krste Asanovic
 

This does not need intuition.
Again - please read the actual delta to understand why software would not ever do this.
It is very clear once you understand the capability that is being removed.

Krste

On Nov 22, 2022, at 2:41 PM, Philip Reames <preames@...> wrote:

Krste,

Do you have any evidence to backup your claim that this isn't something software would do?  Or is this intuition?

I want to be clear here, I'm not arguing this claim is wrong.  I'm simply trying to understand what work has already been done here. 

Philip

On 11/22/22 13:14, Krste Asanovic wrote:
Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.

I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
It is really not something software would do, so the effort would be wasted.

That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
It should have been caught earlier, but the fix is benign.

Krste

On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:

Allen,

Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it.  Is that correct?

Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries?  I see a claim made here, but no evidence given.  I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion.  Has anyone else done that work in a form they can summarize and share?

Philip

On 11/22/22 12:19, Allen Baum wrote:
There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:

The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice: 

 The rationale for this  "clarification" explicitly says  this changes the cases that " can not be sensibly used by application software, ", which is the key.
So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior

I would agree that the wording 
    "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
is too strong. It seems like it would be more accurate to say 
    "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."

On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
Krste,

I am confused how this proposed change does not break compatibility with
the 1.0 vector spec.  If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?

I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does.  My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec.   Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software.  That is, this doesn't seem compatible with software
written to the old spec.

Yours,
Philip Reames

On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the vector spec.
>
> The first change was an error of omission in not catching some cases
> of source and destination register overlap that can not be sensibly
> used by application software, but which add complexity for
> implementations that internally rearrange data based on EEW.
>
> The problematic case is when source and destination overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed.  This case does not have a real use in software, as
> the elements being left undisturbed are a different EEW than the new
> elements being written.  This operation requires that the same
> architectural register is treated as two different EEWs by one
> instruction, which adds considerable complexity to implementations
> that rearrange data internally based on EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have mismatched
> EEW, the instruction is mask- and tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case as always agnostic so existing
> implementations can continue to work as before (e.g., implementing
> undisturbed when requested), while not burdening implementations that
> rearrange data internally.  The assertion is that no software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug fix.
>
> Krste
>
>
>
>
>








Re: Fix for omission in vector spec RVV 1.0 around source/dest overlap

Philip Reames
 

Krste,

Do you have any evidence to backup your claim that this isn't something software would do?  Or is this intuition?

I want to be clear here, I'm not arguing this claim is wrong.  I'm simply trying to understand what work has already been done here. 

Philip

On 11/22/22 13:14, Krste Asanovic wrote:

Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.

I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
It is really not something software would do, so the effort would be wasted.

That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
It should have been caught earlier, but the fix is benign.

Krste

On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:

Allen,

Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it.  Is that correct?

Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries?  I see a claim made here, but no evidence given.  I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion.  Has anyone else done that work in a form they can summarize and share?

Philip

On 11/22/22 12:19, Allen Baum wrote:
There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:

The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice: 

 The rationale for this  "clarification" explicitly says  this changes the cases that " can not be sensibly used by application software, ", which is the key.
So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior

I would agree that the wording 
    "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
is too strong. It seems like it would be more accurate to say 
    "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."

On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
Krste,

I am confused how this proposed change does not break compatibility with
the 1.0 vector spec.  If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?

I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does.  My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec.   Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software.  That is, this doesn't seem compatible with software
written to the old spec.

Yours,
Philip Reames

On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the vector spec.
>
> The first change was an error of omission in not catching some cases
> of source and destination register overlap that can not be sensibly
> used by application software, but which add complexity for
> implementations that internally rearrange data based on EEW.
>
> The problematic case is when source and destination overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed.  This case does not have a real use in software, as
> the elements being left undisturbed are a different EEW than the new
> elements being written.  This operation requires that the same
> architectural register is treated as two different EEWs by one
> instruction, which adds considerable complexity to implementations
> that rearrange data internally based on EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have mismatched
> EEW, the instruction is mask- and tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case as always agnostic so existing
> implementations can continue to work as before (e.g., implementing
> undisturbed when requested), while not burdening implementations that
> rearrange data internally.  The assertion is that no software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug fix.
>
> Krste
>
>
>
>
>






1 - 20 of 872