Fix for omission in vector spec RVV 1.0 around source/dest overlap
The first change was an error of omission in not catching some cases
of source and destination register overlap that can not be sensibly
used by application software, but which add complexity for
implementations that internally rearrange data based on EEW.
The problematic case is when source and destination overlap but have
different EEW, and the instruction is mask-undisturbed or
tail-undisturbed. This case does not have a real use in software, as
the elements being left undisturbed are a different EEW than the new
elements being written. This operation requires that the same
architectural register is treated as two different EEWs by one
instruction, which adds considerable complexity to implementations
that rearrange data internally based on EEW for no benefit.
Proposed addition is:
"when source and destination registers overlap and have mismatched
EEW, the instruction is mask- and tail-agnostic, regardless of vta and
mta".
The proposed solution defines this case as always agnostic so existing
implementations can continue to work as before (e.g., implementing
undisturbed when requested), while not burdening implementations that
rearrange data internally. The assertion is that no software would
rely on the undisturbed behavior in this case.
Note, this also applies to widening reductions.
IMPORTANT: The proposed fix does not break compatibility of
implementations adhering to the ratified v1.0 spec.
The proposal is to add this to the vector spec as a bug fix.
Krste
I am confused how this proposed change does not break compatibility with the 1.0 vector spec. If there's a bit of code in the wild which can witness and rely upon the old behavior, doesn't the new restriction make that bit of assembly non-compliant with the proposed spec version?
I do accept that the proposed spec allows a subset of the legal assembly programs that the old one does. My point of confusion is how that can claim to be compatible when there are assembly programs which are well defined under the old spec, and yet not under the new spec. Your point below seems to address how hardware which implemented the v1.0 spec is compatible with the spec after the proposed change, but I don't see the same for software. That is, this doesn't seem compatible with software written to the old spec.
Yours,
Philip Reames
A few issues have been identified in corners of the vector spec.
The first change was an error of omission in not catching some cases
of source and destination register overlap that can not be sensibly
used by application software, but which add complexity for
implementations that internally rearrange data based on EEW.
The problematic case is when source and destination overlap but have
different EEW, and the instruction is mask-undisturbed or
tail-undisturbed. This case does not have a real use in software, as
the elements being left undisturbed are a different EEW than the new
elements being written. This operation requires that the same
architectural register is treated as two different EEWs by one
instruction, which adds considerable complexity to implementations
that rearrange data internally based on EEW for no benefit.
Proposed addition is:
"when source and destination registers overlap and have mismatched
EEW, the instruction is mask- and tail-agnostic, regardless of vta and
mta".
The proposed solution defines this case as always agnostic so existing
implementations can continue to work as before (e.g., implementing
undisturbed when requested), while not burdening implementations that
rearrange data internally. The assertion is that no software would
rely on the undisturbed behavior in this case.
Note, this also applies to widening reductions.
IMPORTANT: The proposed fix does not break compatibility of
implementations adhering to the ratified v1.0 spec.
The proposal is to add this to the vector spec as a bug fix.
Krste
The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice:
Krste,
I am confused how this proposed change does not break compatibility with
the 1.0 vector spec. If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?
I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does. My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec. Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software. That is, this doesn't seem compatible with software
written to the old spec.
Yours,
Philip Reames
On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the vector spec.
>
> The first change was an error of omission in not catching some cases
> of source and destination register overlap that can not be sensibly
> used by application software, but which add complexity for
> implementations that internally rearrange data based on EEW.
>
> The problematic case is when source and destination overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed. This case does not have a real use in software, as
> the elements being left undisturbed are a different EEW than the new
> elements being written. This operation requires that the same
> architectural register is treated as two different EEWs by one
> instruction, which adds considerable complexity to implementations
> that rearrange data internally based on EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have mismatched
> EEW, the instruction is mask- and tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case as always agnostic so existing
> implementations can continue to work as before (e.g., implementing
> undisturbed when requested), while not burdening implementations that
> rearrange data internally. The assertion is that no software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug fix.
>
> Krste
>
>
>
>
>
Allen,
Sounds like you agree that this isn't strictly compatible with
1.0, and we're now debating what to do about it. Is that correct?
Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries? I see a claim made here, but no evidence given. I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion. Has anyone else done that work in a form they can summarize and share?
Philip
There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice:The rationale for this "clarification" explicitly says this changes the cases that " can not be sensibly used by application software, ", which is the key.So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior
I would agree that the wording"IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."is too strong. It seems like it would be more accurate to say"IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."
On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
Krste,
I am confused how this proposed change does not break compatibility with
the 1.0 vector spec. If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?
I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does. My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec. Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software. That is, this doesn't seem compatible with software
written to the old spec.
Yours,
Philip Reames
On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the vector spec.
>
> The first change was an error of omission in not catching some cases
> of source and destination register overlap that can not be sensibly
> used by application software, but which add complexity for
> implementations that internally rearrange data based on EEW.
>
> The problematic case is when source and destination overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed. This case does not have a real use in software, as
> the elements being left undisturbed are a different EEW than the new
> elements being written. This operation requires that the same
> architectural register is treated as two different EEWs by one
> instruction, which adds considerable complexity to implementations
> that rearrange data internally based on EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have mismatched
> EEW, the instruction is mask- and tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case as always agnostic so existing
> implementations can continue to work as before (e.g., implementing
> undisturbed when requested), while not burdening implementations that
> rearrange data internally. The assertion is that no software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug fix.
>
> Krste
>
>
>
>
>
On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:Allen,
Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it. Is that correct?
Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries? I see a claim made here, but no evidence given. I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion. Has anyone else done that work in a form they can summarize and share?
Philip
On 11/22/22 12:19, Allen Baum wrote:
There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice:The rationale for this "clarification" explicitly says this changes the cases that " can not be sensibly used by application software, ", which is the key.So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior
I would agree that the wording"IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."is too strong. It seems like it would be more accurate to say"IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."
On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
Krste,
I am confused how this proposed change does not break compatibility with
the 1.0 vector spec. If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?
I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does. My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec. Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software. That is, this doesn't seem compatible with software
written to the old spec.
Yours,
Philip Reames
On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the vector spec.
>
> The first change was an error of omission in not catching some cases
> of source and destination register overlap that can not be sensibly
> used by application software, but which add complexity for
> implementations that internally rearrange data based on EEW.
>
> The problematic case is when source and destination overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed. This case does not have a real use in software, as
> the elements being left undisturbed are a different EEW than the new
> elements being written. This operation requires that the same
> architectural register is treated as two different EEWs by one
> instruction, which adds considerable complexity to implementations
> that rearrange data internally based on EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have mismatched
> EEW, the instruction is mask- and tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case as always agnostic so existing
> implementations can continue to work as before (e.g., implementing
> undisturbed when requested), while not burdening implementations that
> rearrange data internally. The assertion is that no software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug fix.
>
> Krste
>
>
>
>
>
Krste,
Do you have any evidence to backup your claim that this isn't something software would do? Or is this intuition?
I want to be clear here, I'm not arguing this claim is wrong.
I'm simply trying to understand what work has already been done
here.
Philip
Existing implementations of the ISA remain compatible - this text is correct and does not need to change.Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.
I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.It is really not something software would do, so the effort would be wasted.
That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.It should have been caught earlier, but the fix is benign.
Krste
On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:
Allen,
Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it. Is that correct?
Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries? I see a claim made here, but no evidence given. I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion. Has anyone else done that work in a form they can summarize and share?
Philip
On 11/22/22 12:19, Allen Baum wrote:
There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice:The rationale for this "clarification" explicitly says this changes the cases that " can not be sensibly used by application software, ", which is the key.So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior
I would agree that the wording"IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."is too strong. It seems like it would be more accurate to say"IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."
On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
Krste,
I am confused how this proposed change does not break compatibility with
the 1.0 vector spec. If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?
I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does. My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec. Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software. That is, this doesn't seem compatible with software
written to the old spec.
Yours,
Philip Reames
On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the vector spec.
>
> The first change was an error of omission in not catching some cases
> of source and destination register overlap that can not be sensibly
> used by application software, but which add complexity for
> implementations that internally rearrange data based on EEW.
>
> The problematic case is when source and destination overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed. This case does not have a real use in software, as
> the elements being left undisturbed are a different EEW than the new
> elements being written. This operation requires that the same
> architectural register is treated as two different EEWs by one
> instruction, which adds considerable complexity to implementations
> that rearrange data internally based on EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have mismatched
> EEW, the instruction is mask- and tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case as always agnostic so existing
> implementations can continue to work as before (e.g., implementing
> undisturbed when requested), while not burdening implementations that
> rearrange data internally. The assertion is that no software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug fix.
>
> Krste
>
>
>
>
>
On Nov 22, 2022, at 2:41 PM, Philip Reames <preames@...> wrote:Krste,
Do you have any evidence to backup your claim that this isn't something software would do? Or is this intuition?
I want to be clear here, I'm not arguing this claim is wrong. I'm simply trying to understand what work has already been done here.
Philip
On 11/22/22 13:14, Krste Asanovic wrote:
Existing implementations of the ISA remain compatible - this text is correct and does not need to change.Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.
I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.It is really not something software would do, so the effort would be wasted.
That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.It should have been caught earlier, but the fix is benign.
Krste
On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:
Allen,
Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it. Is that correct?
Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries? I see a claim made here, but no evidence given. I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion. Has anyone else done that work in a form they can summarize and share?
Philip
On 11/22/22 12:19, Allen Baum wrote:
There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice:The rationale for this "clarification" explicitly says this changes the cases that " can not be sensibly used by application software, ", which is the key.So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior
I would agree that the wording"IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."is too strong. It seems like it would be more accurate to say"IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."
On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
Krste,
I am confused how this proposed change does not break compatibility with
the 1.0 vector spec. If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?
I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does. My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec. Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software. That is, this doesn't seem compatible with software
written to the old spec.
Yours,
Philip Reames
On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the vector spec.
>
> The first change was an error of omission in not catching some cases
> of source and destination register overlap that can not be sensibly
> used by application software, but which add complexity for
> implementations that internally rearrange data based on EEW.
>
> The problematic case is when source and destination overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed. This case does not have a real use in software, as
> the elements being left undisturbed are a different EEW than the new
> elements being written. This operation requires that the same
> architectural register is treated as two different EEWs by one
> instruction, which adds considerable complexity to implementations
> that rearrange data internally based on EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have mismatched
> EEW, the instruction is mask- and tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case as always agnostic so existing
> implementations can continue to work as before (e.g., implementing
> undisturbed when requested), while not burdening implementations that
> rearrange data internally. The assertion is that no software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug fix.
>
> Krste
>
>
>
>
>
Krste,
Since my input is clearly not welcome, I will stop here.
Philip
This does not need intuition.
Again - please read the actual delta to understand why software would not ever do this.It is very clear once you understand the capability that is being removed.
Krste
On Nov 22, 2022, at 2:41 PM, Philip Reames <preames@...> wrote:
Krste,
Do you have any evidence to backup your claim that this isn't something software would do? Or is this intuition?
I want to be clear here, I'm not arguing this claim is wrong. I'm simply trying to understand what work has already been done here.
Philip
On 11/22/22 13:14, Krste Asanovic wrote:
Existing implementations of the ISA remain compatible - this text is correct and does not need to change.Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.
I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.It is really not something software would do, so the effort would be wasted.
That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.It should have been caught earlier, but the fix is benign.
Krste
On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:
Allen,
Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it. Is that correct?
Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries? I see a claim made here, but no evidence given. I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion. Has anyone else done that work in a form they can summarize and share?
Philip
On 11/22/22 12:19, Allen Baum wrote:
There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice:The rationale for this "clarification" explicitly says this changes the cases that " can not be sensibly used by application software, ", which is the key.So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior
I would agree that the wording"IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."is too strong. It seems like it would be more accurate to say"IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."
On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
Krste,
I am confused how this proposed change does not break compatibility with
the 1.0 vector spec. If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?
I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does. My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec. Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software. That is, this doesn't seem compatible with software
written to the old spec.
Yours,
Philip Reames
On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the vector spec.
>
> The first change was an error of omission in not catching some cases
> of source and destination register overlap that can not be sensibly
> used by application software, but which add complexity for
> implementations that internally rearrange data based on EEW.
>
> The problematic case is when source and destination overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed. This case does not have a real use in software, as
> the elements being left undisturbed are a different EEW than the new
> elements being written. This operation requires that the same
> architectural register is treated as two different EEWs by one
> instruction, which adds considerable complexity to implementations
> that rearrange data internally based on EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have mismatched
> EEW, the instruction is mask- and tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case as always agnostic so existing
> implementations can continue to work as before (e.g., implementing
> undisturbed when requested), while not burdening implementations that
> rearrange data internally. The assertion is that no software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug fix.
>
> Krste
>
>
>
>
>
spec non-compliant. This is the first compatibility check.
However, changing the spec does make software that depends on that
behaviour of the old spec incompatible with implementations of the new
spec. In this sense, the specification change is not
backward-compatible with software.
There is not a lot of software out in the wild, so I don't think we
would be breaking very much.
First, I'd like to note there are two parts of the original register
that we are talking about:
(1) the non-written elements (left behind because the new EEW is smaller), and
(2) the tail (which would be ignored because of VL at the original EEW).
The problem with EEW-changing instructions where source/dest overlap
is in region (1). Normally, since EEW is smaller, this means the upper
1/2, 3/4, or 7/8 elements of the source vector register would be
non-written elements. (For the widening reductions, then there would
be even more non-written elements.)
However, I don't really like this proposal because a few things come to mind:
(a) This introduces an inconsistency in the spec, which programmers
have no reason to expect. The reason for the inconsistency is "to make
implementations easier", which programmers do not fully understand
(and therefore do not expect). From a programmer's perspective, there
is a mode setting that says undisturbed, so implementations should
honour it, not choose whichever mode is easier depending upon the
instruction. I don't like the idea of introducing such non-predictable
inconsistencies, as they tend to cause debugging nightmares.
(b) The undisturbed mode gives the potential to use this feature as a
software-managed data cache (to reduce data fetches, either for
performance, power, or because the data is volatile). Although the
lower element(s) may get clobbered, there may be value in preserving
the upper element(s) which is based on the exact same argument for
having the undisturbed mode in the first place with regular
(EEW-preserving) instructions.
(c) Programmers already have the ability to choose the agnostic mode
for performance, so it should not matter that the undisturbed mode of
these EEW-changing instructions might run more slowly to copy or
rearrange the data.
(d) This seems to be asking for permission to be excused from the
really hard part of the homework because it's hard (or slow).
An alternative way to handle this is to add a profile/platform
allowance to the spec that implementations which are always agnostic
for all instructions (probably for both tails and masks)? This would
simplify HPC-oriented implementations, and allow complete consistency
(aka architectural uniformity/predictability) within that
profile/platform. The biggest issue is how to handle the resulting
schism in software?
Thanks,
Guy
Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.
I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
It is really not something software would do, so the effort would be wasted.
That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
It should have been caught earlier, but the fix is benign.
Krste
On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:
Allen,
Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it. Is that correct?
Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries? I see a claim made here, but no evidence given. I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion. Has anyone else done that work in a form they can summarize and share?
Philip
On 11/22/22 12:19, Allen Baum wrote:
There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:
The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice:
The rationale for this "clarification" explicitly says this changes the cases that " can not be sensibly used by application software, ", which is the key.
So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior
I would agree that the wording
"IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
is too strong. It seems like it would be more accurate to say
"IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."
On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
Krste,
I am confused how this proposed change does not break compatibility with
the 1.0 vector spec. If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?
I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does. My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec. Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software. That is, this doesn't seem compatible with software
written to the old spec.
Yours,
Philip Reames
On 11/1/22 23:51, Krste Asanovic wrote:A few issues have been identified in corners of the vector spec.
The first change was an error of omission in not catching some cases
of source and destination register overlap that can not be sensibly
used by application software, but which add complexity for
implementations that internally rearrange data based on EEW.
The problematic case is when source and destination overlap but have
different EEW, and the instruction is mask-undisturbed or
tail-undisturbed. This case does not have a real use in software, as
the elements being left undisturbed are a different EEW than the new
elements being written. This operation requires that the same
architectural register is treated as two different EEWs by one
instruction, which adds considerable complexity to implementations
that rearrange data internally based on EEW for no benefit.
Proposed addition is:
"when source and destination registers overlap and have mismatched
EEW, the instruction is mask- and tail-agnostic, regardless of vta and
mta".
The proposed solution defines this case as always agnostic so existing
implementations can continue to work as before (e.g., implementing
undisturbed when requested), while not burdening implementations that
rearrange data internally. The assertion is that no software would
rely on the undisturbed behavior in this case.
Note, this also applies to widening reductions.
IMPORTANT: The proposed fix does not break compatibility of
implementations adhering to the ratified v1.0 spec.
The proposal is to add this to the vector spec as a bug fix.
Krste
Changing the spec does not make implementations which honour the old
spec non-compliant. This is the first compatibility check.
However, changing the spec does make software that depends on that
behaviour of the old spec incompatible with implementations of the new
spec. In this sense, the specification change is not
backward-compatible with software.
There is not a lot of software out in the wild, so I don't think we
would be breaking very much.
First, I'd like to note there are two parts of the original register
that we are talking about:
(1) the non-written elements (left behind because the new EEW is smaller), and
(2) the tail (which would be ignored because of VL at the original EEW).
The problem with EEW-changing instructions where source/dest overlap
is in region (1). Normally, since EEW is smaller, this means the upper
1/2, 3/4, or 7/8 elements of the source vector register would be
non-written elements. (For the widening reductions, then there would
be even more non-written elements.)
However, I don't really like this proposal because a few things come to mind:
(a) This introduces an inconsistency in the spec, which programmers
have no reason to expect. The reason for the inconsistency is "to make
implementations easier", which programmers do not fully understand
(and therefore do not expect). From a programmer's perspective, there
is a mode setting that says undisturbed, so implementations should
honour it, not choose whichever mode is easier depending upon the
instruction. I don't like the idea of introducing such non-predictable
inconsistencies, as they tend to cause debugging nightmares.
(b) The undisturbed mode gives the potential to use this feature as a
software-managed data cache (to reduce data fetches, either for
performance, power, or because the data is volatile). Although the
lower element(s) may get clobbered, there may be value in preserving
the upper element(s) which is based on the exact same argument for
having the undisturbed mode in the first place with regular
(EEW-preserving) instructions.
(c) Programmers already have the ability to choose the agnostic mode
for performance, so it should not matter that the undisturbed mode of
these EEW-changing instructions might run more slowly to copy or
rearrange the data.
(d) This seems to be asking for permission to be excused from the
really hard part of the homework because it's hard (or slow).
An alternative way to handle this is to add a profile/platform
allowance to the spec that implementations which are always agnostic
for all instructions (probably for both tails and masks)? This would
simplify HPC-oriented implementations, and allow complete consistency
(aka architectural uniformity/predictability) within that
profile/platform. The biggest issue is how to handle the resulting
schism in software?
Thanks,
Guy
On Tue, Nov 22, 2022 at 1:14 PM Krste Asanovic <krste@...> wrote:
>
> Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
> Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.
>
> I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
> It is really not something software would do, so the effort would be wasted.
>
> That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
> It should have been caught earlier, but the fix is benign.
>
> Krste
>
> On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:
>
> Allen,
>
> Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it. Is that correct?
>
> Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries? I see a claim made here, but no evidence given. I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion. Has anyone else done that work in a form they can summarize and share?
>
> Philip
>
> On 11/22/22 12:19, Allen Baum wrote:
>
> There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:
>
> The following changes have been made since version 1.11, which, while not strictly backwards compatible,
> are not anticipated to cause software portability problems in practice:
>
> The rationale for this "clarification" explicitly says this changes the cases that " can not be sensibly used by application software, ", which is the key.
> So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior
>
> I would agree that the wording
> "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
> is too strong. It seems like it would be more accurate to say
> "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."
>
> On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
>>
>> Krste,
>>
>> I am confused how this proposed change does not break compatibility with
>> the 1.0 vector spec. If there's a bit of code in the wild which can
>> witness and rely upon the old behavior, doesn't the new restriction make
>> that bit of assembly non-compliant with the proposed spec version?
>>
>> I do accept that the proposed spec allows a subset of the legal assembly
>> programs that the old one does. My point of confusion is how that can
>> claim to be compatible when there are assembly programs which are well
>> defined under the old spec, and yet not under the new spec. Your point
>> below seems to address how hardware which implemented the v1.0 spec is
>> compatible with the spec after the proposed change, but I don't see the
>> same for software. That is, this doesn't seem compatible with software
>> written to the old spec.
>>
>> Yours,
>> Philip Reames
>>
>> On 11/1/22 23:51, Krste Asanovic wrote:
>> > A few issues have been identified in corners of the vector spec.
>> >
>> > The first change was an error of omission in not catching some cases
>> > of source and destination register overlap that can not be sensibly
>> > used by application software, but which add complexity for
>> > implementations that internally rearrange data based on EEW.
>> >
>> > The problematic case is when source and destination overlap but have
>> > different EEW, and the instruction is mask-undisturbed or
>> > tail-undisturbed. This case does not have a real use in software, as
>> > the elements being left undisturbed are a different EEW than the new
>> > elements being written. This operation requires that the same
>> > architectural register is treated as two different EEWs by one
>> > instruction, which adds considerable complexity to implementations
>> > that rearrange data internally based on EEW for no benefit.
>> >
>> > Proposed addition is:
>> >
>> > "when source and destination registers overlap and have mismatched
>> > EEW, the instruction is mask- and tail-agnostic, regardless of vta and
>> > mta".
>> >
>> > The proposed solution defines this case as always agnostic so existing
>> > implementations can continue to work as before (e.g., implementing
>> > undisturbed when requested), while not burdening implementations that
>> > rearrange data internally. The assertion is that no software would
>> > rely on the undisturbed behavior in this case.
>> >
>> > Note, this also applies to widening reductions.
>> >
>> > IMPORTANT: The proposed fix does not break compatibility of
>> > implementations adhering to the ratified v1.0 spec.
>> >
>> > The proposal is to add this to the vector spec as a bug fix.
>> >
>> > Krste
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>
>
I'm not against the proposal. I do think we should be careful/prudent.
On Wed, Nov 23, 2022 at 8:56 AM Bruce Hoult <bruce@...> wrote:
OpenCV, OpenJDK, MLIR, NCNN and several other open source projectsThere is not a lot of software out in the wild, so I don't think we would be breaking very much.
have already been ported to RVV 1.0. We should be careful.
Precisely zero, outside what people are running on emulators, yes? Or maybe in-house, There is zero RVV 1.0 hardware in the hands of end-users, and so zero software running on it.We may still lack (enough) understanding of the progress of the RISC-V
ecosystem in China, Japan, and South Korea. There are many chip
startups in East Asia with the rapid realization of RVV as their main
competitive advantage.
Again, I'm not against the proposal. What I want to emphasize is that
we should be very careful about the consequences of changing ratified
spec. RISC-V is open standard, and we may have underestimated the
companies and users under the iceberg.
--
Best wishes,
Wei Wu (吴伟)
Hi all,
I'm not against the proposal. I do think we should be careful/prudent.
On Wed, Nov 23, 2022 at 8:56 AM Bruce Hoult <bruce@...> wrote:
>
> >There is not a lot of software out in the wild, so I don't think we would be breaking very much.
OpenCV, OpenJDK, MLIR, NCNN and several other open source projects
have already been ported to RVV 1.0. We should be careful.
> Precisely zero, outside what people are running on emulators, yes? Or maybe in-house, There is zero RVV 1.0 hardware in the hands of end-users, and so zero software running on it.
We may still lack (enough) understanding of the progress of the RISC-V
ecosystem in China, Japan, and South Korea. There are many chip
startups in East Asia with the rapid realization of RVV as their main
competitive advantage.
Again, I'm not against the proposal. What I want to emphasize is that
we should be very careful about the consequences of changing ratified
spec. RISC-V is open standard, and we may have underestimated the
companies and users under the iceberg.
--
Best wishes,
Wei Wu (吴伟)
Changing the spec does not make implementations which honour the old
spec non-compliant. This is the first compatibility check.
However, changing the spec does make software that depends on that
behaviour of the old spec incompatible with implementations of the new
spec. In this sense, the specification change is not
backward-compatible with software.
There is not a lot of software out in the wild, so I don't think we
would be breaking very much.
First, I'd like to note there are two parts of the original register
that we are talking about:
(1) the non-written elements (left behind because the new EEW is smaller), and
(2) the tail (which would be ignored because of VL at the original EEW).
The problem with EEW-changing instructions where source/dest overlap
is in region (1). Normally, since EEW is smaller, this means the upper
1/2, 3/4, or 7/8 elements of the source vector register would be
non-written elements. (For the widening reductions, then there would
be even more non-written elements.)
However, I don't really like this proposal because a few things come to mind:
(a) This introduces an inconsistency in the spec, which programmers
have no reason to expect. The reason for the inconsistency is "to make
implementations easier", which programmers do not fully understand
(and therefore do not expect). From a programmer's perspective, there
is a mode setting that says undisturbed, so implementations should
honour it, not choose whichever mode is easier depending upon the
instruction. I don't like the idea of introducing such non-predictable
inconsistencies, as they tend to cause debugging nightmares.
(b) The undisturbed mode gives the potential to use this feature as a
software-managed data cache (to reduce data fetches, either for
performance, power, or because the data is volatile). Although the
lower element(s) may get clobbered, there may be value in preserving
the upper element(s) which is based on the exact same argument for
having the undisturbed mode in the first place with regular
(EEW-preserving) instructions.
(c) Programmers already have the ability to choose the agnostic mode
for performance, so it should not matter that the undisturbed mode of
these EEW-changing instructions might run more slowly to copy or
rearrange the data.
(d) This seems to be asking for permission to be excused from the
really hard part of the homework because it's hard (or slow).
An alternative way to handle this is to add a profile/platform
allowance to the spec that implementations which are always agnostic
for all instructions (probably for both tails and masks)? This would
simplify HPC-oriented implementations, and allow complete consistency
(aka architectural uniformity/predictability) within that
profile/platform. The biggest issue is how to handle the resulting
schism in software?
Thanks,
Guy
On Tue, Nov 22, 2022 at 1:14 PM Krste Asanovic <krste@...> wrote:
>
> Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
> Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.
>
> I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
> It is really not something software would do, so the effort would be wasted.
>
> That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
> It should have been caught earlier, but the fix is benign.
>
> Krste
>
> On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:
>
> Allen,
>
> Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it. Is that correct?
>
> Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries? I see a claim made here, but no evidence given. I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion. Has anyone else done that work in a form they can summarize and share?
>
> Philip
>
> On 11/22/22 12:19, Allen Baum wrote:
>
> There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:
>
> The following changes have been made since version 1.11, which, while not strictly backwards compatible,
> are not anticipated to cause software portability problems in practice:
>
> The rationale for this "clarification" explicitly says this changes the cases that " can not be sensibly used by application software, ", which is the key.
> So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior
>
> I would agree that the wording
> "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
> is too strong. It seems like it would be more accurate to say
> "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."
>
> On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
>>
>> Krste,
>>
>> I am confused how this proposed change does not break compatibility with
>> the 1.0 vector spec. If there's a bit of code in the wild which can
>> witness and rely upon the old behavior, doesn't the new restriction make
>> that bit of assembly non-compliant with the proposed spec version?
>>
>> I do accept that the proposed spec allows a subset of the legal assembly
>> programs that the old one does. My point of confusion is how that can
>> claim to be compatible when there are assembly programs which are well
>> defined under the old spec, and yet not under the new spec. Your point
>> below seems to address how hardware which implemented the v1.0 spec is
>> compatible with the spec after the proposed change, but I don't see the
>> same for software. That is, this doesn't seem compatible with software
>> written to the old spec.
>>
>> Yours,
>> Philip Reames
>>
>> On 11/1/22 23:51, Krste Asanovic wrote:
>> > A few issues have been identified in corners of the vector spec.
>> >
>> > The first change was an error of omission in not catching some cases
>> > of source and destination register overlap that can not be sensibly
>> > used by application software, but which add complexity for
>> > implementations that internally rearrange data based on EEW.
>> >
>> > The problematic case is when source and destination overlap but have
>> > different EEW, and the instruction is mask-undisturbed or
>> > tail-undisturbed. This case does not have a real use in software, as
>> > the elements being left undisturbed are a different EEW than the new
>> > elements being written. This operation requires that the same
>> > architectural register is treated as two different EEWs by one
>> > instruction, which adds considerable complexity to implementations
>> > that rearrange data internally based on EEW for no benefit.
>> >
>> > Proposed addition is:
>> >
>> > "when source and destination registers overlap and have mismatched
>> > EEW, the instruction is mask- and tail-agnostic, regardless of vta and
>> > mta".
>> >
>> > The proposed solution defines this case as always agnostic so existing
>> > implementations can continue to work as before (e.g., implementing
>> > undisturbed when requested), while not burdening implementations that
>> > rearrange data internally. The assertion is that no software would
>> > rely on the undisturbed behavior in this case.
>> >
>> > Note, this also applies to widening reductions.
>> >
>> > IMPORTANT: The proposed fix does not break compatibility of
>> > implementations adhering to the ratified v1.0 spec.
>> >
>> > The proposal is to add this to the vector spec as a bug fix.
>> >
>> > Krste
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>
>