Fix for omission in vector spec RVV 1.0 around source/dest overlap


Krste Asanovic
 

A few issues have been identified in corners of the vector spec.

The first change was an error of omission in not catching some cases
of source and destination register overlap that can not be sensibly
used by application software, but which add complexity for
implementations that internally rearrange data based on EEW.

The problematic case is when source and destination overlap but have
different EEW, and the instruction is mask-undisturbed or
tail-undisturbed. This case does not have a real use in software, as
the elements being left undisturbed are a different EEW than the new
elements being written. This operation requires that the same
architectural register is treated as two different EEWs by one
instruction, which adds considerable complexity to implementations
that rearrange data internally based on EEW for no benefit.

Proposed addition is:

"when source and destination registers overlap and have mismatched
EEW, the instruction is mask- and tail-agnostic, regardless of vta and
mta".

The proposed solution defines this case as always agnostic so existing
implementations can continue to work as before (e.g., implementing
undisturbed when requested), while not burdening implementations that
rearrange data internally. The assertion is that no software would
rely on the undisturbed behavior in this case.

Note, this also applies to widening reductions.

IMPORTANT: The proposed fix does not break compatibility of
implementations adhering to the ratified v1.0 spec.

The proposal is to add this to the vector spec as a bug fix.

Krste


Philip Reames
 

Krste,

I am confused how this proposed change does not break compatibility with the 1.0 vector spec.  If there's a bit of code in the wild which can witness and rely upon the old behavior, doesn't the new restriction make that bit of assembly non-compliant with the proposed spec version?

I do accept that the proposed spec allows a subset of the legal assembly programs that the old one does.  My point of confusion is how that can claim to be compatible when there are assembly programs which are well defined under the old spec, and yet not under the new spec.   Your point below seems to address how hardware which implemented the v1.0 spec is compatible with the spec after the proposed change, but I don't see the same for software.  That is, this doesn't seem compatible with software written to the old spec.

Yours,
Philip Reames

On 11/1/22 23:51, Krste Asanovic wrote:
A few issues have been identified in corners of the vector spec.

The first change was an error of omission in not catching some cases
of source and destination register overlap that can not be sensibly
used by application software, but which add complexity for
implementations that internally rearrange data based on EEW.

The problematic case is when source and destination overlap but have
different EEW, and the instruction is mask-undisturbed or
tail-undisturbed. This case does not have a real use in software, as
the elements being left undisturbed are a different EEW than the new
elements being written. This operation requires that the same
architectural register is treated as two different EEWs by one
instruction, which adds considerable complexity to implementations
that rearrange data internally based on EEW for no benefit.

Proposed addition is:

"when source and destination registers overlap and have mismatched
EEW, the instruction is mask- and tail-agnostic, regardless of vta and
mta".

The proposed solution defines this case as always agnostic so existing
implementations can continue to work as before (e.g., implementing
undisturbed when requested), while not burdening implementations that
rearrange data internally. The assertion is that no software would
rely on the undisturbed behavior in this case.

Note, this also applies to widening reductions.

IMPORTANT: The proposed fix does not break compatibility of
implementations adhering to the ratified v1.0 spec.

The proposal is to add this to the vector spec as a bug fix.

Krste




Allen Baum
 

There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:

The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice: 

 The rationale for this  "clarification" explicitly says  this changes the cases that " can not be sensibly used by application software, ", which is the key.
So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior

I would agree that the wording 
    "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
is too strong. It seems like it would be more accurate to say 
    "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."

On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
Krste,

I am confused how this proposed change does not break compatibility with
the 1.0 vector spec.  If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?

I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does.  My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec.   Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software.  That is, this doesn't seem compatible with software
written to the old spec.

Yours,
Philip Reames

On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the vector spec.
>
> The first change was an error of omission in not catching some cases
> of source and destination register overlap that can not be sensibly
> used by application software, but which add complexity for
> implementations that internally rearrange data based on EEW.
>
> The problematic case is when source and destination overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed.  This case does not have a real use in software, as
> the elements being left undisturbed are a different EEW than the new
> elements being written.  This operation requires that the same
> architectural register is treated as two different EEWs by one
> instruction, which adds considerable complexity to implementations
> that rearrange data internally based on EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have mismatched
> EEW, the instruction is mask- and tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case as always agnostic so existing
> implementations can continue to work as before (e.g., implementing
> undisturbed when requested), while not burdening implementations that
> rearrange data internally.  The assertion is that no software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug fix.
>
> Krste
>
>
>
>
>






Philip Reames
 

Allen,

Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it.  Is that correct?

Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries?  I see a claim made here, but no evidence given.  I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion.  Has anyone else done that work in a form they can summarize and share?

Philip

On 11/22/22 12:19, Allen Baum wrote:

There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:

The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice: 

 The rationale for this  "clarification" explicitly says  this changes the cases that " can not be sensibly used by application software, ", which is the key.
So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior

I would agree that the wording 
    "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
is too strong. It seems like it would be more accurate to say 
    "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."

On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
Krste,

I am confused how this proposed change does not break compatibility with
the 1.0 vector spec.  If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?

I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does.  My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec.   Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software.  That is, this doesn't seem compatible with software
written to the old spec.

Yours,
Philip Reames

On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the vector spec.
>
> The first change was an error of omission in not catching some cases
> of source and destination register overlap that can not be sensibly
> used by application software, but which add complexity for
> implementations that internally rearrange data based on EEW.
>
> The problematic case is when source and destination overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed.  This case does not have a real use in software, as
> the elements being left undisturbed are a different EEW than the new
> elements being written.  This operation requires that the same
> architectural register is treated as two different EEWs by one
> instruction, which adds considerable complexity to implementations
> that rearrange data internally based on EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have mismatched
> EEW, the instruction is mask- and tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case as always agnostic so existing
> implementations can continue to work as before (e.g., implementing
> undisturbed when requested), while not burdening implementations that
> rearrange data internally.  The assertion is that no software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug fix.
>
> Krste
>
>
>
>
>






Krste Asanovic
 

Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.

I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
It is really not something software would do, so the effort would be wasted.

That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
It should have been caught earlier, but the fix is benign.

Krste

On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:

Allen,

Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it.  Is that correct?

Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries?  I see a claim made here, but no evidence given.  I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion.  Has anyone else done that work in a form they can summarize and share?

Philip

On 11/22/22 12:19, Allen Baum wrote:
There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:

The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice: 

 The rationale for this  "clarification" explicitly says  this changes the cases that " can not be sensibly used by application software, ", which is the key.
So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior

I would agree that the wording 
    "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
is too strong. It seems like it would be more accurate to say 
    "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."

On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
Krste,

I am confused how this proposed change does not break compatibility with
the 1.0 vector spec.  If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?

I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does.  My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec.   Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software.  That is, this doesn't seem compatible with software
written to the old spec.

Yours,
Philip Reames

On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the vector spec.
>
> The first change was an error of omission in not catching some cases
> of source and destination register overlap that can not be sensibly
> used by application software, but which add complexity for
> implementations that internally rearrange data based on EEW.
>
> The problematic case is when source and destination overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed.  This case does not have a real use in software, as
> the elements being left undisturbed are a different EEW than the new
> elements being written.  This operation requires that the same
> architectural register is treated as two different EEWs by one
> instruction, which adds considerable complexity to implementations
> that rearrange data internally based on EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have mismatched
> EEW, the instruction is mask- and tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case as always agnostic so existing
> implementations can continue to work as before (e.g., implementing
> undisturbed when requested), while not burdening implementations that
> rearrange data internally.  The assertion is that no software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug fix.
>
> Krste
>
>
>
>
>







Philip Reames
 

Krste,

Do you have any evidence to backup your claim that this isn't something software would do?  Or is this intuition?

I want to be clear here, I'm not arguing this claim is wrong.  I'm simply trying to understand what work has already been done here. 

Philip

On 11/22/22 13:14, Krste Asanovic wrote:

Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.

I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
It is really not something software would do, so the effort would be wasted.

That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
It should have been caught earlier, but the fix is benign.

Krste

On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:

Allen,

Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it.  Is that correct?

Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries?  I see a claim made here, but no evidence given.  I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion.  Has anyone else done that work in a form they can summarize and share?

Philip

On 11/22/22 12:19, Allen Baum wrote:
There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:

The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice: 

 The rationale for this  "clarification" explicitly says  this changes the cases that " can not be sensibly used by application software, ", which is the key.
So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior

I would agree that the wording 
    "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
is too strong. It seems like it would be more accurate to say 
    "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."

On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
Krste,

I am confused how this proposed change does not break compatibility with
the 1.0 vector spec.  If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?

I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does.  My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec.   Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software.  That is, this doesn't seem compatible with software
written to the old spec.

Yours,
Philip Reames

On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the vector spec.
>
> The first change was an error of omission in not catching some cases
> of source and destination register overlap that can not be sensibly
> used by application software, but which add complexity for
> implementations that internally rearrange data based on EEW.
>
> The problematic case is when source and destination overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed.  This case does not have a real use in software, as
> the elements being left undisturbed are a different EEW than the new
> elements being written.  This operation requires that the same
> architectural register is treated as two different EEWs by one
> instruction, which adds considerable complexity to implementations
> that rearrange data internally based on EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have mismatched
> EEW, the instruction is mask- and tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case as always agnostic so existing
> implementations can continue to work as before (e.g., implementing
> undisturbed when requested), while not burdening implementations that
> rearrange data internally.  The assertion is that no software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug fix.
>
> Krste
>
>
>
>
>







Krste Asanovic
 

This does not need intuition.
Again - please read the actual delta to understand why software would not ever do this.
It is very clear once you understand the capability that is being removed.

Krste

On Nov 22, 2022, at 2:41 PM, Philip Reames <preames@...> wrote:

Krste,

Do you have any evidence to backup your claim that this isn't something software would do?  Or is this intuition?

I want to be clear here, I'm not arguing this claim is wrong.  I'm simply trying to understand what work has already been done here. 

Philip

On 11/22/22 13:14, Krste Asanovic wrote:
Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.

I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
It is really not something software would do, so the effort would be wasted.

That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
It should have been caught earlier, but the fix is benign.

Krste

On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:

Allen,

Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it.  Is that correct?

Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries?  I see a claim made here, but no evidence given.  I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion.  Has anyone else done that work in a form they can summarize and share?

Philip

On 11/22/22 12:19, Allen Baum wrote:
There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:

The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice: 

 The rationale for this  "clarification" explicitly says  this changes the cases that " can not be sensibly used by application software, ", which is the key.
So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior

I would agree that the wording 
    "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
is too strong. It seems like it would be more accurate to say 
    "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."

On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
Krste,

I am confused how this proposed change does not break compatibility with
the 1.0 vector spec.  If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?

I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does.  My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec.   Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software.  That is, this doesn't seem compatible with software
written to the old spec.

Yours,
Philip Reames

On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the vector spec.
>
> The first change was an error of omission in not catching some cases
> of source and destination register overlap that can not be sensibly
> used by application software, but which add complexity for
> implementations that internally rearrange data based on EEW.
>
> The problematic case is when source and destination overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed.  This case does not have a real use in software, as
> the elements being left undisturbed are a different EEW than the new
> elements being written.  This operation requires that the same
> architectural register is treated as two different EEWs by one
> instruction, which adds considerable complexity to implementations
> that rearrange data internally based on EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have mismatched
> EEW, the instruction is mask- and tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case as always agnostic so existing
> implementations can continue to work as before (e.g., implementing
> undisturbed when requested), while not burdening implementations that
> rearrange data internally.  The assertion is that no software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug fix.
>
> Krste
>
>
>
>
>








Philip Reames
 

Krste,

Since my input is clearly not welcome, I will stop here.

Philip

On 11/22/22 14:46, Krste Asanovic wrote:

This does not need intuition.
Again - please read the actual delta to understand why software would not ever do this.
It is very clear once you understand the capability that is being removed.

Krste

On Nov 22, 2022, at 2:41 PM, Philip Reames <preames@...> wrote:

Krste,

Do you have any evidence to backup your claim that this isn't something software would do?  Or is this intuition?

I want to be clear here, I'm not arguing this claim is wrong.  I'm simply trying to understand what work has already been done here. 

Philip

On 11/22/22 13:14, Krste Asanovic wrote:
Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.

I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
It is really not something software would do, so the effort would be wasted.

That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
It should have been caught earlier, but the fix is benign.

Krste

On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:

Allen,

Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it.  Is that correct?

Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries?  I see a claim made here, but no evidence given.  I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion.  Has anyone else done that work in a form they can summarize and share?

Philip

On 11/22/22 12:19, Allen Baum wrote:
There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:

The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice: 

 The rationale for this  "clarification" explicitly says  this changes the cases that " can not be sensibly used by application software, ", which is the key.
So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior

I would agree that the wording 
    "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
is too strong. It seems like it would be more accurate to say 
    "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."

On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
Krste,

I am confused how this proposed change does not break compatibility with
the 1.0 vector spec.  If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?

I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does.  My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec.   Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software.  That is, this doesn't seem compatible with software
written to the old spec.

Yours,
Philip Reames

On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the vector spec.
>
> The first change was an error of omission in not catching some cases
> of source and destination register overlap that can not be sensibly
> used by application software, but which add complexity for
> implementations that internally rearrange data based on EEW.
>
> The problematic case is when source and destination overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed.  This case does not have a real use in software, as
> the elements being left undisturbed are a different EEW than the new
> elements being written.  This operation requires that the same
> architectural register is treated as two different EEWs by one
> instruction, which adds considerable complexity to implementations
> that rearrange data internally based on EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have mismatched
> EEW, the instruction is mask- and tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case as always agnostic so existing
> implementations can continue to work as before (e.g., implementing
> undisturbed when requested), while not burdening implementations that
> rearrange data internally.  The assertion is that no software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug fix.
>
> Krste
>
>
>
>
>








Guy Lemieux
 

Changing the spec does not make implementations which honour the old
spec non-compliant. This is the first compatibility check.

However, changing the spec does make software that depends on that
behaviour of the old spec incompatible with implementations of the new
spec. In this sense, the specification change is not
backward-compatible with software.

There is not a lot of software out in the wild, so I don't think we
would be breaking very much.

First, I'd like to note there are two parts of the original register
that we are talking about:
(1) the non-written elements (left behind because the new EEW is smaller), and
(2) the tail (which would be ignored because of VL at the original EEW).

The problem with EEW-changing instructions where source/dest overlap
is in region (1). Normally, since EEW is smaller, this means the upper
1/2, 3/4, or 7/8 elements of the source vector register would be
non-written elements. (For the widening reductions, then there would
be even more non-written elements.)

However, I don't really like this proposal because a few things come to mind:

(a) This introduces an inconsistency in the spec, which programmers
have no reason to expect. The reason for the inconsistency is "to make
implementations easier", which programmers do not fully understand
(and therefore do not expect). From a programmer's perspective, there
is a mode setting that says undisturbed, so implementations should
honour it, not choose whichever mode is easier depending upon the
instruction. I don't like the idea of introducing such non-predictable
inconsistencies, as they tend to cause debugging nightmares.

(b) The undisturbed mode gives the potential to use this feature as a
software-managed data cache (to reduce data fetches, either for
performance, power, or because the data is volatile). Although the
lower element(s) may get clobbered, there may be value in preserving
the upper element(s) which is based on the exact same argument for
having the undisturbed mode in the first place with regular
(EEW-preserving) instructions.

(c) Programmers already have the ability to choose the agnostic mode
for performance, so it should not matter that the undisturbed mode of
these EEW-changing instructions might run more slowly to copy or
rearrange the data.

(d) This seems to be asking for permission to be excused from the
really hard part of the homework because it's hard (or slow).

An alternative way to handle this is to add a profile/platform
allowance to the spec that implementations which are always agnostic
for all instructions (probably for both tails and masks)? This would
simplify HPC-oriented implementations, and allow complete consistency
(aka architectural uniformity/predictability) within that
profile/platform. The biggest issue is how to handle the resulting
schism in software?

Thanks,
Guy

On Tue, Nov 22, 2022 at 1:14 PM Krste Asanovic <krste@...> wrote:

Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.

I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
It is really not something software would do, so the effort would be wasted.

That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
It should have been caught earlier, but the fix is benign.

Krste

On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:

Allen,

Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it. Is that correct?

Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries? I see a claim made here, but no evidence given. I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion. Has anyone else done that work in a form they can summarize and share?

Philip

On 11/22/22 12:19, Allen Baum wrote:

There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:

The following changes have been made since version 1.11, which, while not strictly backwards compatible,
are not anticipated to cause software portability problems in practice:

The rationale for this "clarification" explicitly says this changes the cases that " can not be sensibly used by application software, ", which is the key.
So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior

I would agree that the wording
"IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
is too strong. It seems like it would be more accurate to say
"IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."

On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:

Krste,

I am confused how this proposed change does not break compatibility with
the 1.0 vector spec. If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?

I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does. My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec. Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software. That is, this doesn't seem compatible with software
written to the old spec.

Yours,
Philip Reames

On 11/1/22 23:51, Krste Asanovic wrote:
A few issues have been identified in corners of the vector spec.

The first change was an error of omission in not catching some cases
of source and destination register overlap that can not be sensibly
used by application software, but which add complexity for
implementations that internally rearrange data based on EEW.

The problematic case is when source and destination overlap but have
different EEW, and the instruction is mask-undisturbed or
tail-undisturbed. This case does not have a real use in software, as
the elements being left undisturbed are a different EEW than the new
elements being written. This operation requires that the same
architectural register is treated as two different EEWs by one
instruction, which adds considerable complexity to implementations
that rearrange data internally based on EEW for no benefit.

Proposed addition is:

"when source and destination registers overlap and have mismatched
EEW, the instruction is mask- and tail-agnostic, regardless of vta and
mta".

The proposed solution defines this case as always agnostic so existing
implementations can continue to work as before (e.g., implementing
undisturbed when requested), while not burdening implementations that
rearrange data internally. The assertion is that no software would
rely on the undisturbed behavior in this case.

Note, this also applies to widening reductions.

IMPORTANT: The proposed fix does not break compatibility of
implementations adhering to the ratified v1.0 spec.

The proposal is to add this to the vector spec as a bug fix.

Krste








Bruce Hoult
 

>There is not a lot of software out in the wild, so I don't think we would be breaking very much.

Precisely zero, outside what people are running on emulators, yes? Or maybe in-house, There is zero RVV 1.0 hardware in the hands of end-users, and so zero software running on it.

I hear this is going to change very soon, but that's the situation right now, as I understand it.


On Wed, Nov 23, 2022 at 12:29 PM Guy Lemieux <guy.lemieux@...> wrote:
Changing the spec does not make implementations which honour the old
spec non-compliant. This is the first compatibility check.

However, changing the spec does make software that depends on that
behaviour of the old spec incompatible with implementations of the new
spec. In this sense, the specification change is not
backward-compatible with software.

There is not a lot of software out in the wild, so I don't think we
would be breaking very much.

First, I'd like to note there are two parts of the original register
that we are talking about:
(1) the non-written elements (left behind because the new EEW is smaller), and
(2) the tail (which would be ignored because of VL at the original EEW).

The problem with EEW-changing instructions where source/dest overlap
is in region (1). Normally, since EEW is smaller, this means the upper
1/2, 3/4, or 7/8 elements of the source vector register would be
non-written elements. (For the widening reductions, then there would
be even more non-written elements.)

However, I don't really like this proposal because a few things come to mind:

(a) This introduces an inconsistency in the spec, which programmers
have no reason to expect. The reason for the inconsistency is "to make
implementations easier", which programmers do not fully understand
(and therefore do not expect). From a programmer's perspective, there
is a mode setting that says undisturbed, so implementations should
honour it, not choose whichever mode is easier depending upon the
instruction. I don't like the idea of introducing such non-predictable
inconsistencies, as they tend to cause debugging nightmares.

(b) The undisturbed mode gives the potential to use this feature as a
software-managed data cache (to reduce data fetches, either for
performance, power, or because the data is volatile). Although the
lower element(s) may get clobbered, there may be value in preserving
the upper element(s) which is based on the exact same argument for
having the undisturbed mode in the first place with regular
(EEW-preserving) instructions.

(c) Programmers already have the ability to choose the agnostic mode
for performance, so it should not matter that the undisturbed mode of
these EEW-changing instructions might run more slowly to copy or
rearrange the data.

(d) This seems to be asking for permission to be excused from the
really hard part of the homework because it's hard (or slow).

An alternative way to handle this is to add a profile/platform
allowance to the spec that implementations which are always agnostic
for all instructions (probably for both tails and masks)? This would
simplify HPC-oriented implementations, and allow complete consistency
(aka architectural uniformity/predictability) within that
profile/platform. The biggest issue is how to handle the resulting
schism in software?

Thanks,
Guy


On Tue, Nov 22, 2022 at 1:14 PM Krste Asanovic <krste@...> wrote:
>
> Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
> Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.
>
> I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
> It is really not something software would do, so the effort would be wasted.
>
> That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
> It should have been caught earlier, but the fix is benign.
>
> Krste
>
> On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:
>
> Allen,
>
> Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it.  Is that correct?
>
> Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries?  I see a claim made here, but no evidence given.  I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion.  Has anyone else done that work in a form they can summarize and share?
>
> Philip
>
> On 11/22/22 12:19, Allen Baum wrote:
>
> There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:
>
> The following changes have been made since version 1.11, which, while not strictly backwards compatible,
> are not anticipated to cause software portability problems in practice:
>
>  The rationale for this  "clarification" explicitly says  this changes the cases that " can not be sensibly used by application software, ", which is the key.
> So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior
>
> I would agree that the wording
>     "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
> is too strong. It seems like it would be more accurate to say
>     "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."
>
> On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
>>
>> Krste,
>>
>> I am confused how this proposed change does not break compatibility with
>> the 1.0 vector spec.  If there's a bit of code in the wild which can
>> witness and rely upon the old behavior, doesn't the new restriction make
>> that bit of assembly non-compliant with the proposed spec version?
>>
>> I do accept that the proposed spec allows a subset of the legal assembly
>> programs that the old one does.  My point of confusion is how that can
>> claim to be compatible when there are assembly programs which are well
>> defined under the old spec, and yet not under the new spec.   Your point
>> below seems to address how hardware which implemented the v1.0 spec is
>> compatible with the spec after the proposed change, but I don't see the
>> same for software.  That is, this doesn't seem compatible with software
>> written to the old spec.
>>
>> Yours,
>> Philip Reames
>>
>> On 11/1/22 23:51, Krste Asanovic wrote:
>> > A few issues have been identified in corners of the vector spec.
>> >
>> > The first change was an error of omission in not catching some cases
>> > of source and destination register overlap that can not be sensibly
>> > used by application software, but which add complexity for
>> > implementations that internally rearrange data based on EEW.
>> >
>> > The problematic case is when source and destination overlap but have
>> > different EEW, and the instruction is mask-undisturbed or
>> > tail-undisturbed.  This case does not have a real use in software, as
>> > the elements being left undisturbed are a different EEW than the new
>> > elements being written.  This operation requires that the same
>> > architectural register is treated as two different EEWs by one
>> > instruction, which adds considerable complexity to implementations
>> > that rearrange data internally based on EEW for no benefit.
>> >
>> > Proposed addition is:
>> >
>> > "when source and destination registers overlap and have mismatched
>> > EEW, the instruction is mask- and tail-agnostic, regardless of vta and
>> > mta".
>> >
>> > The proposed solution defines this case as always agnostic so existing
>> > implementations can continue to work as before (e.g., implementing
>> > undisturbed when requested), while not burdening implementations that
>> > rearrange data internally.  The assertion is that no software would
>> > rely on the undisturbed behavior in this case.
>> >
>> > Note, this also applies to widening reductions.
>> >
>> > IMPORTANT: The proposed fix does not break compatibility of
>> > implementations adhering to the ratified v1.0 spec.
>> >
>> > The proposal is to add this to the vector spec as a bug fix.
>> >
>> > Krste
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>
>






Wei Wu (吴伟)
 

Hi all,

I'm not against the proposal. I do think we should be careful/prudent.

On Wed, Nov 23, 2022 at 8:56 AM Bruce Hoult <bruce@...> wrote:

There is not a lot of software out in the wild, so I don't think we would be breaking very much.
OpenCV, OpenJDK, MLIR, NCNN and several other open source projects
have already been ported to RVV 1.0. We should be careful.

Precisely zero, outside what people are running on emulators, yes? Or maybe in-house, There is zero RVV 1.0 hardware in the hands of end-users, and so zero software running on it.
We may still lack (enough) understanding of the progress of the RISC-V
ecosystem in China, Japan, and South Korea. There are many chip
startups in East Asia with the rapid realization of RVV as their main
competitive advantage.

Again, I'm not against the proposal. What I want to emphasize is that
we should be very careful about the consequences of changing ratified
spec. RISC-V is open standard, and we may have underestimated the
companies and users under the iceberg.

--
Best wishes,
Wei Wu (吴伟)


Nick Knight
 

I too was concerned about potential software issues when this thread began a few weeks ago. As a SW person at a HW company, who has been developing against churning specs for several years.. this is a sensitive subject. So, I did some research.

First, I grepped through all the RVV codes in SiFive's repos and upstream contributions (that I'm aware of). I could not find a single compatibility issue. (Plenty of destructive mixed-EEW examples, but none that rely on undisturbed behavior.)

Next, I asked some GCC and LLVM developers, who confirmed that neither of these compilers are yet able to leverage mixed-EEW src/dst overlap, regardless of mask/tail policy. So I think it highly unlikely to find incompatibilities in compiled codes.

Lastly, I asked around the water cooler: my colleagues and I were unable to dream up a convincing use case for the old behavior.

At this point, I am no longer concerned. I feel this is a strict improvement to RVV and I support it.

Best,
Nick Knight
Algorithms & Libraries, SiFive

On Tue, Nov 22, 2022 at 6:39 PM Wei Wu (吴伟) <lazyparser@...> wrote:
Hi all,

I'm not against the proposal. I do think we should be careful/prudent.

On Wed, Nov 23, 2022 at 8:56 AM Bruce Hoult <bruce@...> wrote:
>
> >There is not a lot of software out in the wild, so I don't think we would be breaking very much.

OpenCV, OpenJDK, MLIR, NCNN and several other open source projects
have already been ported to RVV 1.0. We should be careful.

> Precisely zero, outside what people are running on emulators, yes? Or maybe in-house, There is zero RVV 1.0 hardware in the hands of end-users, and so zero software running on it.

We may still lack (enough) understanding of the progress of the RISC-V
ecosystem in China, Japan, and South Korea. There are many chip
startups in East Asia with the rapid realization of RVV as their main
competitive advantage.

Again, I'm not against the proposal. What I want to emphasize is that
we should be very careful about the consequences of changing ratified
spec. RISC-V is open standard, and we may have underestimated the
companies and users under the iceberg.

--
Best wishes,
Wei Wu (吴伟)






Allen Baum
 

I'm going to have to eat my words (in tiny bites).
Krste is correct that implementations of the earlier spec version will remain spec compatible.
Architectural Compatibility tests would report both tail-undisturbed and tail-agnostic implementations to be so.

But, SW will not necessarily be compatible with the newer version; software written to not depend on a specific behavior will.
So compatibility must be defined carefully here.

As Guy, and others have said, there is little SW available at this time, and it is highly unlikely that any of it will be affected by this specific case.
But, as Guy points out, while the original behavior seems unusable, that doesn't mean that it couldn't be taken advantage of in a useful way.

So it is SW that needs to be cognizant of this, not so much the HW implementation.
Naming this a a profile requirement (i.e. specifying old or new behavior) is one way 
to ensure that SW won't be blindsided (however unlike that will really be)

From a selfish point of view, the existence of agnostic - specifically that the resulting value has a non-deterministic set of two possible answers - make our standard testing methodology difficult, but this isn't the worst of the non-deterministic cases.

On Tue, Nov 22, 2022 at 3:29 PM Guy Lemieux <guy.lemieux@...> wrote:
Changing the spec does not make implementations which honour the old
spec non-compliant. This is the first compatibility check.

However, changing the spec does make software that depends on that
behaviour of the old spec incompatible with implementations of the new
spec. In this sense, the specification change is not
backward-compatible with software.

There is not a lot of software out in the wild, so I don't think we
would be breaking very much.

First, I'd like to note there are two parts of the original register
that we are talking about:
(1) the non-written elements (left behind because the new EEW is smaller), and
(2) the tail (which would be ignored because of VL at the original EEW).

The problem with EEW-changing instructions where source/dest overlap
is in region (1). Normally, since EEW is smaller, this means the upper
1/2, 3/4, or 7/8 elements of the source vector register would be
non-written elements. (For the widening reductions, then there would
be even more non-written elements.)

However, I don't really like this proposal because a few things come to mind:

(a) This introduces an inconsistency in the spec, which programmers
have no reason to expect. The reason for the inconsistency is "to make
implementations easier", which programmers do not fully understand
(and therefore do not expect). From a programmer's perspective, there
is a mode setting that says undisturbed, so implementations should
honour it, not choose whichever mode is easier depending upon the
instruction. I don't like the idea of introducing such non-predictable
inconsistencies, as they tend to cause debugging nightmares.

(b) The undisturbed mode gives the potential to use this feature as a
software-managed data cache (to reduce data fetches, either for
performance, power, or because the data is volatile). Although the
lower element(s) may get clobbered, there may be value in preserving
the upper element(s) which is based on the exact same argument for
having the undisturbed mode in the first place with regular
(EEW-preserving) instructions.

(c) Programmers already have the ability to choose the agnostic mode
for performance, so it should not matter that the undisturbed mode of
these EEW-changing instructions might run more slowly to copy or
rearrange the data.

(d) This seems to be asking for permission to be excused from the
really hard part of the homework because it's hard (or slow).

An alternative way to handle this is to add a profile/platform
allowance to the spec that implementations which are always agnostic
for all instructions (probably for both tails and masks)? This would
simplify HPC-oriented implementations, and allow complete consistency
(aka architectural uniformity/predictability) within that
profile/platform. The biggest issue is how to handle the resulting
schism in software?

Thanks,
Guy


On Tue, Nov 22, 2022 at 1:14 PM Krste Asanovic <krste@...> wrote:
>
> Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
> Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.
>
> I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
> It is really not something software would do, so the effort would be wasted.
>
> That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
> It should have been caught earlier, but the fix is benign.
>
> Krste
>
> On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:
>
> Allen,
>
> Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it.  Is that correct?
>
> Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries?  I see a claim made here, but no evidence given.  I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion.  Has anyone else done that work in a form they can summarize and share?
>
> Philip
>
> On 11/22/22 12:19, Allen Baum wrote:
>
> There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:
>
> The following changes have been made since version 1.11, which, while not strictly backwards compatible,
> are not anticipated to cause software portability problems in practice:
>
>  The rationale for this  "clarification" explicitly says  this changes the cases that " can not be sensibly used by application software, ", which is the key.
> So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior
>
> I would agree that the wording
>     "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
> is too strong. It seems like it would be more accurate to say
>     "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."
>
> On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
>>
>> Krste,
>>
>> I am confused how this proposed change does not break compatibility with
>> the 1.0 vector spec.  If there's a bit of code in the wild which can
>> witness and rely upon the old behavior, doesn't the new restriction make
>> that bit of assembly non-compliant with the proposed spec version?
>>
>> I do accept that the proposed spec allows a subset of the legal assembly
>> programs that the old one does.  My point of confusion is how that can
>> claim to be compatible when there are assembly programs which are well
>> defined under the old spec, and yet not under the new spec.   Your point
>> below seems to address how hardware which implemented the v1.0 spec is
>> compatible with the spec after the proposed change, but I don't see the
>> same for software.  That is, this doesn't seem compatible with software
>> written to the old spec.
>>
>> Yours,
>> Philip Reames
>>
>> On 11/1/22 23:51, Krste Asanovic wrote:
>> > A few issues have been identified in corners of the vector spec.
>> >
>> > The first change was an error of omission in not catching some cases
>> > of source and destination register overlap that can not be sensibly
>> > used by application software, but which add complexity for
>> > implementations that internally rearrange data based on EEW.
>> >
>> > The problematic case is when source and destination overlap but have
>> > different EEW, and the instruction is mask-undisturbed or
>> > tail-undisturbed.  This case does not have a real use in software, as
>> > the elements being left undisturbed are a different EEW than the new
>> > elements being written.  This operation requires that the same
>> > architectural register is treated as two different EEWs by one
>> > instruction, which adds considerable complexity to implementations
>> > that rearrange data internally based on EEW for no benefit.
>> >
>> > Proposed addition is:
>> >
>> > "when source and destination registers overlap and have mismatched
>> > EEW, the instruction is mask- and tail-agnostic, regardless of vta and
>> > mta".
>> >
>> > The proposed solution defines this case as always agnostic so existing
>> > implementations can continue to work as before (e.g., implementing
>> > undisturbed when requested), while not burdening implementations that
>> > rearrange data internally.  The assertion is that no software would
>> > rely on the undisturbed behavior in this case.
>> >
>> > Note, this also applies to widening reductions.
>> >
>> > IMPORTANT: The proposed fix does not break compatibility of
>> > implementations adhering to the ratified v1.0 spec.
>> >
>> > The proposal is to add this to the vector spec as a bug fix.
>> >
>> > Krste
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>
>