
Krste Asanovic
A few issues have been identified in corners of the vector spec.
The first change was an error of omission in not catching some cases of source and destination register overlap that can not be sensibly used by application software, but which add complexity for implementations that internally rearrange data based on EEW.
The problematic case is when source and destination overlap but have different EEW, and the instruction is mask-undisturbed or tail-undisturbed. This case does not have a real use in software, as the elements being left undisturbed are a different EEW than the new elements being written. This operation requires that the same architectural register is treated as two different EEWs by one instruction, which adds considerable complexity to implementations that rearrange data internally based on EEW for no benefit.
Proposed addition is:
"when source and destination registers overlap and have mismatched EEW, the instruction is mask- and tail-agnostic, regardless of vta and mta".
The proposed solution defines this case as always agnostic so existing implementations can continue to work as before (e.g., implementing undisturbed when requested), while not burdening implementations that rearrange data internally. The assertion is that no software would rely on the undisturbed behavior in this case.
Note, this also applies to widening reductions.
IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec.
The proposal is to add this to the vector spec as a bug fix.
Krste
|
|
Krste,
I am confused how this proposed change does not break compatibility with the 1.0 vector spec. If there's a bit of code in the wild which can witness and rely upon the old behavior, doesn't the new restriction make that bit of assembly non-compliant with the proposed spec version?
I do accept that the proposed spec allows a subset of the legal assembly programs that the old one does. My point of confusion is how that can claim to be compatible when there are assembly programs which are well defined under the old spec, and yet not under the new spec. Your point below seems to address how hardware which implemented the v1.0 spec is compatible with the spec after the proposed change, but I don't see the same for software. That is, this doesn't seem compatible with software written to the old spec.
Yours, Philip Reames
toggle quoted message
Show quoted text
On 11/1/22 23:51, Krste Asanovic wrote: A few issues have been identified in corners of the vector spec.
The first change was an error of omission in not catching some cases of source and destination register overlap that can not be sensibly used by application software, but which add complexity for implementations that internally rearrange data based on EEW.
The problematic case is when source and destination overlap but have different EEW, and the instruction is mask-undisturbed or tail-undisturbed. This case does not have a real use in software, as the elements being left undisturbed are a different EEW than the new elements being written. This operation requires that the same architectural register is treated as two different EEWs by one instruction, which adds considerable complexity to implementations that rearrange data internally based on EEW for no benefit.
Proposed addition is:
"when source and destination registers overlap and have mismatched EEW, the instruction is mask- and tail-agnostic, regardless of vta and mta".
The proposed solution defines this case as always agnostic so existing implementations can continue to work as before (e.g., implementing undisturbed when requested), while not burdening implementations that rearrange data internally. The assertion is that no software would rely on the undisturbed behavior in this case.
Note, this also applies to widening reductions.
IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec.
The proposal is to add this to the vector spec as a bug fix.
Krste
|
|

Allen Baum
There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface: The following changes have been made since version 1.11, which, while not strictly backwards
compatible, are not anticipated to cause software portability problems in practice:
The rationale for this "clarification" explicitly says this changes the cases that " can not be sensibly used by application software, ", which is the key. So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior
I would agree that the wording "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec." is too strong. It seems like it would be more accurate to say "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec." On Tue, Nov 22, 2022 at 10:55 AM Philip Reames < preames@...> wrote: Krste,
I am confused how this proposed change does not break compatibility with
the 1.0 vector spec. If there's a bit of code in the wild which can
witness and rely upon the old behavior, doesn't the new restriction make
that bit of assembly non-compliant with the proposed spec version?
I do accept that the proposed spec allows a subset of the legal assembly
programs that the old one does. My point of confusion is how that can
claim to be compatible when there are assembly programs which are well
defined under the old spec, and yet not under the new spec. Your point
below seems to address how hardware which implemented the v1.0 spec is
compatible with the spec after the proposed change, but I don't see the
same for software. That is, this doesn't seem compatible with software
written to the old spec.
Yours,
Philip Reames
On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the vector spec.
>
> The first change was an error of omission in not catching some cases
> of source and destination register overlap that can not be sensibly
> used by application software, but which add complexity for
> implementations that internally rearrange data based on EEW.
>
> The problematic case is when source and destination overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed. This case does not have a real use in software, as
> the elements being left undisturbed are a different EEW than the new
> elements being written. This operation requires that the same
> architectural register is treated as two different EEWs by one
> instruction, which adds considerable complexity to implementations
> that rearrange data internally based on EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have mismatched
> EEW, the instruction is mask- and tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case as always agnostic so existing
> implementations can continue to work as before (e.g., implementing
> undisturbed when requested), while not burdening implementations that
> rearrange data internally. The assertion is that no software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug fix.
>
> Krste
>
>
>
>
>
|
|
Allen,
Sounds like you agree that this isn't strictly compatible with
1.0, and we're now debating what to do about it. Is that correct?
Has there been any work done to assess whether the relevant bits
of assembly appear in existing binaries? I see a claim made here,
but no evidence given. I am neither agreeing or disagreeing with
the claim - I haven't done the work to form an opinion. Has
anyone else done that work in a form they can summarize and share?
Philip
On 11/22/22 12:19, Allen Baum wrote:
toggle quoted message
Show quoted text
There is some
precedence for this case, specifically in the priv spec
1.10->1.11 preface:
The
following changes have been made since version
1.11, which, while not strictly backwards
compatible,
are not
anticipated to cause software portability problems
in practice:
The
rationale for this "clarification" explicitly says this
changes the cases that " can not be sensibly used by
application software, ", which is the key.
So, the
assertion here is that it is highly unlikely that there is any
code in the wild that would take advantage of this "clarified"
behavior
I would agree
that the wording
"IMPORTANT: The proposed fix does not break compatibility of
implementations adhering to the ratified v1.0 spec."
is too
strong. It seems like it would be more accurate to say
"IMPORTANT: The proposed fix is unlikely to break software
compatibility of implementations adhering to the ratified v1.0
spec."
On Tue, Nov 22, 2022 at 10:55
AM Philip Reames < preames@...>
wrote:
Krste,
I am confused how this proposed change does not break
compatibility with
the 1.0 vector spec. If there's a bit of code in the wild
which can
witness and rely upon the old behavior, doesn't the new
restriction make
that bit of assembly non-compliant with the proposed spec
version?
I do accept that the proposed spec allows a subset of the
legal assembly
programs that the old one does. My point of confusion is how
that can
claim to be compatible when there are assembly programs which
are well
defined under the old spec, and yet not under the new spec.
Your point
below seems to address how hardware which implemented the v1.0
spec is
compatible with the spec after the proposed change, but I
don't see the
same for software. That is, this doesn't seem compatible with
software
written to the old spec.
Yours,
Philip Reames
On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the
vector spec.
>
> The first change was an error of omission in not catching
some cases
> of source and destination register overlap that can not
be sensibly
> used by application software, but which add complexity
for
> implementations that internally rearrange data based on
EEW.
>
> The problematic case is when source and destination
overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed. This case does not have a real use in
software, as
> the elements being left undisturbed are a different EEW
than the new
> elements being written. This operation requires that the
same
> architectural register is treated as two different EEWs
by one
> instruction, which adds considerable complexity to
implementations
> that rearrange data internally based on EEW for no
benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have
mismatched
> EEW, the instruction is mask- and tail-agnostic,
regardless of vta and
> mta".
>
> The proposed solution defines this case as always
agnostic so existing
> implementations can continue to work as before (e.g.,
implementing
> undisturbed when requested), while not burdening
implementations that
> rearrange data internally. The assertion is that no
software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility
of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug
fix.
>
> Krste
>
>
>
>
>
|
|

Krste Asanovic
Existing implementations of the ISA remain compatible - this text is correct and does not need to change. Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.
I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions. It is really not something software would do, so the effort would be wasted.
That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error. It should have been caught earlier, but the fix is benign.
Krste
toggle quoted message
Show quoted text
On Nov 22, 2022, at 1:01 PM, Philip Reames < preames@...> wrote:
Allen, Sounds like you agree that this isn't strictly compatible with
1.0, and we're now debating what to do about it. Is that correct?
Has there been any work done to assess whether the relevant bits
of assembly appear in existing binaries? I see a claim made here,
but no evidence given. I am neither agreeing or disagreeing with
the claim - I haven't done the work to form an opinion. Has
anyone else done that work in a form they can summarize and share? Philip
On 11/22/22 12:19, Allen Baum wrote:
There is some
precedence for this case, specifically in the priv spec
1.10->1.11 preface:
The
following changes have been made since version
1.11, which, while not strictly backwards
compatible,
are not
anticipated to cause software portability problems
in practice:
The
rationale for this "clarification" explicitly says this
changes the cases that " can not be sensibly used by
application software, ", which is the key.
So, the
assertion here is that it is highly unlikely that there is any
code in the wild that would take advantage of this "clarified"
behavior
I would agree
that the wording
"IMPORTANT: The proposed fix does not break compatibility of
implementations adhering to the ratified v1.0 spec."
is too
strong. It seems like it would be more accurate to say
"IMPORTANT: The proposed fix is unlikely to break software
compatibility of implementations adhering to the ratified v1.0
spec."
On Tue, Nov 22, 2022 at 10:55
AM Philip Reames < preames@...>
wrote:
Krste,
I am confused how this proposed change does not break
compatibility with
the 1.0 vector spec. If there's a bit of code in the wild
which can
witness and rely upon the old behavior, doesn't the new
restriction make
that bit of assembly non-compliant with the proposed spec
version?
I do accept that the proposed spec allows a subset of the
legal assembly
programs that the old one does. My point of confusion is how
that can
claim to be compatible when there are assembly programs which
are well
defined under the old spec, and yet not under the new spec.
Your point
below seems to address how hardware which implemented the v1.0
spec is
compatible with the spec after the proposed change, but I
don't see the
same for software. That is, this doesn't seem compatible with
software
written to the old spec.
Yours,
Philip Reames
On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in corners of the
vector spec.
>
> The first change was an error of omission in not catching
some cases
> of source and destination register overlap that can not
be sensibly
> used by application software, but which add complexity
for
> implementations that internally rearrange data based on
EEW.
>
> The problematic case is when source and destination
overlap but have
> different EEW, and the instruction is mask-undisturbed or
> tail-undisturbed. This case does not have a real use in
software, as
> the elements being left undisturbed are a different EEW
than the new
> elements being written. This operation requires that the
same
> architectural register is treated as two different EEWs
by one
> instruction, which adds considerable complexity to
implementations
> that rearrange data internally based on EEW for no
benefit.
>
> Proposed addition is:
>
> "when source and destination registers overlap and have
mismatched
> EEW, the instruction is mask- and tail-agnostic,
regardless of vta and
> mta".
>
> The proposed solution defines this case as always
agnostic so existing
> implementations can continue to work as before (e.g.,
implementing
> undisturbed when requested), while not burdening
implementations that
> rearrange data internally. The assertion is that no
software would
> rely on the undisturbed behavior in this case.
>
> Note, this also applies to widening reductions.
>
> IMPORTANT: The proposed fix does not break compatibility
of
> implementations adhering to the ratified v1.0 spec.
>
> The proposal is to add this to the vector spec as a bug
fix.
>
> Krste
>
>
>
>
>
|
|
Krste,
Do you have any evidence to backup your claim that this isn't
something software would do? Or is this intuition?
I want to be clear here, I'm not arguing this claim is wrong.
I'm simply trying to understand what work has already been done
here.
Philip
On 11/22/22 13:14, Krste Asanovic
wrote:
toggle quoted message
Show quoted text
Existing implementations of the ISA remain
compatible - this text is correct and does not need to change.
Yes, software could see the difference with the
change, but outside of verification suites, this is not going to
happen.
I’d ask folks to go and understand the actual case
that is now prohibited before proposing we search for it in
software or take other more drastic actions.
It is really not something software would do, so the
effort would be wasted.
That this case was missed when we were restricting
other forms of EEW-mismatch overlap was an error.
It should have been caught earlier, but the fix is
benign.
Krste
On Nov 22, 2022, at 1:01 PM, Philip Reames
< preames@...>
wrote:
Allen,
Sounds like you agree that this isn't
strictly compatible with 1.0, and we're now
debating what to do about it. Is that correct?
Has there been any work done to assess
whether the relevant bits of assembly appear in
existing binaries? I see a claim made here, but
no evidence given. I am neither agreeing or
disagreeing with the claim - I haven't done the
work to form an opinion. Has anyone else done
that work in a form they can summarize and share?
Philip
On 11/22/22 12:19,
Allen Baum wrote:
There is some
precedence for this case, specifically in the
priv spec 1.10->1.11 preface:
The following changes
have been made since version 1.11,
which, while not strictly
backwards compatible,
are not anticipated to
cause software portability
problems in practice:
The rationale for
this "clarification" explicitly says this
changes the cases that " can not be sensibly
used by application software, ", which is the
key.
So, the assertion here
is that it is highly unlikely that there is
any code in the wild that would take advantage
of this "clarified" behavior
I would agree that the
wording
"IMPORTANT: The
proposed fix does not break compatibility of
implementations adhering to the ratified v1.0
spec."
is too strong. It
seems like it would be more accurate to say
"IMPORTANT: The
proposed fix is unlikely to break software
compatibility of implementations adhering to
the ratified v1.0 spec."
On Tue, Nov
22, 2022 at 10:55 AM Philip Reames < preames@...>
wrote:
Krste,
I am confused how this proposed change does
not break compatibility with
the 1.0 vector spec. If there's a bit of code
in the wild which can
witness and rely upon the old behavior,
doesn't the new restriction make
that bit of assembly non-compliant with the
proposed spec version?
I do accept that the proposed spec allows a
subset of the legal assembly
programs that the old one does. My point of
confusion is how that can
claim to be compatible when there are assembly
programs which are well
defined under the old spec, and yet not under
the new spec. Your point
below seems to address how hardware which
implemented the v1.0 spec is
compatible with the spec after the proposed
change, but I don't see the
same for software. That is, this doesn't seem
compatible with software
written to the old spec.
Yours,
Philip Reames
On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in
corners of the vector spec.
>
> The first change was an error of omission
in not catching some cases
> of source and destination register
overlap that can not be sensibly
> used by application software, but which
add complexity for
> implementations that internally rearrange
data based on EEW.
>
> The problematic case is when source and
destination overlap but have
> different EEW, and the instruction is
mask-undisturbed or
> tail-undisturbed. This case does not
have a real use in software, as
> the elements being left undisturbed are a
different EEW than the new
> elements being written. This operation
requires that the same
> architectural register is treated as two
different EEWs by one
> instruction, which adds considerable
complexity to implementations
> that rearrange data internally based on
EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers
overlap and have mismatched
> EEW, the instruction is mask- and
tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case
as always agnostic so existing
> implementations can continue to work as
before (e.g., implementing
> undisturbed when requested), while not
burdening implementations that
> rearrange data internally. The assertion
is that no software would
> rely on the undisturbed behavior in this
case.
>
> Note, this also applies to widening
reductions.
>
> IMPORTANT: The proposed fix does not
break compatibility of
> implementations adhering to the ratified
v1.0 spec.
>
> The proposal is to add this to the vector
spec as a bug fix.
>
> Krste
>
>
>
>
>
|
|

Krste Asanovic
This does not need intuition. Again - please read the actual delta to understand why software would not ever do this. It is very clear once you understand the capability that is being removed.
toggle quoted message
Show quoted text
On Nov 22, 2022, at 2:41 PM, Philip Reames < preames@...> wrote:
Krste, Do you have any evidence to backup your claim that this isn't
something software would do? Or is this intuition? I want to be clear here, I'm not arguing this claim is wrong.
I'm simply trying to understand what work has already been done
here.
Philip
On 11/22/22 13:14, Krste Asanovic
wrote:
Existing implementations of the ISA remain
compatible - this text is correct and does not need to change.
Yes, software could see the difference with the
change, but outside of verification suites, this is not going to
happen.
I’d ask folks to go and understand the actual case
that is now prohibited before proposing we search for it in
software or take other more drastic actions.
It is really not something software would do, so the
effort would be wasted.
That this case was missed when we were restricting
other forms of EEW-mismatch overlap was an error.
It should have been caught earlier, but the fix is
benign.
Krste
On Nov 22, 2022, at 1:01 PM, Philip Reames
< preames@...>
wrote:
Allen, Sounds like you agree that this isn't
strictly compatible with 1.0, and we're now
debating what to do about it. Is that correct?
Has there been any work done to assess
whether the relevant bits of assembly appear in
existing binaries? I see a claim made here, but
no evidence given. I am neither agreeing or
disagreeing with the claim - I haven't done the
work to form an opinion. Has anyone else done
that work in a form they can summarize and share? Philip
On 11/22/22 12:19,
Allen Baum wrote:
There is some
precedence for this case, specifically in the
priv spec 1.10->1.11 preface:
The following changes
have been made since version 1.11,
which, while not strictly
backwards compatible,
are not anticipated to
cause software portability
problems in practice:
The rationale for
this "clarification" explicitly says this
changes the cases that " can not be sensibly
used by application software, ", which is the
key.
So, the assertion here
is that it is highly unlikely that there is
any code in the wild that would take advantage
of this "clarified" behavior
I would agree that the
wording
"IMPORTANT: The
proposed fix does not break compatibility of
implementations adhering to the ratified v1.0
spec."
is too strong. It
seems like it would be more accurate to say
"IMPORTANT: The
proposed fix is unlikely to break software
compatibility of implementations adhering to
the ratified v1.0 spec."
On Tue, Nov
22, 2022 at 10:55 AM Philip Reames < preames@...>
wrote:
Krste,
I am confused how this proposed change does
not break compatibility with
the 1.0 vector spec. If there's a bit of code
in the wild which can
witness and rely upon the old behavior,
doesn't the new restriction make
that bit of assembly non-compliant with the
proposed spec version?
I do accept that the proposed spec allows a
subset of the legal assembly
programs that the old one does. My point of
confusion is how that can
claim to be compatible when there are assembly
programs which are well
defined under the old spec, and yet not under
the new spec. Your point
below seems to address how hardware which
implemented the v1.0 spec is
compatible with the spec after the proposed
change, but I don't see the
same for software. That is, this doesn't seem
compatible with software
written to the old spec.
Yours,
Philip Reames
On 11/1/22 23:51, Krste Asanovic wrote:
> A few issues have been identified in
corners of the vector spec.
>
> The first change was an error of omission
in not catching some cases
> of source and destination register
overlap that can not be sensibly
> used by application software, but which
add complexity for
> implementations that internally rearrange
data based on EEW.
>
> The problematic case is when source and
destination overlap but have
> different EEW, and the instruction is
mask-undisturbed or
> tail-undisturbed. This case does not
have a real use in software, as
> the elements being left undisturbed are a
different EEW than the new
> elements being written. This operation
requires that the same
> architectural register is treated as two
different EEWs by one
> instruction, which adds considerable
complexity to implementations
> that rearrange data internally based on
EEW for no benefit.
>
> Proposed addition is:
>
> "when source and destination registers
overlap and have mismatched
> EEW, the instruction is mask- and
tail-agnostic, regardless of vta and
> mta".
>
> The proposed solution defines this case
as always agnostic so existing
> implementations can continue to work as
before (e.g., implementing
> undisturbed when requested), while not
burdening implementations that
> rearrange data internally. The assertion
is that no software would
> rely on the undisturbed behavior in this
case.
>
> Note, this also applies to widening
reductions.
>
> IMPORTANT: The proposed fix does not
break compatibility of
> implementations adhering to the ratified
v1.0 spec.
>
> The proposal is to add this to the vector
spec as a bug fix.
>
> Krste
>
>
>
>
>
|
|
Krste,
Since my input is clearly not welcome, I will stop here.
Philip
On 11/22/22 14:46, Krste Asanovic
wrote:
toggle quoted message
Show quoted text
This does not need intuition.
Again - please read the actual delta to understand why software
would not ever do this.
It is very clear once you understand the capability
that is being removed.
Krste
On Nov 22, 2022, at 2:41 PM,
Philip Reames < preames@...>
wrote:
Krste,
Do you have any evidence to
backup your claim that this isn't
something software would do? Or is
this intuition?
I want to be clear here, I'm
not arguing this claim is wrong. I'm
simply trying to understand what work
has already been done here.
Philip
On 11/22/22
13:14, Krste Asanovic wrote:
Existing implementations
of the ISA remain compatible - this
text is correct and does not need to
change.
Yes, software could see
the difference with the change, but
outside of verification suites, this
is not going to happen.
I’d ask folks to go and
understand the actual case that is
now prohibited before proposing we
search for it in software or take
other more drastic actions.
It is really not
something software would do, so the
effort would be wasted.
That this case was
missed when we were restricting
other forms of EEW-mismatch overlap
was an error.
It should have been
caught earlier, but the fix is
benign.
Krste
On Nov 22,
2022, at 1:01 PM, Philip
Reames < preames@...>
wrote:
Allen,
Sounds like
you agree that this
isn't strictly
compatible with 1.0,
and we're now debating
what to do about it.
Is that correct?
Has there
been any work done to
assess whether the
relevant bits of
assembly appear in
existing binaries? I
see a claim made here,
but no evidence
given. I am neither
agreeing or
disagreeing with the
claim - I haven't done
the work to form an
opinion. Has anyone
else done that work in
a form they can
summarize and share?
Philip
On
11/22/22 12:19, Allen
Baum wrote:
There is some precedence for this case,
specifically in
the priv spec
1.10->1.11
preface:
The following changes
have been made
since version
1.11, which,
while not
strictly
backwards
compatible,
are
not
anticipated to
cause software
portability
problems in
practice:
The rationale for this "clarification"
explicitly says
this changes the
cases that " can
not be sensibly
used by
application
software, ", which
is the key.
So, the assertion here is that it is highly
unlikely that
there is any code
in the wild that
would take
advantage of this
"clarified"
behavior
I would agree that the wording
"IMPORTANT: The proposed fix does not break
compatibility of
implementations
adhering to the
ratified v1.0
spec."
is too strong. It seems like it would be more
accurate to say
"IMPORTANT: The proposed fix is unlikely to
break software
compatibility of
implementations
adhering to the
ratified v1.0
spec."
On
Tue, Nov 22, 2022
at 10:55 AM Philip
Reames < preames@...>
wrote:
Krste,
I am confused how
this proposed
change does not
break
compatibility with
the 1.0 vector
spec. If there's
a bit of code in
the wild which can
witness and rely
upon the old
behavior, doesn't
the new
restriction make
that bit of
assembly
non-compliant with
the proposed spec
version?
I do accept that
the proposed spec
allows a subset of
the legal assembly
programs that the
old one does. My
point of confusion
is how that can
claim to be
compatible when
there are assembly
programs which are
well
defined under the
old spec, and yet
not under the new
spec. Your point
below seems to
address how
hardware which
implemented the
v1.0 spec is
compatible with
the spec after the
proposed change,
but I don't see
the
same for
software. That
is, this doesn't
seem compatible
with software
written to the old
spec.
Yours,
Philip Reames
On 11/1/22 23:51,
Krste Asanovic
wrote:
> A few issues
have been
identified in
corners of the
vector spec.
>
> The first
change was an
error of omission
in not catching
some cases
> of source and
destination
register overlap
that can not be
sensibly
> used by
application
software, but
which add
complexity for
>
implementations
that internally
rearrange data
based on EEW.
>
> The
problematic case
is when source and
destination
overlap but have
> different
EEW, and the
instruction is
mask-undisturbed
or
>
tail-undisturbed.
This case does not
have a real use in
software, as
> the elements
being left
undisturbed are a
different EEW than
the new
> elements
being written.
This operation
requires that the
same
> architectural
register is
treated as two
different EEWs by
one
> instruction,
which adds
considerable
complexity to
implementations
> that
rearrange data
internally based
on EEW for no
benefit.
>
> Proposed
addition is:
>
> "when source
and destination
registers overlap
and have
mismatched
> EEW, the
instruction is
mask- and
tail-agnostic,
regardless of vta
and
> mta".
>
> The proposed
solution defines
this case as
always agnostic so
existing
>
implementations
can continue to
work as before
(e.g.,
implementing
> undisturbed
when requested),
while not
burdening
implementations
that
> rearrange
data internally.
The assertion is
that no software
would
> rely on the
undisturbed
behavior in this
case.
>
> Note, this
also applies to
widening
reductions.
>
> IMPORTANT:
The proposed fix
does not break
compatibility of
>
implementations
adhering to the
ratified v1.0
spec.
>
> The proposal
is to add this to
the vector spec as
a bug fix.
>
> Krste
>
>
>
>
>
|
|
Changing the spec does not make implementations which honour the old spec non-compliant. This is the first compatibility check.
However, changing the spec does make software that depends on that behaviour of the old spec incompatible with implementations of the new spec. In this sense, the specification change is not backward-compatible with software.
There is not a lot of software out in the wild, so I don't think we would be breaking very much.
First, I'd like to note there are two parts of the original register that we are talking about: (1) the non-written elements (left behind because the new EEW is smaller), and (2) the tail (which would be ignored because of VL at the original EEW).
The problem with EEW-changing instructions where source/dest overlap is in region (1). Normally, since EEW is smaller, this means the upper 1/2, 3/4, or 7/8 elements of the source vector register would be non-written elements. (For the widening reductions, then there would be even more non-written elements.)
However, I don't really like this proposal because a few things come to mind:
(a) This introduces an inconsistency in the spec, which programmers have no reason to expect. The reason for the inconsistency is "to make implementations easier", which programmers do not fully understand (and therefore do not expect). From a programmer's perspective, there is a mode setting that says undisturbed, so implementations should honour it, not choose whichever mode is easier depending upon the instruction. I don't like the idea of introducing such non-predictable inconsistencies, as they tend to cause debugging nightmares.
(b) The undisturbed mode gives the potential to use this feature as a software-managed data cache (to reduce data fetches, either for performance, power, or because the data is volatile). Although the lower element(s) may get clobbered, there may be value in preserving the upper element(s) which is based on the exact same argument for having the undisturbed mode in the first place with regular (EEW-preserving) instructions.
(c) Programmers already have the ability to choose the agnostic mode for performance, so it should not matter that the undisturbed mode of these EEW-changing instructions might run more slowly to copy or rearrange the data.
(d) This seems to be asking for permission to be excused from the really hard part of the homework because it's hard (or slow).
An alternative way to handle this is to add a profile/platform allowance to the spec that implementations which are always agnostic for all instructions (probably for both tails and masks)? This would simplify HPC-oriented implementations, and allow complete consistency (aka architectural uniformity/predictability) within that profile/platform. The biggest issue is how to handle the resulting schism in software?
Thanks, Guy
toggle quoted message
Show quoted text
On Tue, Nov 22, 2022 at 1:14 PM Krste Asanovic <krste@...> wrote: Existing implementations of the ISA remain compatible - this text is correct and does not need to change. Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.
I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions. It is really not something software would do, so the effort would be wasted.
That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error. It should have been caught earlier, but the fix is benign.
Krste
On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:
Allen,
Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it. Is that correct?
Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries? I see a claim made here, but no evidence given. I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion. Has anyone else done that work in a form they can summarize and share?
Philip
On 11/22/22 12:19, Allen Baum wrote:
There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:
The following changes have been made since version 1.11, which, while not strictly backwards compatible, are not anticipated to cause software portability problems in practice:
The rationale for this "clarification" explicitly says this changes the cases that " can not be sensibly used by application software, ", which is the key. So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior
I would agree that the wording "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec." is too strong. It seems like it would be more accurate to say "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."
On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
Krste,
I am confused how this proposed change does not break compatibility with the 1.0 vector spec. If there's a bit of code in the wild which can witness and rely upon the old behavior, doesn't the new restriction make that bit of assembly non-compliant with the proposed spec version?
I do accept that the proposed spec allows a subset of the legal assembly programs that the old one does. My point of confusion is how that can claim to be compatible when there are assembly programs which are well defined under the old spec, and yet not under the new spec. Your point below seems to address how hardware which implemented the v1.0 spec is compatible with the spec after the proposed change, but I don't see the same for software. That is, this doesn't seem compatible with software written to the old spec.
Yours, Philip Reames
On 11/1/22 23:51, Krste Asanovic wrote:
A few issues have been identified in corners of the vector spec.
The first change was an error of omission in not catching some cases of source and destination register overlap that can not be sensibly used by application software, but which add complexity for implementations that internally rearrange data based on EEW.
The problematic case is when source and destination overlap but have different EEW, and the instruction is mask-undisturbed or tail-undisturbed. This case does not have a real use in software, as the elements being left undisturbed are a different EEW than the new elements being written. This operation requires that the same architectural register is treated as two different EEWs by one instruction, which adds considerable complexity to implementations that rearrange data internally based on EEW for no benefit.
Proposed addition is:
"when source and destination registers overlap and have mismatched EEW, the instruction is mask- and tail-agnostic, regardless of vta and mta".
The proposed solution defines this case as always agnostic so existing implementations can continue to work as before (e.g., implementing undisturbed when requested), while not burdening implementations that rearrange data internally. The assertion is that no software would rely on the undisturbed behavior in this case.
Note, this also applies to widening reductions.
IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec.
The proposal is to add this to the vector spec as a bug fix.
Krste
|
|

Bruce Hoult
>There is not a lot of software out in the wild, so I don't think we would be breaking very much.
Precisely zero, outside what people are running on emulators, yes? Or maybe in-house, There is zero RVV 1.0 hardware in the hands of end-users, and so zero software running on it.
I hear this is going to change very soon, but that's the situation right now, as I understand it.
toggle quoted message
Show quoted text
Changing the spec does not make implementations which honour the old
spec non-compliant. This is the first compatibility check.
However, changing the spec does make software that depends on that
behaviour of the old spec incompatible with implementations of the new
spec. In this sense, the specification change is not
backward-compatible with software.
There is not a lot of software out in the wild, so I don't think we
would be breaking very much.
First, I'd like to note there are two parts of the original register
that we are talking about:
(1) the non-written elements (left behind because the new EEW is smaller), and
(2) the tail (which would be ignored because of VL at the original EEW).
The problem with EEW-changing instructions where source/dest overlap
is in region (1). Normally, since EEW is smaller, this means the upper
1/2, 3/4, or 7/8 elements of the source vector register would be
non-written elements. (For the widening reductions, then there would
be even more non-written elements.)
However, I don't really like this proposal because a few things come to mind:
(a) This introduces an inconsistency in the spec, which programmers
have no reason to expect. The reason for the inconsistency is "to make
implementations easier", which programmers do not fully understand
(and therefore do not expect). From a programmer's perspective, there
is a mode setting that says undisturbed, so implementations should
honour it, not choose whichever mode is easier depending upon the
instruction. I don't like the idea of introducing such non-predictable
inconsistencies, as they tend to cause debugging nightmares.
(b) The undisturbed mode gives the potential to use this feature as a
software-managed data cache (to reduce data fetches, either for
performance, power, or because the data is volatile). Although the
lower element(s) may get clobbered, there may be value in preserving
the upper element(s) which is based on the exact same argument for
having the undisturbed mode in the first place with regular
(EEW-preserving) instructions.
(c) Programmers already have the ability to choose the agnostic mode
for performance, so it should not matter that the undisturbed mode of
these EEW-changing instructions might run more slowly to copy or
rearrange the data.
(d) This seems to be asking for permission to be excused from the
really hard part of the homework because it's hard (or slow).
An alternative way to handle this is to add a profile/platform
allowance to the spec that implementations which are always agnostic
for all instructions (probably for both tails and masks)? This would
simplify HPC-oriented implementations, and allow complete consistency
(aka architectural uniformity/predictability) within that
profile/platform. The biggest issue is how to handle the resulting
schism in software?
Thanks,
Guy
On Tue, Nov 22, 2022 at 1:14 PM Krste Asanovic <krste@...> wrote:
>
> Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
> Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.
>
> I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
> It is really not something software would do, so the effort would be wasted.
>
> That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
> It should have been caught earlier, but the fix is benign.
>
> Krste
>
> On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:
>
> Allen,
>
> Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it. Is that correct?
>
> Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries? I see a claim made here, but no evidence given. I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion. Has anyone else done that work in a form they can summarize and share?
>
> Philip
>
> On 11/22/22 12:19, Allen Baum wrote:
>
> There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:
>
> The following changes have been made since version 1.11, which, while not strictly backwards compatible,
> are not anticipated to cause software portability problems in practice:
>
> The rationale for this "clarification" explicitly says this changes the cases that " can not be sensibly used by application software, ", which is the key.
> So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior
>
> I would agree that the wording
> "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
> is too strong. It seems like it would be more accurate to say
> "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."
>
> On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
>>
>> Krste,
>>
>> I am confused how this proposed change does not break compatibility with
>> the 1.0 vector spec. If there's a bit of code in the wild which can
>> witness and rely upon the old behavior, doesn't the new restriction make
>> that bit of assembly non-compliant with the proposed spec version?
>>
>> I do accept that the proposed spec allows a subset of the legal assembly
>> programs that the old one does. My point of confusion is how that can
>> claim to be compatible when there are assembly programs which are well
>> defined under the old spec, and yet not under the new spec. Your point
>> below seems to address how hardware which implemented the v1.0 spec is
>> compatible with the spec after the proposed change, but I don't see the
>> same for software. That is, this doesn't seem compatible with software
>> written to the old spec.
>>
>> Yours,
>> Philip Reames
>>
>> On 11/1/22 23:51, Krste Asanovic wrote:
>> > A few issues have been identified in corners of the vector spec.
>> >
>> > The first change was an error of omission in not catching some cases
>> > of source and destination register overlap that can not be sensibly
>> > used by application software, but which add complexity for
>> > implementations that internally rearrange data based on EEW.
>> >
>> > The problematic case is when source and destination overlap but have
>> > different EEW, and the instruction is mask-undisturbed or
>> > tail-undisturbed. This case does not have a real use in software, as
>> > the elements being left undisturbed are a different EEW than the new
>> > elements being written. This operation requires that the same
>> > architectural register is treated as two different EEWs by one
>> > instruction, which adds considerable complexity to implementations
>> > that rearrange data internally based on EEW for no benefit.
>> >
>> > Proposed addition is:
>> >
>> > "when source and destination registers overlap and have mismatched
>> > EEW, the instruction is mask- and tail-agnostic, regardless of vta and
>> > mta".
>> >
>> > The proposed solution defines this case as always agnostic so existing
>> > implementations can continue to work as before (e.g., implementing
>> > undisturbed when requested), while not burdening implementations that
>> > rearrange data internally. The assertion is that no software would
>> > rely on the undisturbed behavior in this case.
>> >
>> > Note, this also applies to widening reductions.
>> >
>> > IMPORTANT: The proposed fix does not break compatibility of
>> > implementations adhering to the ratified v1.0 spec.
>> >
>> > The proposal is to add this to the vector spec as a bug fix.
>> >
>> > Krste
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>
>
|
|
Hi all, I'm not against the proposal. I do think we should be careful/prudent. On Wed, Nov 23, 2022 at 8:56 AM Bruce Hoult <bruce@...> wrote:
There is not a lot of software out in the wild, so I don't think we would be breaking very much.
OpenCV, OpenJDK, MLIR, NCNN and several other open source projects have already been ported to RVV 1.0. We should be careful. Precisely zero, outside what people are running on emulators, yes? Or maybe in-house, There is zero RVV 1.0 hardware in the hands of end-users, and so zero software running on it. We may still lack (enough) understanding of the progress of the RISC-V ecosystem in China, Japan, and South Korea. There are many chip startups in East Asia with the rapid realization of RVV as their main competitive advantage. Again, I'm not against the proposal. What I want to emphasize is that we should be very careful about the consequences of changing ratified spec. RISC-V is open standard, and we may have underestimated the companies and users under the iceberg. -- Best wishes, Wei Wu (吴伟)
|
|

Nick Knight
I too was concerned about potential software issues when this thread began a few weeks ago. As a SW person at a HW company, who has been developing against churning specs for several years.. this is a sensitive subject. So, I did some research.
First, I grepped through all the RVV codes in SiFive's repos and upstream contributions (that I'm aware of). I could not find a single compatibility issue. (Plenty of destructive mixed-EEW examples, but none that rely on undisturbed behavior.)
Next, I asked some GCC and LLVM developers, who confirmed that
neither of these compilers are yet able to leverage mixed-EEW src/dst
overlap, regardless of mask/tail policy. So I think it highly unlikely to
find incompatibilities in compiled codes.
Lastly, I asked around the water cooler: my colleagues and I were unable to dream up a convincing use case for the old behavior.
At this point, I am no longer concerned. I feel this is a strict improvement to RVV and I support it.
Best, Nick Knight Algorithms & Libraries, SiFive
toggle quoted message
Show quoted text
Hi all,
I'm not against the proposal. I do think we should be careful/prudent.
On Wed, Nov 23, 2022 at 8:56 AM Bruce Hoult <bruce@...> wrote:
>
> >There is not a lot of software out in the wild, so I don't think we would be breaking very much.
OpenCV, OpenJDK, MLIR, NCNN and several other open source projects
have already been ported to RVV 1.0. We should be careful.
> Precisely zero, outside what people are running on emulators, yes? Or maybe in-house, There is zero RVV 1.0 hardware in the hands of end-users, and so zero software running on it.
We may still lack (enough) understanding of the progress of the RISC-V
ecosystem in China, Japan, and South Korea. There are many chip
startups in East Asia with the rapid realization of RVV as their main
competitive advantage.
Again, I'm not against the proposal. What I want to emphasize is that
we should be very careful about the consequences of changing ratified
spec. RISC-V is open standard, and we may have underestimated the
companies and users under the iceberg.
--
Best wishes,
Wei Wu (吴伟)
|
|

Allen Baum
I'm going to have to eat my words (in tiny bites). Krste is correct that implementations of the earlier spec version will remain spec compatible. Architectural Compatibility tests would report both tail-undisturbed and tail-agnostic implementations to be so.
But, SW will not necessarily be compatible with the newer version; software written to not depend on a specific behavior will. So compatibility must be defined carefully here.
As Guy, and others have said, there is little SW available at this time, and it is highly unlikely that any of it will be affected by this specific case. But, as Guy points out, while the original behavior seems unusable, that doesn't mean that it couldn't be taken advantage of in a useful way.
So it is SW that needs to be cognizant of this, not so much the HW implementation. Naming this a a profile requirement (i.e. specifying old or new behavior) is one way to ensure that SW won't be blindsided (however unlike that will really be)
From a selfish point of view, the existence of agnostic - specifically that the resulting value has a non-deterministic set of two possible answers - make our standard testing methodology difficult, but this isn't the worst of the non-deterministic cases.
toggle quoted message
Show quoted text
Changing the spec does not make implementations which honour the old
spec non-compliant. This is the first compatibility check.
However, changing the spec does make software that depends on that
behaviour of the old spec incompatible with implementations of the new
spec. In this sense, the specification change is not
backward-compatible with software.
There is not a lot of software out in the wild, so I don't think we
would be breaking very much.
First, I'd like to note there are two parts of the original register
that we are talking about:
(1) the non-written elements (left behind because the new EEW is smaller), and
(2) the tail (which would be ignored because of VL at the original EEW).
The problem with EEW-changing instructions where source/dest overlap
is in region (1). Normally, since EEW is smaller, this means the upper
1/2, 3/4, or 7/8 elements of the source vector register would be
non-written elements. (For the widening reductions, then there would
be even more non-written elements.)
However, I don't really like this proposal because a few things come to mind:
(a) This introduces an inconsistency in the spec, which programmers
have no reason to expect. The reason for the inconsistency is "to make
implementations easier", which programmers do not fully understand
(and therefore do not expect). From a programmer's perspective, there
is a mode setting that says undisturbed, so implementations should
honour it, not choose whichever mode is easier depending upon the
instruction. I don't like the idea of introducing such non-predictable
inconsistencies, as they tend to cause debugging nightmares.
(b) The undisturbed mode gives the potential to use this feature as a
software-managed data cache (to reduce data fetches, either for
performance, power, or because the data is volatile). Although the
lower element(s) may get clobbered, there may be value in preserving
the upper element(s) which is based on the exact same argument for
having the undisturbed mode in the first place with regular
(EEW-preserving) instructions.
(c) Programmers already have the ability to choose the agnostic mode
for performance, so it should not matter that the undisturbed mode of
these EEW-changing instructions might run more slowly to copy or
rearrange the data.
(d) This seems to be asking for permission to be excused from the
really hard part of the homework because it's hard (or slow).
An alternative way to handle this is to add a profile/platform
allowance to the spec that implementations which are always agnostic
for all instructions (probably for both tails and masks)? This would
simplify HPC-oriented implementations, and allow complete consistency
(aka architectural uniformity/predictability) within that
profile/platform. The biggest issue is how to handle the resulting
schism in software?
Thanks,
Guy
On Tue, Nov 22, 2022 at 1:14 PM Krste Asanovic <krste@...> wrote:
>
> Existing implementations of the ISA remain compatible - this text is correct and does not need to change.
> Yes, software could see the difference with the change, but outside of verification suites, this is not going to happen.
>
> I’d ask folks to go and understand the actual case that is now prohibited before proposing we search for it in software or take other more drastic actions.
> It is really not something software would do, so the effort would be wasted.
>
> That this case was missed when we were restricting other forms of EEW-mismatch overlap was an error.
> It should have been caught earlier, but the fix is benign.
>
> Krste
>
> On Nov 22, 2022, at 1:01 PM, Philip Reames <preames@...> wrote:
>
> Allen,
>
> Sounds like you agree that this isn't strictly compatible with 1.0, and we're now debating what to do about it. Is that correct?
>
> Has there been any work done to assess whether the relevant bits of assembly appear in existing binaries? I see a claim made here, but no evidence given. I am neither agreeing or disagreeing with the claim - I haven't done the work to form an opinion. Has anyone else done that work in a form they can summarize and share?
>
> Philip
>
> On 11/22/22 12:19, Allen Baum wrote:
>
> There is some precedence for this case, specifically in the priv spec 1.10->1.11 preface:
>
> The following changes have been made since version 1.11, which, while not strictly backwards compatible,
> are not anticipated to cause software portability problems in practice:
>
> The rationale for this "clarification" explicitly says this changes the cases that " can not be sensibly used by application software, ", which is the key.
> So, the assertion here is that it is highly unlikely that there is any code in the wild that would take advantage of this "clarified" behavior
>
> I would agree that the wording
> "IMPORTANT: The proposed fix does not break compatibility of implementations adhering to the ratified v1.0 spec."
> is too strong. It seems like it would be more accurate to say
> "IMPORTANT: The proposed fix is unlikely to break software compatibility of implementations adhering to the ratified v1.0 spec."
>
> On Tue, Nov 22, 2022 at 10:55 AM Philip Reames <preames@...> wrote:
>>
>> Krste,
>>
>> I am confused how this proposed change does not break compatibility with
>> the 1.0 vector spec. If there's a bit of code in the wild which can
>> witness and rely upon the old behavior, doesn't the new restriction make
>> that bit of assembly non-compliant with the proposed spec version?
>>
>> I do accept that the proposed spec allows a subset of the legal assembly
>> programs that the old one does. My point of confusion is how that can
>> claim to be compatible when there are assembly programs which are well
>> defined under the old spec, and yet not under the new spec. Your point
>> below seems to address how hardware which implemented the v1.0 spec is
>> compatible with the spec after the proposed change, but I don't see the
>> same for software. That is, this doesn't seem compatible with software
>> written to the old spec.
>>
>> Yours,
>> Philip Reames
>>
>> On 11/1/22 23:51, Krste Asanovic wrote:
>> > A few issues have been identified in corners of the vector spec.
>> >
>> > The first change was an error of omission in not catching some cases
>> > of source and destination register overlap that can not be sensibly
>> > used by application software, but which add complexity for
>> > implementations that internally rearrange data based on EEW.
>> >
>> > The problematic case is when source and destination overlap but have
>> > different EEW, and the instruction is mask-undisturbed or
>> > tail-undisturbed. This case does not have a real use in software, as
>> > the elements being left undisturbed are a different EEW than the new
>> > elements being written. This operation requires that the same
>> > architectural register is treated as two different EEWs by one
>> > instruction, which adds considerable complexity to implementations
>> > that rearrange data internally based on EEW for no benefit.
>> >
>> > Proposed addition is:
>> >
>> > "when source and destination registers overlap and have mismatched
>> > EEW, the instruction is mask- and tail-agnostic, regardless of vta and
>> > mta".
>> >
>> > The proposed solution defines this case as always agnostic so existing
>> > implementations can continue to work as before (e.g., implementing
>> > undisturbed when requested), while not burdening implementations that
>> > rearrange data internally. The assertion is that no software would
>> > rely on the undisturbed behavior in this case.
>> >
>> > Note, this also applies to widening reductions.
>> >
>> > IMPORTANT: The proposed fix does not break compatibility of
>> > implementations adhering to the ratified v1.0 spec.
>> >
>> > The proposal is to add this to the vector spec as a bug fix.
>> >
>> > Krste
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>
>
|
|