Re: [RISC-V] [tech-cmo] Fault-on-first should be allowed to return randomly on non-faults (also, running SIMT code on vector ISA)
David Horner
You're incorrectly characterizing
FoF below. The FoF loads are not
intended for software to dynamically probe the microarch state to check for possible faults That is not what I am advocating.
(though it can be misused that
way). The
point is to support software vector-length speculation, where whether an access is really needed is not known ahead of time. That is not precisely the full use case.
Rather your intended use case is : When the application is assured that a constrained load can
succeed,
[ the system guarantees a termination condition for the
load exists
,that it is detectable from the data read up to and
including the end point,
and that all the data from the start point to the end
point is readable]
then FoF provides a convenient and expedited way to
advance through the load.
And if you define "not known ahead of time" to mean before
each successive load, then that time frameis not precisely true
either.
The load could be performed one unit at a time, and each time
the need would be known.
The unit requested could be of arbitrary length [successive
packets of ethernet data or crypto segments].
I'm not trying to be obtuse and oppositional.
The value of FoF is to avoid the complexities of such
tracking,
but if an EE were to reasonably guarantee that the data to be
loaded
can be speculatively read up to a page boundary, then FoF is
not needed,
nor does it necessarily provide any hard advantage over the
regular strided load.
[some implementations may detect such things as debug
breakpoints and not trigger them, but as far as the software is
concerned it has the speculative to-the-end-of-the-page
guarantee, thus it will be content even if the debugger is
annoyed]
The FoF loads are not
intended for software to dynamically probe the microarch state to check for possible faults (though it can be misused that way). The detection of microarch state is incidental to the
characterization I attribute to FoF.
And it is not only microarch state that can be revealed but
system and EE level state.
FoF fails in situations that are not covered by your use
case.
Specifically, what does the EE do when it detects a situation
that forward progress is not possible.
e.g. the data requested is not mapped into the process.
As I understand your use case the [standard] FoF load is
aborted and its process as well.
The "enhanced/dangerous" FoF load will be allowed vl=0 to
identify the "abort" case.
Consider this scenario:
A process requests the EE to maps into another process' [e.g.
child's] address space pages to scan,
and the asychronous [child] co-process does the scanning.
FoF return vl=0 is eminently suited to this use case.
It is certainly possible to add to the
handshaking/synchronization process the current end point of the
data
that would need to be checked as each page is processed.
This can be substantial overhead and delay.
It is certainly possible to ensure that each request
overreaches the natural page alignment.
However, as FoF allow the processor to reduce vl at any
point, it could continually reduce vl so that it is better
aligned to cache, anticipating that following request will be
optimized. The program will still work, and detect potential
page failures, but the false positives could be substantial and
even more costly and substantially variable across
implementations. [not to mention the EE thinking the process is
attempting to do side channel attack].
These use cases argue for vl=0 return. And as I mentioned
before, these use cases will motivate the EE to return vl=0,
even without the application using the "new/corrupted" FoF
encoding for vl=0 allowed.
On Tue, Oct 20, 2020 at 5:08
AM Krste Asanovic <krste@...>
wrote:
I believe I have shown practical uses above. The forward-progress guarantee must not add overhead to theI certainly agree. But when does returning vl=0 serve no useful purpose? this is difficult to describe, especially when code may have severalThere are different forward-progress guarantees. As I mentioned before separate encoding
will not provide a practical benefit.
Once the new encoding is introduced,
legacy processors will just have their
EE emulate it by allowing vl=0 return
under the same conditions and the
linkeditor will replace the new FoF with the old.
As mentioned before, if we think
outside the box of the "classic" use case,
there certainly are meaningful and
significant ways that applications can
handle EE level events (analogous to
divide by zero).
The default case is just such a
non-burdensome approach.
Check vl=0 if you are not guaranteed to
succeed.
Ignore vl=0 at your peril if you are
unsure (you could end up in an infinite loop).
Ignore vl=0 if you are guaranteed not
to read past valid memory.
Also, the guarantee would have to The spec will need to address this case
in any event, even if to say we do not recommend EE return
with vl=0.
The spec cannot mandate that EE not
return vl=0. Certification does not extend to runtime
constrained EEs.
Code needs to be aware that this can
happen.
The net is, I don't believe the
"prohibition" significantly simplifies the spec.
It may actually make it more
contentious.
You simplified integer divide over
other ISA that mandated a trap for divide by zero.
With this approach we mandate a trap
for FoF when vl=0 would be sufficient.
Where it is inevitable that EE will do
the sensible thing and
return vl=0; when forward progress
[within reasonable constraints] is not possible.
|
|