---+ Expanded
From
vector meeting
last Friday: trimming, fault-on-first. I
realized that it is similar to the forms of SW visible
non-faulting speculative
loads some machines, especially VLIWs, have. However, instead of
delivering a
NaN or NaT, it is non-faulting except for vector element 0, where
it faults.
The NaT-ness is implied by trimmed vector length. It could be implied by a mask
showing which
vector operations had completed.
All such
SW
non-faulting loads need a "was this correct" operation, which
might
just be a faulting load and a comparison.
Software control flow must fall through such a check
operation, and through a
redo of the faulting load if
necessary. In scalar, non-faulting and faulting loads are
different
instructions, so there must be a branch.
The
RISC-V
Fault-on-first approach has
the
correctness check for non-faulting implied by redoing the
instruction. i.e. it is
its own non-faulting check. it
gets away with this because the trend
vector length indicates which parts were valid and not. forward
progress is
guaranteed by trapping on vector element zero, i.e. never allowing
a trim to
zero length. if a
non-faulting vector
approach was used instead of fault-on-first, it could return a
vector complete
mask, but to make forward progress it would have to guarantee that
at least one
vector element had completed.
David
Horner's
desire for fault-on-first that may have performed no operations at
all is
(1) reasonable IMHO (I
think I managed
to explain that the Krste), but (2) Would require some other
mechanism for
forward progress. E.g. instead of trapping on element zero, the
bitmask that I
described above. Which is almost certainly a bigger architectural
change than
RISC-V should make it this time.
Although
more and
more I am happier that I included such a completion bitmask in
newly every
vector instruction set that I've ever done. Particularly those
vector
instruction sets that were supposed to implement SIMT efficiently.
(I think of
SIMT as a programming model that is implemented on top of what
amounts to a
vector instruction set and microarchitecture. https://pharr.org/matt/papers/ispc_inpar_2012.pdf ). It would be unfortunate for
such an SIMT
program to lose work
completed after the
first fault.
MORAL: fault-on-first may be
suitable for vector
load that might speculate past the end of the vector - where the length is not known or inconvenient
when the vector
load instruction is started. Fault-on-first is
suboptimal for running SIMT on top of vectors. i.e. fault-on-first is the equivalent of precise
exceptions for
in order execution, and
for a single
thread executing vector instructions, whereas
completion mask allows
out of
order within a vector and/or vector length
threading.
IMHO an important realization I
made in that
meeting is that fault-on-first does not need to be just about
faulting. It is
totally fine to have the fault-on-first stuff return up to the first really long latency
cost miss, as long
as it always guarantees
that at least
vector element zero was complete. Because vector element zero
complete is what
guarantees forward progress.
Furthermore,
it is not even required that fault-on-first stop at the first
page-fault. An implementation could actually choose to actually
implement a page-fault that did copy-on-write or swapped in from
disk. but that would be visible to the operating system, not the
user program. However, such an OS implementation would have to
guarantee that it would not kill a process as a result of a true
permissions error page-fault. Or, if the virtual memory
architecture made the distinction between permissions faults and
the sorts of page-fault that is for disk swapping or copy-on-write
or copy on read, the OS does not need to be involved.
EVERYTHING
about fault-on-first is a microarchitecture security/information
leak channel and/or a virtualization hole. (Unless you only trim
only on true faults and not COW or COR or disk
swappage-faults). However, fault-on-first on any page-fault is
a much lower bandwidth information leak channel than is
fault-on-first on long latency cache misses. so a general purpose
system might choose to implement fault-on-first on any page-fault,
but might not want to implement fault-on-first on any cache miss.
However, there are some systems for which that sort of security
issue is not a concern. E.g. a data center or embedded system
where all of the CPUs are dedicated to a single problem. In which
case, if they can gain performance by doing fault-on-first on
particular long latency cache misses, power to them!
Interestingly,
although fault-on-first on long latency cache misses is a
high-bandwidth information leak, it is actually much less of a
virtualization hole than fault-on-first for page-faults. The
operating system or hypervisor has very little control over cache
misses. the OS and hypervisor have almost full control over
page-faults. The usual rule in security and virtualization is
that an application should not be able to detect that it has had
an "innocent" page-fault, such as COW or COR or disk swapping.
--
---
Sorry: Typos (Speech-Os?) Writing Errors <= Speech Recognition
<= Computeritis