Re: [RISC-V] [tech-cmo] [riscv-CMOs:master] reported: Can CMO extension support icache management? #github #risv #cmos


On Thu, Aug 11, 2022 at 6:48 AM Guy Lemieux <guy.lemieux@...> wrote:
On Wed, Aug 10, 2022 at 10:39 PM Derek Williams <striker@...> wrote:
> You say below you state you are asking for a way to guarantee that no I-caches have stale data (for a given block -- my words there). I don't think that's really what you're asking for.

Yes it's what I'm asking for.

> I think what you're really trying to do is replace FENCE.I and you believe that if you just had an instruction that would blow away all the I-cache copies of a given address, that would be enough to get you there.

No, we already have FENCE.I, and I understand how to use it. I
understand that in a non-coherent system, it may not do anything at
all to the i-cache (eg, if an IODMA channel replaces executable code
in memory, and those writes are not observable by the hart executing

  > My point is that is NOT enough unless you make a whole mess of
other assumptions about the system and somehow clean out the post
I-cache pipes of stale instructions which is very bad architecture. A
cache invalidate instruction is one leg at best of the three (or more)
legs you need to hold this three-legged stool up.

I'm not trying to run self-modifying code or JIT code. I'm trying to
load new code from a non-coherent I/O device, and I want to ensure
there are no other copies floating in i-caches anywhere. I will settle
for an instruction that removes a cache block from the local hart
i-cache, because there is no coherence mechanism. In a coherent
system, an instruction that also removes a cache block in other harts'
i-caches is fine, but I don't care about that use-case; I know others
do care, so it should be defined as part of the CBO.INVAL.I operation,
and this would make it consistent with other CBO.* instructions. IODMA
is likely to be non-coherent in either case.

There is no need to worry about the post i-cache pipes because the OS
can guarantee that the code in that thread has stopped running and
therefore has nothing in flight. However, it has no way of knowing the
state of the i-caches.

I'm not asking for it, but I don't believe there is an instruction
that flushes the entire i-cache of a hart. I also believe there is no
way to flush all i-caches of a system. I'm not sure if those would be
useful; they are not on my radar, and they would be very disruptive to

> My question wasn't that, but was more along the lines of is this just an academic critique of the architecture as it exists now, or is there some real project that needs this defined right now.

I am starting a new project around IODMA and coherence issues.
Although it is academic, it will exist physically (real logic on an
FPGA) and run real code. This is not so urgent that I need the
instruction yesterday, or even in 12 months, as I can always create my
own instruction. I do not normally do OS-level work, but this is where
such an instruction would be used the most, and consultation should be
made with maintainers of RobotOS (ROS2), Embedded Linux, FreeRTOS,
Zephyr, etc to see how they handle this problem.

The problem, to be clear, is inadvertent execution of stale code
because there is no i-cache coherence and there are no i-cache
management instructions.

One "workaround" to this problem is for the OS to never re-use a
physical address for new code until it has to wrap around. This is a
"lazy" way that hopes i-cache contents are eventually replaced on
their own. However, it is not a guarantee.

This problem is not the same as the one the J-group is attempting to solve.

> If this is just an academic critique, I see no real reason to fast track an I-cache invalidate instruction on it's own, especially when that is not a complete solution to the problem and you'll be getting that soon (along with all the other necessary parts) with the J group proposal.

I have no insight into what is coming with the J group proposal.
However, I don't think it should be required to adopt the entire J
extension to get this capability. The J group is concerned with
self-modifying / dynamic generation of code on the fly, which is a
different use case which may care about what is in the current
execution pipeline.

> If you have something more than an academic critique here, please share that to the extent possible, but even if you do, the I-cache invalidate on it's own isn't enough to provide a full solution, so I'm still not sure we should be fast-tracking anything.

My request for a fast-track wasn't due to a sense of urgency, but due
to a belief this is relatively simple and easy to define. I am not
attempting to fully define it at this point, but trying to see if
there is any support from others.

> We also need to know the expectations and use cases for the others you've seen request it.

I'll use Google to help us out, but I won't carefully read each link below.

A few discussion forums:

There are also several more, including the most recent one (a CMO
github issue, I believe) that spurred this conversation (but not this

You can also find code that expects to use icache flush instructions:
(this link assumes FENCE.I flushes the i-cache, which non-coherent
systems don't have to do, so it is technically incorrect)


> My concern is that the I-cache invalidate instruction isn't enough to do good architecture here, so it unhelpful to ratify that without the rest of the pieces.

You've stated this several times, and I think you have become
entrenched in your position. To you, it seems that "good architecture"
means worrying about flushing the pipeline. However, there are other
use-cases you are failing to recognize.

To me, I'm worried about "good system design", and as you can see by
the Linux link above it has a bug in its assumptions about i-cache
flushes. Instead, as the code links above show, we are already getting
incorrect software show up.


Join { to automatically receive all group messages.