Re: [RISC-V] [tech-cmo] [riscv-CMOs:master] reported: Can CMO extension support icache management? #github #risv #cmos
Thank you for this email Guy. It clarifies many things.
From: Mark Himelstein <markhimelstein@...>
Sent: Thursday, August 11, 2022 9:01 AM
To: tech-cmo@... Group Moderators <tech-cmo@...>; Guy Lemieux <guy.lemieux@...>; tech-privileged <tech-privileged@...>
Cc: Derek Williams <striker@...>; Andrew Waterman <andrew@...>; allen.baum@... <allen.baum@...>; Martin Maas <mmaas@...>; John Ingalls <john.ingalls@...>; David Kruckemyer <dkruckemyer@...>
Subject: [EXTERNAL] Re: [RISC-V] [tech-cmo] [riscv-CMOs:master] reported: Can CMO extension support icache management? #github #risv #CMOs
This Message Is From an External Sender
This message came from outside your organization.
On Thu, Aug 11, 2022 at 6:48 AM Guy Lemieux <guy.lemieux@...> wrote:
On Wed, Aug 10, 2022 at 10:39 PM Derek Williams <striker@...> wrote:
OK, I can believe you are asking for that instruction (or a variant -- see below), but you also say below you're going to use it in a way that really does replace what FENCE.I (and some IPIs and such) do... which is to bring the I-fetch side up to date with the D-side's latest values. I'm still don't think that works.
> I think what you're really trying to do is replace FENCE.I and you believe that if you just had an instruction that would blow away all the I-cache copies of a given address, that would be enough to get you there.
I'll skip over this para since I don't really understand FENCE.I and I am going to try really hard to never learn it 🙂.
> My point is that is NOT enough unless you make a whole mess of
Huh... ok. I'm not sure that it matters all that much if the D-side updates to create new instructions comes from store instructions from a HART or from an I/O device (incoherent or not) -- at least as far as the post I-cache buffer flushing goes. I don't think that matters at all.
There is no need to worry about the post i-cache pipes because the OS
This is we depart controlled flight. Just because the OS stops running the thread doesn't necessarily guarantee that there isn't something stale lurking in a loop cache or some other exotic structure down-wind of the I-cache that isn't necessarily cleared by the I-cache invalidation instruction. So, while I'm quite certain that your application might, I can dream up ones where that doesn't necessarily hold. <spoiler alert, I'm lying just a bit here. There is one subtle point in the JIT stuff in the j-extension that might make this case have to work out, but even there you need more than just the I-cache invalidate instruction and the point remains... but we hold that aside for the moment>
So, in the end, I think by hanging everything on the i-cahce invalidate instruction, you're inducing some exceptionally subtle additional requirements on the implementation of that i-cache invalidation instruction that are hard to even pin down or describe. I think you really have to has some equivalent of ISYNC/ISB/IMPORT.I to cleanly and architecturally close the hole in the post I-cache buffers.
So, yes, I think in the end you really do need to bite off the full J-extension if you want something that is architecture that works everywhere. Just using the I-cache invalidate might work in your application, but I don't think it closes generally.
I'm not asking for it, but I don't believe there is an instruction
I'm going to skip this paragraph over as well other than to say that's a different debate for another day and my initial reaction is those instructions are really hard to define cleanly as architecture, have limited usefulness, and are much better left to implementation specific instructions in the implementations that can actually justify needing them. They are not mainline architecture everyone needs.
> My question wasn't that, but was more along the lines of is this just an academic critique of the architecture as it exists now, or is there some real project that needs this defined right now.
OK... good. So I would suggest we stop trying to fasttrack anything. We're going to be done with this J group extension if it kills be before your 12- month timeline (and it might). I am perfectly happy for whomever from all of those groups to show up at the J meetings and provide there input/tell us what we have that doesn't fit their needs (but that worked fine for ARM and Power for 25+ years).
The problem, to be clear, is inadvertent execution of stale code
Yes, though arguably FENCE.I is an i-cache management instruction, just one none of us like to use.
One "workaround" to this problem is for the OS to never re-use a
I'm not at all a fan of hack like non-guarantees.
This problem is not the same as the one the J-group is attempting to solve.
Actually, I think it is. Since we disagree here, please explain the difference.
> If this is just an academic critique, I see no real reason to fast track an I-cache invalidate instruction on it's own, especially when that is not a complete solution to the problem and you'll be getting that soon (along with all the other necessary parts) with the J group proposal.
As I say above, I think you will need the J group extension in most of its totality to have a solution that closes in all cases and doesn't rely on subtle unstated requirements/assumptions. I would be loath to try and architect something that tries to do it all with just the I-cache invalidation instruction.
The J extension isn't that costly in complexity or performance.
> If you have something more than an academic critique here, please share that to the extent possible, but even if you do, the I-cache invalidate on it's own isn't enough to provide a full solution, so I'm still not sure we should be fast-tracking anything.
I agree that an instruction that invalidates the I-caches is well within the know state of the art to define. I just think defining that on its own is useless for the purposes we're trying to get to here.
> We also need to know the expectations and use cases for the others you've seen request it.
Thank you for this. It's 11:50pm though and I need to give up. I won't do a deeper dive on this until later.
A few discussion forums:
Entrenched position.. yes.. for 25 years across two major architectures (Power and ARM) for deeply held reasons and I will continue to repeat myself on that point.
I really don't think the use cases are different and I don't believe I'm failing to recognize your use case. Instead, I think you are looking at a specific "system design" point that gives you subtle characteristics, that aren't supported in every implementation, that would let you do what you're hoping to do. Totally fair if you want to set up your system design that way and document that and use your own instructions.
For me, good architecture means the architecture language prevents an implementer from building an implementation that breaks if they don't follow the architecture rules. I think the extension we're going to propose they can't break (it's held for 25+ years). However, I think I can build an implementation that would break the I-cache invalidate instruction only architecture. To the extent that is possible, I don't think that should ever be architecture.
To me, I'm worried about "good system design", and as you can see by
I don't have time right now to look at all that, and in the end you'll probably have point a bit more specifically to the places inside those web pages that support what you're trying to get across. But, would it shock me that people are a bit confused now? No. If this isn't clearly defined going in, it's very easy to confuse folks. And just because they may make an assumption about what an I-cache invalidate does, does not mean that that assumption is rational or should be pandered to.