MTIME update frequency
Ved Shanbhogue
I have a question about this requirement:
"The ACLINT MTIME update frequency (i.e. hardware clock) must be between 10 MHz and 100 MHz, and updates must be strictly monotonic." I do understand requiring a minimum frequency (10 MHz as stated) but I am not sure why an implementation should be considered non compliant if it has a MTIME frequency higher than 100 MHz. In a number of usages/systems having much higher resolution of time is important and few hundred instructions may execute before the MTIME tick advances with a 100 MHz time frequency. This requirement to not exceed 100 MHz seems to be limiting. Could the rationale for this restriction be shared? regards ved |
|
On 11/12/21 20:39, Vedvyas Shanbhogue wrote:
I have a question about this requirement:The resolution of the mtime register is defined as 10 ns. The 100 MHz value is irrelevant and should be deleted from the spec. (You could check if an mtime increment is needed at a 10 GHz rate. You still would end up incrementing mtime at a maximum average rate of 100 MHz because the resolution is 10 ns.) In a number of usages/systems having much higher resolution of time isThinking about side channel attacks like Spectre it might be desirable not to have a high resolution timer. Which applications do you have in mind where you need the time at higher resolution and not only the number of cycles in mcycle? Which resolution would these applications need within the foreseeable future? Best regards Heinrich Could the rationale for this restriction be shared? |
|
Greg Favor
On Fri, Nov 12, 2021 at 5:09 PM Heinrich Schuchardt <xypron.glpk@...> wrote: > "The ACLINT MTIME update frequency (i.e. hardware clock) must be between The intent of the wording is that the resolution is 10ns (as implied by the max update frequency of 100 MHz). But I agree that the current wording should change to just directly state a 10ns resolution. Greg |
|
Jonathan Behrens <behrensj@...>
On Fri, Nov 12, 2021 at 8:09 PM Heinrich Schuchardt via lists.riscv.org <xypron.glpk=gmx.de@...> wrote: Thinking about side channel attacks like Spectre it might be desirable 100 MHz is way more than sufficient for carrying out a Spectre attack. Web browsers limit clock resolutions to 1 kHz or even 500 Hz to disrupt those sorts of attacks (and apply other mitigations as well since they don't think that is enough on its own). Which applications do you have in mind where you need the time at higher Any kind of microbenchmarking benefits from an extremely high resolution time source. As I understand it, mcycle doesn't currently provide enough guarantees to be useful for that. Specifically, I don't see anything about mcycle updating at a constant rate in the face of frequency scaling or changes in power states. That said, the better option would probably be adding requirements to mcycle rather than trying to have mtime take its place. Which resolution would these applications need within the foreseeable I don't think I've ever heard anyone complain about clock_gettime having nanosecond accuracy on systems where that's true. Jonathan |
|
Ved Shanbhogue
On Sat, Nov 13, 2021 at 02:09:34AM +0100, Heinrich Schuchardt wrote:
I can understand definining a minimum resolution but I do not understand why it needs to be capped to 10ns. I do not think the specification should define a maximum resolution. (You could check if an mtime increment is needed at a 10 GHz rate. YouWhich specter version was most concerning and why is specter mitigated if the resolution is 100 MHz but not higher? I was looking for the rationale for the 100 MHz limit - so could the rationale be shared about why specter is not possible with 100 MHz time? Which applications do you have in mind where you need the time at higherCycles is not at a fixed frequency and is not useful by itself as a way for timestamping. Sorry, At this time I am not at liberty to reveal the exact application but having a nanosecond accuracy is important for this application. I am looking for the reasoning and rationale that is behind mandating 100 MHz as the maximum frequency. regards ved |
|
Ved Shanbhogue
On Fri, Nov 12, 2021 at 05:41:02PM -0800, Greg Favor wrote:
On Fri, Nov 12, 2021 at 5:09 PM Heinrich Schuchardt <xypron.glpk@...>But why mandate that it cannot be finer resolytion than 10ns. I can see a requirement to say minimum of 10ns but whats the rationalet to say must not be better than 10ns? regards ved |
|
Ved Shanbhogue
So do we agree that the platform specification must not treat a
implementation that has MTIME update frequency higher than 100 MHz as non-compliant. Removing this upper bound does not of course force any implementation to implement anything higher than 100 MHz. Please suggest how to request this update? regards ved On Fri, Nov 12, 2021 at 9:58 PM Vedvyas Shanbhogue via lists.riscv.org <ved=rivosinc.com@...> wrote:
|
|
Greg Favor
On Tue, Nov 16, 2021 at 9:17 AM Vedvyas Shanbhogue <ved@...> wrote: So do we agree that the platform specification must not treat a Yes. A spec update to allow higher update frequencies is in the works (along with the other changes that have been recently discussed and decided). Greg |
|
Jonathan Behrens <behrensj@...>
Adding more configuration options increases complexity. Under the
current draft, if software wants an interrupt 1ms in the future it can set mtimecmp to the value of mtime plus 100,000. If we make the resolution of
mtime vary between systems, then we have to do a bunch more
specification and implementation work to pipe that information around. Based on Greg's message it sounds like that may be happening, but I also see the appeal of just picking the extremely simple option that works well enough for everyone's case (even if it isn't some people's top pick) Jonathan On Tue, Nov 16, 2021 at 12:17 PM Vedvyas Shanbhogue via lists.riscv.org <ved=rivosinc.com@...> wrote: So do we agree that the platform specification must not treat a |
|
Ved Shanbhogue
On Tue, Nov 16, 2021 at 12:49:11PM -0500, Jonathan Behrens wrote:
Adding more configuration options increases complexity. Under the currentI hope I am reading the right current draft. The current draft states: "The ACLINT MTIME update frequency (i.e. hardware clock) must be between 10 MHz and 100 MHz, and updates must be strictly monotonic." So a value of 100,000 could mean a delay between 100ms to 1 ms. So per current draft it would be wrong for software to assume 100,000 implies 1ms. between systems, then we have to do a bunch more specification andAn enumeration of MTIME frequency is needed per current draft. I beleive the device-tree binding for RISC-V uses the timebase-frequency property under the cpus node. https://github.com/riscv-non-isa/riscv-device-tree-doc/blob/master/bindings/riscv/cpus.txt Is further specification needed? regards ved |
|
Jonathan Behrens <behrensj@...>
The draft says "Platform must support a default ACLINT MTIME counter resolution of 10ns" which I interpret to mean that 100,000 always corresponds to 1ms. The point of different frequencies just means that you can increase mtime by 1 every 10ns or by 2 every 20ns or by 10 every 100ns. Jonathan On Tue, Nov 16, 2021 at 1:10 PM Ved Shanbhogue <ved@...> wrote: On Tue, Nov 16, 2021 at 12:49:11PM -0500, Jonathan Behrens wrote: |
|
Ved Shanbhogue
On Tue, Nov 16, 2021 at 01:50:23PM -0500, Jonathan Behrens wrote:
The draft says "Platform must support a default ACLINT MTIME counterI see what you mean. I was sort of mixed up by the term "default" as not normative. Does the timebase frequency being enumerated not suffice for the platform to convert time to ticks? regards ved |
|
Greg Favor
On Tue, Nov 16, 2021 at 10:50 AM Jonathan Behrens <behrensj@...> wrote:
Yes. This gets to the heart of the difference between resolution and update frequency. For a given resolution one is free to update (with +1 increments) at an update frequency corresponding to the resolution (i.e. 10ns and 100 MHz), or update at a lower frequency (i.e. with +5 increments at 20 MHz). For that matter, one could do fractional updates at higher than 100 MHz (e.g. with +0.04 increments at 2.5 GHz, where the fractional part of 'time' does not appear in the architectural mtime and time registers). To Ved's last post, it is the timebase resolution (and not update frequency) that determines the conversion from time to ticks. So the question is whether there should be a fixed resolution so that platform-compliant software can simply do a fixed absolute time to mtime/time conversion, and conversely how much or little change to Linux would be required to support a discoverable variable conversion? Greg |
|
Ved Shanbhogue
On Tue, Nov 16, 2021 at 11:31:44AM -0800, Greg Favor wrote:
On Tue, Nov 16, 2021 at 10:50 AM Jonathan Behrens <behrensj@...> wrote:Linux discovers the timebase from device-tree and does not assume a fixed frequency: https://elixir.bootlin.com/linux/latest/source/arch/riscv/kernel/time.c#L14 regards ved |
|
Jonathan Behrens <behrensj@...>
Given that the device-tree mechanism is apparently already in place, it honestly probably wouldn't be a big deal to just formalize that and not require any particular mtime resolution. I still prefer the simplicity of always doing 10ns, but don't feel that strongly about it. Jonathan On Tue, Nov 16, 2021 at 3:13 PM Vedvyas Shanbhogue via lists.riscv.org <ved=rivosinc.com@...> wrote: On Tue, Nov 16, 2021 at 11:31:44AM -0800, Greg Favor wrote: |
|
Before we go ahead and change the MTIME resolution requirement in the platform spec, I would like to highlight following points (from past discussions) which led to mandating a fixed MTIME resolution in the platform spec:
Based on above points, I still think mandating fixed MTIME resolution is desirable for OS-A platforms and this has nothing to do with how timebase-frequency is discovered (i.e. DT/ACPI).
Regards, Anup
From: <tech-unixplatformspec@...> on behalf of Jonathan Behrens <behrensj@...>
Given that the device-tree mechanism is apparently already in place, it honestly probably wouldn't be a big deal to just formalize that and not require any particular mtime resolution. I still prefer the simplicity of always doing 10ns, but don't feel that strongly about it.
Jonathan
On Tue, Nov 16, 2021 at 3:13 PM Vedvyas Shanbhogue via
lists.riscv.org <ved=rivosinc.com@...> wrote:
|
|
Ved Shanbhogue
On Wed, Nov 17, 2021 at 04:37:21AM +0000, Anup Patel wrote:
Before we go ahead and change the MTIME resolution requirement in the platform spec, I would like to highlight following points (from past discussions) which led to mandating a fixed MTIME resolution in the platform spec:So synchronizing time between hart, dies or multiple sockets is an engineering problem. The architecture should not restrict the implementation to achieve the 1ns resolution. Synchronizing such counters even at much higher frequencies has been acheived in several implementations. I find Lamport's paper http://lamport.azurewebsites.net/pubs/time-clocks.pdf is a good reference on this topic of time, clocks and ordering of events in a distributed system. The goal of such synchronization would be for the the system to acheive the property that - if an event a occurs before event b then a should happen at an earlier time than b. If that needs bounding to 1 tick or bounding to the latency of a fastest hart-to-hart transfer should be an engineering problem. Once the resolution is determined I am not sure the upper bound on update frequency needs to be bounded.Of course its not very useful for an implementation have an update frequency higher than the resolution and implementations could pick the increments per update as the ratio between 1G and the implemented clock (instead of ration between 100M and implemented clock as in current draft). 2. It is common in enterprise clouds to migrate a Guest/VM across different hosts. Considering the diversity in RISC-V world, we have to support migration of Guest/VM from host A to host B where these hosts can be from different vendors. Now if host A and host B have different MTIME resolution then Guest/VM migration will not work and this problem also applies to ARM world. This is another reason why ARM SBSA mandates a fixed timer resolution. Although, ARM world standardized timer frequency quite late in the game but RISC-V world can avoid these problems by standardizing MTIME resolution quite early. Alternatively, there is also SBI para-virt call possible to detect change in MTIME resolution but this would mean additional code (along with barriers) in the path of reading time CSR (as an example, look at KVM para-virt clock used in x86 world).If we do want to fix the resolution then the 10ns is too coarse. I suggest we make it at least 1ns to address the requirements of systems many of us are building. With 10ns resolution it will not be competitive against ARM systems that as you noted have fixed resolution to 1ns or x86 where TSC operates at P0 frequency. regards ved |
|
Greg Favor
On Wed, Nov 17, 2021 at 5:09 AM Ved Shanbhogue <ved@...> wrote: So synchronizing time between hart, dies or multiple sockets is an engineering problem. The architecture should not restrict the implementation to achieve the 1ns resolution. Synchronizing such counters even at much higher frequencies has been acheived in several implementations. I agree that there are established time synchronization techniques, although where they are used today, they don't achieve or try to achieve 1ns accuracy. Taking a constrained example within a single die and if one avoids trying to synchronize time across all harts in the die by simply distributing time to all harts in a tightly balanced manner (so as to satisfy the synchronization requirement), doing even that to less than 1ns of accuracy can be challenging in the face of any async boundary crossings (especially if one has more than one crossing from mtime out to all the hart's time), in the face of dynamic power management (DVFS) of cores and non-core, and in the face of other little engineering details. Although not impossible. One technique some use in higher-end designs is to interpolate or up-sample from a timebase to a higher resolution and update rate - for where ns and sub-ns resolution is needed for certain purposes (without needing sub-1ns accuracy across harts). This avoids needing to do tight synchronization or distribution of the timebase itself to such high resolution and update rate. Greg |
|
Ved Shanbhogue
On Wed, Nov 17, 2021 at 09:15:55AM -0800, Greg Favor wrote:
On Wed, Nov 17, 2021 at 5:09 AM Ved Shanbhogue <ved@...> wrote:Agree. Standard protocols like IEEE 1588 may also be used to acheive fine synchronization with a distributed time. Having distributed but synchronized time may avoid needing to send a large time bus across the die and have to deal with issues you highlight such as async crossings, spread-spectrum, dvfs, etc.So synchronizing time between hart, dies or multiple sockets is an Besides harts, some systems would also want to ensure time synchronization between harts and accelerators and PCIe - e.g. support precision time measurement (PTM), time sensitive networking for automotive/industrial/financial apps, etc. regards ved |
|
Greg Favor
On Wed, Nov 17, 2021 at 9:49 AM Ved Shanbhogue <ved@...> wrote: Agree. Standard protocols like IEEE 1588 may also be used to acheive fine synchronization with a distributed time. Having distributed but synchronized time may avoid needing to send a large time bus across the die and have to deal with issues you highlight such as async crossings, spread-spectrum, dvfs, etc. Just to clarify what I was trying to say: Whether distributing a large time bus or a "smarter" small time bus - which are functionally equivalent (and many designs I'm aware of do the latter), distributing that across CDC's introduces the obvious timing uncertainty. If the receive side is running at 1 GHz or 2GHz, then right there appears 1ns to 0.5ns of time uncertainty/inaccuracy in a hart's final time value. Two CDC's (e.g. in larger scale many-core designs) doubles that. Leaving aside any CDC's, also keep in mind that even if the distribution of time to each hart is synchronous and "perfectly" balanced (i.e. the exact same number of clock cycles from mtime to each end point), ensuring that the clock skew between these potentially far apart end points is sub-1ns is impractical (especially in leading edge processes with long and highly variable wire delays). Even mesh-based clock distribution schemes won't achieve that. Greg |
|