MTIME update frequency


Greg Favor
 

On Mon, Nov 22, 2021 at 5:42 AM Darius Rad <darius@...> wrote:
> This also ensures a reasonable upper bound on the lack of resolution/accuracy in
> time CSR readings.

What is the rationale for mandating any specific time period in OS-A-ish
platforms?

For the reason stated above.  The proposal should impose no real burden on OS-A class implementations, while ensuring a reasonable upper bound on the lack of resolution/accuracy.  If anything the discussion from others on this thread is to implement finer tick periods.  And, as others have noted, other architectures also provide a relatively fine resolution "time" (meaning in the >10 MHz tick frequency).  Without trying to list umpteen individual use cases, I think it's fair to say that this is a generally useful - yet modest - requirement.

Greg


Darius Rad
 

On Sun, Nov 21, 2021 at 11:33:43AM -0800, Greg Favor wrote:

Coming back around to the OS-A platform specs, the distinction between
resolution and update frequency obviously needs to go away. Given the need
for flexibility across the range of OS-A compliant implementations,
mandating any one tick period will be good for some and unacceptable for
others. I suggest mandating a max tick period of 100 ns (corresponding to
a relatively low 10 MHz frequency for OS-A class designs). This allows
implementations to do anything from 100 ns down to 1 ns and even lower (if
they are able to directly satisfy the synchronization requirement). This
also ensures a reasonable upper bound on the lack of resolution/accuracy in
time CSR readings.
What is the rationale for mandating any specific time period in OS-A-ish
platforms?

The period can be determined by device tree and/or ACPI, at least one of
which is required for OS-A platforms, so the idea that a fixed period makes
things easier is debatable.

If the argument is that a fixed period is necessary for migration of
virtual machines using the hypervisor extension, then perhaps it should
only be a requirement when the hypervisor extension is also present.

As Anup (I believe) mentioned, mandating a specific period will make
non-compliant some existing platforms that would otherwise be compliant.
Maybe this isn't a strong factor, but it was my understanding that one goal
of the first iteration of the platform specifications was to cover, to the
extent possible, existing platforms (hence the legacy PLIC and 8250
requirements, as well).

It is also unfortunate that there is no opportunity for recommendations,
where this requirement could be phrased in such a way that it is not
required, but provides rationale to hopefully persuade platform
implementations to comply.

// darius


Ved Shanbhogue
 

On Sun, Nov 21, 2021 at 11:33:43AM -0800, Greg Favor wrote:
Coming back around to the OS-A platform specs, the distinction between
resolution and update frequency obviously needs to go away. Given the need
for flexibility across the range of OS-A compliant implementations,
mandating any one tick period will be good for some and unacceptable for
others. I suggest mandating a max tick period of 100 ns (corresponding to
a relatively low 10 MHz frequency for OS-A class designs). This allows
implementations to do anything from 100 ns down to 1 ns and even lower (if
they are able to directly satisfy the synchronization requirement). This
also ensures a reasonable upper bound on the lack of resolution/accuracy in
time CSR readings.
Thanks Greg. This would be helpful to not prevent an implementation to engineer a solution but does not force an implementation to do so. I think this makes sense.

P.S. As a personal aside, I find it next to impractical to distribute
and/or explicitly synchronize time truly to within ~1 ns in medium to high
core count implementations (particularly, but not only, because of physical
design and process considerations). (And that leaves aside
multi-die/socket implementations.) But the above doesn't stop anyone from
figuring out how to actually do that.
Totally agree.

regards
ved


Ved Shanbhogue
 

On Sun, Nov 21, 2021 at 11:34:04AM -0800, Greg Favor wrote:

Based on above points, I still think mandating fixed MTIME resolution is
desirable for OS-A platforms and this has nothing to do with how
timebase-frequency is discovered (i.e. DT/ACPI).
If we do want to fix the resolution then the 10ns is too coarse. I suggest
we make it at least 1ns to address the requirements of systems many of us
are building. With 10ns resolution it will not be competitive against ARM
systems that as you noted have fixed resolution to 1ns or x86 where TSC
operates at P0 frequency.
Note that ARM SBSA did not mandate a 1 GHz update rate; it only mandated a
1ns resolution - and explicitly acknowledged that updates may be at a lower
rate using counter scaling.
Absolutely. I was only suggesting a 1ns resolution and not a 1ns update rate.

Greg


Greg Favor
 

On Wed, Nov 17, 2021 at 2:30 PM Darius Rad <darius@...> wrote:
> CSRs need to have the consistent time at the observation point. The fastest way in most system to "observe" value in CSRs is through the cache or through memory. So the difference between the two CSRs should not be observable to the extent that the following test fails:
>
> Hart-X (sends a message to Hart-Y):
>     Read time
>     Write timestamp message to memory
> Hart-Y:
>     Read timestamp message from memory (gets the data from cache or memory)
>     Read time
>     compare timestamp to time (the timestamp should not be in future)
>
> If this test is passes then the fact that the CSR in Hart-Y is 2 ticks behind or ahead of Hart-X is not observable.

Literally, such a "loosely synchronized" implementation would not be compliant with RISC-V architecture and its tight synchronization requirement.  As Darius noted (below), that is just one way to observe the clocks and, most importantly, the arch spec requirement is not defined in terms of an observation litmus test like above.  The literal arch spec requirement is the actual requirement for arch compliance.

That is perhaps a useful test, but not the only way to observe the clocks,
nor is it how the accuracy is specified in the architecture.  The
architecture does not say the accuracy is with respect to what is
observable in another hart.  The time could, for example, be observed by
something external to both harts and with higher precision.

> Setting the granularity to 1 ns provides the path for RISC-V implementations
> that want the fine resolution to acheive the goal without penalizing
> implementations that do not.

As Anup noted early in this thread:  One could require a fixed 1ns resolution (like ARM SBSA), but for RISC-V one must also honor the architectural requirement that all time CSRs are synchronized to within 1 tick.  And for multi-socket (or multi-die) systems it becomes even more challenging (arguably impossible) to synchronize time to within 1ns resolution across an SMP domain.

Greg


Greg Favor
 

>Based on above points, I still think mandating fixed MTIME resolution is desirable for OS-A platforms and this has nothing to do with how timebase-frequency is discovered (i.e. DT/ACPI).
If we do want to fix the resolution then the 10ns is too coarse. I suggest we make it at least 1ns to address the requirements of systems many of us are building. With 10ns resolution it will not be competitive against ARM systems that as you noted have fixed resolution to 1ns or x86 where TSC operates at P0 frequency.

Note that ARM SBSA did not mandate a 1 GHz update rate; it only mandated a 1ns resolution - and explicitly acknowledged that updates may be at a lower rate using counter scaling.

Greg


Greg Favor
 

I took the opportunity to raise this with Andrew and Krste (who authored the text in question) as to the intended and proper reading of the following arch spec text:

The RDTIME pseudoinstruction reads the low XLEN bits of the time CSR, which counts wall-clock real time that has passed from an arbitrary start time in the past. The underlying 64-bit counter should never overflow in practice. The execution environment should provide a means of determining the period of the real-time counter (seconds/tick). The period must be constant. The real-time clocks of all harts in a single user application should be synchronized to within one tick of the real-time clock. The environment should provide a means to determine the accuracy of the clock.

Before summarizing their response, I'll note that the presence of two discoverable properties (tick period and time accuracy) might suggest that these correspond to time resolution and update frequency (which implies inaccuracy equal to the period of the update rate).  And other readings of this text might suggest differently.

In short, the intended meaning and proper reading of the arch text is the most literal reading of the first several sentences, i.e. one tick equals one ulp, and the counter monotonically increments (i.e. by +1) once each tick period.  The resolution and update period are one and the same.  Time CSRs across harts must be synchronized to within one ulp or tick period.

Secondly, the last sentence's statement about accuracy is meant to refer to the bounds on deviation from the nominal frequency of the RTC source.

Krste and Andrew will incorporate some extra words into the current text to make fully clear the intended literal meaning of the text (and hence cut off any alternative not-so-literal readings of the text), as well as to clarify what the accuracy sentence refers to.


Coming back around to the OS-A platform specs, the distinction between resolution and update frequency obviously needs to go away.  Given the need for flexibility across the range of OS-A compliant implementations, mandating any one tick period will be good for some and unacceptable for others.  I suggest mandating a max tick period of 100 ns (corresponding to a relatively low 10 MHz frequency for OS-A class designs).  This allows implementations to do anything from 100 ns down to 1 ns and even lower (if they are able to directly satisfy the synchronization requirement).  This also ensures a reasonable upper bound on the lack of resolution/accuracy in time CSR readings.

Greg

P.S. As a personal aside, I find it next to impractical to distribute and/or explicitly synchronize time truly to within ~1 ns in medium to high core count implementations (particularly, but not only, because of physical design and process considerations).  (And that leaves aside multi-die/socket implementations.)  But the above doesn't stop anyone from figuring out how to actually do that.


Ved Shanbhogue
 

On Fri, Nov 19, 2021 at 03:46:19PM -0500, Darius Rad wrote:
On Fri, Nov 19, 2021 at 02:32:45PM -0600, Vedvyas Shanbhogue wrote:
On Fri, Nov 19, 2021 at 03:27:40PM -0500, Jonathan Behrens wrote:
On Fri, Nov 19, 2021 at 3:02 PM Ved Shanbhogue <ved@...> wrote:

On Fri, Nov 19, 2021 at 12:42:13PM -0500, Jonathan Behrens wrote:
On Fri, Nov 19, 2021 at 12:27 PM Greg Favor <gfavor@...>
wrote:

The current specification says that the MTIME values seen by two HARTs
to
be not apart by more than 1. It does say the MTIME must always
increment in
units of 1. I do not think the specification mandates incrementing by
one
on each clock tick. Presently it says the tick - the update frequency
- can
be 100 MHz or 10 MHz or somewhere in between. If the update frequency
is 10
MHz then the MTIME increment per clock must be 10. If the update
frequency
is 100 Mhz then MTIME increment per clock is 1. So is your concern is
that
an adder that adds 10 to MTIME per clock tick is hard?

I largely agree. While the clock "tick" is one ulp, an update can
increment time by, for example, 10 ticks. (The same is true in ARM
SBSA/BSA/etc.)
I think the argument is that you technically violate the ISA spec if you
have two cores and you increment mtime on the first core by +10 then a
fraction of a nanosecond later update the second core's mtime by +10.
During the tiny duration between the updates on the two cores mtime
differs
by 10, but the ISA requires mtime to differ by at most 1 at any given
instant in time.
Which section of the specication should I look at for that requirement?
The only requirement I could find is from ACLINT spec: "On a RISC-V
platform with multiple MTIMER devices residing on the same die, all must
satisfy the RISC-V architectural requirement that all the MTIME registers
with respect to each other, and all the per-HART time CSRs with respect to
each other, are synchronized to within one MTIME tick (or MTIME update
period)." And this requires synchronized to within one MTIME update
period - not the value of MTIME be not different by more than one.
The requirement seems to come from the Zicntr extension described in the
base (unprivileged) spec. I'll let somebody else say whether incrementing
mtime by +10 counts as 1 tick or 10 ticks.

The execution environment should provide a means of determining the period
of the real-time counter (seconds/tick). The period must be constant. The
real-time clocks of all harts in a single user application should be
synchronized to within one tick of the real-time clock. The environment
should provide a means to determine the accuracy of the clock.
This states synchronized to one tick of the real-time clock. It does not
state it has to not differ by 1 count.
The ISA specification also says:

The execution environment should provide a means of determining the
period of the real-time counter (seconds/tick).

If the period of the counter is in units of seconds/tick, then 1 count (1
LSB) is 1 tick.
I read them separately and I think the text carefully used "counter" and "clock" carefully for the first sentence and second sentence carefully. Placing the synchronization requirement on the clock and not the counter - which matches the ACLINT specification as well.

Hope authors of the specification should clarify if that was the intent.

regards
ved


Jonathan Behrens <behrensj@...>
 



On Fri, Nov 19, 2021 at 3:34 PM Vedvyas Shanbhogue via lists.riscv.org <ved=rivosinc.com@...> wrote:
On Fri, Nov 19, 2021 at 03:27:40PM -0500, Jonathan Behrens wrote:
>On Fri, Nov 19, 2021 at 3:02 PM Ved Shanbhogue <ved@...> wrote:
>
>> On Fri, Nov 19, 2021 at 12:42:13PM -0500, Jonathan Behrens wrote:
>> >On Fri, Nov 19, 2021 at 12:27 PM Greg Favor <gfavor@...>
>> wrote:
>> >
>> >> The current specification says that the MTIME values seen by two HARTs
>> to
>> >>> be not apart by more than 1. It does say the MTIME must always
>> increment in
>> >>> units of 1.  I do not think the specification mandates incrementing by
>> one
>> >>> on each clock tick. Presently it says the tick - the update frequency
>> - can
>> >>> be 100 MHz or 10 MHz or somewhere in between. If the update frequency
>> is 10
>> >>> MHz then the MTIME increment per clock must be 10. If the update
>> frequency
>> >>> is 100 Mhz then MTIME increment per clock is 1. So is your concern is
>> that
>> >>> an adder that adds 10 to MTIME per clock tick is hard?
>> >>
>> >>
>> >> I largely agree.  While the clock "tick" is one ulp, an update can
>> >> increment time by, for example, 10 ticks.  (The same is true in ARM
>> >> SBSA/BSA/etc.)
>> >>
>> >
>> >I think the argument is that you technically violate the ISA spec if you
>> >have two cores and you increment mtime on the first core by +10 then a
>> >fraction of a nanosecond later update the second core's mtime by +10.
>> >During the tiny duration between the updates on the two cores mtime
>> differs
>> >by 10, but the ISA requires mtime to differ by at most 1 at any given
>> >instant in time.
>> >
>> Which section of the specication should I look at for that requirement?
>> The only requirement I could find is from ACLINT spec: "On a RISC-V
>> platform with multiple MTIMER devices residing on the same die, all must
>> satisfy the RISC-V architectural requirement that all the MTIME registers
>> with respect to each other, and all the per-HART time CSRs with respect to
>> each other, are synchronized to within one MTIME tick (or MTIME update
>> period)." And this requires synchronized to within one MTIME update
>> period - not the value of MTIME be not different by more than one.
>>
>
>The requirement seems to come from the Zicntr extension described in the
>base (unprivileged) spec. I'll let somebody else say whether incrementing
>mtime by +10 counts as 1 tick or 10 ticks.
>
>The execution environment should provide a means of determining the period
>> of the real-time counter (seconds/tick). The period must be constant. The
>> real-time clocks of all harts in a single user application should be
>> synchronized to within one tick of the real-time clock. The environment
>> should provide a means to determine the accuracy of the clock.
>>
This states synchronized to one tick of the real-time clock.
It does not state it has to not differ by 1 count.

It seems that this comes down entirely to semantics of what a "tick" means in this context. If 1 tick = 1 count = 10ns then it violates the spec, whereas if 1 tick = 10 count = 100ns then it complies.


Darius Rad
 

On Fri, Nov 19, 2021 at 09:20:58AM -0800, Greg Favor wrote:
On Fri, Nov 19, 2021 at 7:36 AM Darius Rad <darius@...> wrote:

The current requirement effectively forces implementations to have a 100
MHz clock. If the resolution is changed to 1 ns, that becomes a 1 GHz
clock that implementations are forced to have.

The resolution requirement (whether =10ns or <=10ns) doesn't force a 100 or
100+ MHz clock. Only the >=10 MHz update rate forces a 10 MHz clock -
which is a pretty modest requirement for OS-A-class systems. (As a
side-note, most any DDR interface or high-speed I/O interface (e.g. USB,
SATA, PCIe/NVMe) will want significant clock frequencies supplied to them.)
Please read what I said. I said it *effectively* forces an implementation
to have a 100 MHz clock.

With the current wording in the draft specification, multihart
implementations must do one of the following:

1. Provide a 100 MHz (or greater) clock and update the time register one
tick at a time, within one tick per the ISA specification.

2. With a clock between 10 MHz and 100 MHz, update the time register more
than one tick at a time, and provide extremely strict synchronization
between all harts to ensure that all harts see exactly the same value
at all times.

3. Ignore the specification, either by:

a. Not guaranteeing that all harts see time within one tick (violating
the ISA specification), or

b. Using a different timebase for the time register (violating the
current draft platform specification).

It is my opinion that (2) is an unreasonably burdensome, and that
implementations will be forced to choose (1) if they would like to adhere
to the specification. However, I think (3) will happen in practice.


Either way, this violates
the Platform Policy that says platforms should not mandate performance, but
certainly the higher clock rate is more of a burden.
Platforms shouldn't mandate specific software execution performance
requirements, but a platform spec may - in the interest of ensuring
functionality doesn't have horrible performance - choose to mandate certain
qualitative things. For example, a more application domain-targeted
platform may choose to disallow trapping and emulating of all
misaligned data accesses. (The "Embedded" platform, as a rather basic very
"broad" platform, doesn't do this.)
Could you please refer to the portion of the Platform Policy that you
relied on for that interpretation? The Policy says "a Platform shall avoid
performance requirements or whether mechanisms have to be implemented in
hardware (i.e. if they can be emulated through trap-and-emulate).", which
seems to directly contradict what you are saying.

In contrast, mandating that a system has certain hardware capabilities,
like a RISC-V standard interrupt controller with at least X number of
priorities, or time with some minimum resolution and update rate, is
ensuring a form of functionality. For example, an Automotive or
Mobile platform may be unhappy if the update rate and resolution are only
10 ms / 100 Hz.

Greg


Darius Rad
 

On Fri, Nov 19, 2021 at 02:32:45PM -0600, Vedvyas Shanbhogue wrote:
On Fri, Nov 19, 2021 at 03:27:40PM -0500, Jonathan Behrens wrote:
On Fri, Nov 19, 2021 at 3:02 PM Ved Shanbhogue <ved@...> wrote:

On Fri, Nov 19, 2021 at 12:42:13PM -0500, Jonathan Behrens wrote:
On Fri, Nov 19, 2021 at 12:27 PM Greg Favor <gfavor@...>
wrote:

The current specification says that the MTIME values seen by two HARTs
to
be not apart by more than 1. It does say the MTIME must always
increment in
units of 1. I do not think the specification mandates incrementing by
one
on each clock tick. Presently it says the tick - the update frequency
- can
be 100 MHz or 10 MHz or somewhere in between. If the update frequency
is 10
MHz then the MTIME increment per clock must be 10. If the update
frequency
is 100 Mhz then MTIME increment per clock is 1. So is your concern is
that
an adder that adds 10 to MTIME per clock tick is hard?

I largely agree. While the clock "tick" is one ulp, an update can
increment time by, for example, 10 ticks. (The same is true in ARM
SBSA/BSA/etc.)
I think the argument is that you technically violate the ISA spec if you
have two cores and you increment mtime on the first core by +10 then a
fraction of a nanosecond later update the second core's mtime by +10.
During the tiny duration between the updates on the two cores mtime
differs
by 10, but the ISA requires mtime to differ by at most 1 at any given
instant in time.
Which section of the specication should I look at for that requirement?
The only requirement I could find is from ACLINT spec: "On a RISC-V
platform with multiple MTIMER devices residing on the same die, all must
satisfy the RISC-V architectural requirement that all the MTIME registers
with respect to each other, and all the per-HART time CSRs with respect to
each other, are synchronized to within one MTIME tick (or MTIME update
period)." And this requires synchronized to within one MTIME update
period - not the value of MTIME be not different by more than one.
The requirement seems to come from the Zicntr extension described in the
base (unprivileged) spec. I'll let somebody else say whether incrementing
mtime by +10 counts as 1 tick or 10 ticks.

The execution environment should provide a means of determining the period
of the real-time counter (seconds/tick). The period must be constant. The
real-time clocks of all harts in a single user application should be
synchronized to within one tick of the real-time clock. The environment
should provide a means to determine the accuracy of the clock.
This states synchronized to one tick of the real-time clock. It does not
state it has to not differ by 1 count.
The ISA specification also says:

The execution environment should provide a means of determining the
period of the real-time counter (seconds/tick).

If the period of the counter is in units of seconds/tick, then 1 count (1
LSB) is 1 tick.


Ved Shanbhogue
 

On Fri, Nov 19, 2021 at 03:27:40PM -0500, Jonathan Behrens wrote:
On Fri, Nov 19, 2021 at 3:02 PM Ved Shanbhogue <ved@...> wrote:

On Fri, Nov 19, 2021 at 12:42:13PM -0500, Jonathan Behrens wrote:
On Fri, Nov 19, 2021 at 12:27 PM Greg Favor <gfavor@...>
wrote:

The current specification says that the MTIME values seen by two HARTs
to
be not apart by more than 1. It does say the MTIME must always
increment in
units of 1. I do not think the specification mandates incrementing by
one
on each clock tick. Presently it says the tick - the update frequency
- can
be 100 MHz or 10 MHz or somewhere in between. If the update frequency
is 10
MHz then the MTIME increment per clock must be 10. If the update
frequency
is 100 Mhz then MTIME increment per clock is 1. So is your concern is
that
an adder that adds 10 to MTIME per clock tick is hard?

I largely agree. While the clock "tick" is one ulp, an update can
increment time by, for example, 10 ticks. (The same is true in ARM
SBSA/BSA/etc.)
I think the argument is that you technically violate the ISA spec if you
have two cores and you increment mtime on the first core by +10 then a
fraction of a nanosecond later update the second core's mtime by +10.
During the tiny duration between the updates on the two cores mtime
differs
by 10, but the ISA requires mtime to differ by at most 1 at any given
instant in time.
Which section of the specication should I look at for that requirement?
The only requirement I could find is from ACLINT spec: "On a RISC-V
platform with multiple MTIMER devices residing on the same die, all must
satisfy the RISC-V architectural requirement that all the MTIME registers
with respect to each other, and all the per-HART time CSRs with respect to
each other, are synchronized to within one MTIME tick (or MTIME update
period)." And this requires synchronized to within one MTIME update
period - not the value of MTIME be not different by more than one.
The requirement seems to come from the Zicntr extension described in the
base (unprivileged) spec. I'll let somebody else say whether incrementing
mtime by +10 counts as 1 tick or 10 ticks.

The execution environment should provide a means of determining the period
of the real-time counter (seconds/tick). The period must be constant. The
real-time clocks of all harts in a single user application should be
synchronized to within one tick of the real-time clock. The environment
should provide a means to determine the accuracy of the clock.
This states synchronized to one tick of the real-time clock. It does not state it has to not differ by 1 count.

regards
ved


Jonathan Behrens <behrensj@...>
 



On Fri, Nov 19, 2021 at 3:02 PM Ved Shanbhogue <ved@...> wrote:
On Fri, Nov 19, 2021 at 12:42:13PM -0500, Jonathan Behrens wrote:
>On Fri, Nov 19, 2021 at 12:27 PM Greg Favor <gfavor@...> wrote:
>
>> The current specification says that the MTIME values seen by two HARTs to
>>> be not apart by more than 1. It does say the MTIME must always increment in
>>> units of 1.  I do not think the specification mandates incrementing by one
>>> on each clock tick. Presently it says the tick - the update frequency - can
>>> be 100 MHz or 10 MHz or somewhere in between. If the update frequency is 10
>>> MHz then the MTIME increment per clock must be 10. If the update frequency
>>> is 100 Mhz then MTIME increment per clock is 1. So is your concern is that
>>> an adder that adds 10 to MTIME per clock tick is hard?
>>
>>
>> I largely agree.  While the clock "tick" is one ulp, an update can
>> increment time by, for example, 10 ticks.  (The same is true in ARM
>> SBSA/BSA/etc.)
>>
>
>I think the argument is that you technically violate the ISA spec if you
>have two cores and you increment mtime on the first core by +10 then a
>fraction of a nanosecond later update the second core's mtime by +10.
>During the tiny duration between the updates on the two cores mtime differs
>by 10, but the ISA requires mtime to differ by at most 1 at any given
>instant in time.
>
Which section of the specication should I look at for that requirement? The only requirement I could find is from ACLINT spec: "On a RISC-V platform with multiple MTIMER devices residing on the same die, all must satisfy the RISC-V architectural requirement that all the MTIME registers with respect to each other, and all the per-HART time CSRs with respect to each other, are synchronized to within one MTIME tick (or MTIME update period)." And this requires synchronized to within one MTIME update
period - not the value of MTIME be not different by more than one.

The requirement seems to come from the Zicntr extension described in the base (unprivileged) spec. I'll let somebody else say whether incrementing mtime by +10 counts as 1 tick or 10 ticks.

The execution environment should provide a means of determining the period of the real-time counter (seconds/tick). The period must be constant. The real-time clocks of all harts in a single user application should be synchronized to within one tick of the real-time clock. The environment should provide a means to determine the accuracy of the clock.





Ved Shanbhogue
 

On Fri, Nov 19, 2021 at 12:42:13PM -0500, Jonathan Behrens wrote:
On Fri, Nov 19, 2021 at 12:27 PM Greg Favor <gfavor@...> wrote:

The current specification says that the MTIME values seen by two HARTs to
be not apart by more than 1. It does say the MTIME must always increment in
units of 1. I do not think the specification mandates incrementing by one
on each clock tick. Presently it says the tick - the update frequency - can
be 100 MHz or 10 MHz or somewhere in between. If the update frequency is 10
MHz then the MTIME increment per clock must be 10. If the update frequency
is 100 Mhz then MTIME increment per clock is 1. So is your concern is that
an adder that adds 10 to MTIME per clock tick is hard?

I largely agree. While the clock "tick" is one ulp, an update can
increment time by, for example, 10 ticks. (The same is true in ARM
SBSA/BSA/etc.)
I think the argument is that you technically violate the ISA spec if you
have two cores and you increment mtime on the first core by +10 then a
fraction of a nanosecond later update the second core's mtime by +10.
During the tiny duration between the updates on the two cores mtime differs
by 10, but the ISA requires mtime to differ by at most 1 at any given
instant in time.
Which section of the specication should I look at for that requirement? The only requirement I could find is from ACLINT spec: "On a RISC-V platform with multiple MTIMER devices residing on the same die, all must satisfy the RISC-V architectural requirement that all the MTIME registers with respect to each other, and all the per-HART time CSRs with respect to each other, are synchronized to within one MTIME tick (or MTIME update period)." And this requires synchronized to within one MTIME update
period - not the value of MTIME be not different by more than one.

regards
ved


Jonathan Behrens <behrensj@...>
 

On Fri, Nov 19, 2021 at 12:27 PM Greg Favor <gfavor@...> wrote:
The current specification says that the MTIME values seen by two HARTs to be not apart by more than 1. It does say the MTIME must always increment in units of 1.  I do not think the specification mandates incrementing by one on each clock tick. Presently it says the tick - the update frequency - can be 100 MHz or 10 MHz or somewhere in between. If the update frequency is 10 MHz then the MTIME increment per clock must be 10. If the update frequency is 100 Mhz then MTIME increment per clock is 1. So is your concern is that an adder that adds 10 to MTIME per clock tick is hard?

I largely agree.  While the clock "tick" is one ulp, an update can increment time by, for example, 10 ticks.  (The same is true in ARM SBSA/BSA/etc.)

I think the argument is that you technically violate the ISA spec if you have two cores and you increment mtime on the first core by +10 then a fraction of a nanosecond later update the second core's mtime by +10. During the tiny duration between the updates on the two cores mtime differs by 10, but the ISA requires mtime to differ by at most 1 at any given instant in time.

Jonathan


Greg Favor
 

The current specification says that the MTIME values seen by two HARTs to be not apart by more than 1. It does say the MTIME must always increment in units of 1.  I do not think the specification mandates incrementing by one on each clock tick. Presently it says the tick - the update frequency - can be 100 MHz or 10 MHz or somewhere in between. If the update frequency is 10 MHz then the MTIME increment per clock must be 10. If the update frequency is 100 Mhz then MTIME increment per clock is 1. So is your concern is that an adder that adds 10 to MTIME per clock tick is hard?

I largely agree.  While the clock "tick" is one ulp, an update can increment time by, for example, 10 ticks.  (The same is true in ARM SBSA/BSA/etc.)

Greg


Greg Favor
 

On Fri, Nov 19, 2021 at 7:36 AM Darius Rad <darius@...> wrote:
The current requirement effectively forces implementations to have a 100
MHz clock.  If the resolution is changed to 1 ns, that becomes a 1 GHz
clock that implementations are forced to have. 

The resolution requirement (whether =10ns or <=10ns) doesn't force a 100 or 100+ MHz clock.  Only the >=10 MHz update rate forces a 10 MHz clock - which is a pretty modest requirement for OS-A-class systems.  (As a side-note, most any DDR interface or high-speed I/O interface (e.g. USB, SATA, PCIe/NVMe) will want significant clock frequencies supplied to them.)
 
Either way, this violates
the Platform Policy that says platforms should not mandate performance, but
certainly the higher clock rate is more of a burden.

Platforms shouldn't mandate specific software execution performance requirements, but a platform spec may - in the interest of ensuring functionality doesn't have horrible performance - choose to mandate certain qualitative things.  For example, a  more application domain-targeted platform may choose to disallow trapping and emulating of all misaligned data accesses.  (The "Embedded" platform, as a rather basic very "broad" platform, doesn't do this.)

In contrast, mandating that a system has certain hardware capabilities, like a RISC-V standard interrupt controller with at least X number of priorities, or time with some minimum resolution and update rate, is ensuring a form of functionality.  For example, an Automotive or Mobile platform may be unhappy if the update rate and resolution are only 10 ms / 100 Hz.

Greg


Ved Shanbhogue
 

On Fri, Nov 19, 2021 at 10:36:18AM -0500, Darius Rad wrote:
On Fri, Nov 19, 2021 at 08:59:32AM -0600, Vedvyas Shanbhogue wrote:
On Wed, Nov 17, 2021 at 04:37:12PM -0600, Vedvyas Shanbhogue via lists.riscv.org wrote:
On Wed, Nov 17, 2021 at 05:30:05PM -0500, Darius Rad wrote:
On Wed, Nov 17, 2021 at 03:50:17PM -0600, Vedvyas Shanbhogue wrote:
On Wed, Nov 17, 2021 at 04:36:10PM -0500, Darius Rad wrote:
On Wed, Nov 17, 2021 at 04:37:21AM +0000, Anup Patel wrote:
Before we go ahead and change the MTIME resolution requirement in the platform spec, I would like to highlight following points (from past discussions) which led to mandating a fixed MTIME resolution in the platform spec:


1. The Linux kernel creates a clock source on-top-of time CSR (mirror of MMIO MTIME) with timebase-frequency discovered from DT. The generic time management in Linux kernel requires nanoseconds granularity so each value read from clock source is converted to nanoseconds using a mult and shift (i.e. nanoseconds = (time_csr * multi) >> shift)). In other words, Linux kernel always uses integer operation to convert X resolution of time CSR to 1ns resolution and this conversion will have some round-off errors. We could have mandated a fixed 1ns resolution (just like ARM SBSA) but for RISC-V we also need to honour the architectural requirement of all time CSRs to be synchronized within 1 tick (i.e. one resolution period) and for multi-sockets (or multi-die) systems it becomes challenging to synchronize multiple MTIME counters within 1ns resolution. Considering this facts, it made sense to have fixed 10ns resolution for MTIME but the update frequency could be lower than 100MHz. (@Greg Favor<mailto:gfavor@...>, please add if I missed anything)
Considering the requirement that all time CSRs be synchronized to within 1
tick, setting a fixed resolution indirectly makes synchronization much more
difficult for implementations that have a lower update frequency. For such
implementations, since each update is more than 1 tick, it would be
necessary to ensure that all time CSRs always have exactly the same value,
which is considerably more difficult than within 1 tick.
So an implementation that supports 100MHz clock would need to update the mtime by 10 on each tick to meet the 1ns granularity. In current spec a implementation that supports 10 MHz clock would need to update the mtime by 10 on each tick to meet the 10ns resolution. I am not sure incrementing by 1 vs. incrementing by 10 makes it much harder as it was already required for a system that implements a 10MHz clock.
The specification says "synchronized to within one tick".

If the time CSR is incremented one tick at a time, then, for every update,
as long as one update propagates to all harts before the next update, this
requirement is met. Some harts may show time T and some may show T+1, but
those are within one tick, which is acceptable.

Now suppose there is a system that increments 10 ticks per update. If the
time value in any hart lags, where some show T and some show T+10, this is
not within one tick, which is not acceptable.

Thus, for systems that update one tick at a time, they have a full tick to
propagate updates to all harts. For systems that update multiple ticks at
a time, they have *no* time to propagate updates to all harts; all updates
must be instantaneous on *all* harts.
I agree. But that requirement is there already for a system that implements a 10 Mhz clock and has to suport 10ns resolution. If 10ns vs. 1ns is a really sticking point could we have two values supported by the platform specification - similar to the range of update frequencies supported. We can leave it to the end user whether they want to put these two classes of systems in one migration pool.
Did that address your concern? Should we update to supporting 1 ns resolution or not disallowing a 1 ns resolution?
No, it does not address my concern.

As you said, that requirement was already there, and my concern applies to
the requirement as is, although increasing the resolution will exacerbate
the problem.

When the specification mandates a specific update resolution, multihart
implementations will have to choose between (1) updating one tick at a
time, or (2) very stringent synchronization between all harts. I suspect
implementations will be forced to choose (1), even if they otherwise would
have preferred to have a different update frequency or clock. Or they will
simply ignore the synchronization requirement in the ISA specification,
which I think is a real possibility.

The current requirement effectively forces implementations to have a 100
MHz clock. If the resolution is changed to 1 ns, that becomes a 1 GHz
clock that implementations are forced to have. Either way, this violates
the Platform Policy that says platforms should not mandate performance, but
certainly the higher clock rate is more of a burden.
The current specification says that the MTIME values seen by two HARTs to be not apart by more than 1. It does say the MTIME must always increment in units of 1. I do not think the specification mandates incrementing by one on each clock tick. Presently it says the tick - the update frequency - can be 100 MHz or 10 MHz or somewhere in between. If the update frequency is 10 MHz then the MTIME increment per clock must be 10. If the update frequency is 100 Mhz then MTIME increment per clock is 1. So is your concern is that an adder that adds 10 to MTIME per clock tick is hard?

regards
ved


Darius Rad
 

On Fri, Nov 19, 2021 at 08:59:32AM -0600, Vedvyas Shanbhogue wrote:
On Wed, Nov 17, 2021 at 04:37:12PM -0600, Vedvyas Shanbhogue via lists.riscv.org wrote:
On Wed, Nov 17, 2021 at 05:30:05PM -0500, Darius Rad wrote:
On Wed, Nov 17, 2021 at 03:50:17PM -0600, Vedvyas Shanbhogue wrote:
On Wed, Nov 17, 2021 at 04:36:10PM -0500, Darius Rad wrote:
On Wed, Nov 17, 2021 at 04:37:21AM +0000, Anup Patel wrote:
Before we go ahead and change the MTIME resolution requirement in the platform spec, I would like to highlight following points (from past discussions) which led to mandating a fixed MTIME resolution in the platform spec:


1. The Linux kernel creates a clock source on-top-of time CSR (mirror of MMIO MTIME) with timebase-frequency discovered from DT. The generic time management in Linux kernel requires nanoseconds granularity so each value read from clock source is converted to nanoseconds using a mult and shift (i.e. nanoseconds = (time_csr * multi) >> shift)). In other words, Linux kernel always uses integer operation to convert X resolution of time CSR to 1ns resolution and this conversion will have some round-off errors. We could have mandated a fixed 1ns resolution (just like ARM SBSA) but for RISC-V we also need to honour the architectural requirement of all time CSRs to be synchronized within 1 tick (i.e. one resolution period) and for multi-sockets (or multi-die) systems it becomes challenging to synchronize multiple MTIME counters within 1ns resolution. Considering this facts, it made sense to have fixed 10ns resolution for MTIME but the update frequency could be lower than 100MHz. (@Greg Favor<mailto:gfavor@...>, please add if I missed anything)
Considering the requirement that all time CSRs be synchronized to within 1
tick, setting a fixed resolution indirectly makes synchronization much more
difficult for implementations that have a lower update frequency. For such
implementations, since each update is more than 1 tick, it would be
necessary to ensure that all time CSRs always have exactly the same value,
which is considerably more difficult than within 1 tick.
So an implementation that supports 100MHz clock would need to update the mtime by 10 on each tick to meet the 1ns granularity. In current spec a implementation that supports 10 MHz clock would need to update the mtime by 10 on each tick to meet the 10ns resolution. I am not sure incrementing by 1 vs. incrementing by 10 makes it much harder as it was already required for a system that implements a 10MHz clock.
The specification says "synchronized to within one tick".

If the time CSR is incremented one tick at a time, then, for every update,
as long as one update propagates to all harts before the next update, this
requirement is met. Some harts may show time T and some may show T+1, but
those are within one tick, which is acceptable.

Now suppose there is a system that increments 10 ticks per update. If the
time value in any hart lags, where some show T and some show T+10, this is
not within one tick, which is not acceptable.

Thus, for systems that update one tick at a time, they have a full tick to
propagate updates to all harts. For systems that update multiple ticks at
a time, they have *no* time to propagate updates to all harts; all updates
must be instantaneous on *all* harts.
I agree. But that requirement is there already for a system that implements a 10 Mhz clock and has to suport 10ns resolution. If 10ns vs. 1ns is a really sticking point could we have two values supported by the platform specification - similar to the range of update frequencies supported. We can leave it to the end user whether they want to put these two classes of systems in one migration pool.
Did that address your concern? Should we update to supporting 1 ns resolution or not disallowing a 1 ns resolution?
No, it does not address my concern.

As you said, that requirement was already there, and my concern applies to
the requirement as is, although increasing the resolution will exacerbate
the problem.

When the specification mandates a specific update resolution, multihart
implementations will have to choose between (1) updating one tick at a
time, or (2) very stringent synchronization between all harts. I suspect
implementations will be forced to choose (1), even if they otherwise would
have preferred to have a different update frequency or clock. Or they will
simply ignore the synchronization requirement in the ISA specification,
which I think is a real possibility.

The current requirement effectively forces implementations to have a 100
MHz clock. If the resolution is changed to 1 ns, that becomes a 1 GHz
clock that implementations are forced to have. Either way, this violates
the Platform Policy that says platforms should not mandate performance, but
certainly the higher clock rate is more of a burden.


Ved Shanbhogue
 

On Wed, Nov 17, 2021 at 04:37:12PM -0600, Vedvyas Shanbhogue via lists.riscv.org wrote:
On Wed, Nov 17, 2021 at 05:30:05PM -0500, Darius Rad wrote:
On Wed, Nov 17, 2021 at 03:50:17PM -0600, Vedvyas Shanbhogue wrote:
On Wed, Nov 17, 2021 at 04:36:10PM -0500, Darius Rad wrote:
On Wed, Nov 17, 2021 at 04:37:21AM +0000, Anup Patel wrote:
Before we go ahead and change the MTIME resolution requirement in the platform spec, I would like to highlight following points (from past discussions) which led to mandating a fixed MTIME resolution in the platform spec:


1. The Linux kernel creates a clock source on-top-of time CSR (mirror of MMIO MTIME) with timebase-frequency discovered from DT. The generic time management in Linux kernel requires nanoseconds granularity so each value read from clock source is converted to nanoseconds using a mult and shift (i.e. nanoseconds = (time_csr * multi) >> shift)). In other words, Linux kernel always uses integer operation to convert X resolution of time CSR to 1ns resolution and this conversion will have some round-off errors. We could have mandated a fixed 1ns resolution (just like ARM SBSA) but for RISC-V we also need to honour the architectural requirement of all time CSRs to be synchronized within 1 tick (i.e. one resolution period) and for multi-sockets (or multi-die) systems it becomes challenging to synchronize multiple MTIME counters within 1ns resolution. Considering this facts, it made sense to have fixed 10ns resolution for MTIME but the update frequency could be lower than 100MHz. (@Greg Favor<mailto:gfavor@...>, please add if I missed anything)
Considering the requirement that all time CSRs be synchronized to within 1
tick, setting a fixed resolution indirectly makes synchronization much more
difficult for implementations that have a lower update frequency. For such
implementations, since each update is more than 1 tick, it would be
necessary to ensure that all time CSRs always have exactly the same value,
which is considerably more difficult than within 1 tick.
So an implementation that supports 100MHz clock would need to update the mtime by 10 on each tick to meet the 1ns granularity. In current spec a implementation that supports 10 MHz clock would need to update the mtime by 10 on each tick to meet the 10ns resolution. I am not sure incrementing by 1 vs. incrementing by 10 makes it much harder as it was already required for a system that implements a 10MHz clock.
The specification says "synchronized to within one tick".

If the time CSR is incremented one tick at a time, then, for every update,
as long as one update propagates to all harts before the next update, this
requirement is met. Some harts may show time T and some may show T+1, but
those are within one tick, which is acceptable.

Now suppose there is a system that increments 10 ticks per update. If the
time value in any hart lags, where some show T and some show T+10, this is
not within one tick, which is not acceptable.

Thus, for systems that update one tick at a time, they have a full tick to
propagate updates to all harts. For systems that update multiple ticks at
a time, they have *no* time to propagate updates to all harts; all updates
must be instantaneous on *all* harts.
I agree. But that requirement is there already for a system that implements a 10 Mhz clock and has to suport 10ns resolution. If 10ns vs. 1ns is a really sticking point could we have two values supported by the platform specification - similar to the range of update frequencies supported. We can leave it to the end user whether they want to put these two classes of systems in one migration pool.
Did that address your concern? Should we update to supporting 1 ns resolution or not disallowing a 1 ns resolution?


regards
ved