Date   

Re: Unix platform working group future agenda (to be discussed in next meeting (06/09 8AM PST))

atishp@...
 

I have gathered all the requirements discussed till now and created a google doc.

https://docs.google.com/document/d/1sLYZJrK38R_QWj5KjogxU2vjJSTIq0bJjZu57djuGoA/edit?usp=sharing

The plan is to regularly update this document so that we keep a tab of all the requirements that should go into the platform specification.

It will also help to identify which sections should be mandatory and added now.

If anybody is interested to expand a particular area, you can edit the document or send a patch directly.

 

 


Re: Unix platform working group future agenda (to be discussed in next meeting (06/09 8AM PST))

Greg Favor
 

On Thu, Jun 4, 2020 at 11:45 PM Atish Patra <atish.patra@...> wrote:
> Standard system watchdog timer?

AFAIK, we don't have any system watchdog timer in any of the current
RISC-V hardware. May be we can defer this ?

Yes, I imagine so.  Eventually this would be desirable as a very basic form of RAS, i.e when the system is hung or in livelock, this would bring control back to system firmware (to log, recover, reboot, etc.).


> Standard system UART for early boot communication.
>

Do we have to say which standard UART ?

This is desirable for the sake of early boot software (before it's convenient to be loading in platform-specific drivers).  (As a side-note, when ARM SBSA first mandated this, it painfully blew it by not standardizing a little detail like the clock frequency for the UART.)

In any case, this and surrounding comments are about standardizing basics of the platform that provide no differentiation value and just complicate life when one wants to get to a world where one can take a compliant platform, install standard binaries, and have it all work out of the box.  Not an issue for embedded systems, but certainly important if and when RISC-V is going to get into parts of the server or edge compute space.
 
An uart must be available but the implementation details doesn't need
go into the platform spec.

> The Debug spec (which provides facilities for self-hosted as well as
> external debug) has most features as optional.  At least for self-
> hosted debug purposes, should there be a minimum set of trigger
> module requirements for how many triggers and what trigger features
> are available?
>

Is it a common practice to mandate a subset of debug spec in a platform
specification?

In embedded systems, probably.  But, as noted above, desirable on standardized server-class systems (even if going into high-end embedded "infrastructure" type systems - which are increasingly looking more like Linux servers platforms and less like classic embedded Linux systems).
 

> Presenting on-chip peripherals as PCIe integrated endpoints?  (SBSA
> provides a lot of standardization along these lines.  But this may be
> a bridge too far at this early stage of platform standardization.)
>

I guess so.

Hopefully this suggestion makes more sense given my comments above.  Not necessary for embedded systems, but very desirable for high-end embedded and server-class systems (as ARM took a long time to realize).

Greg


Upcoming Event: Unix platform working group meeting (06/09 8AM PST) @ Tue June 09, 2020 8am - 9am (PST) - Tue, 06/09/2020 8:00am-9:00am #cal-reminder

tech-unixplatformspec@lists.riscv.org Calendar <tech-unixplatformspec@...>
 

Reminder: Unix platform working group meeting (06/09 8AM PST) @ Tue June 09, 2020 8am - 9am (PST)

When: Tuesday, 9 June 2020, 8:00am to 9:00am, (GMT-07:00) America/Los Angeles

Where:webex

View Event

Description: Hi,

The next unix platform specification working group meeting is scheduled on 9th June 8AM PST.

The agenda:
 - The system reset extension
    -- https://github.com/riscv/riscv-sbi-doc/pull/39
-  Discuss the changes that were merged after SBI v0.2 is tagged
-  How to deal with errors unspecified in the SBI specification
      - https://github.com/riscv/riscv-sbi-doc/pull/51
-  Future roadmap 

Looking forward to catching up with everyone.
 

Regards,

Atish

 

Options to Join this Meeting

From any computer or mobile deviceClick to Join WebEx.  

From any video conference unit, enter "54 161 501 6405" from remote or touch panel.

Meeting password: 7SEy784483545 (77397844 from phones)



Toll and Toll free numbers are billed at a higher per minute rate. Follow the below instructions for cost effective conferencing.
1. Click link above to join WebEx meeting.
2. Once you are in the meeting, click the Phone icon and select 'Call Me'.
3. Enter your full number, starting with country code followed by remaining digits. The system will then place a call to that number.

If you are unable to use WebEx 'Call Me' or 'Call Using Computer' features, use the following dial in numbers:

Join by phone
+1-408-717-7733 USA Toll
Meeting Number: 161 501 6405

Global call-in numbers

 


Re: Unix platform working group future agenda (to be discussed in next meeting (06/09 8AM PST))

alankao
 
Edited

Since we don't have the chance to discuss the topics on the agenda in the meeting, please allow me to post two comments from Andes here:

DRAM start address
Is there any good reason to standardize it?
I also notice that both BBL and OpenSBI have default start address at 0x80000000, but is there any reason why 0x0 is not a good configuration?

> Minimum number and width of hardware performance counters.  Support for the small set of standard Linux perf mon events
We have made a proposal in privileged task group, https://lists.riscv.org/g/tech-privileged-archive/message/488?p=,,,20,0,0,0::Created,,Proposal,20,2,40,32306071
For those who don't access to privileged task group, please check https://github.com/riscv/riscv-isa-manual/issues/402
This can provide the minimum functionality for perf to work. Any comments are welcome.


Re: Unix platform working group future agenda (to be discussed in next meeting (06/09 8AM PST))

Greg Favor
 

DRAM start address
Is there any good reason to standardize it?
I also notice that both BBL and OpenSBI have default start address at 0x80000000, but is there any reason why 0x0 is not a good configuration?

Note that it can generally be useful to have 4K page 0x0 be no man's land (e.g. catch a bad PA pointer dereference), and that a number of implementations of the RISC-V Debug spec have the Debug Module and the Debug-Mode code located specifically in page 0x0 (so as to avoid the need for freeing up a GPR to serve as a D-mode base register).  (In the latter case, that debug stuff in page 0x0 might only be accessible while in D-mode.)

Greg

 


New file uploaded to tech-unixplatformspec@lists.riscv.org

tech-unixplatformspec@lists.riscv.org Notification <tech-unixplatformspec+notification@...>
 

Hello,

This email message is a notification to let you know that the following files have been uploaded to the Files area of the tech-unixplatformspec@... group.

Uploaded By: Atish Patra <atish.patra@...>

Description:
Recording of the Unix platform meeting held on 06/09/2020.

Cheers,
The RISCV Team


Proposal: Magic number in boot register

Jonathan Behrens <behrensj@...>
 

Hi everyone,

To start off discussion about requirements that should go into the platform spec, I propose a simple change to current software:

When entering S-mode for the first time, the a2 register should contain the value 0x54414c5058494e55 ("UNIXPLAT").

The intention here is that software should be able to look for this value and know that it has been booted in a Supervisor Execution Environment that is compliant with the Unix-class platform spec. This would distinguish both from old implementations that only support SBI v0.1, but also possible future execution environments designed by other groups.

Jonathan


Re: Proposal: Magic number in boot register

atishp@...
 

On Tue, 2020-06-16 at 09:54 -0400, Jonathan Behrens wrote:
Hi everyone,

To start off discussion about requirements that should go into the
platform spec, I propose a simple change to current software:

When entering S-mode for the first time, the a2 register should
contain the value 0x54414c5058494e55 ("UNIXPLAT").

The intention here is that software should be able to look for this
value and know that it has been booted in a Supervisor Execution
Environment that is compliant with the Unix-class platform spec. This
would distinguish both from old implementations that only support SBI
v0.1, but also possible future execution environments designed by
other groups.
For SBI version, supervisor systems should use "sbi_get_spec_version"
API to identify what is the SBI version of the SBI implementation. For
v0.1, the above call will return a -ve value indicating that this is a
v0.1.

That's how linux kernel currently detects the SBI version dynamically.


Jonathan
--
Regards,
Atish


Re: Proposal: Magic number in boot register

Jonathan Behrens <behrensj@...>
 

Thanks for that clarification! It is good to know that SBI v0.1 implementations are consistent about returning negative values for functions they don't recognize like sbi_get_spec_version. This however doesn't work for environments which cannot or don't want to implement the SBI at all (what value do you return to say there is no SBI?)

Once RISC-V is more widely deployed, it is likely that there will be more platform specs written by other committees, or even groups entirely outside of the RISC-V foundation. They may not want to require ecalls to detect capabilities, or might have other constraints. Yet, developers will likely want to write kernels that can boot across a range of these different environments. This has certainly been the case on x86 where there's lots of different bootloaders that each work with their own conventions.

To give one case where this already seems to be coming up, Linux can run in M-mode instead of S-mode but only if it is configured that way at compile time. If Linux had a better way to know whether there was firmware present, it might be able to use a shared kernel binary for both cases.

Best,
Jonathan


On Wed, Jun 17, 2020 at 2:56 PM Atish Patra <Atish.Patra@...> wrote:
On Tue, 2020-06-16 at 09:54 -0400, Jonathan Behrens wrote:
> Hi everyone,
>
> To start off discussion about requirements that should go into the
> platform spec, I propose a simple change to current software:
>
> When entering S-mode for the first time, the a2 register should
> contain the value 0x54414c5058494e55 ("UNIXPLAT").
>
> The intention here is that software should be able to look for this
> value and know that it has been booted in a Supervisor Execution
> Environment that is compliant with the Unix-class platform spec. This
> would distinguish both from old implementations that only support SBI
> v0.1, but also possible future execution environments designed by
> other groups.
>

For SBI version, supervisor systems should use "sbi_get_spec_version"
API to identify what is the SBI version of the SBI implementation. For
v0.1, the above call will return a -ve value indicating that this is a
v0.1.

That's how linux kernel currently detects the SBI version dynamically.


> Jonathan
--
Regards,
Atish


Re: Proposal: Magic number in boot register

atishp@...
 



On Jun 19, 2020, at 1:26 PM, Jonathan Behrens <behrensj@...> wrote:

Thanks for that clarification! It is good to know that SBI v0.1 implementations are consistent about returning negative values for functions they don't recognize like sbi_get_spec_version. This however doesn't work for environments which cannot or don't want to implement the SBI at all (what value do you return to say there is no SBI?)

Once RISC-V is more widely deployed, it is likely that there will be more platform specs written by other committees, or even groups entirely outside of the RISC-V foundation. They may not want to require ecalls to detect capabilities, or might have other constraints. Yet, developers will likely want to write kernels that can boot across a range of these different environments. This has certainly been the case on x86 where there's lots of different bootloaders that each work with their own conventions.


Yes. That’s a possibility. If I understand you correctly, you want some identifier that let supervisor know that the M-mode 
firmware is an SBI based one.

If that’s the only case, how about a DT property under /chosen node instead of reserving a register for a fixed value.

To give one case where this already seems to be coming up, Linux can run in M-mode instead of S-mode but only if it is configured that way at compile time. If Linux had a better way to know whether there was firmware present, it might be able to use a shared kernel binary for both cases.

Best,
Jonathan

On Wed, Jun 17, 2020 at 2:56 PM Atish Patra <Atish.Patra@...> wrote:
On Tue, 2020-06-16 at 09:54 -0400, Jonathan Behrens wrote:
> Hi everyone,
>
> To start off discussion about requirements that should go into the
> platform spec, I propose a simple change to current software:
>
> When entering S-mode for the first time, the a2 register should
> contain the value 0x54414c5058494e55 ("UNIXPLAT").
>
> The intention here is that software should be able to look for this
> value and know that it has been booted in a Supervisor Execution
> Environment that is compliant with the Unix-class platform spec. This
> would distinguish both from old implementations that only support SBI
> v0.1, but also possible future execution environments designed by
> other groups.
>

For SBI version, supervisor systems should use "sbi_get_spec_version"
API to identify what is the SBI version of the SBI implementation. For
v0.1, the above call will return a -ve value indicating that this is a
v0.1.

That's how linux kernel currently detects the SBI version dynamically.


> Jonathan
--
Regards,
Atish


Re: Proposal: Magic number in boot register

Jonathan Behrens <behrensj@...>
 


On Fri, Jun 19, 2020 at 5:42 PM Atish Patra <Atish.Patra@...> wrote:
On Jun 19, 2020, at 1:26 PM, Jonathan Behrens <behrensj@...> wrote:

Thanks for that clarification! It is good to know that SBI v0.1 implementations are consistent about returning negative values for functions they don't recognize like sbi_get_spec_version. This however doesn't work for environments which cannot or don't want to implement the SBI at all (what value do you return to say there is no SBI?)

Once RISC-V is more widely deployed, it is likely that there will be more platform specs written by other committees, or even groups entirely outside of the RISC-V foundation. They may not want to require ecalls to detect capabilities, or might have other constraints. Yet, developers will likely want to write kernels that can boot across a range of these different environments. This has certainly been the case on x86 where there's lots of different bootloaders that each work with their own conventions.


Yes. That’s a possibility. If I understand you correctly, you want some identifier that let supervisor know that the M-mode 
firmware is an SBI based one.

If that’s the only case, how about a DT property under /chosen node instead of reserving a register for a fixed value.

The register value would also signal the other elements of this platform spec are being followed. Notably including that a1 actually points to a valid device tree. If we could count on a device tree always being present then I agree that going the /chosen route would be cleaner, but if a future third party standard decided to go with ACPI tables or something instead then they may not be willing to require a dummy device tree just to allow software to blindly dereference a1.

Jonathan
To give one case where this already seems to be coming up, Linux can run in M-mode instead of S-mode but only if it is configured that way at compile time. If Linux had a better way to know whether there was firmware present, it might be able to use a shared kernel binary for both cases.

Best,
Jonathan

On Wed, Jun 17, 2020 at 2:56 PM Atish Patra <Atish.Patra@...> wrote:
On Tue, 2020-06-16 at 09:54 -0400, Jonathan Behrens wrote:
> Hi everyone,
>
> To start off discussion about requirements that should go into the
> platform spec, I propose a simple change to current software:
>
> When entering S-mode for the first time, the a2 register should
> contain the value 0x54414c5058494e55 ("UNIXPLAT").
>
> The intention here is that software should be able to look for this
> value and know that it has been booted in a Supervisor Execution
> Environment that is compliant with the Unix-class platform spec. This
> would distinguish both from old implementations that only support SBI
> v0.1, but also possible future execution environments designed by
> other groups.
>

For SBI version, supervisor systems should use "sbi_get_spec_version"
API to identify what is the SBI version of the SBI implementation. For
v0.1, the above call will return a -ve value indicating that this is a
v0.1.

That's how linux kernel currently detects the SBI version dynamically.


> Jonathan
--
Regards,
Atish


Re: Proposal: Magic number in boot register

atishp@...
 

On Tue, 2020-06-23 at 16:37 -0400, Jonathan Behrens wrote:

On Fri, Jun 19, 2020 at 5:42 PM Atish Patra <Atish.Patra@...>
wrote:
On Jun 19, 2020, at 1:26 PM, Jonathan Behrens <behrensj@...>
wrote:

Thanks for that clarification! It is good to know that SBI v0.1
implementations are consistent about returning negative values
for functions they don't recognize like sbi_get_spec_version.
This however doesn't work for environments which cannot or don't
want to implement the SBI at all (what value do you return to say
there is no SBI?)

Once RISC-V is more widely deployed, it is likely that there will
be more platform specs written by other committees, or even
groups entirely outside of the RISC-V foundation. They may not
want to require ecalls to detect capabilities, or might have
other constraints. Yet, developers will likely want to write
kernels that can boot across a range of these different
environments. This has certainly been the case on x86 where
there's lots of different bootloaders that each work with their
own conventions.
Yes. That’s a possibility. If I understand you correctly, you want
some identifier that let supervisor know that the M-mode
firmware is an SBI based one.

If that’s the only case, how about a DT property under /chosen node
instead of reserving a register for a fixed value.
The register value would also signal the other elements of this
platform spec are being followed. Notably including that a1 actually
points to a valid device tree. If we could count on a device tree
always being present then I agree that going the /chosen route would
be cleaner, but if a future third party standard decided to go with
ACPI tables or something instead then they may not be willing to
require a dummy device tree just to allow software to blindly
dereference a1.
For ACPI tables, a similar property can be added in the ACPI table.
We anyways have to add other run time properties to ACPI table as we do
currently for the device tree.


Jonathan
To give one case where this already seems to be coming up, Linux
can run in M-mode instead of S-mode but only if it is configured
that way at compile time. If Linux had a better way to know
whether there was firmware present, it might be able to use a
shared kernel binary for both cases.

Best,
Jonathan

On Wed, Jun 17, 2020 at 2:56 PM Atish Patra <Atish.Patra@...>
wrote:
On Tue, 2020-06-16 at 09:54 -0400, Jonathan Behrens wrote:
Hi everyone,

To start off discussion about requirements that should go
into the
platform spec, I propose a simple change to current software:

When entering S-mode for the first time, the a2 register
should
contain the value 0x54414c5058494e55 ("UNIXPLAT").

The intention here is that software should be able to look
for this
value and know that it has been booted in a Supervisor
Execution
Environment that is compliant with the Unix-class platform
spec. This
would distinguish both from old implementations that only
support SBI
v0.1, but also possible future execution environments
designed by
other groups.
For SBI version, supervisor systems should use
"sbi_get_spec_version"
API to identify what is the SBI version of the SBI
implementation. For
v0.1, the above call will return a -ve value indicating that
this is a
v0.1.

That's how linux kernel currently detects the SBI version
dynamically.


Jonathan
--
Regards,
Atish


Re: Proposal: Magic number in boot register

Jonathan Behrens <behrensj@...>
 

But how will the booting OS know whether to look at ACPI tables or the device tree? Wouldn't you need some register to indicate which one is being used?

Jonathan

On Wed, Jun 24, 2020 at 11:18 AM Atish Patra via lists.riscv.org <atish.patra=wdc.com@...> wrote:
On Tue, 2020-06-23 at 16:37 -0400, Jonathan Behrens wrote:
>
> On Fri, Jun 19, 2020 at 5:42 PM Atish Patra <Atish.Patra@...>
> wrote:
> > > On Jun 19, 2020, at 1:26 PM, Jonathan Behrens <behrensj@...>
> > > wrote:
> > >
> > > Thanks for that clarification! It is good to know that SBI v0.1
> > > implementations are consistent about returning negative values
> > > for functions they don't recognize like sbi_get_spec_version.
> > > This however doesn't work for environments which cannot or don't
> > > want to implement the SBI at all (what value do you return to say
> > > there is no SBI?)
> > >
> > > Once RISC-V is more widely deployed, it is likely that there will
> > > be more platform specs written by other committees, or even
> > > groups entirely outside of the RISC-V foundation. They may not
> > > want to require ecalls to detect capabilities, or might have
> > > other constraints. Yet, developers will likely want to write
> > > kernels that can boot across a range of these different
> > > environments. This has certainly been the case on x86 where
> > > there's lots of different bootloaders that each work with their
> > > own conventions.
> > >
> >
> > Yes. That’s a possibility. If I understand you correctly, you want
> > some identifier that let supervisor know that the M-mode
> > firmware is an SBI based one.
> >
> > If that’s the only case, how about a DT property under /chosen node
> > instead of reserving a register for a fixed value.
> >
>
> The register value would also signal the other elements of this
> platform spec are being followed. Notably including that a1 actually
> points to a valid device tree. If we could count on a device tree
> always being present then I agree that going the /chosen route would
> be cleaner, but if a future third party standard decided to go with
> ACPI tables or something instead then they may not be willing to
> require a dummy device tree just to allow software to blindly
> dereference a1.
>

For ACPI tables, a similar property can be added in the ACPI table.
We anyways have to add other run time properties to ACPI table as we do
currently for the device tree.


> Jonathan
> > > To give one case where this already seems to be coming up, Linux
> > > can run in M-mode instead of S-mode but only if it is configured
> > > that way at compile time. If Linux had a better way to know
> > > whether there was firmware present, it might be able to use a
> > > shared kernel binary for both cases.
> > >
> > > Best,
> > > Jonathan
> > >
> > > On Wed, Jun 17, 2020 at 2:56 PM Atish Patra <Atish.Patra@...>
> > > wrote:
> > > > On Tue, 2020-06-16 at 09:54 -0400, Jonathan Behrens wrote:
> > > > > Hi everyone,
> > > > >
> > > > > To start off discussion about requirements that should go
> > > > into the
> > > > > platform spec, I propose a simple change to current software:
> > > > >
> > > > > When entering S-mode for the first time, the a2 register
> > > > should
> > > > > contain the value 0x54414c5058494e55 ("UNIXPLAT").
> > > > >
> > > > > The intention here is that software should be able to look
> > > > for this
> > > > > value and know that it has been booted in a Supervisor
> > > > Execution
> > > > > Environment that is compliant with the Unix-class platform
> > > > spec. This
> > > > > would distinguish both from old implementations that only
> > > > support SBI
> > > > > v0.1, but also possible future execution environments
> > > > designed by
> > > > > other groups.
> > > > >
> > > >
> > > > For SBI version, supervisor systems should use
> > > > "sbi_get_spec_version"
> > > > API to identify what is the SBI version of the SBI
> > > > implementation. For
> > > > v0.1, the above call will return a -ve value indicating that
> > > > this is a
> > > > v0.1.
> > > >
> > > > That's how linux kernel currently detects the SBI version
> > > > dynamically.
> > > >
> > > >
> > > > > Jonathan

--
Regards,
Atish




Re: Proposal: Magic number in boot register

atishp@...
 

On Wed, 2020-06-24 at 13:04 -0400, Jonathan Behrens wrote:
But how will the booting OS know whether to look at ACPI tables or
the device tree? Wouldn't you need some register to indicate which
one is being used?
I am not sure how it will be implemented in RISC-V when we have ACPI.
However, this is process followed in ARM64[1]

ACPI tables are passed via UEFI system configuration table while DT
address will be passed in x0. Kernel tries to use DT first if ACPI is
not preferred choice from kernel commandline. If it fails to find a DT,
it will try to use ACPI table if exists.


[1] https://lwn.net/Articles/642050/
Jonathan

On Wed, Jun 24, 2020 at 11:18 AM Atish Patra via lists.riscv.org <
atish.patra=wdc.com@...> wrote:
On Tue, 2020-06-23 at 16:37 -0400, Jonathan Behrens wrote:

On Fri, Jun 19, 2020 at 5:42 PM Atish Patra <Atish.Patra@...>
wrote:
On Jun 19, 2020, at 1:26 PM, Jonathan Behrens <
behrensj@...>
wrote:

Thanks for that clarification! It is good to know that SBI
v0.1
implementations are consistent about returning negative
values
for functions they don't recognize like sbi_get_spec_version.
This however doesn't work for environments which cannot or
don't
want to implement the SBI at all (what value do you return to
say
there is no SBI?)

Once RISC-V is more widely deployed, it is likely that there
will
be more platform specs written by other committees, or even
groups entirely outside of the RISC-V foundation. They may
not
want to require ecalls to detect capabilities, or might have
other constraints. Yet, developers will likely want to write
kernels that can boot across a range of these different
environments. This has certainly been the case on x86 where
there's lots of different bootloaders that each work with
their
own conventions.
Yes. That’s a possibility. If I understand you correctly, you
want
some identifier that let supervisor know that the M-mode
firmware is an SBI based one.

If that’s the only case, how about a DT property under /chosen
node
instead of reserving a register for a fixed value.
The register value would also signal the other elements of this
platform spec are being followed. Notably including that a1
actually
points to a valid device tree. If we could count on a device tree
always being present then I agree that going the /chosen route
would
be cleaner, but if a future third party standard decided to go
with
ACPI tables or something instead then they may not be willing to
require a dummy device tree just to allow software to blindly
dereference a1.
For ACPI tables, a similar property can be added in the ACPI table.
We anyways have to add other run time properties to ACPI table as
we do
currently for the device tree.


Jonathan
To give one case where this already seems to be coming up,
Linux
can run in M-mode instead of S-mode but only if it is
configured
that way at compile time. If Linux had a better way to know
whether there was firmware present, it might be able to use a
shared kernel binary for both cases.

Best,
Jonathan

On Wed, Jun 17, 2020 at 2:56 PM Atish Patra <
Atish.Patra@...>
wrote:
On Tue, 2020-06-16 at 09:54 -0400, Jonathan Behrens wrote:
Hi everyone,

To start off discussion about requirements that should go
into the
platform spec, I propose a simple change to current
software:

When entering S-mode for the first time, the a2 register
should
contain the value 0x54414c5058494e55 ("UNIXPLAT").

The intention here is that software should be able to
look
for this
value and know that it has been booted in a Supervisor
Execution
Environment that is compliant with the Unix-class
platform
spec. This
would distinguish both from old implementations that only
support SBI
v0.1, but also possible future execution environments
designed by
other groups.
For SBI version, supervisor systems should use
"sbi_get_spec_version"
API to identify what is the SBI version of the SBI
implementation. For
v0.1, the above call will return a -ve value indicating
that
this is a
v0.1.

That's how linux kernel currently detects the SBI version
dynamically.


Jonathan
--
Regards,
Atish


Re: Proposal: Magic number in boot register

Jonathan Behrens <behrensj@...>
 



On Wed, Jun 24, 2020 at 1:55 PM Atish Patra <Atish.Patra@...> wrote:
On Wed, 2020-06-24 at 13:04 -0400, Jonathan Behrens wrote:
> But how will the booting OS know whether to look at ACPI tables or
> the device tree? Wouldn't you need some register to indicate which
> one is being used?
>

I am not sure how it will be implemented in RISC-V when we have ACPI.
However, this is process followed in ARM64[1]

ACPI tables are passed via UEFI system configuration table while DT
address will be passed in x0. Kernel tries to use DT first if ACPI is
not preferred choice from kernel commandline. If it fails to find a DT,
it will try to use ACPI table if exists.


[1] https://lwn.net/Articles/642050/

From that link it looks like the OS already has access to the kernel command line and the EFI system table before it starts looking at the DT / ACPI tables? In that case, the bootloader has already passed enough relevant information for the OS to know that it isn't about to dereference a bad pointer or something when trying to read from the DT.

However, perhaps I've been too pessimistic about the RISC-V ecosystem all conforming to the Unix-class platform spec. I'm way more familiar with x86 which didn't really manage to achieve something like this, but it looks like maybe ARM did? If changes like adding ACPI tables or whatever are all done in a compatible way (say by having a stub DT) there there is no need for a magic number. If there are going to be multiple conflicting standards, then proactively using one out of the 31 registers to tell the apart might be worthwhile.

Jonathan

PS: I started this thread focusing on a small/not too technical question partially in the hope of generating more discussion on this mailing list. Please chime in if you have thoughts!

> On Wed, Jun 24, 2020 at 11:18 AM Atish Patra via lists.riscv.org <
> atish.patra=wdc.com@...> wrote:
> > On Tue, 2020-06-23 at 16:37 -0400, Jonathan Behrens wrote:
> > >
> > > On Fri, Jun 19, 2020 at 5:42 PM Atish Patra <Atish.Patra@...>
> > > wrote:
> > > > > On Jun 19, 2020, at 1:26 PM, Jonathan Behrens <
> > behrensj@...>
> > > > > wrote:
> > > > >
> > > > > Thanks for that clarification! It is good to know that SBI
> > v0.1
> > > > > implementations are consistent about returning negative
> > values
> > > > > for functions they don't recognize like sbi_get_spec_version.
> > > > > This however doesn't work for environments which cannot or
> > don't
> > > > > want to implement the SBI at all (what value do you return to
> > say
> > > > > there is no SBI?)
> > > > >
> > > > > Once RISC-V is more widely deployed, it is likely that there
> > will
> > > > > be more platform specs written by other committees, or even
> > > > > groups entirely outside of the RISC-V foundation. They may
> > not
> > > > > want to require ecalls to detect capabilities, or might have
> > > > > other constraints. Yet, developers will likely want to write
> > > > > kernels that can boot across a range of these different
> > > > > environments. This has certainly been the case on x86 where
> > > > > there's lots of different bootloaders that each work with
> > their
> > > > > own conventions.
> > > > >
> > > >
> > > > Yes. That’s a possibility. If I understand you correctly, you
> > want
> > > > some identifier that let supervisor know that the M-mode
> > > > firmware is an SBI based one.
> > > >
> > > > If that’s the only case, how about a DT property under /chosen
> > node
> > > > instead of reserving a register for a fixed value.
> > > >
> > >
> > > The register value would also signal the other elements of this
> > > platform spec are being followed. Notably including that a1
> > actually
> > > points to a valid device tree. If we could count on a device tree
> > > always being present then I agree that going the /chosen route
> > would
> > > be cleaner, but if a future third party standard decided to go
> > with
> > > ACPI tables or something instead then they may not be willing to
> > > require a dummy device tree just to allow software to blindly
> > > dereference a1.
> > >
> >
> > For ACPI tables, a similar property can be added in the ACPI table.
> > We anyways have to add other run time properties to ACPI table as
> > we do
> > currently for the device tree.
> >
> >
> > > Jonathan
> > > > > To give one case where this already seems to be coming up,
> > Linux
> > > > > can run in M-mode instead of S-mode but only if it is
> > configured
> > > > > that way at compile time. If Linux had a better way to know
> > > > > whether there was firmware present, it might be able to use a
> > > > > shared kernel binary for both cases.
> > > > >
> > > > > Best,
> > > > > Jonathan
> > > > >
> > > > > On Wed, Jun 17, 2020 at 2:56 PM Atish Patra <
> > Atish.Patra@...>
> > > > > wrote:
> > > > > > On Tue, 2020-06-16 at 09:54 -0400, Jonathan Behrens wrote:
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > To start off discussion about requirements that should go
> > > > > > into the
> > > > > > > platform spec, I propose a simple change to current
> > software:
> > > > > > >
> > > > > > > When entering S-mode for the first time, the a2 register
> > > > > > should
> > > > > > > contain the value 0x54414c5058494e55 ("UNIXPLAT").
> > > > > > >
> > > > > > > The intention here is that software should be able to
> > look
> > > > > > for this
> > > > > > > value and know that it has been booted in a Supervisor
> > > > > > Execution
> > > > > > > Environment that is compliant with the Unix-class
> > platform
> > > > > > spec. This
> > > > > > > would distinguish both from old implementations that only
> > > > > > support SBI
> > > > > > > v0.1, but also possible future execution environments
> > > > > > designed by
> > > > > > > other groups.
> > > > > > >
> > > > > >
> > > > > > For SBI version, supervisor systems should use
> > > > > > "sbi_get_spec_version"
> > > > > > API to identify what is the SBI version of the SBI
> > > > > > implementation. For
> > > > > > v0.1, the above call will return a -ve value indicating
> > that
> > > > > > this is a
> > > > > > v0.1.
> > > > > >
> > > > > > That's how linux kernel currently detects the SBI version
> > > > > > dynamically.
> > > > > >
> > > > > >
> > > > > > > Jonathan
> >

--
Regards,
Atish


Re: Proposal: Magic number in boot register

atishp@...
 

On Fri, 2020-06-26 at 15:12 -0400, Jonathan Behrens wrote:


On Wed, Jun 24, 2020 at 1:55 PM Atish Patra <Atish.Patra@...>
wrote:
On Wed, 2020-06-24 at 13:04 -0400, Jonathan Behrens wrote:
But how will the booting OS know whether to look at ACPI tables
or
the device tree? Wouldn't you need some register to indicate
which
one is being used?
I am not sure how it will be implemented in RISC-V when we have
ACPI.
However, this is process followed in ARM64[1]

ACPI tables are passed via UEFI system configuration table while DT
address will be passed in x0. Kernel tries to use DT first if ACPI
is
not preferred choice from kernel commandline. If it fails to find a
DT,
it will try to use ACPI table if exists.


[1] https://lwn.net/Articles/642050/
From that link it looks like the OS already has access to the kernel
command line and the EFI system table before it starts looking at the
DT / ACPI tables?
Yes. ACPI is only usable via UEFI boot. Thus, EFI system table is
already available with kernel. Even though kernel looks at DT, before
looking at EFI system table, it removes all the memory mappings from DT
for efi boot and reinitilizes all memory blocks from efi memory
mappings.

In that case, the bootloader has already passed enough relevant
information for the OS to know that it isn't about to dereference a
bad pointer or something when trying to read from the DT.

However, perhaps I've been too pessimistic about the RISC-V ecosystem
all conforming to the Unix-class platform spec. I'm way more familiar
with x86 which didn't really manage to achieve something like this,
but it looks like maybe ARM did?
I guess ARM has to do it because it had to support both DT & ACPI
unlike x86. We may have to follow similar approach for RISC-V as well
in future.

If changes like adding ACPI tables or whatever are all done in a
compatible way (say by having a stub DT) there there is no need for a
magic number. If there are going to be multiple conflicting
standards, then proactively using one out of the 31 registers to tell
the apart might be worthwhile.
I am hoping there will not be any conflicting standards and we add
everything in a compatiable way. For Linux land, we try to keep single
Linux kernel image booting all platforms (supporting S-mode).

Jonathan

PS: I started this thread focusing on a small/not too technical
question partially in the hope of generating more discussion on this
mailing list. Please chime in if you have thoughts!

On Wed, Jun 24, 2020 at 11:18 AM Atish Patra via lists.riscv.org
<
atish.patra=wdc.com@...> wrote:
On Tue, 2020-06-23 at 16:37 -0400, Jonathan Behrens wrote:

On Fri, Jun 19, 2020 at 5:42 PM Atish Patra <
Atish.Patra@...>
wrote:
On Jun 19, 2020, at 1:26 PM, Jonathan Behrens <
behrensj@...>
wrote:

Thanks for that clarification! It is good to know that
SBI
v0.1
implementations are consistent about returning negative
values
for functions they don't recognize like
sbi_get_spec_version.
This however doesn't work for environments which cannot
or
don't
want to implement the SBI at all (what value do you
return to
say
there is no SBI?)

Once RISC-V is more widely deployed, it is likely that
there
will
be more platform specs written by other committees, or
even
groups entirely outside of the RISC-V foundation. They
may
not
want to require ecalls to detect capabilities, or might
have
other constraints. Yet, developers will likely want to
write
kernels that can boot across a range of these different
environments. This has certainly been the case on x86
where
there's lots of different bootloaders that each work with
their
own conventions.
Yes. That’s a possibility. If I understand you correctly,
you
want
some identifier that let supervisor know that the M-mode
firmware is an SBI based one.

If that’s the only case, how about a DT property under
/chosen
node
instead of reserving a register for a fixed value.
The register value would also signal the other elements of
this
platform spec are being followed. Notably including that a1
actually
points to a valid device tree. If we could count on a device
tree
always being present then I agree that going the /chosen
route
would
be cleaner, but if a future third party standard decided to
go
with
ACPI tables or something instead then they may not be willing
to
require a dummy device tree just to allow software to blindly
dereference a1.
For ACPI tables, a similar property can be added in the ACPI
table.
We anyways have to add other run time properties to ACPI table
as
we do
currently for the device tree.


Jonathan
To give one case where this already seems to be coming
up,
Linux
can run in M-mode instead of S-mode but only if it is
configured
that way at compile time. If Linux had a better way to
know
whether there was firmware present, it might be able to
use a
shared kernel binary for both cases.

Best,
Jonathan

On Wed, Jun 17, 2020 at 2:56 PM Atish Patra <
Atish.Patra@...>
wrote:
On Tue, 2020-06-16 at 09:54 -0400, Jonathan Behrens
wrote:
Hi everyone,

To start off discussion about requirements that
should go
into the
platform spec, I propose a simple change to current
software:

When entering S-mode for the first time, the a2
register
should
contain the value 0x54414c5058494e55 ("UNIXPLAT").

The intention here is that software should be able to
look
for this
value and know that it has been booted in a
Supervisor
Execution
Environment that is compliant with the Unix-class
platform
spec. This
would distinguish both from old implementations that
only
support SBI
v0.1, but also possible future execution environments
designed by
other groups.
For SBI version, supervisor systems should use
"sbi_get_spec_version"
API to identify what is the SBI version of the SBI
implementation. For
v0.1, the above call will return a -ve value indicating
that
this is a
v0.1.

That's how linux kernel currently detects the SBI
version
dynamically.


Jonathan
--
Regards,
Atish


Proposal: SBI PMU Extension

Anup Patel
 

Hi All,

We don't have a dedicated RISC-V PMU extension but we do have HW performance
counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER CSRs. A RISC-V
CPU can allow monitoring HW events using few HPMCOUNTER CSRs. The M-mode
software can also inhibit unused performance counters to save energy.

In addition to HW performance counters, a SBI implementation (e.g. OpenSBI,
Xvisor, KVM, etc) can provide software counters for interesting events
such as number of RFENCEs, number of IPIs, number of misaligned load/store
instructions, number of illegal instructions, etc.

We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR,
HPMCOUNTER CSRs and software counters of the SBI implementation.

To define SBI PMU extension, we first define counter_idx which is a unique
number assigned to a counter and event_idx which is an encoded number
representing event to be monitored.

The SBI PMU event_idx is 15bit number encoded as follows:
event_idx[14:12] = type
event_idx[11:0] = code

If event_idx.type == 0 then it is HARDWARE event and event_idx.code can
be one of the following:
enum sbi_pmu_hw_id {
/*
* Common hardware events, generalized by the kernel:
*/
PERF_COUNT_HW_CPU_CYCLES = 0,
PERF_COUNT_HW_INSTRUCTIONS = 1,
PERF_COUNT_HW_CACHE_REFERENCES = 2,
PERF_COUNT_HW_CACHE_MISSES = 3,
PERF_COUNT_HW_BRANCH_INSTRUCTIONS = 4,
PERF_COUNT_HW_BRANCH_MISSES = 5,
PERF_COUNT_HW_BUS_CYCLES = 6,
PERF_COUNT_HW_STALLED_CYCLES_FRONTEND = 7,
PERF_COUNT_HW_STALLED_CYCLES_BACKEND = 8,
PERF_COUNT_HW_REF_CPU_CYCLES = 9,

PERF_COUNT_HW_MAX, /* non-ABI */
};
(NOTE: Same as described in <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 1 then it is CACHE event and event_idx.code is encoded
as follows:
event_idx.code[11:4] = cache_id
event_idx.code[3:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
PERF_COUNT_HW_CACHE_L1D = 0,
PERF_COUNT_HW_CACHE_L1I = 1,
PERF_COUNT_HW_CACHE_LL = 2,
PERF_COUNT_HW_CACHE_DTLB = 3,
PERF_COUNT_HW_CACHE_ITLB = 4,
PERF_COUNT_HW_CACHE_BPU = 5,
PERF_COUNT_HW_CACHE_NODE = 6,

PERF_COUNT_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
PERF_COUNT_HW_CACHE_OP_READ = 0,
PERF_COUNT_HW_CACHE_OP_WRITE = 1,
PERF_COUNT_HW_CACHE_OP_PREFETCH = 2,

PERF_COUNT_HW_CACHE_OP_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
PERF_COUNT_HW_CACHE_RESULT_ACCESS = 0,
PERF_COUNT_HW_CACHE_RESULT_MISS = 1,

PERF_COUNT_HW_CACHE_RESULT_MAX, /* non-ABI */
};
(NOTE: Same as described in <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 2 then it is RAW event and event_idx.code is just
a RAW event number.

In future, more event_idx can be defined without breaking ABI compatibility
of SBI calls.

Based on above definition of counter_idx definition, we can potentially have
the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
This call will return the number of COUNTERs
2. SBI_PMU_COUNTER_DESCRIBE
This call takes two parameters: 1) counter_idx 2) physical address of 4k page
It will write the description of SBI PMU counter at specified physical
address. The details of the SBI PMU counter written at specified physical
address are as follows:
1. Name (64 bytes)
2. CSR_Offset (4 bytes)
(E.g. CSR_Offset == 0x2 imply CSR 0xC02)
(E.g. CSR_Offset == 0xffffffff means it is SBI implementation counter)
3. CSR_Width (4 bytes)
(Number of CSR bits implemented in HW)
4. Event bitmap (2048 bytes) (i.e. 1-bit for each possible event_idx)
(If bit corresponding to a event_idx is 1 then event_idx is supported
by the counter)
5. Anything else ??
3. SBI_PMU_COUNTER_SET_PHYS_ADDR
This call takes two parameters: 1) counter_idx 2) physical address
It will set the physical address where SBI implementation will write
the software counter. This SBI call is only for counters not mapped
to any CSR (i.e. only for counters with CSR_Offset == 0xffffffff).
4. SBI_PMU_COUNTER_START
This call takes two parameters: 1) counter_idx 2) event_idx
It will inform SBI implementation to configure and start/enable specified
counter on the calling HART to monitor specific event. This SBI call will
fail for counters which are not present.
5. SBI_PMU_COUNTER_STOP
This call takes one parameter: 1) counter_idx
It will inform SBI implementation to stop/disable specified counters on the
calling HART. This SBI call will fail for counters which are not present.

From above, the RISC-V PMU driver will use most of the SBI calls at boot time.
Only SBI_PMU_COUNTER_START to be used once before using the counter. The reading
the counter is by reading CSR (for CSR_Offset != 0xffffffff) OR by reading
memory location (for CSR_Offset == 0xffffffff). The counter overflow handling
will have to be done in software by Linux kernel.

The information returned by SBI_PMU_NUM_COUNTERS and SBI_PMU_COUNTER_DESCRIBE
can be passed via DT/ACPI but it will be difficult to maintain because we
have hardware counters and SBI implementation counters both provided by SBI
PMU extension. The SBI implementation counters are specific to underlying
SBI implementation so we will have to keep counters/events described in
DT/ACPI in-sync with underlying SBI implementation.

Regards,
Anup


Re: Proposal: SBI PMU Extension

Zong Li
 

On Wed, Jul 1, 2020 at 8:26 PM Anup Patel <anup.patel@...> wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension but we do have HW performance
counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER CSRs. A RISC-V
CPU can allow monitoring HW events using few HPMCOUNTER CSRs. The M-mode
software can also inhibit unused performance counters to save energy.
Do we need the SBI calls to set the mcounteren and mcountinhibit (optional CSR)?
OR these two CSRs shouldn't be changed at runtime from s-moe?


In addition to HW performance counters, a SBI implementation (e.g. OpenSBI,
Xvisor, KVM, etc) can provide software counters for interesting events
such as number of RFENCEs, number of IPIs, number of misaligned load/store
instructions, number of illegal instructions, etc.
I'm not sure whether I misunderstood the usage of software counter, I don't
see the event_idxs of these events for software counters, maybe we could
define the event_idxs for them in this proposal, for example, if
event_idx.type == 3,
then it is SOFTWARE event, and event_idx.code is used to specify the monitoring
events such as number of RFENCEs, number of IPIs and so on.


We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR,
HPMCOUNTER CSRs and software counters of the SBI implementation.

To define SBI PMU extension, we first define counter_idx which is a unique
number assigned to a counter and event_idx which is an encoded number
representing event to be monitored.

The SBI PMU event_idx is 15bit number encoded as follows:
event_idx[14:12] = type
event_idx[11:0] = code

If event_idx.type == 0 then it is HARDWARE event and event_idx.code can
be one of the following:
enum sbi_pmu_hw_id {
/*
* Common hardware events, generalized by the kernel:
*/
PERF_COUNT_HW_CPU_CYCLES = 0,
PERF_COUNT_HW_INSTRUCTIONS = 1,
PERF_COUNT_HW_CACHE_REFERENCES = 2,
PERF_COUNT_HW_CACHE_MISSES = 3,
PERF_COUNT_HW_BRANCH_INSTRUCTIONS = 4,
PERF_COUNT_HW_BRANCH_MISSES = 5,
PERF_COUNT_HW_BUS_CYCLES = 6,
PERF_COUNT_HW_STALLED_CYCLES_FRONTEND = 7,
PERF_COUNT_HW_STALLED_CYCLES_BACKEND = 8,
PERF_COUNT_HW_REF_CPU_CYCLES = 9,

PERF_COUNT_HW_MAX, /* non-ABI */
};
(NOTE: Same as described in <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 1 then it is CACHE event and event_idx.code is encoded
as follows:
event_idx.code[11:4] = cache_id
event_idx.code[3:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
PERF_COUNT_HW_CACHE_L1D = 0,
PERF_COUNT_HW_CACHE_L1I = 1,
PERF_COUNT_HW_CACHE_LL = 2,
PERF_COUNT_HW_CACHE_DTLB = 3,
PERF_COUNT_HW_CACHE_ITLB = 4,
PERF_COUNT_HW_CACHE_BPU = 5,
PERF_COUNT_HW_CACHE_NODE = 6,

PERF_COUNT_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
PERF_COUNT_HW_CACHE_OP_READ = 0,
PERF_COUNT_HW_CACHE_OP_WRITE = 1,
PERF_COUNT_HW_CACHE_OP_PREFETCH = 2,

PERF_COUNT_HW_CACHE_OP_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
PERF_COUNT_HW_CACHE_RESULT_ACCESS = 0,
PERF_COUNT_HW_CACHE_RESULT_MISS = 1,

PERF_COUNT_HW_CACHE_RESULT_MAX, /* non-ABI */
};
(NOTE: Same as described in <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 2 then it is RAW event and event_idx.code is just
a RAW event number.

In future, more event_idx can be defined without breaking ABI compatibility
of SBI calls.

Based on above definition of counter_idx definition, we can potentially have
the following SBI calls:
I don't see the definition of counter_idx, but I remember that you had
talked about
that in other places as follows:
1. counter_idx = 0 to 2 are for CYCLE, TIME, and INSTRET
2. counter_idx = 3 to 31 are for HPMCOUNTERs
3. counter_idx >= 32 are for software counters


1. SBI_PMU_NUM_COUNTERS
This call will return the number of COUNTERs
2. SBI_PMU_COUNTER_DESCRIBE
This call takes two parameters: 1) counter_idx 2) physical address of 4k page
It will write the description of SBI PMU counter at specified physical
address. The details of the SBI PMU counter written at specified physical
address are as follows:
1. Name (64 bytes)
2. CSR_Offset (4 bytes)
(E.g. CSR_Offset == 0x2 imply CSR 0xC02)
(E.g. CSR_Offset == 0xffffffff means it is SBI implementation counter)
Maybe it would be more clear if we use counter_idx instead of CSR_Offset?

3. CSR_Width (4 bytes)
(Number of CSR bits implemented in HW)
4. Event bitmap (2048 bytes) (i.e. 1-bit for each possible event_idx)
(If bit corresponding to a event_idx is 1 then event_idx is supported
by the counter)
Is there more detail about the corresponding bit of events? for example,
the bit 0 corresponds to event_idx 0x0, and bit 10 corresponds to
event_idx 0x1000.

5. Anything else ??
3. SBI_PMU_COUNTER_SET_PHYS_ADDR
This call takes two parameters: 1) counter_idx 2) physical address
It will set the physical address where SBI implementation will write
the software counter. This SBI call is only for counters not mapped
to any CSR (i.e. only for counters with CSR_Offset == 0xffffffff).
4. SBI_PMU_COUNTER_START
This call takes two parameters: 1) counter_idx 2) event_idx
It will inform SBI implementation to configure and start/enable specified
counter on the calling HART to monitor specific event. This SBI call will
fail for counters which are not present.
Just want to make sure whether SBI_PMU_COUNTER_START is for
hardware and software counters? if so, we should define the event_idx
for software counters as I mentioned above.

5. SBI_PMU_COUNTER_STOP
This call takes one parameter: 1) counter_idx
It will inform SBI implementation to stop/disable specified counters on the
calling HART. This SBI call will fail for counters which are not present.

From above, the RISC-V PMU driver will use most of the SBI calls at boot time.
Only SBI_PMU_COUNTER_START to be used once before using the counter. The reading
the counter is by reading CSR (for CSR_Offset != 0xffffffff) OR by reading
memory location (for CSR_Offset == 0xffffffff). The counter overflow handling
will have to be done in software by Linux kernel.

The information returned by SBI_PMU_NUM_COUNTERS and SBI_PMU_COUNTER_DESCRIBE
can be passed via DT/ACPI but it will be difficult to maintain because we
have hardware counters and SBI implementation counters both provided by SBI
PMU extension. The SBI implementation counters are specific to underlying
SBI implementation so we will have to keep counters/events described in
DT/ACPI in-sync with underlying SBI implementation.
I have a proposal for DT format of PMU, it seems to me that we need to add some
information for software counters, such as the number of software counters and
it's events, are there any ideas?


Regards,
Anup



Re: Proposal: SBI PMU Extension

Greg Favor
 

Anup,

This is great to see - as part of standardizing how RISC-V HPM counters are configured and controlled by softare

I have a modest but important request:  Increase the size of the event_idx 'code' field from event_idx[11:0] to event_idx[15:0].  This is for two reasons:

- As with the size of the event_idx 'type' field, this allows a good amount of space for future growth, especially as more architecture extensions come along and motivate having additional events (starting with the vector, hypervisor, and bitmanip extensions).

- This allows space and flexibility for things like having "structured" events - meaning events with event-specific filter bits.  This would be applicable when event_idx.type==2 (aka RAW type events).  In our implementation, for example, part of the 'code' field would specify a particular type of event and another part of the 'code' field would specify filter bits to provide the flexibility in only counting selected sub-categories of that type of event.

Secondly, this proposal seems to only provide event_idx as information to be written into an hpmevent CSR (in the case of hardware counters)?  It would be desirable to have another parameter (e.g. event_info) that can be passed through this API to the eventual hpmevent CSR write.  One could imagine event_idx and event_info being concatenated to create what is written into a 32b or 64-bit hpmeventX CSR.

For example, in RV64, this could result in writing the 64-bit value {event_info[43:0], event_idx[19:0]} into hpmeventX  (assuming the above increase in event_idx size).  This provides a standard way for software to configure an entire hpmevent CSR.

Greg


On Wed, Jul 1, 2020 at 5:26 AM Anup Patel <anup.patel@...> wrote:
Hi All,

We don't have a dedicated RISC-V PMU extension but we do have HW performance
counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER CSRs. A RISC-V
CPU can allow monitoring HW events using few HPMCOUNTER CSRs. The M-mode
software can also inhibit unused performance counters to save energy.

In addition to HW performance counters, a SBI implementation (e.g. OpenSBI,
Xvisor, KVM, etc) can provide software counters for interesting events
such as number of RFENCEs, number of IPIs, number of misaligned load/store
instructions, number of illegal instructions, etc.

We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET CSR,
HPMCOUNTER CSRs and software counters of the SBI implementation.

To define SBI PMU extension, we first define counter_idx which is a unique
number assigned to a counter and event_idx which is an encoded number
representing event to be monitored.

The SBI PMU event_idx is 15bit number encoded as follows:
event_idx[14:12] = type
event_idx[11:0] = code

If event_idx.type == 0 then it is HARDWARE event and event_idx.code can
be one of the following:
enum sbi_pmu_hw_id {
    /*
     * Common hardware events, generalized by the kernel:
     */
    PERF_COUNT_HW_CPU_CYCLES        = 0,
    PERF_COUNT_HW_INSTRUCTIONS        = 1,
    PERF_COUNT_HW_CACHE_REFERENCES        = 2,
    PERF_COUNT_HW_CACHE_MISSES        = 3,
    PERF_COUNT_HW_BRANCH_INSTRUCTIONS    = 4,
    PERF_COUNT_HW_BRANCH_MISSES        = 5,
    PERF_COUNT_HW_BUS_CYCLES        = 6,
    PERF_COUNT_HW_STALLED_CYCLES_FRONTEND    = 7,
    PERF_COUNT_HW_STALLED_CYCLES_BACKEND    = 8,
    PERF_COUNT_HW_REF_CPU_CYCLES        = 9,

    PERF_COUNT_HW_MAX,            /* non-ABI */
};
(NOTE: Same as described in <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 1 then it is CACHE event and event_idx.code is encoded
as follows:
event_idx.code[11:4] = cache_id
event_idx.code[3:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
    PERF_COUNT_HW_CACHE_L1D            = 0,
    PERF_COUNT_HW_CACHE_L1I            = 1,
    PERF_COUNT_HW_CACHE_LL            = 2,
    PERF_COUNT_HW_CACHE_DTLB        = 3,
    PERF_COUNT_HW_CACHE_ITLB        = 4,
    PERF_COUNT_HW_CACHE_BPU            = 5,
    PERF_COUNT_HW_CACHE_NODE        = 6,

    PERF_COUNT_HW_CACHE_MAX,        /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
    PERF_COUNT_HW_CACHE_OP_READ        = 0,
    PERF_COUNT_HW_CACHE_OP_WRITE        = 1,
    PERF_COUNT_HW_CACHE_OP_PREFETCH        = 2,

    PERF_COUNT_HW_CACHE_OP_MAX,        /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
    PERF_COUNT_HW_CACHE_RESULT_ACCESS    = 0,
    PERF_COUNT_HW_CACHE_RESULT_MISS        = 1,

    PERF_COUNT_HW_CACHE_RESULT_MAX,        /* non-ABI */
};
(NOTE: Same as described in <linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 2 then it is RAW event and event_idx.code is just
a RAW event number.

In future, more event_idx can be defined without breaking ABI compatibility
of SBI calls.

Based on above definition of counter_idx definition, we can potentially have
the following SBI calls:

1. SBI_PMU_NUM_COUNTERS
   This call will return the number of COUNTERs
2. SBI_PMU_COUNTER_DESCRIBE
   This call takes two parameters: 1) counter_idx 2) physical address of 4k page
   It will write the description of SBI PMU counter at specified physical
   address. The details of the SBI PMU counter written at specified physical
   address are as follows:
   1. Name (64 bytes)
   2. CSR_Offset (4 bytes)
      (E.g. CSR_Offset == 0x2 imply CSR 0xC02)
      (E.g. CSR_Offset == 0xffffffff means it is SBI implementation counter)
   3. CSR_Width (4 bytes)
      (Number of CSR bits implemented in HW)
   4. Event bitmap (2048 bytes) (i.e. 1-bit for each possible event_idx)
      (If bit corresponding to a event_idx is 1 then event_idx is supported
       by the counter)
   5. Anything else ??
3. SBI_PMU_COUNTER_SET_PHYS_ADDR
   This call takes two parameters: 1) counter_idx 2) physical address
   It will set the physical address where SBI implementation will write
   the software counter. This SBI call is only for counters not mapped
   to any CSR (i.e. only for counters with CSR_Offset == 0xffffffff).
4. SBI_PMU_COUNTER_START
   This call takes two parameters: 1) counter_idx 2) event_idx
   It will inform SBI implementation to configure and start/enable specified
   counter on the calling HART to monitor specific event. This SBI call will
   fail for counters which are not present.
5. SBI_PMU_COUNTER_STOP
   This call takes one parameter: 1) counter_idx
   It will inform SBI implementation to stop/disable specified counters on the
   calling HART. This SBI call will fail for counters which are not present.

From above, the RISC-V PMU driver will use most of the SBI calls at boot time.
Only SBI_PMU_COUNTER_START to be used once before using the counter. The reading
the counter is by reading CSR (for CSR_Offset != 0xffffffff) OR by reading
memory location (for CSR_Offset == 0xffffffff). The counter overflow handling
will have to be done in software by Linux kernel.

The information returned by SBI_PMU_NUM_COUNTERS and SBI_PMU_COUNTER_DESCRIBE
can be passed via DT/ACPI but it will be difficult to maintain because we
have hardware counters and SBI implementation counters both provided by SBI
PMU extension. The SBI implementation counters are specific to underlying
SBI implementation so we will have to keep counters/events described in
DT/ACPI in-sync with underlying SBI implementation.

Regards,
Anup




Re: Proposal: SBI PMU Extension

Anup Patel
 

-----Original Message-----
From: tech-unixplatformspec@... <tech-
unixplatformspec@...> On Behalf Of Zong Li
Sent: 01 July 2020 20:32
To: Anup Patel <Anup.Patel@...>
Cc: tech-unixplatformspec@...; Andrew Waterman
<andrew@...>
Subject: Re: [RISC-V] [tech-unixplatformspec] Proposal: SBI PMU Extension

On Wed, Jul 1, 2020 at 8:26 PM Anup Patel <anup.patel@...> wrote:

Hi All,

We don't have a dedicated RISC-V PMU extension but we do have HW
performance counters such as CYCLE CSR, INSTRET CSR, and HPMCOUNTER
CSRs. A RISC-V CPU can allow monitoring HW events using few
HPMCOUNTER
CSRs. The M-mode software can also inhibit unused performance counters
to save energy.

Do we need the SBI calls to set the mcounteren and mcountinhibit (optional
CSR)?
OR these two CSRs shouldn't be changed at runtime from s-moe?
The SBI_PMU_COUNTER_START call will set/clear appropriate bits in
MCOUNTEREN and MCOUNTINHIBIT CSRs. The SBI_PMU_COUNTER_STOP
will do the reverse for MCOUNTEREN and MCOUNTINHIBIT CSRs.

It is also possible that a RISC-V implementation has few HPMCOUNTER
CSRs but lot of HW events to be monitored. In this case, the RISC-V
implementation will have implementation specific CSR to select a
particular HW event to be monitored in HPMCOUNTER. The SBI implementation
(i.e. OpenSBI) will provide optional platform hooks which will be
called for SBI_PMU_COUNTER_START and SBI_PMU_COUNTER_STOP calls.



In addition to HW performance counters, a SBI implementation (e.g.
OpenSBI, Xvisor, KVM, etc) can provide software counters for
interesting events such as number of RFENCEs, number of IPIs, number
of misaligned load/store instructions, number of illegal instructions, etc.
I'm not sure whether I misunderstood the usage of software counter, I don't
see the event_idxs of these events for software counters, maybe we could
define the event_idxs for them in this proposal, for example, if
event_idx.type == 3, then it is SOFTWARE event, and event_idx.code is used
to specify the monitoring events such as number of RFENCEs, number of IPIs
and so on.
My bad, I forgot to event_idx for SBI implementation events. I will update
in next version. Thanks for catching.



We propose SBI PMU extension which tries to cover CYCLE CSR, INSTRET
CSR, HPMCOUNTER CSRs and software counters of the SBI
implementation.

To define SBI PMU extension, we first define counter_idx which is a
unique number assigned to a counter and event_idx which is an encoded
number representing event to be monitored.

The SBI PMU event_idx is 15bit number encoded as follows:
event_idx[14:12] = type
event_idx[11:0] = code

If event_idx.type == 0 then it is HARDWARE event and event_idx.code
can be one of the following:
enum sbi_pmu_hw_id {
/*
* Common hardware events, generalized by the kernel:
*/
PERF_COUNT_HW_CPU_CYCLES = 0,
PERF_COUNT_HW_INSTRUCTIONS = 1,
PERF_COUNT_HW_CACHE_REFERENCES = 2,
PERF_COUNT_HW_CACHE_MISSES = 3,
PERF_COUNT_HW_BRANCH_INSTRUCTIONS = 4,
PERF_COUNT_HW_BRANCH_MISSES = 5,
PERF_COUNT_HW_BUS_CYCLES = 6,
PERF_COUNT_HW_STALLED_CYCLES_FRONTEND = 7,
PERF_COUNT_HW_STALLED_CYCLES_BACKEND = 8,
PERF_COUNT_HW_REF_CPU_CYCLES = 9,

PERF_COUNT_HW_MAX, /* non-ABI */
};
(NOTE: Same as described in
<linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 1 then it is CACHE event and event_idx.code is
encoded as follows:
event_idx.code[11:4] = cache_id
event_idx.code[3:1] = op_id
event_idx.code[0:0] = result_id
enum sbi_pmu_hw_cache_id {
PERF_COUNT_HW_CACHE_L1D = 0,
PERF_COUNT_HW_CACHE_L1I = 1,
PERF_COUNT_HW_CACHE_LL = 2,
PERF_COUNT_HW_CACHE_DTLB = 3,
PERF_COUNT_HW_CACHE_ITLB = 4,
PERF_COUNT_HW_CACHE_BPU = 5,
PERF_COUNT_HW_CACHE_NODE = 6,

PERF_COUNT_HW_CACHE_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_id {
PERF_COUNT_HW_CACHE_OP_READ = 0,
PERF_COUNT_HW_CACHE_OP_WRITE = 1,
PERF_COUNT_HW_CACHE_OP_PREFETCH = 2,

PERF_COUNT_HW_CACHE_OP_MAX, /* non-ABI */
};
enum sbi_pmu_hw_cache_op_result_id {
PERF_COUNT_HW_CACHE_RESULT_ACCESS = 0,
PERF_COUNT_HW_CACHE_RESULT_MISS = 1,

PERF_COUNT_HW_CACHE_RESULT_MAX, /* non-ABI */
};
(NOTE: Same as described in
<linux_source>/include/uapi/linux/perf_event.h)

If event_idx.type == 2 then it is RAW event and event_idx.code is just
a RAW event number.

In future, more event_idx can be defined without breaking ABI
compatibility of SBI calls.

Based on above definition of counter_idx definition, we can
potentially have the following SBI calls:
I don't see the definition of counter_idx, but I remember that you had talked
about that in other places as follows:
1. counter_idx = 0 to 2 are for CYCLE, TIME, and INSTRET 2. counter_idx = 3 to
31 are for HPMCOUNTERs 3. counter_idx >= 32 are for software counters
Initially, I had tied counter_idx with CSR numbers but TIME CSR will need
to be handled as special case.

It's better to treat counter_idx as logical index of available counter this
also helps reduce SBI calls.



1. SBI_PMU_NUM_COUNTERS
This call will return the number of COUNTERs 2.
SBI_PMU_COUNTER_DESCRIBE
This call takes two parameters: 1) counter_idx 2) physical address of 4k
page
It will write the description of SBI PMU counter at specified physical
address. The details of the SBI PMU counter written at specified physical
address are as follows:
1. Name (64 bytes)
2. CSR_Offset (4 bytes)
(E.g. CSR_Offset == 0x2 imply CSR 0xC02)
(E.g. CSR_Offset == 0xffffffff means it is SBI implementation
counter)
Maybe it would be more clear if we use counter_idx instead of CSR_Offset?
See my previous comment.

I agree CSR_offset is little confusing. Let's have CSR_number instead
of CSR_offset. This way we can even use RISC-V implementation specific
CSR (i.e. non-HPMCOUNTER CSR) as counter. All counters with
CSR_number > 0xfff will be treated as SBI implementation counter.


3. CSR_Width (4 bytes)
(Number of CSR bits implemented in HW)
4. Event bitmap (2048 bytes) (i.e. 1-bit for each possible event_idx)
(If bit corresponding to a event_idx is 1 then event_idx is supported
by the counter)
Is there more detail about the corresponding bit of events? for example, the
bit 0 corresponds to event_idx 0x0, and bit 10 corresponds to event_idx
0x1000.
It's a bitmap representing all possible event_idx values. If bit X is set then
it means event_idx = X can be monitored by this counter.


5. Anything else ??
3. SBI_PMU_COUNTER_SET_PHYS_ADDR
This call takes two parameters: 1) counter_idx 2) physical address
It will set the physical address where SBI implementation will write
the software counter. This SBI call is only for counters not mapped
to any CSR (i.e. only for counters with CSR_Offset == 0xffffffff).
4. SBI_PMU_COUNTER_START
This call takes two parameters: 1) counter_idx 2) event_idx
It will inform SBI implementation to configure and start/enable specified
counter on the calling HART to monitor specific event. This SBI call will
fail for counters which are not present.
Just want to make sure whether SBI_PMU_COUNTER_START is for hardware
and software counters? if so, we should define the event_idx for software
counters as I mentioned above.
Yes, it's for both HW and SW counters.

For HW counters (i.e. HPMCOUTNER CSRs), the SBI_PMU_COUNTER_START call will:
1. It will enable access to CSR using MCOUNTEREN CSR (or HCOUNTEREN CSR for hypervisor)
2. It will disable inhibit using MCOUNTINHIT CSR
3. Do any platform specific event selection for the specified HW counter


5. SBI_PMU_COUNTER_STOP
This call takes one parameter: 1) counter_idx
It will inform SBI implementation to stop/disable specified counters on
the
calling HART. This SBI call will fail for counters which are not present.

From above, the RISC-V PMU driver will use most of the SBI calls at boot
time.
Only SBI_PMU_COUNTER_START to be used once before using the
counter.
The reading the counter is by reading CSR (for CSR_Offset !=
0xffffffff) OR by reading memory location (for CSR_Offset ==
0xffffffff). The counter overflow handling will have to be done in software
by Linux kernel.

The information returned by SBI_PMU_NUM_COUNTERS and
SBI_PMU_COUNTER_DESCRIBE can be passed via DT/ACPI but it will be
difficult to maintain because we have hardware counters and SBI
implementation counters both provided by SBI PMU extension. The SBI
implementation counters are specific to underlying SBI implementation
so we will have to keep counters/events described in DT/ACPI in-sync with
underlying SBI implementation.

I have a proposal for DT format of PMU, it seems to me that we need to add
some information for software counters, such as the number of software
counters and it's events, are there any ideas?
With the SBI_PMU_COUNTER_DESCRIBE call, we don't need to pass this
information in DT.

I think the DT format you had proposed seems to have following limitations:
1. It maps each counter to a particular HW event. In reality, we will have few
HW counters and lots of HW events and a HW counter can be configured to
monitor a particular event from a set of HW events. In other words, relation
between HW counter and HW event is one-to-many and not one-to-one.
2. It does not deal with implementation specific HW events.

Both above limitations, have been taken care in SBI_PMU_COUNTER_DESCRIBE
call.

Regards,
Anup

81 - 100 of 1846