Re: [RFC PATCH 1/1] server extension: PCIe requirements


Josh Scheid
 

On Wed, Jun 9, 2021 at 11:27 AM Mayuresh Chitale <mchitale@...> wrote:
This patch adds requirements for PCIe support for the server extension

Signed-off-by: Mayuresh Chitale <mchitale@...>

Signed-off-by: Mayuresh Chitale <mchitale@...>
---
 riscv-platform-spec.adoc | 133 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 132 insertions(+), 1 deletion(-)

diff --git a/riscv-platform-spec.adoc b/riscv-platform-spec.adoc
index 4418788..9de487e 100644
--- a/riscv-platform-spec.adoc
+++ b/riscv-platform-spec.adoc
@@ -363,7 +363,138 @@ https://lists.riscv.org/g/tech-privileged/message/404[Sstc] extension.
 ** Platforms are required to delegate the supervisor timer interrupt to 'S'
 mode. If the 'H' extension is implemented then the platforms are required to
 delegate the virtual supervisor timer interrupt to 'VS' mode.

Is this an M-mode SW requirement or a HW requirement that these interrupts are delegatable (writeable) in HW?

Why require the delegation by M-mode instead of allowing for M-mode to trap and pass down?  Is this just a performance benefit?

-* PCI-E
+
+===== PCIe
+Platforms are required to support PCIe
+footnote:[https://pcisig.com/specifications].Following are the requirements:

Any particular baseline PCIe version and/or extensions?

+
+====== PCIe Config Space
+* Platforms shall support access to the PCIe config space via ECAM as described
+in the PCI Express Base specification.
+* The entire config space for a single PCIe domain should be accessible via a
+single ECAM I/O region.
+* Platform firmware should implement the MCFG table to allow the operating
+systems to discover the supported PCIe domains and map the ECAM I/O region for
+each domain.
+* ECAM I/O regions shall be configured as channel 0 I/O regions.
+
+====== PCIe Memory Space
+* PCIe Outbound region +
+Platforms are required to provide atleast two I/O regions for mapping the
+memory requested by PCIe endpoints and PCIe bridges/switches through BARs.
+The first I/O region is required to be located below 4G physical address to
+map the memory requested by non-prefetchabe BARs. This region shall be
+configured as channel 0 I/O region. The second I/O region is required to be
+located above 4G physical address to map the memory requested by prefetchable
+BARs.

Is there any guidance needed about the amount of total space available (below 4G), or that space needs to be allocated for each domain?

I think that this is only necessary in the platform because of the current lack of an IOMMU requirement or standard.  With an IOMMU, that component can be used to locate 32-bit BARS anywhere in the system address space.  So at least keep in mind the requirement can be dropped at that time.
 
This region may be configured as I/O region or as memory region.

Is an SBI call needed to support S-mode configuration?  What is the default expected to be if there is no SBI call or no call is made?

IIRC, some older versions of some HCI standards (USB, SATA?) only had device support for 32-bit addresses.  I mention this to check if the requirement is really just that non-prefetchable BARs need to be supported <4GB, or that it's also needed for other 32-bit BAR support.  Thus it may need to support prefetchable BARs located <4GB.
 
+
+* PCIe Inbound region +
+For security reasons, platforms are required to provide a mechanism to
+restrict the inbound accesses over PCIe to certain specific regions in
+the address space such as the DRAM.

While a standard IOMMU is further off, is the current opinion that the IOPMP is not in a position to be required or suggested as an implementation of the above requirement?  If not, then it's hard to check for compliance.

Is this mechanism expected to be M-mode SW controlled, or is it also expected to be controlled by S-mode (either directly or via SBI)?

+
+====== PCIe Interrupts
+* Platforms shall support both INTx and MSI/MSI-x interrupts.
+* Integration with AIA +
+TBD

While TBD, one question interesting to me is whether or not it matters if the PCI RC implements it's own INTx to MSI bridge, or if an AIA APLIC is required for that.

+
+====== PCIe I/O coherency
+Following are the requirements:
+
+* Platforms shall provide a mechanism to control the `NoSnoop` bit for any
+outbound TLP.

Is it implicit here if this mechanism is provided to M-mode SW only, or also to S-mode?

+* If the host bridge/root port receives a TLP which does not have `NoSnoop` bit
+set then hardware shall generate a snoop request.
+* If the host bridge/root port receives a TLP which has `NoSnoop` set then no
+hardware coherency is required. Software coherency may be required via CMOs.

I read this as primarily stating that inbound NoSnoop controls the "coherent" access attribute.  But why this instead of focusing on control of the "cacheable" vs "non-cacheable" attribute?  With the latter, it seems more apparent how harts would then manages coherency: by controlling accesses to use the same "cacheable" attribute.

+
+====== PCIe Topology
+Platforms are required to implement atleast one of the following topologies and
+the components required in that topology.
+
+[ditaa]
+....
+
+            +----------+                             +----------+
+            |   CPU    |                             |   CPU    |
+            |          |                             |          |
+            +-----|----+                             +-----|----+
+                  |                                        |
+                  |                                        |
+    +-------------|------------+             +-------------|------------+
+    |        ROOT | COMPLEX    |             |        ROOT | COMPLEX    |
+    |                          |             |                          |
+    |      +------|-------+    |             |      +------|-------+    |
+    |      |  Host Bridge |    |             |      |  Host Bridge |    |
+    |      +------|-------+    |             |      +------|-------+    |
+    |             |            |             |             |            |
+    |             | BUS 0      |             |             | BUS 0      |
+    |     |-------|------|     |             |       +-----|-------+    |
+    |     |              |     |             |       | ROOT  PORT  |    |
+    |     |              |     |             |       +-----|-------+    |
+    | +---|---+      +---|---+ |             |             |            |
+    | | RCEIP |      | RCEC  | |             |             | PCIe Link  |
+    | +-------+      +-------+ |             |             |            |
+    |                          |             +-------------|------------+
+    +--------------------------+                           |
+                                                           |  BUS 1
+    RCEIP - Root complex integrated endpoint
+    RCEC - Root complex event collector
+....


Have we considered the option of requiring EPs to be behind virtual integrated RPs, instead of being RCiEPs?  This seems to bypass some of the unique limitations of RCiEPs, including the RCEC.

Do we need to ban or allow for impl-spec address mapping capabilities between PCI and system addresses?

Do we need say anything about peer-to-peer support, or requirements if a system enables it?  Including ACS?

Should the system mtimer counter also be the source for PCIe PTP?

-Josh

Join {tech-unixplatformspec@lists.riscv.org to automatically receive all group messages.