Re: Quality of Service (QoS)

Jonathan Behrens <behrensj@...>

I only skimmed some of the proposal, but one thing I noticed is that there doesn't seem to be much limit over who can set the current RCID and MCID. In particular, with the H-extension it looks like a  guest operating system can freely set its own IDs. That would for instance mean that a cloud provider that ran multiple customer VMs couldn't use this to monitor or limit resource usage of individual VMs.


On Thu, Dec 16, 2021 at 3:59 PM Vedvyas Shanbhogue via <> wrote:
Greetings All!

So finally collected these thoughts and put them into a document:
The document also has some thoughts on a configuration interface for
cache and bandwidth controllers.
Look forward to feedback and comments.


On Fri, Dec 3, 2021 at 8:39 AM Ved Shanbhogue <ved@...> wrote:
> Continuing this thread with some more thoughts included.
> regards
> ved
> On Fri, Nov 12, 2021 at 08:19:01AM -0600, Vedvyas Shanbhogue via wrote:
> >Presently ASID is defined to be private to a hart. This was clarified in version 1.11 of the privileged specification but there was commentary added about possibility of a future global-ASID. However, for QoS purposes the ASID may not lend itself as well as an identifier. The system may want to group multiple applications/virtual-machines/containers into a resource control group. Further the ASID does not help differentiate between code execution vs. data access. One way that could have been addressed is to carry a code/data indicator along with the request but that may create some inefficiencies sicne in the resource controllers now there will be two sets of controls/counters per ID (one for code and other for data), but when differentiated service for code vs. data is not required it may lead to the per-ID code counters/controls to be not used. To support grouping a lookup table may be employed in hardware to group multiple ASIDs together but it increases hardware complexity especially for high speed implementations to have a lookup table accessed on each request. So we may want to keep the hardware simpler and let the grouping be done by software.
> >
> >So to support QoS we may want to provide a mechanism by which an application can be associated with a resource control ID (RCID) and a monitoring counter ID (MCID) that accompany each request made by the application. We would also want a mechanism to associate these IDs  with request made by a device on behalf of the application. Here the term application is used generically to refer to a process or a VM or a container or other abstractions employed by the system for resource control.
> >
> >An application would be associated with one RCID and one MCID that
> >accompany its requests for data accesses and a potentially diffferent
> >RCID and MCID that accompany its requests for code accesses. Data
> >accesses include requests generated by load and store instructions as
> >well as the implicit loads and stores to the first-stage and
> >second-stage page tables. Where differentiated QoS for code vs. data
> >is not required, the code and data RCID and MCID may be programmed to
> >be the same.
> >
> >A group of applications may be associated with the same RCID and one or more of these applications may be associated with a unique MCID for code and/or data. This allows measuring the resource consumption of a subset of applications that share a RCID to determine if the resource partitioning is optimal and to make adjustments as needed.
> >
> >The RCID and MCID would want to have a global scope across all caches, interconnect, and memory controllers that a request may access. To support maximum flexibility, the RCID and MCID may be defined to be up to 16-bits wide but could be limited to more reasonable numbers by an implementation e.g. 64 or 128 resource control IDs.
> >
> >These IDs may thus be programmed into a set of CSRs (one each for M/S/VS mode) where each CSR is 64 bit wide holding the RCID and MCID for code and data accesses respectively. For device initiated accesses these IDs could be programmed into the IOMMU such that the IOMMU. Other implementations may support directly configuring these IDs into the devices themself.
> >
> Quality of service enforcement in caches:
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Caches that support the QoS extension allow the cache capacity to be allocated to applications and provide mechanisms to monitor the cache usage by the applications. The granularity of allocation is 1/MaxCacheBlocks where MaxCacheBlocks is a property of the cache controller. A cache that supports this extension, defines the number of blocks supported by the cache. A cache block mask may then be configured in the cache controller, for each supported RCID, where each bit of the mask corresponds to a cache block. All cache lookups scan the entire cache to determine if the requested line is present. If the requested cache line is not found then a cache line may be allocated from the set of cache blocks selected by the RCID. If allocating a line requires an eviction of a previously allocated cache line then the eviction candidate is obtained from the set of cache blocks selected by the RCID.
> The cache controller implements a monitoring counter per RCID and the counter can be programmed with a monitoring event ID that selects an event to count for requests with matching RCID. One such event ID would be to count the number of cache lines allocated and resident in the cache by requests with the matching RCID.  Some events counted by the cache controller may not be precise but are expected to be statistically accurate over a reasonable monitoring period.  When a monitoring counter is enabled, the count held in the counters may not be accurate till an implementation-defined number of requests have been observed by the cache controller. The controller provides a validity indiation to indicate when the count is valid.
> Quality of service enforcement in interconnects and memory controllers:
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> The interconnect and memory controller capacity i.e. bandwidth allocation enables restricting the bandwidth consumed by an application to a programmed limit.  The bandwidth allocation is represented as a ratio of the maximum available bandwidth.  The granularity of allocation is 1/MaxBWBlocks where MaxBWBlocks is a power of 10 with the smallest value of 100 (e.g., 100, 1000, or 10000). The MaxBWBlocks is a property of the interconnect or memory bandwidth controller. Allocating bandwidth to an RCID involves configuring:
> - A guaranteed bandwidth - Gbw
> - A maximum bandwidth - Mbw
> - Priority  - Mprio - high, medium, or low
> The Gbw is the minimum bandwidth in units of bandwidth blocks that is reserved for the RCID and must be at least one. The sum of Gbw across all RCID must not exceed MaxGBWBlocks. The MaxGBWBlocks is a property of the interconnect or memory controller. In some implementations it may be the same as MaxBWBlocks. Other implementations may limit to a fraction (e.g. 90%) of MaxBWBlocks. The Mbw is the maximum bandwidth in units of bandwidth blocks that the RCID may consume. If Mbw is <= Gbw then Mbw does not constrain the bandwidth usage. If Mbw is > Gbw the bandwidth beyond Gbw is not guaranteed and actual bandwidth available may depend on the priority - Mprio - of the RCID that contend for the non-guaranteed bandwidth.  To enforce these limits, the controller needs to meter the bandwidth. The bandwidth metering involves counting bytes transferred (B), in both directions, over a time interval (T) to determine the bandwidth B / T.
> The physical manifestations of such meters would be outside the scope of this specification. Implementation may use discrete time intervals to count bytes such that no history is preserved from one time interval to the next. In such implementations, the counter B is reset at the start of each time interval.  Other implementations may use a sliding time interval where in the start of the time interval advances at an uniform rate. In such a sliding time interval scheme, the counter B increments on each request and decreases by the number of bytes of older requests that are no longer in the time interval. Such a scheme may require carrying a history of requests received in any interval T.
> If there is contention for bandwidth then requests from RCID that have not consumed their Gbw have priority irrespective of the Mprio configured for the RCID. Requesters that have consumed their Gbw contend with other requesters for the best effort available bandwidth till they have consumed Mbw. The contention for the non-guaranteed bandwidth is resolved using Mprio. The proportion of excess bandwidth that may be allocated to each Mprio class is configurable in the form of a configurable weight associated with each priority level.
> The bandwidth controllers implement a monitoring counter for each MCID. The bandwidth monitoring counter reports the bytes that go past the monitoring point in the bandwidth controller.  The bandwidth controller provides a mechanism to obtain a snapshot of the counter value and a timestamp at which the snapshot was taken. The timestamp shall be based on a timer that increments at the same rate as the clock used to provide timestamp on reading time CSR. By computing the difference between the byte counter values from two snapshots separated in time and by computing the difference between the timestamp of the two snapshots the bandwidth consumed by the MCID in that interval can be determined. Each counter can be programmed with a monitoring event ID such as “local read bandwidth”, “local write bandwidth”, “local read and write bandwidth”,  “remote read bandwidth”, “remote write bandwidth”, “remote read and write bandwidth”, “total read bandwidth”, “total write bandwidth”, or “total read and write bandwidth” to select the event to count. When the event ID selects read bandwidth, the counter increments by the number of bytes transferred in response to a read request. When the event ID selects write bandwidth, the counter increments by the number of bytes transferred by a write request. The distinction of local vs. remote exists for non-uniform memory architectures where local bandwidth is the bandwidth consumed by the MCID when it accesses resources in its NUMA domain and remote bandwidth is bandwidth consumed accessing resources outside NUMA domain. The distinction of local vs. remote may not exist in some bandwidth controllers and such controllers may only support monitoring of total read and/or write bandwidth.
> Configuration interface
> ~~~~~~~~~~~~~~~~~~~~~~~
> The configuration interface may be through a set of memory-mapped registers in each cache, interconnect, and memory controller.
> A cache controller would provide registers for:
> - Configuring the cache block allocations for an RCID
> - COnfiguring a monitoring event for an RMID
> - Registers to read the monitoring counters
> A bandwidth controller would provide registers for:
> - Configuring minimum b/w, guaranteed b/w and priority for an RCID
> - Configuring a monitoring event for a RMID
> - Registers to read the monitoring counters

Join { to automatically receive all group messages.