Re: Quality of Service (QoS)
toggle quoted message Show quoted text
We have a RAS committee on the org and approved by the BOD but has not been formed and QOS is one part of what it was intended to look at (as part of availability).
I wonder if we can't use this as an opportunity to initiate this committee. Once it has strategy,gaps, and priorities (through itself for a SIG), the idea is the committee work with Priv to create a TG.
We would need an acting committee chair to drive this. Policy here.
On Wed, Nov 10, 2021 at 6:11 AM Vedvyas Shanbhogue <ved@...> wrote:
I would like to start a discussion on supporting QoS capabilities in RISC-V architecture. I hope I am posting on the right list/TG/HC.
First, a short background:
Quality of Service (QoS) is the minimal end-to-end performance that is guaranteed in advance by a service level agreement (SLA) to an application. The performance may be measured in the form of metrics such as instructions per cycle (IPC), latency of servicing work, etc.
Various factors such as the available cache capacity, memory bandwidth, interconnect bandwidth, CPU cycles, system memory, etc. affect the performance in a computing system that runs multiple applications concurrently. Further when there is arbitration required for shared resources, the prioritization of the applications requests against other competing requests may also affect the performance of the application.
When multiple applications are running concurrently on modern processors with large core counts, multiple cache hierarchies, and multiple memory controllers, the performance of an application becomes less deterministic or even non-deterministic as the performance depends on the behavior of all the other applications in the machine that contend for the shared resources leading to interference. In many deployment scenarios such as public cloud servers the application owner may not be in control of the type and placement of other applications in the platform.
A typical use model involves profiling the resource usage of the application to meet desired performance goals and to establish resource allocations/limits for the application to acheive those goals.
System software can control some of these resources available to the application such as the number of hardware threads made available for execution, the amount of system memory allocated to the applications, the number of CPU cycles provided for execution, etc. but presently lacks the capabilities to control interference to an application and thereby reduce variability in performance experienced by an application due to other applications use of capacity, memory bandwidth, interconnect bandwidth, etc.
Some thoughts on supporting such capability:
1. To provide differentiated services in the platform a CSR may be provided to associate an identifier with a application (e.g. process, VM, container, etc). This identifier is then associated with requests to access to the shared resources such as caches, interconnects, memory, etc.
2. Configuration registers and counters are needed in resource controllers e.g. memory, cache, interconnect, etc. to setup resource allocations and monitor resource usage. The controllers may use the identifiers associated with requests to enforce the configured resource allocations and/or monitor the resource consumption.
Please share your comments and feedback. If there is WIP already please point me to that.