I understand your concern. In fact, the goal is not to provide more than one vstimecmp CSR.
You are right that when a VM is context switched out, it's vstimecmp CSR is also switched out. However, there exists time when the VM, or a vCPU, is not executing, however, its CSR context is still 'bound' to a hart. In other words, the vs- CSRs still contain values for that vCPU (including vstimecmp), however, the hart is executing hypervisor code. For example, when the hypervisor is handling certain exception caused by the VM such as a guest page fault, there is no need to switch the vCPU context out.
During this time, since vstimecmp still contains the value for that vCPU, it can fire interrupts. When this interrupt is received by the hart, then obviously it should be the hypervisor that handles it, thus the VGTI. Without VGTI, this interrupt will be delayed until the hypervisor returns to the vCPU. This delay can be long because the hypervisor may not immediately return to the vCPU, i.e. it may well decide that the vCPU needs to yield and schedule something else. Of course, the hypervisor can check at strategic points the value in vstimecmp and make proper decision, but that makes software more complex because those checks might be tricky to be inserted.
So VGTI is purely for the prompt handling of the vstimecmp interrupt when the vCPU is still bound to a hart but not executing.
Lastly, when the hypervisor finally decides to switch a vCPU out, then vstimecmp gets saved and hypervisor uses its own timer to track the timer set by that vCPU, which goes to stimecmp. In this way the only vstimecmp is multiplexed among vCPUs, there's no need for more than one vstimecmp.