diff options
| author | Marc Zyngier <marc.zyngier@arm.com> | 2013-10-08 18:38:13 +0100 | 
|---|---|---|
| committer | Christoffer Dall <christoffer.dall@linaro.org> | 2013-10-15 18:02:05 -0700 | 
| commit | 1f5580986a3667e9d67b65d916bb4249fd86a400 (patch) | |
| tree | b1f2090427f597bcf768ba7170e80bd4f0b50465 /drivers/gpu/drm/omapdrm/omap_gem.c | |
| parent | a7265fb1751ffbfad553afc7f592a6dac6be48de (diff) | |
ARM: KVM: Yield CPU when vcpu executes a WFE
On an (even slightly) oversubscribed system, spinlocks are quickly
becoming a bottleneck, as some vcpus are spinning, waiting for a
lock to be released, while the vcpu holding the lock may not be
running at all.
This creates contention, and the observed slowdown is 40x for
hackbench. No, this isn't a typo.
The solution is to trap blocking WFEs and tell KVM that we're
now spinning. This ensures that other vpus will get a scheduling
boost, allowing the lock to be released more quickly. Also, using
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
when the VM is severely overcommited.
Quick test to estimate the performance: hackbench 1 process 1000
2xA15 host (baseline):	1.843s
2xA15 guest w/o patch:	2.083s
4xA15 guest w/o patch:	80.212s
8xA15 guest w/o patch:	Could not be bothered to find out
2xA15 guest w/ patch:	2.102s
4xA15 guest w/ patch:	3.205s
8xA15 guest w/ patch:	6.887s
So we go from a 40x degradation to 1.5x in the 2x overcommit case,
which is vaguely more acceptable.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Diffstat (limited to 'drivers/gpu/drm/omapdrm/omap_gem.c')
0 files changed, 0 insertions, 0 deletions
