From aae760ed21cd690fe8a6db9f3a177ad55d7e12ab Mon Sep 17 00:00:00 2001 From: "Srivatsa S. Bhat" Date: Fri, 12 Jul 2013 03:45:37 +0530 Subject: cpufreq: Revert commit a66b2e to fix suspend/resume regression commit a66b2e (cpufreq: Preserve sysfs files across suspend/resume) has unfortunately caused several things in the cpufreq subsystem to break subtly after a suspend/resume cycle. The intention of that patch was to retain the file permissions of the cpufreq related sysfs files across suspend/resume. To achieve that, the commit completely removed the calls to cpufreq_add_dev() and __cpufreq_remove_dev() during suspend/resume transitions. But the problem is that those functions do 2 kinds of things: 1. Low-level initialization/tear-down that are critical to the correct functioning of cpufreq-core. 2. Kobject and sysfs related initialization/teardown. Ideally we should have reorganized the code to cleanly separate these two responsibilities, and skipped only the sysfs related parts during suspend/resume. Since we skipped the entire callbacks instead (which also included some CPU and cpufreq-specific critical components), cpufreq subsystem started behaving erratically after suspend/resume. So revert the commit to fix the regression. We'll revisit and address the original goal of that commit separately, since it involves quite a bit of careful code reorganization and appears to be non-trivial. (While reverting the commit, note that another commit f51e1eb (cpufreq: Fix cpufreq regression after suspend/resume) already reverted part of the original set of changes. So revert only the remaining ones). Signed-off-by: Srivatsa S. Bhat Acked-by: Viresh Kumar Tested-by: Paul Bolle Cc: 3.10+ Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cpufreq.c | 4 +++- drivers/cpufreq/cpufreq_stats.c | 6 ++---- 2 files changed, 5 insertions(+), 5 deletions(-) (limited to 'drivers/cpufreq') diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 0937b8d6c2a..7dcfa685483 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1942,13 +1942,15 @@ static int __cpuinit cpufreq_cpu_callback(struct notifier_block *nfb, if (dev) { switch (action) { case CPU_ONLINE: + case CPU_ONLINE_FROZEN: cpufreq_add_dev(dev, NULL); break; case CPU_DOWN_PREPARE: - case CPU_UP_CANCELED_FROZEN: + case CPU_DOWN_PREPARE_FROZEN: __cpufreq_remove_dev(dev, NULL); break; case CPU_DOWN_FAILED: + case CPU_DOWN_FAILED_FROZEN: cpufreq_add_dev(dev, NULL); break; } diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index cd9e81713a7..12225d19ffc 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -353,13 +353,11 @@ static int __cpuinit cpufreq_stat_cpu_callback(struct notifier_block *nfb, cpufreq_update_policy(cpu); break; case CPU_DOWN_PREPARE: + case CPU_DOWN_PREPARE_FROZEN: cpufreq_stats_free_sysfs(cpu); break; case CPU_DEAD: - cpufreq_stats_free_table(cpu); - break; - case CPU_UP_CANCELED_FROZEN: - cpufreq_stats_free_sysfs(cpu); + case CPU_DEAD_FROZEN: cpufreq_stats_free_table(cpu); break; } -- cgit v1.2.3-18-g5258 From 4a6c41083df3bf4ad828c45e5f8ee1a224d537c6 Mon Sep 17 00:00:00 2001 From: Paul Bolle Date: Sun, 14 Jul 2013 20:29:56 +0200 Subject: cpufreq: s3c24xx: rename CONFIG_CPU_FREQ_S3C24XX_DEBUGFS The Kconfig symbol CPU_FREQ_S3C24XX_DEBUGFS was renamed to ARM_S3C24XX_CPUFREQ_DEBUGFS in commit f023f8dd59 ("cpufreq: s3c24xx: move cpufreq driver to drivers/cpufreq"). But that commit missed one instance of its macro CONFIG_CPU_FREQ_S3C24XX_DEBUGFS. Rename it too. Signed-off-by: Paul Bolle Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/s3c24xx-cpufreq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'drivers/cpufreq') diff --git a/drivers/cpufreq/s3c24xx-cpufreq.c b/drivers/cpufreq/s3c24xx-cpufreq.c index 3513e747716..87781eb20d6 100644 --- a/drivers/cpufreq/s3c24xx-cpufreq.c +++ b/drivers/cpufreq/s3c24xx-cpufreq.c @@ -49,7 +49,7 @@ static struct clk *clk_hclk; static struct clk *clk_pclk; static struct clk *clk_arm; -#ifdef CONFIG_CPU_FREQ_S3C24XX_DEBUGFS +#ifdef CONFIG_ARM_S3C24XX_CPUFREQ_DEBUGFS struct s3c_cpufreq_config *s3c_cpufreq_getconfig(void) { return &cpu_cur; @@ -59,7 +59,7 @@ struct s3c_iotimings *s3c_cpufreq_getiotimings(void) { return &s3c24xx_iotiming; } -#endif /* CONFIG_CPU_FREQ_S3C24XX_DEBUGFS */ +#endif /* CONFIG_ARM_S3C24XX_CPUFREQ_DEBUGFS */ static void s3c_cpufreq_getcur(struct s3c_cpufreq_config *cfg) { -- cgit v1.2.3-18-g5258 From e8d05276f236ee6435e78411f62be9714e0b9377 Mon Sep 17 00:00:00 2001 From: "Srivatsa S. Bhat" Date: Tue, 16 Jul 2013 22:46:48 +0200 Subject: cpufreq: Revert commit 2f7021a8 to fix CPU hotplug regression commit 2f7021a8 "cpufreq: protect 'policy->cpus' from offlining during __gov_queue_work()" caused a regression in CPU hotplug, because it lead to a deadlock between cpufreq governor worker thread and the CPU hotplug writer task. Lockdep splat corresponding to this deadlock is shown below: [ 60.277396] ====================================================== [ 60.277400] [ INFO: possible circular locking dependency detected ] [ 60.277407] 3.10.0-rc7-dbg-01385-g241fd04-dirty #1744 Not tainted [ 60.277411] ------------------------------------------------------- [ 60.277417] bash/2225 is trying to acquire lock: [ 60.277422] ((&(&j_cdbs->work)->work)){+.+...}, at: [] flush_work+0x5/0x280 [ 60.277444] but task is already holding lock: [ 60.277449] (cpu_hotplug.lock){+.+.+.}, at: [] cpu_hotplug_begin+0x2b/0x60 [ 60.277465] which lock already depends on the new lock. [ 60.277472] the existing dependency chain (in reverse order) is: [ 60.277477] -> #2 (cpu_hotplug.lock){+.+.+.}: [ 60.277490] [] lock_acquire+0xa4/0x200 [ 60.277503] [] mutex_lock_nested+0x67/0x410 [ 60.277514] [] get_online_cpus+0x3c/0x60 [ 60.277522] [] gov_queue_work+0x2a/0xb0 [ 60.277532] [] cs_dbs_timer+0xc1/0xe0 [ 60.277543] [] process_one_work+0x1cd/0x6a0 [ 60.277552] [] worker_thread+0x121/0x3a0 [ 60.277560] [] kthread+0xdb/0xe0 [ 60.277569] [] ret_from_fork+0x7c/0xb0 [ 60.277580] -> #1 (&j_cdbs->timer_mutex){+.+...}: [ 60.277592] [] lock_acquire+0xa4/0x200 [ 60.277600] [] mutex_lock_nested+0x67/0x410 [ 60.277608] [] cs_dbs_timer+0x8d/0xe0 [ 60.277616] [] process_one_work+0x1cd/0x6a0 [ 60.277624] [] worker_thread+0x121/0x3a0 [ 60.277633] [] kthread+0xdb/0xe0 [ 60.277640] [] ret_from_fork+0x7c/0xb0 [ 60.277649] -> #0 ((&(&j_cdbs->work)->work)){+.+...}: [ 60.277661] [] __lock_acquire+0x1766/0x1d30 [ 60.277669] [] lock_acquire+0xa4/0x200 [ 60.277677] [] flush_work+0x3d/0x280 [ 60.277685] [] __cancel_work_timer+0x8a/0x120 [ 60.277693] [] cancel_delayed_work_sync+0x13/0x20 [ 60.277701] [] cpufreq_governor_dbs+0x529/0x6f0 [ 60.277709] [] cs_cpufreq_governor_dbs+0x17/0x20 [ 60.277719] [] __cpufreq_governor+0x48/0x100 [ 60.277728] [] __cpufreq_remove_dev.isra.14+0x80/0x3c0 [ 60.277737] [] cpufreq_cpu_callback+0x38/0x4c [ 60.277747] [] notifier_call_chain+0x5d/0x110 [ 60.277759] [] __raw_notifier_call_chain+0xe/0x10 [ 60.277768] [] _cpu_down+0x88/0x330 [ 60.277779] [] cpu_down+0x36/0x50 [ 60.277788] [] store_online+0x98/0xd0 [ 60.277796] [] dev_attr_store+0x18/0x30 [ 60.277806] [] sysfs_write_file+0xdb/0x150 [ 60.277818] [] vfs_write+0xbd/0x1f0 [ 60.277826] [] SyS_write+0x4c/0xa0 [ 60.277834] [] tracesys+0xd0/0xd5 [ 60.277842] other info that might help us debug this: [ 60.277848] Chain exists of: (&(&j_cdbs->work)->work) --> &j_cdbs->timer_mutex --> cpu_hotplug.lock [ 60.277864] Possible unsafe locking scenario: [ 60.277869] CPU0 CPU1 [ 60.277873] ---- ---- [ 60.277877] lock(cpu_hotplug.lock); [ 60.277885] lock(&j_cdbs->timer_mutex); [ 60.277892] lock(cpu_hotplug.lock); [ 60.277900] lock((&(&j_cdbs->work)->work)); [ 60.277907] *** DEADLOCK *** [ 60.277915] 6 locks held by bash/2225: [ 60.277919] #0: (sb_writers#6){.+.+.+}, at: [] vfs_write+0x1c3/0x1f0 [ 60.277937] #1: (&buffer->mutex){+.+.+.}, at: [] sysfs_write_file+0x3c/0x150 [ 60.277954] #2: (s_active#61){.+.+.+}, at: [] sysfs_write_file+0xc3/0x150 [ 60.277972] #3: (x86_cpu_hotplug_driver_mutex){+.+...}, at: [] cpu_hotplug_driver_lock+0x17/0x20 [ 60.277990] #4: (cpu_add_remove_lock){+.+.+.}, at: [] cpu_down+0x22/0x50 [ 60.278007] #5: (cpu_hotplug.lock){+.+.+.}, at: [] cpu_hotplug_begin+0x2b/0x60 [ 60.278023] stack backtrace: [ 60.278031] CPU: 3 PID: 2225 Comm: bash Not tainted 3.10.0-rc7-dbg-01385-g241fd04-dirty #1744 [ 60.278037] Hardware name: Acer Aspire 5741G /Aspire 5741G , BIOS V1.20 02/08/2011 [ 60.278042] ffffffff8204e110 ffff88014df6b9f8 ffffffff815b3d90 ffff88014df6ba38 [ 60.278055] ffffffff815b0a8d ffff880150ed3f60 ffff880150ed4770 3871c4002c8980b2 [ 60.278068] ffff880150ed4748 ffff880150ed4770 ffff880150ed3f60 ffff88014df6bb00 [ 60.278081] Call Trace: [ 60.278091] [] dump_stack+0x19/0x1b [ 60.278101] [] print_circular_bug+0x2b6/0x2c5 [ 60.278111] [] __lock_acquire+0x1766/0x1d30 [ 60.278123] [] ? __kernel_text_address+0x58/0x80 [ 60.278134] [] lock_acquire+0xa4/0x200 [ 60.278142] [] ? flush_work+0x5/0x280 [ 60.278151] [] flush_work+0x3d/0x280 [ 60.278159] [] ? flush_work+0x5/0x280 [ 60.278169] [] ? mark_held_locks+0x94/0x140 [ 60.278178] [] ? __cancel_work_timer+0x77/0x120 [ 60.278188] [] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 60.278196] [] __cancel_work_timer+0x8a/0x120 [ 60.278206] [] cancel_delayed_work_sync+0x13/0x20 [ 60.278214] [] cpufreq_governor_dbs+0x529/0x6f0 [ 60.278225] [] cs_cpufreq_governor_dbs+0x17/0x20 [ 60.278234] [] __cpufreq_governor+0x48/0x100 [ 60.278244] [] __cpufreq_remove_dev.isra.14+0x80/0x3c0 [ 60.278255] [] cpufreq_cpu_callback+0x38/0x4c [ 60.278265] [] notifier_call_chain+0x5d/0x110 [ 60.278275] [] __raw_notifier_call_chain+0xe/0x10 [ 60.278284] [] _cpu_down+0x88/0x330 [ 60.278292] [] ? cpu_hotplug_driver_lock+0x17/0x20 [ 60.278302] [] cpu_down+0x36/0x50 [ 60.278311] [] store_online+0x98/0xd0 [ 60.278320] [] dev_attr_store+0x18/0x30 [ 60.278329] [] sysfs_write_file+0xdb/0x150 [ 60.278337] [] vfs_write+0xbd/0x1f0 [ 60.278347] [] ? fget_light+0x320/0x4b0 [ 60.278355] [] SyS_write+0x4c/0xa0 [ 60.278364] [] tracesys+0xd0/0xd5 [ 60.280582] smpboot: CPU 1 is now offline The intention of that commit was to avoid warnings during CPU hotplug, which indicated that offline CPUs were getting IPIs from the cpufreq governor's work items. But the real root-cause of that problem was commit a66b2e5 (cpufreq: Preserve sysfs files across suspend/resume) because it totally skipped all the cpufreq callbacks during CPU hotplug in the suspend/resume path, and hence it never actually shut down the cpufreq governor's worker threads during CPU offline in the suspend/resume path. Reflecting back, the reason why we never suspected that commit as the root-cause earlier, was that the original issue was reported with just the halt command and nobody had brought in suspend/resume to the equation. The reason for _that_ in turn, as it turns out, is that earlier halt/shutdown was being done by disabling non-boot CPUs while tasks were frozen, just like suspend/resume.... but commit cf7df378a (reboot: migrate shutdown/reboot to boot cpu) which came somewhere along that very same time changed that logic: shutdown/halt no longer takes CPUs offline. Thus, the test-cases for reproducing the bug were vastly different and thus we went totally off the trail. Overall, it was one hell of a confusion with so many commits affecting each other and also affecting the symptoms of the problems in subtle ways. Finally, now since the original problematic commit (a66b2e5) has been completely reverted, revert this intermediate fix too (2f7021a8), to fix the CPU hotplug deadlock. Phew! Reported-by: Sergey Senozhatsky Reported-by: Bartlomiej Zolnierkiewicz Signed-off-by: Srivatsa S. Bhat Tested-by: Peter Wu Cc: 3.10+ Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cpufreq_governor.c | 3 --- 1 file changed, 3 deletions(-) (limited to 'drivers/cpufreq') diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 46458769756..7b839a8db2a 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -25,7 +25,6 @@ #include #include #include -#include #include "cpufreq_governor.h" @@ -137,10 +136,8 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, if (!all_cpus) { __gov_queue_work(smp_processor_id(), dbs_data, delay); } else { - get_online_cpus(); for_each_cpu(i, policy->cpus) __gov_queue_work(i, dbs_data, delay); - put_online_cpus(); } } EXPORT_SYMBOL_GPL(gov_queue_work); -- cgit v1.2.3-18-g5258