aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2009-08-02sched: Fully integrate cpus_active_map and root-domain codeGregory Haskins
Reflect "active" cpus in the rq->rd->online field, instead of the online_map. The motivation is that things that use the root-domain code (such as cpupri) only care about cpus classified as "active" anyway. By synchronizing the root-domain state with the active map, we allow several optimizations. For instance, we can remove an extra cpumask_and from the scheduler hotpath by utilizing rq->rd->online (since it is now a cached version of cpu_active_map & rq->rd->span). Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090730145723.25226.24493.stgit@dev.haskins.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02sched: Enhance the pre/post scheduling logicGregory Haskins
We currently have an explicit "needs_post" vtable method which returns a stack variable for whether we should later run post-schedule. This leads to an awkward exchange of the variable as it bubbles back up out of the context switch. Peter Zijlstra observed that this information could be stored in the run-queue itself instead of handled on the stack. Therefore, we revert to the method of having context_switch return void, and update an internal rq->post_schedule variable when we require further processing. In addition, we fix a race condition where we try to access current->sched_class without holding the rq->lock. This is technically racy, as the sched-class could change out from under us. Instead, we reference the per-rq post_schedule variable with the runqueue unlocked, but with preemption disabled to see if we need to reacquire the rq->lock. Finally, we clean the code up slightly by removing the #ifdef CONFIG_SMP conditionals from the schedule() call, and implement some inline helper functions instead. This patch passes checkpatch, and rt-migrate. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090729150422.17691.55590.stgit@dev.haskins.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02sched: Add new prio to cpupri before removing old prioSteven Rostedt
We need to add the new prio to the cpupri accounting before removing the old prio. This is because removing the old prio first will open a race window where the cpu will be removed from pri_active. In this case the cpu will not be visible for RT push and pulls. This could cause a RT task to not migrate appropriately, and create a very large latency. This bug was found with the use of ftrace sched events and trace_printk. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090729042526.438281019@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02sched: Check for pushing rt tasks after all schedulingSteven Rostedt
The current method for pushing RT tasks after scheduling only happens after a context switch. But we found cases where a task is set up on a run queue to be pushed but the push never happens because the schedule chooses the same task. This bug was found with the help of Gregory Haskins and the use of ftrace (trace_printk). It tooks several days for both of us analyzing the code and the trace output to find this. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090729042526.205923666@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02sched: Optimize unused cgroup configurationPeter Zijlstra
When cgroup group scheduling is built in, skip some code paths if we don't have any (but the root) cgroups configured. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02sched: Fix cgroup smp fairnessPeter Zijlstra
Commit ec4e0e2fe018992d980910db901637c814575914 ("fix inconsistency when redistribute per-cpu tg->cfs_rq shares") broke cgroup smp fairness. In order to avoid starvation of newly placed tasks, we never quite set the share of an empty cpu group-task to 0, but instead we set it as if there's a single NICE-0 task present. If however we actually set this in cfs_rq[cpu]->shares, that means the total shares for that group will be slightly inflated every time we balance, causing the observed unfairness. Fix this by setting cfs_rq[cpu]->shares to 0 but actually setting the effective weight of the related se to the inflated number. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1248696557.6987.1615.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02Merge branch 'sched/urgent' into sched/coreIngo Molnar
Merge reason: avoid upcoming patch conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02sched: Fix race in cpupri introduced by cpumask_var changesGregory Haskins
Background: Several race conditions in the scheduler have cropped up recently, which Steven and I have tracked down using ftrace. The most recent one turns out to be a race in how the scheduler determines a suitable migration target for RT tasks, introduced recently with commit: commit 68e74568fbe5854952355e942acca51f138096d9 Date: Tue Nov 25 02:35:13 2008 +1030 sched: convert struct cpupri_vec cpumask_var_t. The original design of cpupri allowed lockless readers to quickly determine a best-estimate target. Races between the pri_active bitmap and the vec->mask were handled in the original code because we would detect and return "0" when this occured. The design was predicated on the *effective* atomicity (*) of caching the result of cpus_and() between the cpus_allowed and the vec->mask. Commit 68e74568 changed the behavior such that vec->mask is accessed multiple times. This introduces a subtle race, the result of which means we can have a result that returns "1", but with an empty bitmap. *) yes, we know cpus_and() is not a locked operator across the entire composite array, but it is implicitly atomic on a per-word basis which is all the design required to work. Implementation: Rather than forgoing the lockless design, or reverting to a stack-based cpumask_t, we simply check for when the race has been encountered and continue processing in the event that the race is hit. This renders the removal race as if the priority bit had been atomically cleared as well, and allows the algorithm to execute correctly. Signed-off-by: Gregory Haskins <ghaskins@novell.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090730145728.25226.92769.stgit@dev.haskins.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02sched: Fix latencytop and sleep profiling vs group schedulingPeter Zijlstra
The latencytop and sleep accounting code assumes that any scheduler entity represents a task, this is not so. Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-30Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing/stat: Fix seqfile memory leak function-graph: Fix seqfile memory leak trace_stack: Fix seqfile memory leak profile: Suppress warning about large allocations when profile=1 is specified
2009-07-30kprobes: Use kernel_text_address() for checking probe addressMasami Hiramatsu
Use kernel_text_address() for checking probe address instead of __kernel_text_address(), because __kernel_text_address() returns true for init functions even after relaseing those functions. That will hit a BUG() in text_poke(). Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-29profile: suppress warning about large allocations when profile=1 is specifiedMel Gorman
When profile= is used, a large buffer is allocated early at boot. This can be larger than what the page allocator can provide so it prints a warning. However, the caller is able to handle the situation so this patch suppresses the warning. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-29cgroup avoid permanent sleep at rmdirKAMEZAWA Hiroyuki
After commit ec64f51545fffbc4cb968f0cea56341a4b07e85a ("cgroup: fix frequent -EBUSY at rmdir"), cgroup's rmdir (especially against memcg) doesn't return -EBUSY by temporary ref counts. That commit expects all refs after pre_destroy() is temporary but...it wasn't. Then, rmdir can wait permanently. This patch tries to fix that and change followings. - set CGRP_WAIT_ON_RMDIR flag before pre_destroy(). - clear CGRP_WAIT_ON_RMDIR flag when the subsys finds racy case. if there are sleeping ones, wakes them up. - rmdir() sleeps only when CGRP_WAIT_ON_RMDIR flag is set. Tested-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reviewed-by: Paul Menage <menage@google.com> Acked-by: Balbir Sigh <balbir@linux.vnet.ibm.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-29cgroups: fix pid namespace bugLi Zefan
The bug was introduced by commit cc31edceee04a7b87f2be48f9489ebb72d264844 ("cgroups: convert tasks file to use a seq_file with shared pid array"). We cache a pid array for all threads that are opening the same "tasks" file, but the pids in the array are always from the namespace of the last process that opened the file, so all other threads will read pids from that namespace instead of their own namespaces. To fix it, we maintain a list of pid arrays, which is keyed by pid_ns. The list will be of length 1 at most time. Reported-by: Paul Menage <menage@google.com> Idea-by: Paul Menage <menage@google.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Serge Hallyn <serue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-29kexec: fix omitting offset in extended crashkernel syntaxHidetoshi Seto
Setting "crashkernel=512M-2G:64M,2G-:128M" does not work but it turns to work if it has a trailing-whitespace, like "crashkernel=512M-2G:64M,2G-:128M ". It was because of a bug in the parser, running over the cmdline. This patch adds a check of the termination. Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Tested-by: Jin Dongming <jin.dongming@np.css.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-29mm: copy over oom_adj value at fork timeRik van Riel
Fix a post-2.6.31 regression which was introduced by 2ff05b2b4eac2e63d345fc731ea151a060247f53 ("oom: move oom_adj value from task_struct to mm_struct"). After moving the oom_adj value from the task struct to the mm_struct, the oom_adj value was no longer properly inherited by child processes. Copying over the oom_adj value at fork time fixes that bug. [kosaki.motohiro@jp.fujitsu.com: test for current->mm before dereferencing it] Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Paul Menage <manage@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-27update the comment in kthread_stop()Oleg Nesterov
Commit 63706172f332fd3f6e7458ebfb35fa6de9c21dc5 ("kthreads: rework kthread_stop()") removed the limitation that the thread function mysr not call do_exit() itself, but forgot to update the comment. Since that commit it is OK to use kthread_stop() even if kthread can exit itself. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-27module: use MODULE_SYMBOL_PREFIX with module_layoutMike Frysinger
The check_modstruct_version() needs to look up the symbol "module_layout" in the kernel, but it does so literally and not by a C identifier. The trouble is that it does not include a symbol prefix for those ports that need it (like the Blackfin and H8300 port). So make sure we tack on the MODULE_SYMBOL_PREFIX define to the front of it. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-24sched: Fix return value of migration_init()Thomas Gleixner
migration_init() returns the return value of the hotplug notifier. In the success case this is NOTIFY_OK which is 1. initcall_debug evaluates that as an error code because init calls are expected to return 0 on success. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-23tracing/stat: Fix seqfile memory leakLi Zefan
Every time we cat a trace_stat file, we leak memory allocated by seq_open(). Also fix memory leak in a failure path in tracing_stat_open(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A67D92B.4060704@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-07-23function-graph: Fix seqfile memory leakLi Zefan
Every time we cat set_graph_function, we leak memory allocated by seq_open(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A67D907.2010500@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-07-23trace_stack: Fix seqfile memory leakLi Zefan
Every time we cat stack_trace, we leak memory allocated by seq_open(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A67D8E8.3020500@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-07-22genirq: Fix UP compile failure caused by irq_thread_check_affinityBruno Premont
Since genirq: Delegate irq affinity setting to the irq thread (591d2fb02ea80472d846c0b8507007806bdd69cc) compilation with CONFIG_SMP=n fails with following error: /usr/src/linux-2.6/kernel/irq/manage.c: In function 'irq_thread_check_affinity': /usr/src/linux-2.6/kernel/irq/manage.c:475: error: 'struct irq_desc' has no member named 'affinity' make[4]: *** [kernel/irq/manage.o] Error 1 That commit adds a new function irq_thread_check_affinity() which uses struct irq_desc.affinity which is only available for CONFIG_SMP=y. Move that function under #ifdef CONFIG_SMP. [ tglx@brownpaperbag: compile and boot tested on UP and SMP ] Signed-off-by: Bruno Premont <bonbons@linux-vserver.org> LKML-Reference: <20090722222232.2eb3e1c4@neptune.home> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-22Merge branch 'perf-counters-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf * 'perf-counters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf: (31 commits) perf_counter tools: Give perf top inherit option perf_counter tools: Fix vmlinux symbol generation breakage perf_counter: Detect debugfs location perf_counter: Add tracepoint support to perf list, perf stat perf symbol: C++ demangling perf: avoid structure size confusion by using a fixed size perf_counter: Fix throttle/unthrottle event logging perf_counter: Improve perf stat and perf record option parsing perf_counter: PERF_SAMPLE_ID and inherited counters perf_counter: Plug more stack leaks perf: Fix stack data leak perf_counter: Remove unused variables perf_counter: Make call graph option consistent perf_counter: Add perf record option to log addresses perf_counter: Log vfork as a fork event perf_counter: Synthesize VDSO mmap event perf_counter: Make sure we dont leak kernel memory to userspace perf_counter tools: Fix index boundary check perf_counter: Fix the tracepoint channel to perfcounters perf_counter, x86: Extend perf_counter Pentium M support ...
2009-07-22Merge branch 'core-fixes-for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: softirq: introduce tasklet_hrtimer infrastructure
2009-07-22Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clocksource: Prevent NULL pointer dereference timer: Avoid reading uninitialized data
2009-07-22Merge branch 'irq-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Delegate irq affinity setting to the irq thread
2009-07-22Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: fix nr_uninterruptible accounting of frozen tasks really sched: fix load average accounting vs. cpu hotplug sched: Account for vruntime wrapping
2009-07-22perf: fix stack data leakArjan van de Ven
the "reserved" field was not initialized to zero, resulting in 4 bytes of stack data leaking to userspace.... Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-22perf_counter: Fix throttle/unthrottle event loggingAnton Blanchard
Right now we only print PERF_EVENT_THROTTLE + 1 (ie PERF_EVENT_UNTHROTTLE). Fix this to print both a throttle and unthrottle event. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090722130546.GE9029@kryten>
2009-07-22perf_counter: PERF_SAMPLE_ID and inherited countersPeter Zijlstra
Anton noted that for inherited counters the counter-id as provided by PERF_SAMPLE_ID isn't mappable to the id found through PERF_RECORD_ID because each inherited counter gets its own id. His suggestion was to always return the parent counter id, since that is the primary counter id as exposed. However, these inherited counters have a unique identifier so that events like PERF_EVENT_PERIOD and PERF_EVENT_THROTTLE can be specific about which counter gets modified, which is important when trying to normalize the sample streams. This patch removes PERF_EVENT_PERIOD in favour of PERF_SAMPLE_PERIOD, which is more useful anyway, since changing periods became a lot more common than initially thought -- rendering PERF_EVENT_PERIOD the less useful solution (also, PERF_SAMPLE_PERIOD reports the more accurate value, since it reports the value used to trigger the overflow, whereas PERF_EVENT_PERIOD simply reports the requested period changed, which might only take effect on the next cycle). This still leaves us PERF_EVENT_THROTTLE to consider, but since that _should_ be a rare occurrence, and linking it to a primary id is the most useful bit to diagnose the problem, we introduce a PERF_SAMPLE_STREAM_ID, for those few cases where the full reconstruction is important. [Does change the ABI a little, but I see no other way out] Suggested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1248095846.15751.8781.camel@twins>
2009-07-22perf_counter: Plug more stack leaksPeter Zijlstra
Per example of Arjan's patch, I went through and found a few more. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2009-07-22perf: Fix stack data leakArjan van de Ven
the "reserved" field was not initialized to zero, resulting in 4 bytes of stack data leaking to userspace.... Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2009-07-22Merge commit 'tip/perfcounters/core' into perf-counters-for-linusPeter Zijlstra
2009-07-22softirq: introduce tasklet_hrtimer infrastructurePeter Zijlstra
commit ca109491f (hrtimer: removing all ur callback modes) moved all hrtimer callbacks into hard interrupt context when high resolution timers are active. That breaks code which relied on the assumption that the callback happens in softirq context. Provide a generic infrastructure which combines tasklets and hrtimers together to provide an in-softirq hrtimer experience. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: torvalds@linux-foundation.org Cc: kaber@trash.net Cc: David Miller <davem@davemloft.net> LKML-Reference: <1248265724.27058.1366.camel@twins> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-21genirq: Delegate irq affinity setting to the irq threadThomas Gleixner
irq_set_thread_affinity() calls set_cpus_allowed_ptr() which might sleep, but irq_set_thread_affinity() is called with desc->lock held and can be called from hard interrupt context as well. The code has another bug as it does not hold a ref on the task struct as required by set_cpus_allowed_ptr(). Just set the IRQTF_AFFINITY bit in action->thread_flags. The next time the thread runs it migrates itself. Solves all of the above problems nicely. Add kerneldoc to irq_set_thread_affinity() while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission>
2009-07-19clocksource: Prevent NULL pointer dereferenceThomas Gleixner
Writing a zero length string to sys/.../current_clocksource will cause a NULL pointer dereference if the clock events system is in one shot (highres or nohz) mode. Pointed-out-by: Dan Carpenter <error27@gmail.com> LKML-Reference: <alpine.DEB.2.00.0907191545580.12306@bicker> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-18timer: Avoid reading uninitialized dataPavel Roskin
timer->expires may be uninitialized, so check timer_pending() before touching timer->expires to pacify kmemcheck. Signed-off-by: Pavel Roskin <proski@gnu.org> LKML-Reference: <20090718204602.5191.360.stgit@mj.roinet.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-18sched: Pull up the might_sleep() check into cond_resched()Frederic Weisbecker
might_sleep() is called late-ish in cond_resched(), after the need_resched()/preempt enabled/system running tests are checked. It's better to check the sleeps while atomic earlier and not depend on some environment datas that reduce the chances to detect a problem. Also define cond_resched_*() helpers as macros, so that the FILE/LINE reported in the sleeping while atomic warning displays the real origin and not sched.h Changes in v2: - Call __might_sleep() directly instead of might_sleep() which may call cond_resched() - Turn cond_resched() into a macro so that the file:line couple reported refers to the caller of cond_resched() and not __cond_resched() itself. Changes in v3: - Also propagate this __might_sleep() pull up to cond_resched_lock() and cond_resched_softirq() Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1247725694-6082-6-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18sched: Add a preempt count base offset to __might_sleep()Frederic Weisbecker
Add a preempt count base offset to compare against the current preempt level count. It prepares to pull up the might_sleep check from cond_resched() to cond_resched_lock() and cond_resched_bh(). For these two helpers, we need to respectively ensure that once we'll unlock the given spinlock / reenable local softirqs, we will reach a sleepable state. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> [ Move and rename preempt_count_equals() ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1247725694-6082-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18sched: Cover the CONFIG_DEBUG_SPINLOCK_SLEEP off-case for __might_sleep()Frederic Weisbecker
Cover the off case for __might_sleep(), so that we avoid #ifdefs in files that make use of it. Especially, this prepares for the __might_sleep() pull up on cond_resched(). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1247725694-6082-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18sched: Remove obsolete comment in __cond_resched()Frederic Weisbecker
Remove the outdated comment from __cond_resched() related to the now removed Big Kernel Semaphore. Reported-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1247725694-6082-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18sched: Drop the need_resched() loop from cond_resched()Frederic Weisbecker
The schedule() function is a loop that reschedules the current task while the TIF_NEED_RESCHED flag is set: void schedule(void) { need_resched: /* schedule code */ if (need_resched()) goto need_resched; } And cond_resched() repeat this loop: do { add_preempt_count(PREEMPT_ACTIVE); schedule(); sub_preempt_count(PREEMPT_ACTIVE); } while(need_resched()); This loop is needless because schedule() already did the check and nothing can set TIF_NEED_RESCHED between schedule() exit and the loop check in need_resched(). Then remove this needless loop. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1247725694-6082-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18Merge branch 'linus' into sched/coreIngo Molnar
Merge reason: branch had an old upstream base (-rc1-ish), but also merge to avoid a conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18sched: fix nr_uninterruptible accounting of frozen tasks reallyThomas Gleixner
commit e3c8ca8336 (sched: do not count frozen tasks toward load) broke the nr_uninterruptible accounting on freeze/thaw. On freeze the task is excluded from accounting with a check for (task->flags & PF_FROZEN), but that flag is cleared before the task is thawed. So while we prevent that the task with state TASK_UNINTERRUPTIBLE is accounted to nr_uninterruptible on freeze we decrement nr_uninterruptible on thaw. Use a separate flag which is handled by the freezing task itself. Set it before calling the scheduler with TASK_UNINTERRUPTIBLE state and clear it after we return from frozen state. Cc: <stable@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-18sched: fix load average accounting vs. cpu hotplugThomas Gleixner
The new load average code clears rq->calc_load_active on CPU_ONLINE. That's wrong as the new onlined CPU might have got a scheduler tick already and accounted the delta to the stale value of the time we offlined the CPU. Clear the value when we cleanup the dead CPU instead. Also move the update of the calc_load_update time for the newly online CPU to CPU_UP_PREPARE to avoid that the CPU plays catch up with the stale update time value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-18profile: Suppress warning about large allocations when profile=1 is specifiedMel Gorman
When profile= is used, a large buffer is allocated early at boot. This can be larger than what the page allocator can provide so it prints a warning. However, the caller is able to handle the situation so this patch suppresses the warning. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Linux Memory Management List <linux-mm@kvack.org> Cc: Heinz Diehl <htd@fancy-poultry.org> Cc: David Miller <davem@davemloft.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1247656992-19846-3-git-send-email-mel@csn.ul.ie> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18perf_counter: Log vfork as a fork eventAnton Blanchard
Right now we don't output vfork events. Even though we should always see an exec after a vfork, we may get perfcounter samples between the vfork and exec. These samples can lead to some confusion when parsing perfcounter data. To keep things consistent we should always log a fork event. It will result in a little more log data, but is less confusing to trace parsing tools. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090716104817.589309391@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18perf_counter: Make sure we dont leak kernel memory to userspaceAnton Blanchard
There are a few places we are leaking tiny amounts of kernel memory to userspace. This happens when writing out strings because we always align the end to 64 bits. To avoid this we should always use an appropriately sized temporary buffer and ensure it is zeroed. Since d_path assembles the string from the end of the buffer backwards, we need to add 64 bits after the buffer to allow for alignment. We also need to copy arch_vma_name to the temporary buffer, because if we use it directly we may end up copying to userspace a number of bytes after the end of the string constant. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090716104817.273972048@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18sched: Account for vruntime wrappingFabio Checconi
I spotted two sites that didn't take vruntime wrap-around into account. Fix these by creating a comparison helper that does do so. Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>