Age | Commit message (Collapse) | Author |
|
While calculating the scheduler tick max deferment, the delta is
converted from microseconds to nanoseconds through a multiplication
against NSEC_PER_USEC.
But this microseconds operand is an unsigned int, thus the result may
likely overflow. The result is cast to u64 but only once the operation
is completed, which is too late to avoid overflown result.
This is currently not a problem because the scheduler tick max deferment
is 1 second. But this may become an issue as we plan to make this
value tunable.
So lets fix this by casting the usecs value to u64 before multiplying by
NSECS_PER_USEC.
Also to prevent from this kind of mistake to happen again, move this
ad-hoc jiffies -> nsecs conversion to a new helper.
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1387315388-31676-2-git-send-email-khilman@linaro.org
[move ad-hoc conversion to jiffies_to_nsecs helper]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
|
Code usually starts with 'tab' instead of 7 'space' in kernel
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1386074112-30754-2-git-send-email-alex.shi@linaro.org
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
|
We don't need to fetch the timekeeping max deferment under the
jiffies_lock seqlock.
If the clocksource is updated concurrently while we stop the tick,
stop machine is called and the tick will be reevaluated again along with
uptodate jiffies and its related values.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1387320692-28460-9-git-send-email-fweisbec@gmail.com
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
|
This makes the code more symetric against the existing tick functions
called on irq exit: tick_irq_exit() and tick_nohz_irq_exit().
These function are also symetric as they mirror each other's action:
we start to account idle time on irq exit and we stop this accounting
on irq entry. Also the tick is stopped on irq exit and timekeeping
catches up with the tickless time elapsed until we reach irq entry.
This rename was suggested by Peter Zijlstra a long while ago but it
got forgotten in the mass.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1387320692-28460-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
|
The equivalent uapi struct uses __u32 so make the kernel
uses u32 too.
This can prevent some oddities where the limit is
logged/emitted as a negative value.
Convert kstrtol to kstrtouint to disallow negative values.
Signed-off-by: Joe Perches <joe@perches.com>
[eparis: do not remove static from audit_default declaration]
|
|
Add pr_fmt to prefix "audit: " to output
Convert printk(KERN_<LEVEL> to pr_<level>
Coalesce formats
Use pr_cont
Move a brace after switch
Signed-off-by: Joe Perches <joe@perches.com>
|
|
Using the generic kernel function causes the
object size to increase with gcc 4.8.1.
$ size kernel/audit.o*
text data bss dec hex filename
18577 6079 8436 33092 8144 kernel/audit.o.new
18579 6015 8420 33014 80f6 kernel/audit.o.old
Unsigned...
|
|
The trace buffer has a descriptor pointer that goes back to the trace
array. But it was never assigned. Luckily, nothing uses it (yet), but
it will in the future.
Although nothing currently uses this, if any of the new features get
backported to older kernels, and because this is such a simple change,
I'm marking it for stable too.
Cc: stable@vger.kernel.org # v3.10+
Fixes: 12883efb670c "tracing: Consolidate max_tr into main trace_array structure"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
An admin is likely to want to see old and new values next to each other.
Putting all of the old values followed by all of the new values is just
hard to read as a human.
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
We can simplify the AUDIT_TTY_SET code to only grab the spin_lock one
time. We need to determine if the new values are valid and if so, set
the new values at the same time we grab the old onces. While we are
here get rid of 'res' and just use err.
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
If userspace specified that it was setting values via the mask we do not
need a second check to see if they also set the version field high
enough to understand those values. (clearly if they set the mask they
knew those values).
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
Give names to the audit versions. Just something for a userspace
programmer to know what the version provides.
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
We had some craziness with signed to unsigned long casting which appears
wholely unnecessary. Just use signed long. Even though 2 values of the
math equation are unsigned longs the result is expected to be a signed
long. So why keep casting the result to signed long? Just make it
signed long and use it.
We also remove the needless "timeout" variable. We already have the
stack "sleep_time" variable. Just use that...
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
Add task information to the log when changing a feature state.
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
NETLINK_CB(skb).sk is the socket of user space process,
netlink_unicast in kauditd_send_skb wants the kernel
side socket. Since the sk_state of audit netlink socket
is not NETLINK_CONNECTED, so the netlink_getsockbyportid
doesn't return -ECONNREFUSED.
And the socket of userspace process can be released anytime,
so the audit_sock may point to invalid socket.
this patch sets the audit_sock to the kernel side audit
netlink socket.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
print the error message and then return -ENOMEM.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
Remove spaces between "new", "old" label modifiers and "auid", "ses" labels in
log output since userspace tools can't parse orphaned keywords.
Make variable names more consistent and intuitive.
Make audit_log_format() argument code easier to read.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
An error on an AUDIT_NEVER rule disabled logging on that rule.
On error on AUDIT_NEVER rules, log.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
The backlog cannot be consumed when audit_log_start is running on auditd
even if audit_log_start calls wait_for_auditd to consume it.
The situation is the deadlock because only auditd can consume the backlog.
If the other process needs to send the backlog, it can be also stopped
by the deadlock.
So, audit_log_start running on auditd should not stop.
You can see the deadlock with the following reproducer:
# auditctl -a exit,always -S all
# reboot
Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Reviewed-by: gaofeng@cn.fujitsu.com
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
We do not need to hold the audit_cmd_mutex for this family of cases. The
possible exception to this is the call to audit_filter_user(), so drop the lock
immediately after. To help in fixing the race we are trying to avoid, make
sure that nothing called by audit_filter_user() calls audit_log_start(). In
particular, watch out for *_audit_rule_match().
This fix will take care of systemd and anything USING audit. It still means
that we could race with something configuring audit and auditd shutting down.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Reported-by: toshi.okajima@jp.fujitsu.com
Tested-by: toshi.okajima@jp.fujitsu.com
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
Right now the sessionid value in the kernel is a combination of u32,
int, and unsigned int. Just use unsigned int throughout.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
Currently when the coredump signals are logged by the audit system, the
actual path to the executable is not logged. Without details of exe, the
system admin may not have an exact idea on what program failed.
This patch changes the audit_log_task() so that the path to the exe is also
logged.
This was copied from audit_log_task_info() and the latter enhanced to avoid
disappearing text fields.
Signed-off-by: Paul Davies C <pauldaviesc@gmail.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
There have been reports of auditd restarts resulting in kaudit not being able
to find a newly registered auditd. It results in reports such as:
kernel: [ 2077.233573] audit: *NO* daemon at audit_pid=1614
kernel: [ 2077.234712] audit: audit_lost=97 audit_rate_limit=0 audit_backlog_limit=320
kernel: [ 2077.234718] audit: auditd disappeared
(previously mis-spelled "dissapeared")
One possible cause is a race between the shutdown of an older auditd and a
newer one. If the newer one sets the daemon pid to itself in kauditd before
the older one has cleared the daemon pid, the newer daemon pid will be erased.
This could be caused by an automated system, or by manual intervention, but in
either case, there is no use in having the older daemon clear the daemon pid
reference since its old pid is no longer being referenced. This patch will
prevent that specific case, returning an error of EACCES.
The case for preventing a newer auditd from registering itself if there is an
existing auditd is a more difficult case that is beyond the scope of this
patch.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
audit_receive_msg() needlessly contained a fallthrough case that called
audit_receive_filter(), containing no common code between the cases. Separate
them to make the logic clearer. Refactor AUDIT_LIST_RULES, AUDIT_ADD_RULE,
AUDIT_DEL_RULE cases to create audit_rule_change(), audit_list_rules_send()
functions. This should not functionally change the logic.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
Log transition of config changes when AUDIT_TTY_SET is called, including both
enabled and log_passwd values now in the struct.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
kauditd_send_skb is called after audit_pid was checked to be non-zero.
However, it can be set to 0 due to auditd exiting while kauditd_send_skb
is still executed and this can result in a spurious warning about missing
auditd.
Re-check audit_pid before printing the message.
Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: linux-kernel@vger.kernel.org
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
The audit_log_abend() is used only by the audit_core_dumps(). Thus there is no
need of maintaining the audit_log_abend() as a separate function.
This patch drops the audit_log_abend() and pushes its functionalities back to
the audit_core_dumps(). Apart from that the "reason" field is also dropped
from being logged since the reason can be deduced from the signal number.
Signed-off-by: Paul Davies C <pauldaviesc@gmail.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
Since audit can already be disabled by "audit=0" on the kernel boot line, or by
the command "auditctl -e 0", it would be more useful to have the
audit_backlog_limit set to zero mean effectively unlimited (limited only by
system RAM).
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
If audit is disabled, we shouldn't generate loginuid audit
log.
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
we already have old_lock, no need to calculate it again.
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
If audit is disabled,we shouldn't generate the audit log.
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
The order of new feature and old feature is incorrect,
this patch fix it.
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
Since kernel parameter is operated before
initcall, so the audit_initialized must be
AUDIT_UNINITIALIZED or DISABLED in audit_enable.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
reaahead-collector abuses the audit logging facility to discover which files
are accessed at boot time to make a pre-load list
Add a tuning option to audit_backlog_wait_time so that if auditd can't keep up,
or gets blocked, the callers won't be blocked.
Bump audit_status API version to "2".
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
Re-named confusing local variable names (status_set and status_get didn't agree
with their command type name) and reduced their scope.
Future-proof API changes by not depending on the exact size of the audit_status
struct and by adding an API version field.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
The default audit_backlog_limit is 64. This was a reasonable limit at one time.
systemd causes so much audit queue activity on startup that auditd doesn't
start before the backlog queue has already overflowed by more than a factor of
2. On a system with audit= not set on the kernel command line, this isn't an
issue since that history isn't kept for auditd when it is available. On a
system with audit=1 set on the kernel command line, kaudit tries to keep that
history until auditd is able to drain the queue.
This default can be changed by the "-b" option in audit.rules once the system
has booted, but won't help with lost messages on boot.
One way to solve this would be to increase the default backlog queue size to
avoid losing any messages before auditd is able to consume them. This would
be overkill to the embedded community and insufficient for some servers.
Another way to solve it might be to add a kconfig option to set the default
based on the system type. An embedded system would get the current (or
smaller) default, while Workstations might get more than now and servers might
get more.
None of these solutions helps if a system's compiled default is too small to
see the lost messages without compiling a new kernel.
This patch adds a kernel set-up parameter (audit already has one to
enable/disable it) "audit_backlog_limit=<n>" that overrides the default to
allow the system administrator to set the backlog limit.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
These and similar errors were seen on a patched 3.8 kernel when the
audit subsystem was overrun during boot:
udevd[876]: worker [887] unexpectedly returned with status 0x0100
udevd[876]: worker [887] failed while handling
'/devices/pci0000:00/0000:00:03.0/0000:40:00.0'
udevd[876]: worker [880] unexpectedly returned with status 0x0100
udevd[876]: worker [880] failed while handling
'/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1/event1'
udevadm settle - timeout of 180 seconds reached, the event queue
contains:
/sys/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1/event1 (3995)
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/INT3F0D:00 (4034)
audit: audit_backlog=258 > audit_backlog_limit=256
audit: audit_lost=1 audit_rate_limit=0 audit_backlog_limit=256
The change below increases the efficiency of the audit code and prevents it
from being overrun:
Use add_wait_queue_exclusive() in wait_for_auditd() to put the
thread on the wait queue. When kauditd dequeues an skb, all
of the waiting threads are waiting for the same resource, but
only one is going to get it, so there's no need to wake up
more than one waiter.
See: https://lkml.org/lkml/2013/9/2/479
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
These and similar errors were seen on a patched 3.8 kernel when the
audit subsystem was overrun during boot:
udevd[876]: worker [887] unexpectedly returned with status 0x0100
udevd[876]: worker [887] failed while handling
'/devices/pci0000:00/0000:00:03.0/0000:40:00.0'
udevd[876]: worker [880] unexpectedly returned with status 0x0100
udevd[876]: worker [880] failed while handling
'/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1/event1'
udevadm settle - timeout of 180 seconds reached, the event queue
contains:
/sys/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1/event1 (3995)
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/INT3F0D:00 (4034)
audit: audit_backlog=258 > audit_backlog_limit=256
audit: audit_lost=1 audit_rate_limit=0 audit_backlog_limit=256
The change below increases the efficiency of the audit code and prevents it
from being overrun:
Only issue a wake_up in kauditd if the length of the skb queue is less than the
backlog limit. Otherwise, threads waiting in wait_for_auditd() will simply
wake up, discover that the queue is still too long for them to proceed, and go
back to sleep. This results in wasted context switches and machine cycles.
kauditd_thread() is the only function that removes buffers from audit_skb_queue
so we can't race. If we did, the timeout in wait_for_auditd() would expire and
the waiting thread would continue.
See: https://lkml.org/lkml/2013/9/2/479
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
If wait_for_auditd() times out, go immediately to the error function rather
than retesting the loop conditions.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
When the audit queue overflows and times out (audit_backlog_wait_time), the
audit queue overflow timeout is set to zero. Once the audit queue overflow
timeout condition recovers, the timeout should be reset to the original value.
See also:
https://lkml.org/lkml/2013/9/2/473
Cc: stable@vger.kernel.org # v3.8-rc4+
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
Convert audit from only listening in init_net to use register_pernet_subsys()
to dynamically manage the netlink socket list.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
When being refactored from audit_log_start() to audit_log_task_info(), in
commit e23eb920 the tty and ses fields in the log output got transposed.
Restore to original order to avoid breaking search tools.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
Normally, netlink ports use the PID of the userspace process as the port ID.
If the PID is already in use by a port, the kernel will allocate another port
ID to avoid conflict. Re-name all references to netlink ports from pid to
portid to reflect this reality and avoid confusion with actual PIDs. Ports
use the __u32 type, so re-type all portids accordingly.
(This patch is very similar to ebiederman's 5deadd69)
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
- Always report the current process as capset now always only works on
the current process. This prevents reporting 0 or a random pid in
a random pid namespace.
- Don't bother to pass the pid as is available.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
(cherry picked from commit bcc85f0af31af123e32858069eb2ad8f39f90e67)
(cherry picked from commit f911cac4556a7a23e0b3ea850233d13b32328692)
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[eparis: fix build error when audit disabled]
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
The synchronization needed after ftrace_ops are unregistered must happen
after the callback is disabled from becing called by functions.
The current location happens after the function is being removed from the
internal lists, but not after the function callbacks were disabled, leaving
the functions susceptible of being called after their callbacks are freed.
This affects perf and any externel users of function tracing (LTTng and
SystemTap).
Cc: stable@vger.kernel.org # 3.0+
Fixes: cdbe61bfe704 "ftrace: Allow dynamically allocated function tracers"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
With various drivers wanting to inject idle time; we get people
calling idle routines outside of the idle loop proper.
Therefore we need to be extra careful about not missing
TIF_NEED_RESCHED -> PREEMPT_NEED_RESCHED propagations.
While looking at this, I also realized there's a small window in the
existing idle loop where we can miss TIF_NEED_RESCHED; when it hits
right after the tif_need_resched() test at the end of the loop but
right before the need_resched() test at the start of the loop.
So move preempt_fold_need_resched() out of the loop where we're
guaranteed to have TIF_NEED_RESCHED set.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-x9jgh45oeayzajz2mjt0y7d6@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Currently local_bh_disable() is out-of-line for no apparent reason.
So inline it to save a few cycles on call/return nonsense, the
function body is a single add on x86 (a few loads and store extra on
load/store archs).
Also expose two new local_bh functions:
__local_bh_{dis,en}able_ip(unsigned long ip, unsigned int cnt);
Which implement the actual local_bh_{dis,en}able() behaviour.
The next patch uses the exposed @cnt argument to optimize bh lock
functions.
With build fixes from Jacob Pan.
Cc: rjw@rjwysocki.net
Cc: rui.zhang@intel.com
Cc: jacob.jun.pan@linux.intel.com
Cc: Mike Galbraith <bitbucket@online.de>
Cc: hpa@zytor.com
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: lenb@kernel.org
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Doing some different tests, I discovered that function graph tracing, when
filtered via the set_ftrace_filter and set_ftrace_notrace files, does
not always keep with them if another function ftrace_ops is registered
to trace functions.
The reason is that function graph just happens to trace all functions
that the function tracer enables. When there was only one user of
function tracing, the function graph tracer did not need to worry about
being called by functions that it did not want to trace. But now that there
are other users, this becomes a problem.
For example, one just needs to do the following:
# cd /sys/kernel/debug/tracing
# echo schedule > set_ftrace_filter
# echo function_graph > current_tracer
# cat trace
[..]
0) | schedule() {
------------------------------------------
0) <idle>-0 => rcu_pre-7
------------------------------------------
0) ! 2980.314 us | }
0) | schedule() {
------------------------------------------
0) rcu_pre-7 => <idle>-0
------------------------------------------
0) + 20.701 us | }
# echo 1 > /proc/sys/kernel/stack_tracer_enabled
# cat trace
[..]
1) + 20.825 us | }
1) + 21.651 us | }
1) + 30.924 us | } /* SyS_ioctl */
1) | do_page_fault() {
1) | __do_page_fault() {
1) 0.274 us | down_read_trylock();
1) 0.098 us | find_vma();
1) | handle_mm_fault() {
1) | _raw_spin_lock() {
1) 0.102 us | preempt_count_add();
1) 0.097 us | do_raw_spin_lock();
1) 2.173 us | }
1) | do_wp_page() {
1) 0.079 us | vm_normal_page();
1) 0.086 us | reuse_swap_page();
1) 0.076 us | page_move_anon_rmap();
1) | unlock_page() {
1) 0.082 us | page_waitqueue();
1) 0.086 us | __wake_up_bit();
1) 1.801 us | }
1) 0.075 us | ptep_set_access_flags();
1) | _raw_spin_unlock() {
1) 0.098 us | do_raw_spin_unlock();
1) 0.105 us | preempt_count_sub();
1) 1.884 us | }
1) 9.149 us | }
1) + 13.083 us | }
1) 0.146 us | up_read();
When the stack tracer was enabled, it enabled all functions to be traced, which
now the function graph tracer also traces. This is a side effect that should
not occur.
To fix this a test is added when the function tracing is changed, as well as when
the graph tracer is enabled, to see if anything other than the ftrace global_ops
function tracer is enabled. If so, then the graph tracer calls a test trampoline
that will look at the function that is being traced and compare it with the
filters defined by the global_ops.
As an optimization, if there's no other function tracers registered, or if
the only registered function tracers also use the global ops, the function
graph infrastructure will call the registered function graph callback directly
and not go through the test trampoline.
Cc: stable@vger.kernel.org # 3.3+
Fixes: d2d45c7a03a2 "tracing: Have stack_tracer use a separate list of functions"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The below tells us the static_key conversion has a problem; since the
exact point of clearing that flag isn't too important, delay the flip
and use a workqueue to process it.
[ ] TSC synchronization [CPU#0 -> CPU#22]:
[ ] Measured 8 cycles TSC warp between CPUs, turning off TSC clock.
[ ]
[ ] ======================================================
[ ] [ INFO: possible circular locking dependency detected ]
[ ] 3.13.0-rc3-01745-g848b0d0322cb-dirty #637 Not tainted
[ ] -------------------------------------------------------
[ ] swapper/0/1 is trying to acquire lock:
[ ] (jump_label_mutex){+.+...}, at: [<ffffffff8115a637>] jump_label_lock+0x17/0x20
[ ]
[ ] but task is already holding lock:
[ ] (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109408b>] cpu_hotplug_begin+0x2b/0x60
[ ]
[ ] which lock already depends on the new lock.
[ ]
[ ]
[ ] the existing dependency chain (in reverse order) is:
[ ]
[ ] -> #1 (cpu_hotplug.lock){+.+.+.}:
[ ] [<ffffffff810def00>] lock_acquire+0x90/0x130
[ ] [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0
[ ] [<ffffffff81093fdc>] get_online_cpus+0x3c/0x60
[ ] [<ffffffff8104cc67>] arch_jump_label_transform+0x37/0x130
[ ] [<ffffffff8115a3cf>] __jump_label_update+0x5f/0x80
[ ] [<ffffffff8115a48d>] jump_label_update+0x9d/0xb0
[ ] [<ffffffff8115aa6d>] static_key_slow_inc+0x9d/0xb0
[ ] [<ffffffff810c0f65>] sched_feat_set+0xf5/0x100
[ ] [<ffffffff810c5bdc>] set_numabalancing_state+0x2c/0x30
[ ] [<ffffffff81d12f3d>] numa_policy_init+0x1af/0x1b7
[ ] [<ffffffff81cebdf4>] start_kernel+0x35d/0x41f
[ ] [<ffffffff81ceb5a5>] x86_64_start_reservations+0x2a/0x2c
[ ] [<ffffffff81ceb6a2>] x86_64_start_kernel+0xfb/0xfe
[ ]
[ ] -> #0 (jump_label_mutex){+.+...}:
[ ] [<ffffffff810de141>] __lock_acquire+0x1701/0x1eb0
[ ] [<ffffffff810def00>] lock_acquire+0x90/0x130
[ ] [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0
[ ] [<ffffffff8115a637>] jump_label_lock+0x17/0x20
[ ] [<ffffffff8115aa3b>] static_key_slow_inc+0x6b/0xb0
[ ] [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20
[ ] [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70
[ ] [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150
[ ] [<ffffffff81076612>] native_cpu_up+0x3a2/0x890
[ ] [<ffffffff810941cb>] _cpu_up+0xdb/0x160
[ ] [<ffffffff810942c9>] cpu_up+0x79/0x90
[ ] [<ffffffff81d0af6b>] smp_init+0x60/0x8c
[ ] [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197
[ ] [<ffffffff8164e32e>] kernel_init+0xe/0x130
[ ] [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0
[ ]
[ ] other info that might help us debug this:
[ ]
[ ] Possible unsafe locking scenario:
[ ]
[ ] CPU0 CPU1
[ ] ---- ----
[ ] lock(cpu_hotplug.lock);
[ ] lock(jump_label_mutex);
[ ] lock(cpu_hotplug.lock);
[ ] lock(jump_label_mutex);
[ ]
[ ] *** DEADLOCK ***
[ ]
[ ] 2 locks held by swapper/0/1:
[ ] #0: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81094037>] cpu_maps_update_begin+0x17/0x20
[ ] #1: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109408b>] cpu_hotplug_begin+0x2b/0x60
[ ]
[ ] stack backtrace:
[ ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc3-01745-g848b0d0322cb-dirty #637
[ ] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
[ ] ffffffff82c9c270 ffff880236843bb8 ffffffff8165c5f5 ffffffff82c9c270
[ ] ffff880236843bf8 ffffffff81658c02 ffff880236843c80 ffff8802368586a0
[ ] ffff880236858678 0000000000000001 0000000000000002 ffff880236858000
[ ] Call Trace:
[ ] [<ffffffff8165c5f5>] dump_stack+0x4e/0x7a
[ ] [<ffffffff81658c02>] print_circular_bug+0x1f9/0x207
[ ] [<ffffffff810de141>] __lock_acquire+0x1701/0x1eb0
[ ] [<ffffffff816680ff>] ? __atomic_notifier_call_chain+0x8f/0xb0
[ ] [<ffffffff810def00>] lock_acquire+0x90/0x130
[ ] [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20
[ ] [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20
[ ] [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0
[ ] [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20
[ ] [<ffffffff8115a637>] jump_label_lock+0x17/0x20
[ ] [<ffffffff8115aa3b>] static_key_slow_inc+0x6b/0xb0
[ ] [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20
[ ] [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70
[ ] [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150
[ ] [<ffffffff81076612>] native_cpu_up+0x3a2/0x890
[ ] [<ffffffff810941cb>] _cpu_up+0xdb/0x160
[ ] [<ffffffff810942c9>] cpu_up+0x79/0x90
[ ] [<ffffffff81d0af6b>] smp_init+0x60/0x8c
[ ] [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197
[ ] [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
[ ] [<ffffffff8164e32e>] kernel_init+0xe/0x130
[ ] [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0
[ ] [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
[ ] ------------[ cut here ]------------
[ ] WARNING: CPU: 0 PID: 1 at /usr/src/linux-2.6/kernel/smp.c:374 smp_call_function_many+0xad/0x300()
[ ] Modules linked in:
[ ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc3-01745-g848b0d0322cb-dirty #637
[ ] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
[ ] 0000000000000009 ffff880236843be0 ffffffff8165c5f5 0000000000000000
[ ] ffff880236843c18 ffffffff81093d8c 0000000000000000 0000000000000000
[ ] ffffffff81ccd1a0 ffffffff810ca951 0000000000000000 ffff880236843c28
[ ] Call Trace:
[ ] [<ffffffff8165c5f5>] dump_stack+0x4e/0x7a
[ ] [<ffffffff81093d8c>] warn_slowpath_common+0x8c/0xc0
[ ] [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0
[ ] [<ffffffff81093dda>] warn_slowpath_null+0x1a/0x20
[ ] [<ffffffff8110b72d>] smp_call_function_many+0xad/0x300
[ ] [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30
[ ] [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30
[ ] [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0
[ ] [<ffffffff8110ba96>] smp_call_function+0x46/0x80
[ ] [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30
[ ] [<ffffffff8110bb3c>] on_each_cpu+0x3c/0xa0
[ ] [<ffffffff810ca950>] ? sched_clock_idle_sleep_event+0x20/0x20
[ ] [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0
[ ] [<ffffffff8104f964>] text_poke_bp+0x64/0xd0
[ ] [<ffffffff810ca950>] ? sched_clock_idle_sleep_event+0x20/0x20
[ ] [<ffffffff8104ccde>] arch_jump_label_transform+0xae/0x130
[ ] [<ffffffff8115a3cf>] __jump_label_update+0x5f/0x80
[ ] [<ffffffff8115a48d>] jump_label_update+0x9d/0xb0
[ ] [<ffffffff8115aa6d>] static_key_slow_inc+0x9d/0xb0
[ ] [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20
[ ] [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70
[ ] [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150
[ ] [<ffffffff81076612>] native_cpu_up+0x3a2/0x890
[ ] [<ffffffff810941cb>] _cpu_up+0xdb/0x160
[ ] [<ffffffff810942c9>] cpu_up+0x79/0x90
[ ] [<ffffffff81d0af6b>] smp_init+0x60/0x8c
[ ] [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197
[ ] [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
[ ] [<ffffffff8164e32e>] kernel_init+0xe/0x130
[ ] [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0
[ ] [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
[ ] ---[ end trace 6ff1df5620c49d26 ]---
[ ] tsc: Marking TSC unstable due to check_tsc_sync_source failed
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-v55fgqj3nnyqnngmvuu8ep6h@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In order to avoid the runtime condition and variable load turn
sched_clock_stable into a static_key.
Also provide a shorter implementation of local_clock() and
cpu_clock(int) when sched_clock_stable==1.
MAINLINE PRE POST
sched_clock_stable: 1 1 1
(cold) sched_clock: 329841 221876 215295
(cold) local_clock: 301773 234692 220773
(warm) sched_clock: 38375 25602 25659
(warm) local_clock: 100371 33265 27242
(warm) rdtsc: 27340 24214 24208
sched_clock_stable: 0 0 0
(cold) sched_clock: 382634 235941 237019
(cold) local_clock: 396890 297017 294819
(warm) sched_clock: 38194 25233 25609
(warm) local_clock: 143452 71234 71232
(warm) rdtsc: 27345 24245 24243
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|