<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/Documentation/sysctl, branch v3.16</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/Documentation/sysctl?h=v3.16</id>
<link rel='self' href='https://git.amat.us/linux/atom/Documentation/sysctl?h=v3.16'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2014-06-23T23:47:44Z</updated>
<entry>
<title>kernel/watchdog.c: print traces for all cpus on lockup detection</title>
<updated>2014-06-23T23:47:44Z</updated>
<author>
<name>Aaron Tomlin</name>
<email>atomlin@redhat.com</email>
</author>
<published>2014-06-23T20:22:05Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=ed235875e2ca983197831337a986f0517074e1a0'/>
<id>urn:sha1:ed235875e2ca983197831337a986f0517074e1a0</id>
<content type='text'>
A 'softlockup' is defined as a bug that causes the kernel to loop in
kernel mode for more than a predefined period to time, without giving
other tasks a chance to run.

Currently, upon detection of this condition by the per-cpu watchdog
task, debug information (including a stack trace) is sent to the system
log.

On some occasions, we have observed that the "victim" rather than the
actual "culprit" (i.e.  the owner/holder of the contended resource) is
reported to the user.  Often this information has proven to be
insufficient to assist debugging efforts.

To avoid loss of useful debug information, for architectures which
support NMI, this patch makes it possible to improve soft lockup
reporting.  This is accomplished by issuing an NMI to each cpu to obtain
a stack trace.

If NMI is not supported we just revert back to the old method.  A sysctl
and boot-time parameter is available to toggle this feature.

[dzickus@redhat.com: add CONFIG_SMP in certain areas]
[akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations]
[mq@suse.cz: fix warning]
Signed-off-by: Aaron Tomlin &lt;atomlin@redhat.com&gt;
Signed-off-by: Don Zickus &lt;dzickus@redhat.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Mateusz Guzik &lt;mguzik@redhat.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Jan Moskyto Matejka &lt;mq@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, pcp: allow restoring percpu_pagelist_fraction default</title>
<updated>2014-06-23T23:47:43Z</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2014-06-23T20:22:04Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=7cd2b0a34ab8e4db971920eef8982f985441adfb'/>
<id>urn:sha1:7cd2b0a34ab8e4db971920eef8982f985441adfb</id>
<content type='text'>
Oleg reports a division by zero error on zero-length write() to the
percpu_pagelist_fraction sysctl:

    divide error: 0000 [#1] SMP DEBUG_PAGEALLOC
    CPU: 1 PID: 9142 Comm: badarea_io Not tainted 3.15.0-rc2-vm-nfs+ #19
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    task: ffff8800d5aeb6e0 ti: ffff8800d87a2000 task.ti: ffff8800d87a2000
    RIP: 0010: percpu_pagelist_fraction_sysctl_handler+0x84/0x120
    RSP: 0018:ffff8800d87a3e78  EFLAGS: 00010246
    RAX: 0000000000000f89 RBX: ffff88011f7fd000 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000010
    RBP: ffff8800d87a3e98 R08: ffffffff81d002c8 R09: ffff8800d87a3f50
    R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000060
    R13: ffffffff81c3c3e0 R14: ffffffff81cfddf8 R15: ffff8801193b0800
    FS:  00007f614f1e9740(0000) GS:ffff88011f440000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007f614f1fa000 CR3: 00000000d9291000 CR4: 00000000000006e0
    Call Trace:
      proc_sys_call_handler+0xb3/0xc0
      proc_sys_write+0x14/0x20
      vfs_write+0xba/0x1e0
      SyS_write+0x46/0xb0
      tracesys+0xe1/0xe6

However, if the percpu_pagelist_fraction sysctl is set by the user, it
is also impossible to restore it to the kernel default since the user
cannot write 0 to the sysctl.

This patch allows the user to write 0 to restore the default behavior.
It still requires a fraction equal to or larger than 8, however, as
stated by the documentation for sanity.  If a value in the range [1, 7]
is written, the sysctl will return EINVAL.

This successfully solves the divide by zero issue at the same time.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Reported-by: Oleg Drokin &lt;green@linuxhacker.ru&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sysctl: allow for strict write position handling</title>
<updated>2014-06-06T23:08:13Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2014-06-06T21:37:19Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f4aacea2f5d1a5f7e3154e967d70cf3f711bcd61'/>
<id>urn:sha1:f4aacea2f5d1a5f7e3154e967d70cf3f711bcd61</id>
<content type='text'>
When writing to a sysctl string, each write, regardless of VFS position,
begins writing the string from the start.  This means the contents of
the last write to the sysctl controls the string contents instead of the
first:

  open("/proc/sys/kernel/modprobe", O_WRONLY)   = 1
  write(1, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 4096) = 4096
  write(1, "/bin/true", 9)                = 9
  close(1)                                = 0

  $ cat /proc/sys/kernel/modprobe
  /bin/true

Expected behaviour would be to have the sysctl be "AAAA..." capped at
maxlen (in this case KMOD_PATH_LEN: 256), instead of truncating to the
contents of the second write.  Similarly, multiple short writes would
not append to the sysctl.

The old behavior is unlike regular POSIX files enough that doing audits
of software that interact with sysctls can end up in unexpected or
dangerous situations.  For example, "as long as the input starts with a
trusted path" turns out to be an insufficient filter, as what must also
happen is for the input to be entirely contained in a single write
syscall -- not a common consideration, especially for high level tools.

This provides kernel.sysctl_writes_strict as a way to make this behavior
act in a less surprising manner for strings, and disallows non-zero file
position when writing numeric sysctls (similar to what is already done
when reading from non-zero file positions).  For now, the default (0) is
to warn about non-zero file position use, but retain the legacy
behavior.  Setting this to -1 disables the warning, and setting this to
1 enables the file position respecting behavior.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: move misplaced hunk, per Randy]
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Documentation/sysctl/vm.txt: clarify vfs_cache_pressure description</title>
<updated>2014-06-04T23:54:13Z</updated>
<author>
<name>Denys Vlasenko</name>
<email>dvlasenk@redhat.com</email>
</author>
<published>2014-06-04T23:11:03Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4a0da71b96b9d4080c0820e9e7d02470ebe62dc6'/>
<id>urn:sha1:4a0da71b96b9d4080c0820e9e7d02470ebe62dc6</id>
<content type='text'>
Existing description is worded in a way which almost encourages setting of
vfs_cache_pressure above 100, possibly way above it.

Users are left in a dark what this numeric value is - an int?  a
percentage?  what the scale is?

As a result, we are getting reports about noticeable performance
degradation from users who have set vfs_cache_pressure to ridiculously
high values - because they thought there is no downside to it.

Via code inspection it's obvious that this value is treated as a
percentage.  This patch changes text to reflect this fact, and adds a
cautionary paragraph advising against setting vfs_cache_pressure sky high.

Signed-off-by: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: disable zone_reclaim_mode by default</title>
<updated>2014-06-04T23:53:59Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2014-06-04T23:07:14Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4f9b16a64753d0bb607454347036dc997fd03b82'/>
<id>urn:sha1:4f9b16a64753d0bb607454347036dc997fd03b82</id>
<content type='text'>
When it was introduced, zone_reclaim_mode made sense as NUMA distances
punished and workloads were generally partitioned to fit into a NUMA
node.  NUMA machines are now common but few of the workloads are
NUMA-aware and it's routine to see major performance degradation due to
zone_reclaim_mode being enabled but relatively few can identify the
problem.

Those that require zone_reclaim_mode are likely to be able to detect
when it needs to be enabled and tune appropriately so lets have a
sensible default for the bulk of users.

This patch (of 2):

zone_reclaim_mode causes processes to prefer reclaiming memory from
local node instead of spilling over to other nodes.  This made sense
initially when NUMA machines were almost exclusively HPC and the
workload was partitioned into nodes.  The NUMA penalties were
sufficiently high to justify reclaiming the memory.  On current machines
and workloads it is often the case that zone_reclaim_mode destroys
performance but not all users know how to detect this.  Favour the
common case and disable it by default.  Users that are sophisticated
enough to know they need zone_reclaim_mode will detect it.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Zhang Yanfei &lt;zhangyanfei@cn.fujitsu.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Reviewed-by: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>hung_task: check the value of "sysctl_hung_task_timeout_sec"</title>
<updated>2014-04-07T23:36:07Z</updated>
<author>
<name>Liu Hua</name>
<email>sdu.liu@huawei.com</email>
</author>
<published>2014-04-07T22:38:57Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=80df28476505ed4e6701c3448c63c9229a50c655'/>
<id>urn:sha1:80df28476505ed4e6701c3448c63c9229a50c655</id>
<content type='text'>
As sysctl_hung_task_timeout_sec is unsigned long, when this value is
larger then LONG_MAX/HZ, the function schedule_timeout_interruptible in
watchdog will return immediately without sleep and with print :

  schedule_timeout: wrong timeout value ffffffffffffff83

and then the funtion watchdog will call schedule_timeout_interruptible
again and again.  The screen will be filled with

	"schedule_timeout: wrong timeout value ffffffffffffff83"

This patch does some check and correction in sysctl, to let the function
schedule_timeout_interruptible allways get the valid parameter.

Signed-off-by: Liu Hua &lt;sdu.liu@huawei.com&gt;
Tested-by: Satoru Takeuchi &lt;satoru.takeuchi@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.4+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux</title>
<updated>2014-04-06T16:38:07Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-04-06T16:38:07Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=6f4c98e1c22c28e00b8f050cce895a6b74db15d1'/>
<id>urn:sha1:6f4c98e1c22c28e00b8f050cce895a6b74db15d1</id>
<content type='text'>
Pull module updates from Rusty Russell:
 "Nothing major: the stricter permissions checking for sysfs broke a
  staging driver; fix included.  Greg KH said he'd take the patch but
  hadn't as the merge window opened, so it's included here to avoid
  breaking build"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  staging: fix up speakup kobject mode
  Use 'E' instead of 'X' for unsigned module taint flag.
  VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.
  kallsyms: fix percpu vars on x86-64 with relocation.
  kallsyms: generalize address range checking
  module: LLVMLinux: Remove unused function warning from __param_check macro
  Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
  module: remove MODULE_GENERIC_TABLE
  module: allow multiple calls to MODULE_DEVICE_TABLE() per module
  module: use pr_cont
</content>
</entry>
<entry>
<title>drop_caches: add some documentation and info message</title>
<updated>2014-04-03T23:21:04Z</updated>
<author>
<name>Dave Hansen</name>
<email>dave@linux.vnet.ibm.com</email>
</author>
<published>2014-04-03T21:48:19Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=5509a5d27b971a90b940e148ca9ca53312e4fa7a'/>
<id>urn:sha1:5509a5d27b971a90b940e148ca9ca53312e4fa7a</id>
<content type='text'>
There is plenty of anecdotal evidence and a load of blog posts
suggesting that using "drop_caches" periodically keeps your system
running in "tip top shape".  Perhaps adding some kernel documentation
will increase the amount of accurate data on its use.

If we are not shrinking caches effectively, then we have real bugs.
Using drop_caches will simply mask the bugs and make them harder to
find, but certainly does not fix them, nor is it an appropriate
"workaround" to limit the size of the caches.  On the contrary, there
have been bug reports on issues that turned out to be misguided use of
cache dropping.

Dropping caches is a very drastic and disruptive operation that is good
for debugging and running tests, but if it creates bug reports from
production use, kernel developers should be aware of its use.

Add a bit more documentation about it, a syslog message to track down
abusers, and vmstat drop counters to help analyze problem reports.

[akpm@linux-foundation.org: checkpatch fixes]
[hannes@cmpxchg.org: add runtime suppression control]
Signed-off-by: Dave Hansen &lt;dave@linux.vnet.ibm.com&gt;
Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2014-03-31T18:21:19Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-03-31T18:21:19Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=971eae7c99212dd67b425a603f1fe3b763359907'/>
<id>urn:sha1:971eae7c99212dd67b425a603f1fe3b763359907</id>
<content type='text'>
Pull scheduler changes from Ingo Molnar:
 "Bigger changes:

   - sched/idle restructuring: they are WIP preparation for deeper
     integration between the scheduler and idle state selection, by
     Nicolas Pitre.

   - add NUMA scheduling pseudo-interleaving, by Rik van Riel.

   - optimize cgroup context switches, by Peter Zijlstra.

   - RT scheduling enhancements, by Thomas Gleixner.

  The rest is smaller changes, non-urgnt fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits)
  sched: Clean up the task_hot() function
  sched: Remove double calculation in fix_small_imbalance()
  sched: Fix broken setscheduler()
  sparc64, sched: Remove unused sparc64_multi_core
  sched: Remove unused mc_capable() and smt_capable()
  sched/numa: Move task_numa_free() to __put_task_struct()
  sched/fair: Fix endless loop in idle_balance()
  sched/core: Fix endless loop in pick_next_task()
  sched/fair: Push down check for high priority class task into idle_balance()
  sched/rt: Fix picking RT and DL tasks from empty queue
  trace: Replace hardcoding of 19 with MAX_NICE
  sched: Guarantee task priority in pick_next_task()
  sched/idle: Remove stale old file
  sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHED
  cpuidle/arm64: Remove redundant cpuidle_idle_call()
  cpuidle/powernv: Remove redundant cpuidle_idle_call()
  sched, nohz: Exclude isolated cores from load balancing
  sched: Fix select_task_rq_fair() description comments
  workqueue: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE
  sys: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE
  ...
</content>
</entry>
<entry>
<title>Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE</title>
<updated>2014-03-13T01:41:51Z</updated>
<author>
<name>Mathieu Desnoyers</name>
<email>mathieu.desnoyers@efficios.com</email>
</author>
<published>2014-03-13T01:41:30Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=66cc69e34e86a231fbe68d8918c6119e3b7549a3'/>
<id>urn:sha1:66cc69e34e86a231fbe68d8918c6119e3b7549a3</id>
<content type='text'>
Users have reported being unable to trace non-signed modules loaded
within a kernel supporting module signature.

This is caused by tracepoint.c:tracepoint_module_coming() refusing to
take into account tracepoints sitting within force-loaded modules
(TAINT_FORCED_MODULE). The reason for this check, in the first place, is
that a force-loaded module may have a struct module incompatible with
the layout expected by the kernel, and can thus cause a kernel crash
upon forced load of that module on a kernel with CONFIG_TRACEPOINTS=y.

Tracepoints, however, specifically accept TAINT_OOT_MODULE and
TAINT_CRAP, since those modules do not lead to the "very likely system
crash" issue cited above for force-loaded modules.

With kernels having CONFIG_MODULE_SIG=y (signed modules), a non-signed
module is tainted re-using the TAINT_FORCED_MODULE taint flag.
Unfortunately, this means that Tracepoints treat that module as a
force-loaded module, and thus silently refuse to consider any tracepoint
within this module.

Since an unsigned module does not fit within the "very likely system
crash" category of tainting, add a new TAINT_UNSIGNED_MODULE taint flag
to specifically address this taint behavior, and accept those modules
within Tracepoints. We use the letter 'X' as a taint flag character for
a module being loaded that doesn't know how to sign its name (proposed
by Steven Rostedt).

Also add the missing 'O' entry to trace event show_module_flags() list
for the sake of completeness.

Signed-off-by: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Acked-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
NAKed-by: Ingo Molnar &lt;mingo@redhat.com&gt;
CC: Thomas Gleixner &lt;tglx@linutronix.de&gt;
CC: David Howells &lt;dhowells@redhat.com&gt;
CC: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
</content>
</entry>
</feed>
