<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel, branch v2.6.33.19</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/kernel?h=v2.6.33.19</id>
<link rel='self' href='https://git.amat.us/linux/atom/kernel?h=v2.6.33.19'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2011-08-29T21:33:00Z</updated>
<entry>
<title>futex: Fix regression with read only mappings</title>
<updated>2011-08-29T21:33:00Z</updated>
<author>
<name>Shawn Bohrer</name>
<email>sbohrer@rgmadvisors.com</email>
</author>
<published>2011-06-30T16:21:32Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=1d57518f2aa045a64886781a66b42e072a9e7bf6'/>
<id>urn:sha1:1d57518f2aa045a64886781a66b42e072a9e7bf6</id>
<content type='text'>
commit 9ea71503a8ed9184d2d0b8ccc4d269d05f7940ae upstream.

commit 7485d0d3758e8e6491a5c9468114e74dc050785d (futexes: Remove rw
parameter from get_futex_key()) in 2.6.33 fixed two problems:  First, It
prevented a loop when encountering a ZERO_PAGE. Second, it fixed RW
MAP_PRIVATE futex operations by forcing the COW to occur by
unconditionally performing a write access get_user_pages_fast() to get
the page.  The commit also introduced a user-mode regression in that it
broke futex operations on read-only memory maps.  For example, this
breaks workloads that have one or more reader processes doing a
FUTEX_WAIT on a futex within a read only shared file mapping, and a
writer processes that has a writable mapping issuing the FUTEX_WAKE.

This fixes the regression for valid futex operations on RO mappings by
trying a RO get_user_pages_fast() when the RW get_user_pages_fast()
fails. This change makes it necessary to also check for invalid use
cases, such as anonymous RO mappings (which can never change) and the
ZERO_PAGE which the commit referenced above was written to address.

This patch does restore the original behavior with RO MAP_PRIVATE
mappings, which have inherent user-mode usage problems and don't really
make sense.  With this patch performing a FUTEX_WAIT within a RO
MAP_PRIVATE mapping will be successfully woken provided another process
updates the region of the underlying mapped file.  However, the mmap()
man page states that for a MAP_PRIVATE mapping:

  It is unspecified whether changes made to the file after
  the mmap() call are visible in the mapped region.

So user-mode users attempting to use futex operations on RO MAP_PRIVATE
mappings are depending on unspecified behavior.  Additionally a
RO MAP_PRIVATE mapping could fail to wake up in the following case.

  Thread-A: call futex(FUTEX_WAIT, memory-region-A).
            get_futex_key() return inode based key.
            sleep on the key
  Thread-B: call mprotect(PROT_READ|PROT_WRITE, memory-region-A)
  Thread-B: write memory-region-A.
            COW happen. This process's memory-region-A become related
            to new COWed private (ie PageAnon=1) page.
  Thread-B: call futex(FUETX_WAKE, memory-region-A).
            get_futex_key() return mm based key.
            IOW, we fail to wake up Thread-A.

Once again doing something like this is just silly and users who do
something like this get what they deserve.

While RO MAP_PRIVATE mappings are nonsensical, checking for a private
mapping requires walking the vmas and was deemed too costly to avoid a
userspace hang.

This Patch is based on Peter Zijlstra's initial patch with modifications to
only allow RO mappings for futex operations that need VERIFY_READ access.

Reported-by: David Oliver &lt;david@rgmadvisors.com&gt;
Signed-off-by: Shawn Bohrer &lt;sbohrer@rgmadvisors.com&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: peterz@infradead.org
Cc: eric.dumazet@gmail.com
Cc: zvonler@rgmadvisors.com
Cc: hughd@google.com
Link: http://lkml.kernel.org/r/1309450892-30676-1-git-send-email-sbohrer@rgmadvisors.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;


</content>
</entry>
<entry>
<title>PM / Hibernate: Fix free_unnecessary_pages()</title>
<updated>2011-07-13T03:31:26Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rjw@sisk.pl</email>
</author>
<published>2011-07-06T18:15:23Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=386f12d6c880e3c7b9a31da1c7b5cca3500470d0'/>
<id>urn:sha1:386f12d6c880e3c7b9a31da1c7b5cca3500470d0</id>
<content type='text'>
commit 4d4cf23cdde2f8f9324f5684a7f349e182039529 upstream.

There is a bug in free_unnecessary_pages() that causes it to
attempt to free too many pages in some cases, which triggers the
BUG_ON() in memory_bm_clear_bit() for copy_bm.  Namely, if
count_data_pages() is initially greater than alloc_normal, we get
to_free_normal equal to 0 and "save" greater from 0.  In that case,
if the sum of "save" and count_highmem_pages() is greater than
alloc_highmem, we subtract a positive number from to_free_normal.
Hence, since to_free_normal was 0 before the subtraction and is
an unsigned int, the result is converted to a huge positive number
that is used as the number of pages to free.

Fix this bug by checking if to_free_normal is actually greater
than or equal to the number we're going to subtract from it.

Signed-off-by: Rafael J. Wysocki &lt;rjw@sisk.pl&gt;
Reported-and-tested-by: Matthew Garrett &lt;mjg@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>PM / Hibernate: Avoid hitting OOM during preallocation of memory</title>
<updated>2011-07-13T03:31:26Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rjw@sisk.pl</email>
</author>
<published>2010-09-11T18:58:27Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=87f97028802669d593a5f8b99249d47b02a08a30'/>
<id>urn:sha1:87f97028802669d593a5f8b99249d47b02a08a30</id>
<content type='text'>
commit 6715045ddc7472a22be5e49d4047d2d89b391f45 upstream.

There is a problem in hibernate_preallocate_memory() that it calls
preallocate_image_memory() with an argument that may be greater than
the total number of available non-highmem memory pages.  If that's
the case, the OOM condition is guaranteed to trigger, which in turn
can cause significant slowdown to occur during hibernation.

To avoid that, make preallocate_image_memory() adjust its argument
before calling preallocate_image_pages(), so that the total number of
saveable non-highem pages left is not less than the minimum size of
a hibernation image.  Change hibernate_preallocate_memory() to try to
allocate from highmem if the number of pages allocated by
preallocate_image_memory() is too low.

Modify free_unnecessary_pages() to take all possible memory
allocation patterns into account.

Reported-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rjw@sisk.pl&gt;
Tested-by: M. Vefa Bicakci &lt;bicave@superonline.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>taskstats: don't allow duplicate entries in listener mode</title>
<updated>2011-07-13T03:31:25Z</updated>
<author>
<name>Vasiliy Kulikov</name>
<email>segoon@openwall.com</email>
</author>
<published>2011-06-27T23:18:11Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=dbc6022c6cf86812bc622378913fc73fa511201c'/>
<id>urn:sha1:dbc6022c6cf86812bc622378913fc73fa511201c</id>
<content type='text'>
commit 26c4caea9d697043cc5a458b96411b86d7f6babd upstream.

Currently a single process may register exit handlers unlimited times.
It may lead to a bloated listeners chain and very slow process
terminations.

Eg after 10KK sent TASKSTATS_CMD_ATTR_REGISTER_CPUMASKs ~300 Mb of
kernel memory is stolen for the handlers chain and "time id" shows 2-7
seconds instead of normal 0.003.  It makes it possible to exhaust all
kernel memory and to eat much of CPU time by triggerring numerous exits
on a single CPU.

The patch limits the number of times a single process may register
itself on a single CPU to one.

One little issue is kept unfixed - as taskstats_exit() is called before
exit_files() in do_exit(), the orphaned listener entry (if it was not
explicitly deregistered) is kept until the next someone's exit() and
implicit deregistration in send_cpu_listeners().  So, if a process
registered itself as a listener exits and the next spawned process gets
the same pid, it would inherit taskstats attributes.

Signed-off-by: Vasiliy Kulikov &lt;segooon@gmail.com&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>PM: Free memory bitmaps if opening /dev/snapshot fails</title>
<updated>2011-07-13T03:31:24Z</updated>
<author>
<name>Michal Kubecek</name>
<email>mkubecek@suse.cz</email>
</author>
<published>2011-06-18T18:34:01Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=92e438c204e4ae51e706725b6c007591e62504f2'/>
<id>urn:sha1:92e438c204e4ae51e706725b6c007591e62504f2</id>
<content type='text'>
commit 8440f4b19494467883f8541b7aa28c7bbf6ac92b upstream.

When opening /dev/snapshot device, snapshot_open() creates memory
bitmaps which are freed in snapshot_release(). But if any of the
callbacks called by pm_notifier_call_chain() returns NOTIFY_BAD, open()
fails, snapshot_release() is never called and bitmaps are not freed.
Next attempt to open /dev/snapshot then triggers BUG_ON() check in
create_basic_memory_bitmaps(). This happens e.g. when vmwatchdog module
is active on s390x.

Signed-off-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Signed-off-by: Rafael J. Wysocki &lt;rjw@sisk.pl&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>clocksource: Make watchdog robust vs. interruption</title>
<updated>2011-07-13T03:31:23Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2011-06-16T14:22:08Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=43b951e4185f3132e77e6340d1aed42e90618e4b'/>
<id>urn:sha1:43b951e4185f3132e77e6340d1aed42e90618e4b</id>
<content type='text'>
commit b5199515c25cca622495eb9c6a8a1d275e775088 upstream.

The clocksource watchdog code is interruptible and it has been
observed that this can trigger false positives which disable the TSC.

The reason is that an interrupt storm or a long running interrupt
handler between the read of the watchdog source and the read of the
TSC brings the two far enough apart that the delta is larger than the
unstable treshold. Move both reads into a short interrupt disabled
region to avoid that.

Reported-and-tested-by: Vernon Mauery &lt;vernux@us.ibm.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>lockdep: Fix lock_is_held() on recursion</title>
<updated>2011-06-23T22:28:40Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-06-06T10:32:43Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=904b61178e165b531ea656c200719d2e585d313d'/>
<id>urn:sha1:904b61178e165b531ea656c200719d2e585d313d</id>
<content type='text'>
commit f2513cde93f0957d5dc6c09bc24b0cccd27d8e1d upstream.

The main lock_is_held() user is lockdep_assert_held(), avoid false
assertions in lockdep_off() sections by unconditionally reporting the
lock is taken.

[ the reason this is important is a lockdep_assert_held() in ttwu()
  which triggers a warning under lockdep_off() as in printk() which
  can trigger another wakeup and lock up due to spinlock
  recursion, as reported and heroically debugged by Arne Jansen ]

Reported-and-tested-by: Arne Jansen &lt;lists@die-jansens.de&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1307398759.2497.966.camel@laptop
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>time: Compensate for rounding on odd-frequency clocksources</title>
<updated>2011-06-23T22:28:35Z</updated>
<author>
<name>Kasper Pedersen</name>
<email>kkp2010@kasperkp.dk</email>
</author>
<published>2010-10-20T22:55:15Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=73048d1e37649b9cfc45fbc2c51383994354a35f'/>
<id>urn:sha1:73048d1e37649b9cfc45fbc2c51383994354a35f</id>
<content type='text'>
commit a386b5af8edda1c742ce9f77891e112eefffc005 upstream.

When the clocksource is not a multiple of HZ, the clock will be off.  For
acpi_pm, HZ=1000 the error is 127.111 ppm:

The rounding of cycle_interval ends up generating a false error term in
ntp_error accumulation since xtime_interval is not exactly 1/HZ.  So, we
subtract out the error caused by the rounding.

This has been visible since 2.6.32-rc2
	commit a092ff0f90cae22b2ac8028ecd2c6f6c1a9e4601
	time: Implement logarithmic time accumulation
That commit raised NTP_INTERVAL_FREQ and exposed the rounding error.

testing tool: http://n1.taur.dk/permanent/testpmt.c
Also tested with ntpd and a frequency counter.

Signed-off-by: Kasper Pedersen &lt;kkp2010@kasperkp.dk&gt;
Acked-by: john stultz &lt;johnstul@us.ibm.com&gt;
Cc: John Kacur &lt;jkacur@redhat.com&gt;
Cc: Clark Williams &lt;williams@redhat.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>ftrace: Only update the function code on write to filter files</title>
<updated>2011-06-23T22:28:33Z</updated>
<author>
<name>Steven Rostedt</name>
<email>srostedt@redhat.com</email>
</author>
<published>2011-04-30T02:35:33Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=7116a3c50456ac9634ce305fd51ad24fcbf13e58'/>
<id>urn:sha1:7116a3c50456ac9634ce305fd51ad24fcbf13e58</id>
<content type='text'>
commit 058e297d34a404caaa5ed277de15698d8dc43000 upstream.

If function tracing is enabled, a read of the filter files will
cause the call to stop_machine to update the function trace sites.
It should only call stop_machine on write.

Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>tick: Clear broadcast active bit when switching to oneshot</title>
<updated>2011-05-23T18:23:01Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2011-05-16T09:07:48Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d338c626a587146115812da877de871324b33f65'/>
<id>urn:sha1:d338c626a587146115812da877de871324b33f65</id>
<content type='text'>
commit 07f4beb0b5bbfaf36a64aa00d59e670ec578a95a upstream.

The first cpu which switches from periodic to oneshot mode switches
also the broadcast device into oneshot mode. The broadcast device
serves as a backup for per cpu timers which stop in deeper
C-states. To avoid starvation of the cpus which might be in idle and
depend on broadcast mode it marks the other cpus as broadcast active
and sets the brodcast expiry value of those cpus to the next tick.

The oneshot mode broadcast bit for the other cpus is sticky and gets
only cleared when those cpus exit idle. If a cpu was not idle while
the bit got set in consequence the bit prevents that the broadcast
device is armed on behalf of that cpu when it enters idle for the
first time after it switched to oneshot mode.

In most cases that goes unnoticed as one of the other cpus has usually
a timer pending which keeps the broadcast device armed with a short
timeout. Now if the only cpu which has a short timer active has the
bit set then the broadcast device will not be armed on behalf of that
cpu and will fire way after the expected timer expiry. In the case of
Christians bug report it took ~145 seconds which is about half of the
wrap around time of HPET (the limit for that device) due to the fact
that all other cpus had no timers armed which expired before the 145
seconds timeframe.

The solution is simply to clear the broadcast active bit
unconditionally when a cpu switches to oneshot mode after the first
cpu switched the broadcast device over. It's not idle at that point
otherwise it would not be executing that code.

[ I fundamentally hate that broadcast crap. Why the heck thought some
  folks that when going into deep idle it's a brilliant concept to
  switch off the last device which brings the cpu back from that
  state? ]

Thanks to Christian for providing all the valuable debug information!

Reported-and-tested-by: Christian Hoffmann &lt;email@christianhoffmann.info&gt;
Cc: John Stultz &lt;johnstul@us.ibm.com&gt;
Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1105161105170.3078%40ionos%3E
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
</feed>
