<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm, branch v3.14</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/mm?h=v3.14</id>
<link rel='self' href='https://git.amat.us/linux/atom/mm?h=v3.14'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2014-03-21T05:09:09Z</updated>
<entry>
<title>mm: fix swapops.h:131 bug if remap_file_pages raced migration</title>
<updated>2014-03-21T05:09:09Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2014-03-21T04:52:17Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=7e09e738afd21ef99f047425fc0b0c9be8b03254'/>
<id>urn:sha1:7e09e738afd21ef99f047425fc0b0c9be8b03254</id>
<content type='text'>
Add remove_linear_migration_ptes_from_nonlinear(), to fix an interesting
little include/linux/swapops.h:131 BUG_ON(!PageLocked) found by trinity:
indicating that remove_migration_ptes() failed to find one of the
migration entries that was temporarily inserted.

The problem comes from remap_file_pages()'s switch from vma_interval_tree
(good for inserting the migration entry) to i_mmap_nonlinear list (no good
for locating it again); but can only be a problem if the remap_file_pages()
range does not cover the whole of the vma (zap_pte() clears the range).

remove_migration_ptes() needs a file_nonlinear method to go down the
i_mmap_nonlinear list, applying linear location to look for migration
entries in those vmas too, just in case there was this race.

The file_nonlinear method does need rmap_walk_control.arg to do this;
but it never needed vma passed in - vma comes from its own iteration.

Reported-and-tested-by: Dave Jones &lt;davej@redhat.com&gt;
Reported-and-tested-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: fix bad rss-counter if remap_file_pages raced migration</title>
<updated>2014-03-19T23:21:49Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2014-03-19T00:38:38Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=887843961c4b4681ee993c36d4997bf4b4aa8253'/>
<id>urn:sha1:887843961c4b4681ee993c36d4997bf4b4aa8253</id>
<content type='text'>
Fix some "Bad rss-counter state" reports on exit, arising from the
interaction between page migration and remap_file_pages(): zap_pte()
must count a migration entry when zapping it.

And yes, it is possible (though very unusual) to find an anon page or
swap entry in a VM_SHARED nonlinear mapping: coming from that horrid
get_user_pages(write, force) case which COWs even in a shared mapping.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Tested-by: Sasha Levin sasha.levin@oracle.com&gt;
Tested-by: Dave Jones davej@redhat.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/Kconfig: fix URL for zsmalloc benchmark</title>
<updated>2014-03-11T00:26:20Z</updated>
<author>
<name>Ben Hutchings</name>
<email>ben@decadent.org.uk</email>
</author>
<published>2014-03-10T22:49:46Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=2216ee853017f9c9371106c5c02d4fe42f61cbfa'/>
<id>urn:sha1:2216ee853017f9c9371106c5c02d4fe42f61cbfa</id>
<content type='text'>
The help text for CONFIG_PGTABLE_MAPPING has an incorrect URL.  While
we're at it, remove the unnecessary footnote notation.

Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Acked-by: Minchan Kim &lt;minchan@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/compaction: break out of loop on !PageBuddy in isolate_freepages_block</title>
<updated>2014-03-11T00:26:20Z</updated>
<author>
<name>Laura Abbott</name>
<email>lauraa@codeaurora.org</email>
</author>
<published>2014-03-10T22:49:44Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=2af120bc040c5ebcda156df6be6a66610ab6957f'/>
<id>urn:sha1:2af120bc040c5ebcda156df6be6a66610ab6957f</id>
<content type='text'>
We received several reports of bad page state when freeing CMA pages
previously allocated with alloc_contig_range:

    BUG: Bad page state in process Binder_A  pfn:63202
    page:d21130b0 count:0 mapcount:1 mapping:  (null) index:0x7dfbf
    page flags: 0x40080068(uptodate|lru|active|swapbacked)

Based on the page state, it looks like the page was still in use.  The
page flags do not make sense for the use case though.  Further debugging
showed that despite alloc_contig_range returning success, at least one
page in the range still remained in the buddy allocator.

There is an issue with isolate_freepages_block.  In strict mode (which
CMA uses), if any pages in the range cannot be isolated,
isolate_freepages_block should return failure 0.  The current check
keeps track of the total number of isolated pages and compares against
the size of the range:

        if (strict &amp;&amp; nr_strict_required &gt; total_isolated)
                total_isolated = 0;

After taking the zone lock, if one of the pages in the range is not in
the buddy allocator, we continue through the loop and do not increment
total_isolated.  If in the last iteration of the loop we isolate more
than one page (e.g.  last page needed is a higher order page), the check
for total_isolated may pass and we fail to detect that a page was
skipped.  The fix is to bail out if the loop immediately if we are in
strict mode.  There's no benfit to continuing anyway since we need all
pages to be isolated.  Additionally, drop the error checking based on
nr_strict_required and just check the pfn ranges.  This matches with
what isolate_freepages_range does.

Signed-off-by: Laura Abbott &lt;lauraa@codeaurora.org&gt;
Acked-by: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Acked-by: Bartlomiej Zolnierkiewicz &lt;b.zolnierkie@samsung.com&gt;
Acked-by: Michal Nazarewicz &lt;mina86@mina86.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: fix GFP_THISNODE callers and clarify</title>
<updated>2014-03-11T00:26:19Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2014-03-10T22:49:43Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=e97ca8e5b864f88b028c1759ba8536fa827d6d96'/>
<id>urn:sha1:e97ca8e5b864f88b028c1759ba8536fa827d6d96</id>
<content type='text'>
GFP_THISNODE is for callers that implement their own clever fallback to
remote nodes.  It restricts the allocation to the specified node and
does not invoke reclaim, assuming that the caller will take care of it
when the fallback fails, e.g.  through a subsequent allocation request
without GFP_THISNODE set.

However, many current GFP_THISNODE users only want the node exclusive
aspect of the flag, without actually implementing their own fallback or
triggering reclaim if necessary.  This results in things like page
migration failing prematurely even when there is easily reclaimable
memory available, unless kswapd happens to be running already or a
concurrent allocation attempt triggers the necessary reclaim.

Convert all callsites that don't implement their own fallback strategy
to __GFP_THISNODE.  This restricts the allocation a single node too, but
at the same time allows the allocator to enter the slowpath, wake
kswapd, and invoke direct reclaim if necessary, to make the allocation
happen when memory is full.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Jan Stancek &lt;jstancek@redhat.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: page_alloc: exempt GFP_THISNODE allocations from zone fairness</title>
<updated>2014-03-04T15:55:50Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2014-03-03T23:38:41Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=27329369c9ecf37771b2a65202cbf5578cff3331'/>
<id>urn:sha1:27329369c9ecf37771b2a65202cbf5578cff3331</id>
<content type='text'>
Jan Stancek reports manual page migration encountering allocation
failures after some pages when there is still plenty of memory free, and
bisected the problem down to commit 81c0a2bb515f ("mm: page_alloc: fair
zone allocator policy").

The problem is that GFP_THISNODE obeys the zone fairness allocation
batches on one hand, but doesn't reset them and wake kswapd on the other
hand.  After a few of those allocations, the batches are exhausted and
the allocations fail.

Fixing this means either having GFP_THISNODE wake up kswapd, or
GFP_THISNODE not participating in zone fairness at all.  The latter
seems safer as an acute bugfix, we can clean up later.

Reported-by: Jan Stancek &lt;jstancek@redhat.com&gt;
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: &lt;stable@kernel.org&gt;		[3.12+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking</title>
<updated>2014-03-04T15:55:48Z</updated>
<author>
<name>Vlastimil Babka</name>
<email>vbabka@suse.cz</email>
</author>
<published>2014-03-03T23:38:27Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9050d7eba40b3d79551668f54e68fd6f51945ef3'/>
<id>urn:sha1:9050d7eba40b3d79551668f54e68fd6f51945ef3</id>
<content type='text'>
Daniel Borkmann reported a VM_BUG_ON assertion failing:

  ------------[ cut here ]------------
  kernel BUG at mm/mlock.c:528!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ccm arc4 iwldvm [...]
   video
  CPU: 3 PID: 2266 Comm: netsniff-ng Not tainted 3.14.0-rc2+ #8
  Hardware name: LENOVO 2429BP3/2429BP3, BIOS G4ET37WW (1.12 ) 05/29/2012
  task: ffff8801f87f9820 ti: ffff88002cb44000 task.ti: ffff88002cb44000
  RIP: 0010:[&lt;ffffffff81171ad0&gt;]  [&lt;ffffffff81171ad0&gt;] munlock_vma_pages_range+0x2e0/0x2f0
  Call Trace:
    do_munmap+0x18f/0x3b0
    vm_munmap+0x41/0x60
    SyS_munmap+0x22/0x30
    system_call_fastpath+0x1a/0x1f
  RIP   munlock_vma_pages_range+0x2e0/0x2f0
  ---[ end trace a0088dcf07ae10f2 ]---

because munlock_vma_pages_range() thinks it's unexpectedly in the middle
of a THP page.  This can be reproduced with default config since 3.11
kernels.  A reproducer can be found in the kernel's selftest directory
for networking by running ./psock_tpacket.

The problem is that an order=2 compound page (allocated by
alloc_one_pg_vec_page() is part of the munlocked VM_MIXEDMAP vma (mapped
by packet_mmap()) and mistaken for a THP page and assumed to be order=9.

The checks for THP in munlock came with commit ff6a6da60b89 ("mm:
accelerate munlock() treatment of THP pages"), i.e.  since 3.9, but did
not trigger a bug.  It just makes munlock_vma_pages_range() skip such
compound pages until the next 512-pages-aligned page, when it encounters
a head page.  This is however not a problem for vma's where mlocking has
no effect anyway, but it can distort the accounting.

Since commit 7225522bb429 ("mm: munlock: batch non-THP page isolation
and munlock+putback using pagevec") this can trigger a VM_BUG_ON in
PageTransHuge() check.

This patch fixes the issue by adding VM_MIXEDMAP flag to VM_SPECIAL, a
list of flags that make vma's non-mlockable and non-mergeable.  The
reasoning is that VM_MIXEDMAP vma's are similar to VM_PFNMAP, which is
already on the VM_SPECIAL list, and both are intended for non-LRU pages
where mlocking makes no sense anyway.  Related Lkml discussion can be
found in [2].

 [1] tools/testing/selftests/net/psock_tpacket
 [2] https://lkml.org/lkml/2014/1/10/427

Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Reported-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Tested-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Cc: Thomas Hellstrom &lt;thellstrom@vmware.com&gt;
Cc: John David Anglin &lt;dave.anglin@bell.net&gt;
Cc: HATAYAMA Daisuke &lt;d.hatayama@jp.fujitsu.com&gt;
Cc: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Cc: Carsten Otte &lt;cotte@de.ibm.com&gt;
Cc: Jared Hulbert &lt;jaredeh@gmail.com&gt;
Tested-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt; [3.11.x+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: reparent charges of children before processing parent</title>
<updated>2014-03-04T15:55:48Z</updated>
<author>
<name>Filipe Brandenburger</name>
<email>filbranden@google.com</email>
</author>
<published>2014-03-03T23:38:25Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4fb1a86fb5e4209a7d4426d4e586c58e9edc74ac'/>
<id>urn:sha1:4fb1a86fb5e4209a7d4426d4e586c58e9edc74ac</id>
<content type='text'>
Sometimes the cleanup after memcg hierarchy testing gets stuck in
mem_cgroup_reparent_charges(), unable to bring non-kmem usage down to 0.

There may turn out to be several causes, but a major cause is this: the
workitem to offline parent can get run before workitem to offline child;
parent's mem_cgroup_reparent_charges() circles around waiting for the
child's pages to be reparented to its lrus, but it's holding
cgroup_mutex which prevents the child from reaching its
mem_cgroup_reparent_charges().

Further testing showed that an ordered workqueue for cgroup_destroy_wq
is not always good enough: percpu_ref_kill_and_confirm's call_rcu_sched
stage on the way can mess up the order before reaching the workqueue.

Instead, when offlining a memcg, call mem_cgroup_reparent_charges() on
all its children (and grandchildren, in the correct order) to have their
charges reparented first.

Fixes: e5fca243abae ("cgroup: use a dedicated workqueue for cgroup destruction")
Signed-off-by: Filipe Brandenburger &lt;filbranden@google.com&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[v3.10+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: fix endless loop in __mem_cgroup_iter_next()</title>
<updated>2014-03-04T15:55:47Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2014-03-03T23:38:24Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=ce48225fe3b1b0d1fc9fceb96ac3d8a879e45114'/>
<id>urn:sha1:ce48225fe3b1b0d1fc9fceb96ac3d8a879e45114</id>
<content type='text'>
Commit 0eef615665ed ("memcg: fix css reference leak and endless loop in
mem_cgroup_iter") got the interaction with the commit a few before it
d8ad30559715 ("mm/memcg: iteration skip memcgs not yet fully
initialized") slightly wrong, and we didn't notice at the time.

It's elusive, and harder to get than the original, but for a couple of
days before rc1, I several times saw a endless loop similar to that
supposedly being fixed.

This time it was a tighter loop in __mem_cgroup_iter_next(): because we
can get here when our root has already been offlined, and the ordering
of conditions was such that we then just cycled around forever.

Fixes: 0eef615665ed ("memcg: fix css reference leak and endless loop in mem_cgroup_iter").
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.12+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: close PageTail race</title>
<updated>2014-03-04T15:55:47Z</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2014-03-03T23:38:18Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=668f9abbd4334e6c29fa8acd71635c4f9101caa7'/>
<id>urn:sha1:668f9abbd4334e6c29fa8acd71635c4f9101caa7</id>
<content type='text'>
Commit bf6bddf1924e ("mm: introduce compaction and migration for
ballooned pages") introduces page_count(page) into memory compaction
which dereferences page-&gt;first_page if PageTail(page).

This results in a very rare NULL pointer dereference on the
aforementioned page_count(page).  Indeed, anything that does
compound_head(), including page_count() is susceptible to racing with
prep_compound_page() and seeing a NULL or dangling page-&gt;first_page
pointer.

This patch uses Andrea's implementation of compound_trans_head() that
deals with such a race and makes it the default compound_head()
implementation.  This includes a read memory barrier that ensures that
if PageTail(head) is true that we return a head page that is neither
NULL nor dangling.  The patch then adds a store memory barrier to
prep_compound_page() to ensure page-&gt;first_page is set.

This is the safest way to ensure we see the head page that we are
expecting, PageTail(page) is already in the unlikely() path and the
memory barriers are unfortunately required.

Hugetlbfs is the exception, we don't enforce a store memory barrier
during init since no race is possible.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Holger Kiehl &lt;Holger.Kiehl@dwd.de&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Rafael Aquini &lt;aquini@redhat.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
