<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm, branch v3.3</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/mm?h=v3.3</id>
<link rel='self' href='https://git.amat.us/linux/atom/mm?h=v3.3'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2012-03-16T00:03:03Z</updated>
<entry>
<title>memcg: free mem_cgroup by RCU to fix oops</title>
<updated>2012-03-16T00:03:03Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2012-03-15T22:17:07Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=59927fb984de1703c67bc640c3e522d8b5276c73'/>
<id>urn:sha1:59927fb984de1703c67bc640c3e522d8b5276c73</id>
<content type='text'>
After fixing the GPF in mem_cgroup_lru_del_list(), three times one
machine running a similar load (moving and removing memcgs while
swapping) has oopsed in mem_cgroup_zone_nr_lru_pages(), when retrieving
memcg zone numbers for get_scan_count() for shrink_mem_cgroup_zone():
this is where a struct mem_cgroup is first accessed after being chosen
by mem_cgroup_iter().

Just what protects a struct mem_cgroup from being freed, in between
mem_cgroup_iter()'s css_get_next() and its css_tryget()? css_tryget()
fails once css-&gt;refcnt is zero with CSS_REMOVED set in flags, yes: but
what if that memory is freed and reused for something else, which sets
"refcnt" non-zero? Hmm, and scope for an indefinite freeze if refcnt is
left at zero but flags are cleared.

It's tempting to move the css_tryget() into css_get_next(), to make it
really "get" the css, but I don't think that actually solves anything:
the same difficulty in moving from css_id found to stable css remains.

But we already have rcu_read_lock() around the two, so it's easily fixed
if __mem_cgroup_free() just uses kfree_rcu() to free mem_cgroup.

However, a big struct mem_cgroup is allocated with vzalloc() instead of
kzalloc(), and we're not allowed to vfree() at interrupt time: there
doesn't appear to be a general vfree_rcu() to help with this, so roll
our own using schedule_work().  The compiler decently removes
vfree_work() and vfree_rcu() when the config doesn't need them.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: revert fix to mapcount check for this release</title>
<updated>2012-03-09T23:32:20Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2012-03-09T21:37:32Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=be22aece684f5a700e6247b9861c3759d5798a3c'/>
<id>urn:sha1:be22aece684f5a700e6247b9861c3759d5798a3c</id>
<content type='text'>
Respectfully revert commit e6ca7b89dc76 "memcg: fix mapcount check
in move charge code for anonymous page" for the 3.3 release, so that
it behaves exactly like releases 2.6.35 through 3.2 in this respect.

Horiguchi-san's commit is correct in itself, 1 makes much more sense
than 2 in that check; but it does not go far enough - swapcount
should be considered too - if we really want such a check at all.

We appear to have reached agreement now, and expect that 3.4 will
remove the mapcount check, but had better not make 3.3 different.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vm: avoid using find_vma_prev() unnecessarily</title>
<updated>2012-03-07T02:23:36Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-03-07T02:23:36Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=097d59106a8e4b42d07c9892fdd7790f1659c6ff'/>
<id>urn:sha1:097d59106a8e4b42d07c9892fdd7790f1659c6ff</id>
<content type='text'>
Several users of "find_vma_prev()" were not in fact interested in the
previous vma if there was no primary vma to be found either.  And in
those cases, we're much better off just using the regular "find_vma()",
and then "prev" can be looked up by just checking vma-&gt;vm_prev.

The find_vma_prev() semantics are fairly subtle (see Mikulas' recent
commit 83cd904d271b: "mm: fix find_vma_prev"), and the whole "return
prev by reference" means that it generates worse code too.

Thus this "let's avoid using this inconvenient and clearly too subtle
interface when we don't really have to" patch.

Cc: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: fix find_vma_prev</title>
<updated>2012-03-07T00:48:03Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2012-03-05T00:52:03Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=83cd904d271ba960c53f3adbb037f3486518f1e6'/>
<id>urn:sha1:83cd904d271ba960c53f3adbb037f3486518f1e6</id>
<content type='text'>
Commit 6bd4837de96e ("mm: simplify find_vma_prev()") broke memory
management on PA-RISC.

After application of the patch, programs that allocate big arrays on the
stack crash with segfault, for example, this will crash if compiled
without optimization:

  int main()
  {
	char array[200000];
	array[199999] = 0;
	return 0;
  }

The reason is that PA-RISC has up-growing stack and the stack is usually
the last memory area.  In the above example, a page fault happens above
the stack.

Previously, if we passed too high address to find_vma_prev, it returned
NULL and stored the last VMA in *pprev.  After "simplify find_vma_prev"
change, it stores NULL in *pprev.  Consequently, the stack area is not
found and it is not expanded, as it used to be before the change.

This patch restores the old behavior and makes it return the last VMA in
*pprev if the requested address is higher than address of any other VMA.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Acked-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mmap: EINVAL not ENOMEM when rejecting VM_GROWS</title>
<updated>2012-03-06T21:49:08Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2012-03-06T20:28:52Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=ce8fea7aa4ad9e3b40999a08622ef27c77159659'/>
<id>urn:sha1:ce8fea7aa4ad9e3b40999a08622ef27c77159659</id>
<content type='text'>
Currently error is -ENOMEM when rejecting VM_GROWSDOWN|VM_GROWSUP
from shared anonymous: hoist the file case's -EINVAL up for both.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>page_cgroup: fix horrid swap accounting regression</title>
<updated>2012-03-06T16:18:23Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2012-03-06T04:52:55Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=c09ff089aa62380ad904ea785bd713c56720270e'/>
<id>urn:sha1:c09ff089aa62380ad904ea785bd713c56720270e</id>
<content type='text'>
Why is memcg's swap accounting so broken? Insane counts, wrong
ownership, unfreeable structures, which later get freed and then
accessed after free.

Turns out to be a tiny a little 3.3-rc1 regression in 9fb4b7cc0724
"page_cgroup: add helper function to get swap_cgroup": the helper
function (actually named lookup_swap_cgroup()) returns an address using
void* arithmetic, but the structure in question is a short.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: Bob Liu &lt;lliubbo@gmail.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Johannes Weiner &lt;jweiner@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (Andrew's patch bomb)</title>
<updated>2012-03-05T23:50:25Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-03-05T23:50:25Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3e85fb9cd4f711d70c5d26baa86e438390731eab'/>
<id>urn:sha1:3e85fb9cd4f711d70c5d26baa86e438390731eab</id>
<content type='text'>
Merge the emailed seties of 19 patches from Andrew Morton

* akpm:
  rapidio/tsi721: fix queue wrapping bug in inbound doorbell handler
  memcg: fix mapcount check in move charge code for anonymous page
  mm: thp: fix BUG on mm-&gt;nr_ptes
  alpha: fix 32/64-bit bug in futex support
  memcg: fix GPF when cgroup removal races with last exit
  debugobjects: Fix selftest for static warnings
  floppy/scsi: fix setting of BIO flags
  memcg: fix deadlock by inverting lrucare nesting
  drivers/rtc/rtc-r9701.c: fix crash in r9701_remove()
  c2port: class_create() returns an ERR_PTR
  pps: class_create() returns an ERR_PTR, not NULL
  hung_task: fix the broken rcu_lock_break() logic
  vfork: kill PF_STARTING
  coredump_wait: don't call complete_vfork_done()
  vfork: make it killable
  vfork: introduce complete_vfork_done()
  aio: wake up waiters when freeing unused kiocbs
  kprobes: return proper error code from register_kprobe()
  kmsg_dump: don't run on non-error paths by default
</content>
</entry>
<entry>
<title>memcg: fix mapcount check in move charge code for anonymous page</title>
<updated>2012-03-05T23:49:43Z</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2012-03-05T22:59:20Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=e6ca7b89dc76abf77c80887fed54e0a60c87c0a8'/>
<id>urn:sha1:e6ca7b89dc76abf77c80887fed54e0a60c87c0a8</id>
<content type='text'>
Currently the charge on shared anonyous pages is supposed not to moved in
task migration.  To implement this, we need to check that mapcount &gt; 1,
instread of &gt; 2.  So this patch fixes it.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Reviewed-by: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: thp: fix BUG on mm-&gt;nr_ptes</title>
<updated>2012-03-05T23:49:43Z</updated>
<author>
<name>Andrea Arcangeli</name>
<email>aarcange@redhat.com</email>
</author>
<published>2012-03-05T22:59:20Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=1c641e84719429bbfe62a95ed3545ee7fe24408f'/>
<id>urn:sha1:1c641e84719429bbfe62a95ed3545ee7fe24408f</id>
<content type='text'>
Dave Jones reports a few Fedora users hitting the BUG_ON(mm-&gt;nr_ptes...)
in exit_mmap() recently.

Quoting Hugh's discovery and explanation of the SMP race condition:

  "mm-&gt;nr_ptes had unusual locking: down_read mmap_sem plus
   page_table_lock when incrementing, down_write mmap_sem (or mm_users
   0) when decrementing; whereas THP is careful to increment and
   decrement it under page_table_lock.

   Now most of those paths in THP also hold mmap_sem for read or write
   (with appropriate checks on mm_users), but two do not: when
   split_huge_page() is called by hwpoison_user_mappings(), and when
   called by add_to_swap().

   It's conceivable that the latter case is responsible for the
   exit_mmap() BUG_ON mm-&gt;nr_ptes that has been reported on Fedora."

The simplest way to fix it without having to alter the locking is to make
split_huge_page() a noop in nr_ptes terms, so by counting the preallocated
pagetables that exists for every mapped hugepage.  It was an arbitrary
choice not to count them and either way is not wrong or right, because
they are not used but they're still allocated.

Reported-by: Dave Jones &lt;davej@redhat.com&gt;
Reported-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Josh Boyer &lt;jwboyer@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.0.x, 3.1.x, 3.2.x]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: fix GPF when cgroup removal races with last exit</title>
<updated>2012-03-05T23:49:43Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2012-03-05T22:59:18Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=7512102cf64d36e3c7444480273623c7aab3563f'/>
<id>urn:sha1:7512102cf64d36e3c7444480273623c7aab3563f</id>
<content type='text'>
When moving tasks from old memcg (with move_charge_at_immigrate on new
memcg), followed by removal of old memcg, hit General Protection Fault in
mem_cgroup_lru_del_list() (called from release_pages called from
free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from
exit_mmap from mmput from exit_mm from do_exit).

Somewhat reproducible, takes a few hours: the old struct mem_cgroup has
been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is
still trying to update its stats, and take page off lru before freeing.

A task, or a charge, or a page on lru: each secures a memcg against
removal.  In this case, the last task has been moved out of the old memcg,
and it is exiting: anonymous pages are uncharged one by one from the
memcg, as they are zapped from its pagetables, so the charge gets down to
0; but the pages themselves are queued in an mmu_gather for freeing.

Most of those pages will be on lru (and force_empty is careful to
lru_add_drain_all, to add pages from pagevec to lru first), but not
necessarily all: perhaps some have been isolated for page reclaim, perhaps
some isolated for other reasons.  So, force_empty may find no task, no
charge and no page on lru, and let the removal proceed.

There would still be no problem if these pages were immediately freed; but
typically (and the put_page_testzero protocol demands it) they have to be
added back to lru before they are found freeable, then removed from lru
and freed.  We don't see the issue when adding, because the
mem_cgroup_iter() loops keep their own reference to the memcg being
scanned; but when it comes to mem_cgroup_lru_del_list().

I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and
PageCgroupUsed flags were used (like a trick with mirrors) to deflect view
of pc-&gt;mem_cgroup to the stable root_mem_cgroup when neither set.
38c5d72f3ebe ("memcg: simplify LRU handling by new rule") mercifully
removed those convolutions, but left this General Protection Fault.

But it's surprisingly easy to restore the old behaviour: just check
PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec
to add), and reset pc to root_mem_cgroup if page is uncharged.  A risky
change?  just going back to how it worked before; testing, and an audit of
uses of pc-&gt;mem_cgroup, show no problem.

And there's a nice bonus: with mem_cgroup_lru_add_list() itself making
sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no
longer has any purpose, and we can safely revert 4e5f01c2b9b9 ("memcg:
clear pc-&gt;mem_cgroup if necessary").

Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c
is not strictly necessary: the lru_lock there, with RCU before memcg
structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe
without that; but it seems cleaner to rely on one dependency less.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
