<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm, branch v3.0.87</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/mm?h=v3.0.87</id>
<link rel='self' href='https://git.amat.us/linux/atom/mm?h=v3.0.87'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2013-06-20T18:28:20Z</updated>
<entry>
<title>mm: migration: add migrate_entry_wait_huge()</title>
<updated>2013-06-20T18:28:20Z</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2013-06-12T21:05:04Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=79848ba66d91e0c171ff203363e0c96629279c15'/>
<id>urn:sha1:79848ba66d91e0c171ff203363e0c96629279c15</id>
<content type='text'>
commit 30dad30922ccc733cfdbfe232090cf674dc374dc upstream.

When we have a page fault for the address which is backed by a hugepage
under migration, the kernel can't wait correctly and do busy looping on
hugepage fault until the migration finishes.  As a result, users who try
to kick hugepage migration (via soft offlining, for example) occasionally
experience long delay or soft lockup.

This is because pte_offset_map_lock() can't get a correct migration entry
or a correct page table lock for hugepage.  This patch introduces
migration_entry_wait_huge() to solve this.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Reviewed-by: Wanpeng Li &lt;liwanp@linux.vnet.ibm.com&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion</title>
<updated>2013-06-20T18:28:20Z</updated>
<author>
<name>Rafael Aquini</name>
<email>aquini@redhat.com</email>
</author>
<published>2013-06-12T21:04:49Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=78ef884ebb6c02d45abefc95350f428be7390a26'/>
<id>urn:sha1:78ef884ebb6c02d45abefc95350f428be7390a26</id>
<content type='text'>
commit cbab0e4eec299e9059199ebe6daf48730be46d2b upstream.

read_swap_cache_async() can race against get_swap_page(), and stumble
across a SWAP_HAS_CACHE entry in the swap map whose page wasn't brought
into the swapcache yet.

This transient swap_map state is expected to be transitory, but the
actual placement of discard at scan_swap_map() inserts a wait for I/O
completion thus making the thread at read_swap_cache_async() to loop
around its -EEXIST case, while the other end at get_swap_page() is
scheduled away at scan_swap_map().  This can leave the system deadlocked
if the I/O completion happens to be waiting on the CPU waitqueue where
read_swap_cache_async() is busy looping and !CONFIG_PREEMPT.

This patch introduces a cond_resched() call to make the aforementioned
read_swap_cache_async() busy loop condition to bail out when necessary,
thus avoiding the subtle race window.

Signed-off-by: Rafael Aquini &lt;aquini@redhat.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Shaohua Li &lt;shli@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/THP: use pmd_populate() to update the pmd with pgtable_t pointer</title>
<updated>2013-06-07T19:46:37Z</updated>
<author>
<name>Aneesh Kumar K.V</name>
<email>aneesh.kumar@linux.vnet.ibm.com</email>
</author>
<published>2013-05-24T22:55:21Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=2f28357cd8f0be2c507d3a7d6f79c0cca0d7f9ce'/>
<id>urn:sha1:2f28357cd8f0be2c507d3a7d6f79c0cca0d7f9ce</id>
<content type='text'>
commit 7c3425123ddfdc5f48e7913ff59d908789712b18 upstream.

We should not use set_pmd_at to update pmd_t with pgtable_t pointer.
set_pmd_at is used to set pmd with huge pte entries and architectures
like ppc64, clear few flags from the pte when saving a new entry.
Without this change we observe bad pte errors like below on ppc64 with
THP enabled.

  BUG: Bad page map in process ld mm=0xc000001ee39f4780 pte:7fc3f37848000001 pmd:c000001ec0000000

Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Reviewed-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: mmu_notifier: re-fix freed page still mapped in secondary MMU</title>
<updated>2013-06-07T19:46:36Z</updated>
<author>
<name>Xiao Guangrong</name>
<email>xiaoguangrong@linux.vnet.ibm.com</email>
</author>
<published>2013-05-24T22:55:11Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=329d6f2ca0653e8a078637ed42ba259f5414e872'/>
<id>urn:sha1:329d6f2ca0653e8a078637ed42ba259f5414e872</id>
<content type='text'>
commit d34883d4e35c0a994e91dd847a82b4c9e0c31d83 upstream.

Commit 751efd8610d3 ("mmu_notifier_unregister NULL Pointer deref and
multiple -&gt;release()") breaks the fix 3ad3d901bbcf ("mm: mmu_notifier:
fix freed page still mapped in secondary MMU").

Since hlist_for_each_entry_rcu() is changed now, we can not revert that
patch directly, so this patch reverts the commit and simply fix the bug
spotted by that patch

This bug spotted by commit 751efd8610d3 is:

    There is a race condition between mmu_notifier_unregister() and
    __mmu_notifier_release().

    Assume two tasks, one calling mmu_notifier_unregister() as a result
    of a filp_close() -&gt;flush() callout (task A), and the other calling
    mmu_notifier_release() from an mmput() (task B).

                        A                               B
    t1                                            srcu_read_lock()
    t2            if (!hlist_unhashed())
    t3                                            srcu_read_unlock()
    t4            srcu_read_lock()
    t5                                            hlist_del_init_rcu()
    t6                                            synchronize_srcu()
    t7            srcu_read_unlock()
    t8            hlist_del_rcu()  &lt;--- NULL pointer deref.

This can be fixed by using hlist_del_init_rcu instead of hlist_del_rcu.

The another issue spotted in the commit is "multiple -&gt;release()
callouts", we needn't care it too much because it is really rare (e.g,
can not happen on kvm since mmu-notify is unregistered after
exit_mmap()) and the later call of multiple -&gt;release should be fast
since all the pages have already been released by the first call.
Anyway, this issue should be fixed in a separate patch.

-stable suggestions: Any version that has commit 751efd8610d3 need to be
backported.  I find the oldest version has this commit is 3.0-stable.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Xiao Guangrong &lt;xiaoguangrong@linux.vnet.ibm.com&gt;
Tested-by: Robin Holt &lt;holt@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm compaction: fix of improper cache flush in migration code</title>
<updated>2013-06-07T19:46:36Z</updated>
<author>
<name>Leonid Yegoshin</name>
<email>Leonid.Yegoshin@imgtec.com</email>
</author>
<published>2013-05-24T22:55:18Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=c0872911a5926b9c0a3e570cf8bf2a027275a664'/>
<id>urn:sha1:c0872911a5926b9c0a3e570cf8bf2a027275a664</id>
<content type='text'>
commit c2cc499c5bcf9040a738f49e8051b42078205748 upstream.

Page 'new' during MIGRATION can't be flushed with flush_cache_page().
Using flush_cache_page(vma, addr, pfn) is justified only if the page is
already placed in process page table, and that is done right after
flush_cache_page().  But without it the arch function has no knowledge
of process PTE and does nothing.

Besides that, flush_cache_page() flushes an application cache page, but
the kernel has a different page virtual address and dirtied it.

Replace it with flush_dcache_page(new) which is the proper usage.

The old page is flushed in try_to_unmap_one() before migration.

This bug takes place in Sead3 board with M14Kc MIPS CPU without cache
aliasing (but Harvard arch - separate I and D cache) in tight memory
environment (128MB) each 1-3days on SOAK test.  It fails in cc1 during
kernel build (SIGILL, SIGBUS, SIGSEG) if CONFIG_COMPACTION is switched
ON.

Signed-off-by: Leonid Yegoshin &lt;Leonid.Yegoshin@imgtec.com&gt;
Cc: Leonid Yegoshin &lt;yegoshin@mips.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Russell King &lt;rmk@arm.linux.org.uk&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>vm: add vm_iomap_memory() helper function</title>
<updated>2013-04-26T04:23:49Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-04-16T20:45:37Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d1a01d18320e37367e23f006f0dfbd74ff32de68'/>
<id>urn:sha1:d1a01d18320e37367e23f006f0dfbd74ff32de68</id>
<content type='text'>
commit b4cbb197c7e7a68dbad0d491242e3ca67420c13e upstream.

Various drivers end up replicating the code to mmap() their memory
buffers into user space, and our core memory remapping function may be
very flexible but it is unnecessarily complicated for the common cases
to use.

Our internal VM uses pfn's ("page frame numbers") which simplifies
things for the VM, and allows us to pass physical addresses around in a
denser and more efficient format than passing a "phys_addr_t" around,
and having to shift it up and down by the page size.  But it just means
that drivers end up doing that shifting instead at the interface level.

It also means that drivers end up mucking around with internal VM things
like the vma details (vm_pgoff, vm_start/end) way more than they really
need to.

So this just exports a function to map a certain physical memory range
into user space (using a phys_addr_t based interface that is much more
natural for a driver) and hides all the complexity from the driver.
Some drivers will still end up tweaking the vm_page_prot details for
things like prefetching or cacheability etc, but that's actually
relevant to the driver, rather than caring about what the page offset of
the mapping is into the particular IO memory region.

Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;

</content>
</entry>
<entry>
<title>hugetlbfs: add swap entry check in follow_hugetlb_page()</title>
<updated>2013-04-26T04:23:47Z</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2013-04-17T22:58:30Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=6cf9b8f1a9ae1640f73cf8804484530e74eb9d5d'/>
<id>urn:sha1:6cf9b8f1a9ae1640f73cf8804484530e74eb9d5d</id>
<content type='text'>
commit 9cc3a5bd40067b9a0fbd49199d0780463fc2140f upstream.

With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in
initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory
error happens on a hugepage and the affected processes try to access the
error hugepage, we hit VM_BUG_ON(atomic_read(&amp;page-&gt;_count) &lt;= 0) in
get_page().

The reason for this bug is that coredump-related code doesn't recognise
"hugepage hwpoison entry" with which a pmd entry is replaced when a memory
error occurs on a hugepage.

In other words, physical address information is stored in different bit
layout between hugepage hwpoison entry and pmd entry, so
follow_hugetlb_page() which is called in get_dump_page() returns a wrong
page from a given address.

The expected behavior is like this:

  absent   is_swap_pte   FOLL_DUMP   Expected behavior
  -------------------------------------------------------------------
   true     false         false       hugetlb_fault
   false    true          false       hugetlb_fault
   false    false         false       return page
   true     false         true        skip page (to avoid allocation)
   false    true          true        hugetlb_fault
   false    false         true        return page

With this patch, we can call hugetlb_fault() and take proper actions (we
wait for migration entries, fail with VM_FAULT_HWPOISON_LARGE for
hwpoisoned entries,) and as the result we can dump all hugepages except
for hwpoisoned ones.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: HATAYAMA Daisuke &lt;d.hatayama@jp.fujitsu.com&gt;
Acked-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: prevent mmap_cache race in find_vma()</title>
<updated>2013-04-12T16:18:09Z</updated>
<author>
<name>Jan Stancek</name>
<email>jstancek@redhat.com</email>
</author>
<published>2013-04-08T20:00:02Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=05fc9336dbfe557067f472074c123d9474393f02'/>
<id>urn:sha1:05fc9336dbfe557067f472074c123d9474393f02</id>
<content type='text'>
commit b6a9b7f6b1f21735a7456d534dc0e68e61359d2c upstream.

find_vma() can be called by multiple threads with read lock
held on mm-&gt;mmap_sem and any of them can update mm-&gt;mmap_cache.
Prevent compiler from re-fetching mm-&gt;mmap_cache, because other
readers could update it in the meantime:

               thread 1                             thread 2
                                        |
  find_vma()                            |  find_vma()
    struct vm_area_struct *vma = NULL;  |
    vma = mm-&gt;mmap_cache;               |
    if (!(vma &amp;&amp; vma-&gt;vm_end &gt; addr     |
        &amp;&amp; vma-&gt;vm_start &lt;= addr)) {    |
                                        |    mm-&gt;mmap_cache = vma;
    return vma;                         |
     ^^ compiler may optimize this      |
        local variable out and re-read  |
        mm-&gt;mmap_cache                  |

This issue can be reproduced with gcc-4.8.0-1 on s390x by running
mallocstress testcase from LTP, which triggers:

  kernel BUG at mm/rmap.c:1088!
    Call Trace:
     ([&lt;000003d100c57000&gt;] 0x3d100c57000)
      [&lt;000000000023a1c0&gt;] do_wp_page+0x2fc/0xa88
      [&lt;000000000023baae&gt;] handle_pte_fault+0x41a/0xac8
      [&lt;000000000023d832&gt;] handle_mm_fault+0x17a/0x268
      [&lt;000000000060507a&gt;] do_protection_exception+0x1e2/0x394
      [&lt;0000000000603a04&gt;] pgm_check_handler+0x138/0x13c
      [&lt;000003fffcf1f07a&gt;] 0x3fffcf1f07a
    Last Breaking-Event-Address:
      [&lt;000000000024755e&gt;] page_add_new_anon_rmap+0xc2/0x168

Thanks to Jakub Jelinek for his insight on gcc and helping to
track this down.

Signed-off-by: Jan Stancek &lt;jstancek@redhat.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/hotplug: correctly add new zone to all other nodes' zone lists</title>
<updated>2013-04-05T17:16:47Z</updated>
<author>
<name>Jiang Liu</name>
<email>jiang.liu@huawei.com</email>
</author>
<published>2013-03-19T11:36:58Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d9e61dba0c73294e9e6761f290dce2049f06bfac'/>
<id>urn:sha1:d9e61dba0c73294e9e6761f290dce2049f06bfac</id>
<content type='text'>
commit 08dff7b7d629807dbb1f398c68dd9cd58dd657a1 upstream.

When online_pages() is called to add new memory to an empty zone, it
rebuilds all zone lists by calling build_all_zonelists().  But there's a
bug which prevents the new zone to be added to other nodes' zone lists.

online_pages() {
	build_all_zonelists()
	.....
	node_set_state(zone_to_nid(zone), N_HIGH_MEMORY)
}

Here the node of the zone is put into N_HIGH_MEMORY state after calling
build_all_zonelists(), but build_all_zonelists() only adds zones from
nodes in N_HIGH_MEMORY state to the fallback zone lists.
build_all_zonelists()

    -&gt;__build_all_zonelists()
	-&gt;build_zonelists()
	    -&gt;find_next_best_node()
		-&gt;for_each_node_state(n, N_HIGH_MEMORY)

So memory in the new zone will never be used by other nodes, and it may
cause strange behavor when system is under memory pressure.  So put node
into N_HIGH_MEMORY state before calling build_all_zonelists().

Signed-off-by: Jianguo Wu &lt;wujianguo@huawei.com&gt;
Signed-off-by: Jiang Liu &lt;liuj97@gmail.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: Yinghai Lu &lt;yinghai@kernel.org&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Keping Chen &lt;chenkeping@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/hugetlb: fix total hugetlbfs pages count when using memory overcommit accouting</title>
<updated>2013-03-28T19:06:03Z</updated>
<author>
<name>Wanpeng Li</name>
<email>liwanp@linux.vnet.ibm.com</email>
</author>
<published>2013-03-22T22:04:40Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3993d37e21053878739ad1baa264617aa115f4af'/>
<id>urn:sha1:3993d37e21053878739ad1baa264617aa115f4af</id>
<content type='text'>
commit d00285884c0892bb1310df96bce6056e9ce9b9d9 upstream.

hugetlb_total_pages is used for overcommit calculations but the current
implementation considers only the default hugetlb page size (which is
either the first defined hugepage size or the one specified by
default_hugepagesz kernel boot parameter).

If the system is configured for more than one hugepage size, which is
possible since commit a137e1cc6d6e ("hugetlbfs: per mount huge page
sizes") then the overcommit estimation done by __vm_enough_memory()
(resp.  shown by meminfo_proc_show) is not precise - there is an
impression of more available/allowed memory.  This can lead to an
unexpected ENOMEM/EFAULT resp.  SIGSEGV when memory is accounted.

Testcase:
  boot: hugepagesz=1G hugepages=1
  the default overcommit ratio is 50
  before patch:

    egrep 'CommitLimit' /proc/meminfo
    CommitLimit:     55434168 kB

  after patch:

    egrep 'CommitLimit' /proc/meminfo
    CommitLimit:     54909880 kB

[akpm@linux-foundation.org: coding-style tweak]
Signed-off-by: Wanpeng Li &lt;liwanp@linux.vnet.ibm.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
