<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm, branch v3.2.38</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/mm?h=v3.2.38</id>
<link rel='self' href='https://git.amat.us/linux/atom/mm?h=v3.2.38'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2013-02-06T04:33:31Z</updated>
<entry>
<title>mm: use aligned zone start for pfn_to_bitidx calculation</title>
<updated>2013-02-06T04:33:31Z</updated>
<author>
<name>Laura Abbott</name>
<email>lauraa@codeaurora.org</email>
</author>
<published>2013-01-11T22:31:51Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=458b03804988d061567a1a0f0d17c86967dfdc21'/>
<id>urn:sha1:458b03804988d061567a1a0f0d17c86967dfdc21</id>
<content type='text'>
commit c060f943d0929f3e429c5d9522290584f6281d6e upstream.

The current calculation in pfn_to_bitidx assumes that (pfn -
zone-&gt;zone_start_pfn) &gt;&gt; pageblock_order will return the same bit for
all pfn in a pageblock.  If zone_start_pfn is not aligned to
pageblock_nr_pages, this may not always be correct.

Consider the following with pageblock order = 10, zone start 2MB:

  pfn     | pfn - zone start | (pfn - zone start) &gt;&gt; page block order
  ----------------------------------------------------------------
  0x26000 | 0x25e00	   |  0x97
  0x26100 | 0x25f00	   |  0x97
  0x26200 | 0x26000	   |  0x98
  0x26300 | 0x26100	   |  0x98

This means that calling {get,set}_pageblock_migratetype on a single page
will not set the migratetype for the full block.  Fix this by rounding
down zone_start_pfn when doing the bitidx calculation.

For our use case, the effects of this bug were mostly tied to the fact
that CMA allocations would either take a long time or fail to happen.
Depending on the driver using CMA, this could result in anything from
visual glitches to application failures.

Signed-off-by: Laura Abbott &lt;lauraa@codeaurora.org&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>mm: compaction: fix echo 1 &gt; compact_memory return error issue</title>
<updated>2013-02-06T04:33:31Z</updated>
<author>
<name>Jason Liu</name>
<email>r64343@freescale.com</email>
</author>
<published>2013-01-11T22:31:47Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=2c7ff1aa06ba37b36e07d4b47c55d7431ec9dff9'/>
<id>urn:sha1:2c7ff1aa06ba37b36e07d4b47c55d7431ec9dff9</id>
<content type='text'>
commit 7964c06d66c76507d8b6b662bffea770c29ef0ce upstream.

when run the folloing command under shell, it will return error

  sh/$ echo 1 &gt; /proc/sys/vm/compact_memory
  sh/$ sh: write error: Bad address

After strace, I found the following log:

  ...
  write(1, "1\n", 2)               = 3
  write(1, "", 4294967295)         = -1 EFAULT (Bad address)
  write(2, "echo: write error: Bad address\n", 31echo: write error: Bad address
  ) = 31

This tells system return 3(COMPACT_COMPLETE) after write data to
compact_memory.

The fix is to make the system just return 0 instead 3(COMPACT_COMPLETE)
from sysctl_compaction_handler after compaction_nodes finished.

Signed-off-by: Jason Liu &lt;r64343@freescale.com&gt;
Suggested-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT</title>
<updated>2013-01-16T01:13:20Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.cz</email>
</author>
<published>2013-01-04T23:35:12Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9a1eb12a55fda70ed451c0b7b6122de0f7ec3aad'/>
<id>urn:sha1:9a1eb12a55fda70ed451c0b7b6122de0f7ec3aad</id>
<content type='text'>
commit 53a59fc67f97374758e63a9c785891ec62324c81 upstream.

Since commit e303297e6c3a ("mm: extended batches for generic
mmu_gather") we are batching pages to be freed until either
tlb_next_batch cannot allocate a new batch or we are done.

This works just fine most of the time but we can get in troubles with
non-preemptible kernel (CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY)
on large machines where too aggressive batching might lead to soft
lockups during process exit path (exit_mmap) because there are no
scheduling points down the free_pages_and_swap_cache path and so the
freeing can take long enough to trigger the soft lockup.

The lockup is harmless except when the system is setup to panic on
softlockup which is not that unusual.

The simplest way to work around this issue is to limit the maximum
number of batches in a single mmu_gather.  10k of collected pages should
be safe to prevent from soft lockups (we would have 2ms for one) even if
they are all freed without an explicit scheduling point.

This patch doesn't add any new explicit scheduling points because it
relies on zap_pmd_range during page tables zapping which calls
cond_resched per PMD.

The following lockup has been reported for 3.0 kernel with a huge
process (in order of hundreds gigs but I do know any more details).

  BUG: soft lockup - CPU#56 stuck for 22s! [kernel:31053]
  Modules linked in: af_packet nfs lockd fscache auth_rpcgss nfs_acl sunrpc mptctl mptbase autofs4 binfmt_misc dm_round_robin dm_multipath bonding cpufreq_conservative cpufreq_userspace cpufreq_powersave pcc_cpufreq mperf microcode fuse loop osst sg sd_mod crc_t10dif st qla2xxx scsi_transport_fc scsi_tgt netxen_nic i7core_edac iTCO_wdt joydev e1000e serio_raw pcspkr edac_core iTCO_vendor_support acpi_power_meter rtc_cmos hpwdt hpilo button container usbhid hid dm_mirror dm_region_hash dm_log linear uhci_hcd ehci_hcd usbcore usb_common scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh dm_snapshot pcnet32 mii edd dm_mod raid1 ext3 mbcache jbd fan thermal processor thermal_sys hwmon cciss scsi_mod
  Supported: Yes
  CPU 56
  Pid: 31053, comm: kernel Not tainted 3.0.31-0.9-default #1 HP ProLiant DL580 G7
  RIP: 0010:  _raw_spin_unlock_irqrestore+0x8/0x10
  RSP: 0018:ffff883ec1037af0  EFLAGS: 00000206
  RAX: 0000000000000e00 RBX: ffffea01a0817e28 RCX: ffff88803ffd9e80
  RDX: 0000000000000200 RSI: 0000000000000206 RDI: 0000000000000206
  RBP: 0000000000000002 R08: 0000000000000001 R09: ffff887ec724a400
  R10: 0000000000000000 R11: dead000000200200 R12: ffffffff8144c26e
  R13: 0000000000000030 R14: 0000000000000297 R15: 000000000000000e
  FS:  00007ed834282700(0000) GS:ffff88c03f200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 000000000068b240 CR3: 0000003ec13c5000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kernel (pid: 31053, threadinfo ffff883ec1036000, task ffff883ebd5d4100)
  Call Trace:
    release_pages+0xc5/0x260
    free_pages_and_swap_cache+0x9d/0xc0
    tlb_flush_mmu+0x5c/0x80
    tlb_finish_mmu+0xe/0x50
    exit_mmap+0xbd/0x120
    mmput+0x49/0x120
    exit_mm+0x122/0x160
    do_exit+0x17a/0x430
    do_group_exit+0x3d/0xb0
    get_signal_to_deliver+0x247/0x480
    do_signal+0x71/0x1b0
    do_notify_resume+0x98/0xb0
    int_signal+0x12/0x17
  DWARF2 unwinder stuck at int_signal+0x12/0x17

Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>tmpfs mempolicy: fix /proc/mounts corrupting memory</title>
<updated>2013-01-16T01:13:14Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2013-01-02T10:01:33Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=b8709f81cee3b33995a35a7362c3bed139c77886'/>
<id>urn:sha1:b8709f81cee3b33995a35a7362c3bed139c77886</id>
<content type='text'>
commit f2a07f40dbc603c15f8b06e6ec7f768af67b424f upstream.

Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA
mempolicy testing.  Very nasty.  Reading /proc/mounts, /proc/pid/mounts
or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often
in a page table (causing "Bad swap" or "Bad page map" warning or "Bad
pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere
worse.  "mpol=prefer" and "mpol=prefer:Node" are equally toxic.

Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
when commit e17f74af351c "mempolicy: don't call mpol_set_nodemask() when
no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
With slab poisoning, you can then rely on mpol_to_str() to set the bit
for node 0x6b6b, probably in the next page above the caller's stack.

mpol_parse_str() is only called from shmem_parse_options(): no_context
is always true, so call it unused for now, and remove !no_context code.
Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might
expect.  Then mpol_to_str() can ignore its no_context argument also,
the mpol being appropriately initialized whether contextualized or not.
Rename its no_context unused too, and let subsequent patch remove them
(that's not needed for stable backporting, which would involve rejects).

I don't understand why MPOL_LOCAL is described as a pseudo-policy:
it's a reasonable policy which suffers from a confusing implementation
in terms of MPOL_PREFERRED with MPOL_F_LOCAL.  I believe this would be
much more robust if MPOL_LOCAL were recognized in switch statements
throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
empty) nodes mask like everyone else, instead of its preferred_node
variant (I presume an optimization from the days before MPOL_LOCAL).
But that would take me too long to get right and fully tested.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>thp, memcg: split hugepage for memcg oom on cow</title>
<updated>2013-01-03T03:33:55Z</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2012-05-29T22:06:23Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=505bd8dcba128970db6fed2eca126f9e11a44290'/>
<id>urn:sha1:505bd8dcba128970db6fed2eca126f9e11a44290</id>
<content type='text'>
commit 1f1d06c34f7675026326cd9f39ff91e4555cf355 upstream.

On COW, a new hugepage is allocated and charged to the memcg.  If the
system is oom or the charge to the memcg fails, however, the fault
handler will return VM_FAULT_OOM which results in an oom kill.

Instead, it's possible to fallback to splitting the hugepage so that the
COW results only in an order-0 page being allocated and charged to the
memcg which has a higher liklihood to succeed.  This is expensive
because the hugepage must be split in the page fault handler, but it is
much better than unnecessarily oom killing a process.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Johannes Weiner &lt;jweiner@redhat.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>Revert "mm: vmscan: fix endless loop in kswapd balancing"</title>
<updated>2013-01-03T03:33:55Z</updated>
<author>
<name>Ben Hutchings</name>
<email>ben@decadent.org.uk</email>
</author>
<published>2012-12-27T18:39:54Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=40686464a6c29b4d8f5ef1d6c86117022e46ecd1'/>
<id>urn:sha1:40686464a6c29b4d8f5ef1d6c86117022e46ecd1</id>
<content type='text'>
This reverts commit 39d18dc4b8b0c000fa681cbae10ac3f8a132814b which
was commit 60cefed485a02bd99b6299dad70666fe49245da7 upstream.

This was not needed and is not suitable for 3.2.y.

Reported-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>mm: dmapool: use provided gfp flags for all dma_alloc_coherent() calls</title>
<updated>2013-01-03T03:33:36Z</updated>
<author>
<name>Marek Szyprowski</name>
<email>m.szyprowski@samsung.com</email>
</author>
<published>2012-11-07T14:37:07Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=56201be51f36f6e633872c0686016749c15853a0'/>
<id>urn:sha1:56201be51f36f6e633872c0686016749c15853a0</id>
<content type='text'>
commit 387870f2d6d679746020fa8e25ef786ff338dc98 upstream.

dmapool always calls dma_alloc_coherent() with GFP_ATOMIC flag,
regardless the flags provided by the caller. This causes excessive
pruning of emergency memory pools without any good reason. Additionaly,
on ARM architecture any driver which is using dmapools will sooner or
later  trigger the following error:
"ERROR: 256 KiB atomic DMA coherent pool is too small!
Please increase it with coherent_pool= kernel parameter!".
Increasing the coherent pool size usually doesn't help much and only
delays such error, because all GFP_ATOMIC DMA allocations are always
served from the special, very limited memory pool.

This patch changes the dmapool code to correctly use gfp flags provided
by the dmapool caller.

Reported-by: Soeren Moch &lt;smoch@web.de&gt;
Reported-by: Thomas Petazzoni &lt;thomas.petazzoni@free-electrons.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Tested-by: Andrew Lunn &lt;andrew@lunn.ch&gt;
Tested-by: Soeren Moch &lt;smoch@web.de&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>mm: highmem: export kmap_to_page for modules</title>
<updated>2013-01-03T03:32:57Z</updated>
<author>
<name>Will Deacon</name>
<email>will.deacon@arm.com</email>
</author>
<published>2012-10-19T13:03:31Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=8bc528cece6bc1adf4524eaab7bb82aa66438c4a'/>
<id>urn:sha1:8bc528cece6bc1adf4524eaab7bb82aa66438c4a</id>
<content type='text'>
commit f0263d2d222e9e25f2587e51a9dc58c6fb2a9352 upstream.

Some virtio device drivers (9p) need to translate high virtual addresses
to physical addresses, which are inserted into the virtqueue for
processing by userspace.

This patch exports the kmap_to_page symbol, so that the affected drivers
can be compiled as modules.

Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>mm: add kmap_to_page()</title>
<updated>2013-01-03T03:32:57Z</updated>
<author>
<name>Ben Hutchings</name>
<email>ben@decadent.org.uk</email>
</author>
<published>2012-07-31T23:45:02Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=fcb8996728fb59eddf84678df7cb213b2c9a2e26'/>
<id>urn:sha1:fcb8996728fb59eddf84678df7cb213b2c9a2e26</id>
<content type='text'>
This is extracted from Mel Gorman's commit 5a178119b0fb ('mm: add
support for direct_IO to highmem pages') upstream.

Required to backport commit b9cdc88df8e6 ('virtio: 9p: correctly pass
physical address to userspace for high pages').

Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>tmpfs: fix shared mempolicy leak</title>
<updated>2013-01-03T03:32:54Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2012-12-05T22:01:41Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=c6dc8bee909fd3b11d9f591165bd614e3c3ab99d'/>
<id>urn:sha1:c6dc8bee909fd3b11d9f591165bd614e3c3ab99d</id>
<content type='text'>
commit 18a2f371f5edf41810f6469cb9be39931ef9deb9 upstream.

This fixes a regression in 3.7-rc, which has since gone into stable.

Commit 00442ad04a5e ("mempolicy: fix a memory corruption by refcount
imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the
refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went
on expecting alloc_page_vma() to drop the refcount it had acquired.
This deserves a rework: but for now fix the leak in shmem_alloc_page().

Hugh: shmem_swapin() did not need a fix, but surely it's clearer to use
the same refcounting there as in shmem_alloc_page(), delete its onstack
mempolicy, and the strange mpol_cond_copy() and __mpol_cond_copy() -
those were invented to let swapin_readahead() make an unknown number of
calls to alloc_pages_vma() with one mempolicy; but since 00442ad04a5e,
alloc_pages_vma() has kept refcount in balance, so now no problem.

Reported-and-tested-by: Tommi Rantala &lt;tt.rantala@gmail.com&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
</feed>
