<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm, branch v3.7.2</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/mm?h=v3.7.2</id>
<link rel='self' href='https://git.amat.us/linux/atom/mm?h=v3.7.2'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2013-01-11T17:19:05Z</updated>
<entry>
<title>mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT</title>
<updated>2013-01-11T17:19:05Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.cz</email>
</author>
<published>2013-01-04T23:35:12Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=787f7301074ccd07a3e82236ca41eefd245f4e07'/>
<id>urn:sha1:787f7301074ccd07a3e82236ca41eefd245f4e07</id>
<content type='text'>
commit 53a59fc67f97374758e63a9c785891ec62324c81 upstream.

Since commit e303297e6c3a ("mm: extended batches for generic
mmu_gather") we are batching pages to be freed until either
tlb_next_batch cannot allocate a new batch or we are done.

This works just fine most of the time but we can get in troubles with
non-preemptible kernel (CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY)
on large machines where too aggressive batching might lead to soft
lockups during process exit path (exit_mmap) because there are no
scheduling points down the free_pages_and_swap_cache path and so the
freeing can take long enough to trigger the soft lockup.

The lockup is harmless except when the system is setup to panic on
softlockup which is not that unusual.

The simplest way to work around this issue is to limit the maximum
number of batches in a single mmu_gather.  10k of collected pages should
be safe to prevent from soft lockups (we would have 2ms for one) even if
they are all freed without an explicit scheduling point.

This patch doesn't add any new explicit scheduling points because it
relies on zap_pmd_range during page tables zapping which calls
cond_resched per PMD.

The following lockup has been reported for 3.0 kernel with a huge
process (in order of hundreds gigs but I do know any more details).

  BUG: soft lockup - CPU#56 stuck for 22s! [kernel:31053]
  Modules linked in: af_packet nfs lockd fscache auth_rpcgss nfs_acl sunrpc mptctl mptbase autofs4 binfmt_misc dm_round_robin dm_multipath bonding cpufreq_conservative cpufreq_userspace cpufreq_powersave pcc_cpufreq mperf microcode fuse loop osst sg sd_mod crc_t10dif st qla2xxx scsi_transport_fc scsi_tgt netxen_nic i7core_edac iTCO_wdt joydev e1000e serio_raw pcspkr edac_core iTCO_vendor_support acpi_power_meter rtc_cmos hpwdt hpilo button container usbhid hid dm_mirror dm_region_hash dm_log linear uhci_hcd ehci_hcd usbcore usb_common scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh dm_snapshot pcnet32 mii edd dm_mod raid1 ext3 mbcache jbd fan thermal processor thermal_sys hwmon cciss scsi_mod
  Supported: Yes
  CPU 56
  Pid: 31053, comm: kernel Not tainted 3.0.31-0.9-default #1 HP ProLiant DL580 G7
  RIP: 0010:  _raw_spin_unlock_irqrestore+0x8/0x10
  RSP: 0018:ffff883ec1037af0  EFLAGS: 00000206
  RAX: 0000000000000e00 RBX: ffffea01a0817e28 RCX: ffff88803ffd9e80
  RDX: 0000000000000200 RSI: 0000000000000206 RDI: 0000000000000206
  RBP: 0000000000000002 R08: 0000000000000001 R09: ffff887ec724a400
  R10: 0000000000000000 R11: dead000000200200 R12: ffffffff8144c26e
  R13: 0000000000000030 R14: 0000000000000297 R15: 000000000000000e
  FS:  00007ed834282700(0000) GS:ffff88c03f200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 000000000068b240 CR3: 0000003ec13c5000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kernel (pid: 31053, threadinfo ffff883ec1036000, task ffff883ebd5d4100)
  Call Trace:
    release_pages+0xc5/0x260
    free_pages_and_swap_cache+0x9d/0xc0
    tlb_flush_mmu+0x5c/0x80
    tlb_finish_mmu+0xe/0x50
    exit_mmap+0xbd/0x120
    mmput+0x49/0x120
    exit_mm+0x122/0x160
    do_exit+0x17a/0x430
    do_group_exit+0x3d/0xb0
    get_signal_to_deliver+0x247/0x480
    do_signal+0x71/0x1b0
    do_notify_resume+0x98/0xb0
    int_signal+0x12/0x17
  DWARF2 unwinder stuck at int_signal+0x12/0x17

Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/hugetlb: create hugetlb cgroup file in hugetlb_init</title>
<updated>2013-01-11T17:18:50Z</updated>
<author>
<name>Jianguo Wu</name>
<email>wujianguo@huawei.com</email>
</author>
<published>2012-12-18T22:23:19Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=61001c0a21fef142d5a462db15810afbcd95b416'/>
<id>urn:sha1:61001c0a21fef142d5a462db15810afbcd95b416</id>
<content type='text'>
commit 7179e7bf4592ac5a7b30257a7df6259ee81e51da upstream.

Build kernel with CONFIG_HUGETLBFS=y,CONFIG_HUGETLB_PAGE=y and
CONFIG_CGROUP_HUGETLB=y, then specify hugepagesz=xx boot option, system
will fail to boot.

This failure is caused by following code path:

  setup_hugepagesz
    hugetlb_add_hstate
      hugetlb_cgroup_file_init
        cgroup_add_cftypes
          kzalloc &lt;--slab is *not available* yet

For this path, slab is not available yet, so memory allocated will be
failed, and cause WARN_ON() in hugetlb_cgroup_file_init().

So I move hugetlb_cgroup_file_init() into hugetlb_init().

[akpm@linux-foundation.org: tweak coding-style, remove pointless __init on inlined function]
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jianguo Wu &lt;wujianguo@huawei.com&gt;
Signed-off-by: Jiang Liu &lt;jiang.liu@huawei.com&gt;
Reviewed-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>tmpfs mempolicy: fix /proc/mounts corrupting memory</title>
<updated>2013-01-11T17:18:19Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2013-01-02T10:01:33Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=680dfe59174a569b842cbcb1cecaf517923e8fc1'/>
<id>urn:sha1:680dfe59174a569b842cbcb1cecaf517923e8fc1</id>
<content type='text'>
commit f2a07f40dbc603c15f8b06e6ec7f768af67b424f upstream.

Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA
mempolicy testing.  Very nasty.  Reading /proc/mounts, /proc/pid/mounts
or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often
in a page table (causing "Bad swap" or "Bad page map" warning or "Bad
pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere
worse.  "mpol=prefer" and "mpol=prefer:Node" are equally toxic.

Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
when commit e17f74af351c "mempolicy: don't call mpol_set_nodemask() when
no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
With slab poisoning, you can then rely on mpol_to_str() to set the bit
for node 0x6b6b, probably in the next page above the caller's stack.

mpol_parse_str() is only called from shmem_parse_options(): no_context
is always true, so call it unused for now, and remove !no_context code.
Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might
expect.  Then mpol_to_str() can ignore its no_context argument also,
the mpol being appropriately initialized whether contextualized or not.
Rename its no_context unused too, and let subsequent patch remove them
(that's not needed for stable backporting, which would involve rejects).

I don't understand why MPOL_LOCAL is described as a pseudo-policy:
it's a reasonable policy which suffers from a confusing implementation
in terms of MPOL_PREFERRED with MPOL_F_LOCAL.  I believe this would be
much more robust if MPOL_LOCAL were recognized in switch statements
throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
empty) nodes mask like everyone else, instead of its preferred_node
variant (I presume an optimization from the days before MPOL_LOCAL).
But that would take me too long to get right and fully tested.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: fix calculation of dirtyable memory</title>
<updated>2013-01-11T17:18:18Z</updated>
<author>
<name>Sonny Rao</name>
<email>sonnyrao@chromium.org</email>
</author>
<published>2012-12-20T23:05:07Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9359dd031b11100a3532ef7bfc04f2fc7f23c33b'/>
<id>urn:sha1:9359dd031b11100a3532ef7bfc04f2fc7f23c33b</id>
<content type='text'>
commit c8b74c2f6604923de91f8aa6539f8bb934736754 upstream.

The system uses global_dirtyable_memory() to calculate number of
dirtyable pages/pages that can be allocated to the page cache.  A bug
causes an underflow thus making the page count look like a big unsigned
number.  This in turn confuses the dirty writeback throttling to
aggressively write back pages as they become dirty (usually 1 page at a
time).  This generally only affects systems with highmem because the
underflowed count gets subtracted from the global count of dirtyable
memory.

The problem was introduced with v3.2-4896-gab8fabd

Fix is to ensure we don't get an underflowed total of either highmem or
global dirtyable memory.

Signed-off-by: Sonny Rao &lt;sonnyrao@chromium.org&gt;
Signed-off-by: Puneet Kumar &lt;puneetster@chromium.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Tested-by: Damien Wyart &lt;damien.wyart@free.fr&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: highmem: export kmap_to_page for modules</title>
<updated>2013-01-11T17:18:18Z</updated>
<author>
<name>Will Deacon</name>
<email>will.deacon@arm.com</email>
</author>
<published>2012-10-19T13:03:31Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4a8d5b926259721cf7b931967b8e3f9e3936c2d7'/>
<id>urn:sha1:4a8d5b926259721cf7b931967b8e3f9e3936c2d7</id>
<content type='text'>
commit f0263d2d222e9e25f2587e51a9dc58c6fb2a9352 upstream.

Some virtio device drivers (9p) need to translate high virtual addresses
to physical addresses, which are inserted into the virtqueue for
processing by userspace.

This patch exports the kmap_to_page symbol, so that the affected drivers
can be compiled as modules.

Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: dmapool: use provided gfp flags for all dma_alloc_coherent() calls</title>
<updated>2012-12-17T19:06:52Z</updated>
<author>
<name>Marek Szyprowski</name>
<email>m.szyprowski@samsung.com</email>
</author>
<published>2012-11-07T14:37:07Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=13a4fcec5484b83b61926d5304d86349e5530027'/>
<id>urn:sha1:13a4fcec5484b83b61926d5304d86349e5530027</id>
<content type='text'>
commit 387870f2d6d679746020fa8e25ef786ff338dc98 upstream.

dmapool always calls dma_alloc_coherent() with GFP_ATOMIC flag,
regardless the flags provided by the caller. This causes excessive
pruning of emergency memory pools without any good reason. Additionaly,
on ARM architecture any driver which is using dmapools will sooner or
later  trigger the following error:
"ERROR: 256 KiB atomic DMA coherent pool is too small!
Please increase it with coherent_pool= kernel parameter!".
Increasing the coherent pool size usually doesn't help much and only
delays such error, because all GFP_ATOMIC DMA allocations are always
served from the special, very limited memory pool.

This patch changes the dmapool code to correctly use gfp flags provided
by the dmapool caller.

Reported-by: Soeren Moch &lt;smoch@web.de&gt;
Reported-by: Thomas Petazzoni &lt;thomas.petazzoni@free-electrons.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Tested-by: Andrew Lunn &lt;andrew@lunn.ch&gt;
Tested-by: Soeren Moch &lt;smoch@web.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated damage</title>
<updated>2012-12-10T19:03:05Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-12-10T18:51:16Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=caf491916b1c1e939a2c7575efb7a77f11fc9bdf'/>
<id>urn:sha1:caf491916b1c1e939a2c7575efb7a77f11fc9bdf</id>
<content type='text'>
This reverts commits a50915394f1fc02c2861d3b7ce7014788aa5066e and
d7c3b937bdf45f0b844400b7bf6fd3ed50bac604.

This is a revert of a revert of a revert.  In addition, it reverts the
even older i915 change to stop using the __GFP_NO_KSWAPD flag due to the
original commits in linux-next.

It turns out that the original patch really was bogus, and that the
original revert was the correct thing to do after all.  We thought we
had fixed the problem, and then reverted the revert, but the problem
really is fundamental: waking up kswapd simply isn't the right thing to
do, and direct reclaim sometimes simply _is_ the right thing to do.

When certain allocations fail, we simply should try some direct reclaim,
and if that fails, fail the allocation.  That's the right thing to do
for THP allocations, which can easily fail, and the GPU allocations want
to do that too.

So starting kswapd is sometimes simply wrong, and removing the flag that
said "don't start kswapd" was a mistake.  Let's hope we never revisit
this mistake again - and certainly not this many times ;)

Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "mm: avoid waking kswapd for THP allocations when compaction is deferred or contended"</title>
<updated>2012-12-10T18:47:45Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-12-10T18:47:45Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=31f8d42d44b48ba72b586ca03e810cbbd21ea16b'/>
<id>urn:sha1:31f8d42d44b48ba72b586ca03e810cbbd21ea16b</id>
<content type='text'>
This reverts commit 782fd30406ecb9d9b082816abe0c6008fc72a7b0.

We are going to reinstate the __GFP_NO_KSWAPD flag that has been
removed, the removal reverted, and then removed again.  Making this
commit a pointless fixup for a problem that was caused by the removal of
__GFP_NO_KSWAPD flag.

The thing is, we really don't want to wake up kswapd for THP allocations
(because they fail quite commonly under any kind of memory pressure,
including when there is tons of memory free), and these patches were
just trying to fix up the underlying bug: the original removal of
__GFP_NO_KSWAPD in commit c654345924f7 ("mm: remove __GFP_NO_KSWAPD")
was simply bogus.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: vmscan: fix inappropriate zone congestion clearing</title>
<updated>2012-12-08T16:41:18Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2012-12-06T20:23:25Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=ed23ec4f0a510528e0ffe415f9394107418ae854'/>
<id>urn:sha1:ed23ec4f0a510528e0ffe415f9394107418ae854</id>
<content type='text'>
commit c702418f8a2f ("mm: vmscan: do not keep kswapd looping forever due
to individual uncompactable zones") removed zone watermark checks from
the compaction code in kswapd but left in the zone congestion clearing,
which now happens unconditionally on higher order reclaim.

This messes up the reclaim throttling logic for zones with
dirty/writeback pages, where zones should only lose their congestion
status when their watermarks have been restored.

Remove the clearing from the zone compaction section entirely.  The
preliminary zone check and the reclaim loop in kswapd will clear it if
the zone is considered balanced.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>tmpfs: fix shared mempolicy leak</title>
<updated>2012-12-06T19:56:43Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2012-12-05T22:01:41Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=18a2f371f5edf41810f6469cb9be39931ef9deb9'/>
<id>urn:sha1:18a2f371f5edf41810f6469cb9be39931ef9deb9</id>
<content type='text'>
This fixes a regression in 3.7-rc, which has since gone into stable.

Commit 00442ad04a5e ("mempolicy: fix a memory corruption by refcount
imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the
refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went
on expecting alloc_page_vma() to drop the refcount it had acquired.
This deserves a rework: but for now fix the leak in shmem_alloc_page().

Hugh: shmem_swapin() did not need a fix, but surely it's clearer to use
the same refcounting there as in shmem_alloc_page(), delete its onstack
mempolicy, and the strange mpol_cond_copy() and __mpol_cond_copy() -
those were invented to let swapin_readahead() make an unknown number of
calls to alloc_pages_vma() with one mempolicy; but since 00442ad04a5e,
alloc_pages_vma() has kept refcount in balance, so now no problem.

Reported-and-tested-by: Tommi Rantala &lt;tt.rantala@gmail.com&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
