linux/mm, branch v3.4

memcg,thp: fix res_counter:96 regression

2012-05-19T17:10:27Z

Occasionally, testing memcg's move_charge_at_immigrate on rc7 shows a flurry of hundreds of warnings at kernel/res_counter.c:96, where res_counter_uncharge_locked() does WARN_ON(counter->usage < val). The first trace of each flurry implicates __mem_cgroup_cancel_charge() of mc.precharge, and an audit of mc.precharge handling points to mem_cgroup_move_charge_pte_range()'s THP handling in commit 12724850e806 ("memcg: avoid THP split in task migration"). Checking !mc.precharge is good everywhere else, when a single page is to be charged; but here the "mc.precharge -= HPAGE_PMD_NR" likely to follow, is liable to result in underflow (a lot can change since the precharge was estimated). Simply check against HPAGE_PMD_NR: there's probably a better alternative, trying precharge for more, splitting if unsuccessful; but this one-liner is safer for now - no kernel/res_counter.c:96 warnings seen in 26 hours. Signed-off-by: Hugh Dickins Signed-off-by: Linus Torvalds

slub: missing test for partial pages flush work in flush_all()

2012-05-18T01:00:51Z

I found some kernel messages such as: SLUB raid5-md127: kmem_cache_destroy called for cache that still has objects. Pid: 6143, comm: mdadm Tainted: G O 3.4.0-rc6+ #75 Call Trace: kmem_cache_destroy+0x328/0x400 free_conf+0x2d/0xf0 [raid456] stop+0x41/0x60 [raid456] md_stop+0x1a/0x60 [md_mod] do_md_stop+0x74/0x470 [md_mod] md_ioctl+0xff/0x11f0 [md_mod] blkdev_ioctl+0xd8/0x7a0 block_ioctl+0x3b/0x40 do_vfs_ioctl+0x96/0x560 sys_ioctl+0x91/0xa0 system_call_fastpath+0x16/0x1b Then using kmemleak I found these messages: unreferenced object 0xffff8800b6db7380 (size 112): comm "mdadm", pid 5783, jiffies 4294810749 (age 90.589s) hex dump (first 32 bytes): 01 01 db b6 ad 4e ad de ff ff ff ff ff ff ff ff .....N.......... ff ff ff ff ff ff ff ff 98 40 4a 82 ff ff ff ff .........@J..... backtrace: kmemleak_alloc+0x21/0x50 kmem_cache_alloc+0xeb/0x1b0 kmem_cache_open+0x2f1/0x430 kmem_cache_create+0x158/0x320 setup_conf+0x649/0x770 [raid456] run+0x68b/0x840 [raid456] md_run+0x529/0x940 [md_mod] do_md_run+0x18/0xc0 [md_mod] md_ioctl+0xba8/0x11f0 [md_mod] blkdev_ioctl+0xd8/0x7a0 block_ioctl+0x3b/0x40 do_vfs_ioctl+0x96/0x560 sys_ioctl+0x91/0xa0 system_call_fastpath+0x16/0x1b This bug was introduced by commit a8364d5555b ("slub: only IPI CPUs that have per cpu obj to flush"), which did not include checks for per cpu partial pages being present on a cpu. Signed-off-by: majianpeng Cc: Gilad Ben-Yossef Acked-by: Christoph Lameter Cc: Pekka Enberg Tested-by: Jeff Layton Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: raise MemFree by reverting percpu_pagelist_fraction to 0

2012-05-11T16:23:39Z

Why is there less MemFree than there used to be? It perturbed a test, so I've just been bisecting linux-next, and now find the offender went upstream yesterday. Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()" mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8, which leaves 1/8th of memory on percpu lists (on each cpu??); but most of us expect it to be left unset at 0 (and it's not then used as a divisor). MemTotal: 8061476kB 8061476kB 8061476kB 8061476kB 8061476kB 8061476kB Repetitive test with percpu_pagelist_fraction 8: MemFree: 6948420kB 6237172kB 6949696kB 6840692kB 6949048kB 6862984kB Same test with percpu_pagelist_fraction back to 0: MemFree: 7945000kB 7944908kB 7948568kB 7949060kB 7948796kB 7948812kB Signed-off-by: Hugh Dickins [ We really should fix the crazy sysctl interface too, but that's a separate thing - Linus ] Signed-off-by: Linus Torvalds

Merge branch 'akpm' (Andrew's patch-bomb)

2012-05-10T22:17:24Z

Merge misc fixes from Andrew Morton. * emailed from Andrew Morton : (8 patches) MAINTAINERS: add maintainer for LED subsystem mm: nobootmem: fix sign extend problem in __free_pages_memory() drivers/leds: correct __devexit annotations memcg: free spare array to avoid memory leak namespaces, pid_ns: fix leakage on fork() failure hugetlb: prevent BUG_ON in hugetlb_fault() -> hugetlb_cow() mm: fix division by 0 in percpu_pagelist_fraction() proc/pid/pagemap: correctly report non-present ptes and holes between vmas

mm: nobootmem: fix sign extend problem in __free_pages_memory()

2012-05-10T22:06:44Z

Systems with 8 TBytes of memory or greater can hit a problem where only the the first 8 TB of memory shows up. This is due to "int i" being smaller than "unsigned long start_aligned", causing the high bits to be dropped. The fix is to change `i' to unsigned long to match start_aligned and end_aligned. Thanks to Jack Steiner for assistance tracking this down. Signed-off-by: Russ Anderson Cc: Jack Steiner Cc: Johannes Weiner Cc: Tejun Heo Cc: David S. Miller Cc: Yinghai Lu Cc: Gavin Shan Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

memcg: free spare array to avoid memory leak

2012-05-10T22:06:44Z

When the last event is unregistered, there is no need to keep the spare array anymore. So free it to avoid memory leak. Signed-off-by: Sha Zhengju Acked-by: KAMEZAWA Hiroyuki Reviewed-by: Kirill A. Shutemov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

hugetlb: prevent BUG_ON in hugetlb_fault() -> hugetlb_cow()

2012-05-10T22:06:44Z

Commit 66aebce747eaf ("hugetlb: fix race condition in hugetlb_fault()") added code to avoid a race condition by elevating the page refcount in hugetlb_fault() while calling hugetlb_cow(). However, one code path in hugetlb_cow() includes an assertion that the page count is 1, whereas it may now also have the value 2 in this path. The consensus is that this BUG_ON has served its purpose, so rather than extending it to cover both cases, we just remove it. Signed-off-by: Chris Metcalf Acked-by: Mel Gorman Acked-by: Hillf Danton Acked-by: Hugh Dickins Cc: Michal Hocko Cc: KAMEZAWA Hiroyuki Cc: [3.0.29+, 3.2.16+, 3.3.3+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: fix division by 0 in percpu_pagelist_fraction()

2012-05-10T22:06:44Z

percpu_pagelist_fraction_sysctl_handler() has only considered -EINVAL as a possible error from proc_dointvec_minmax(). If any other error is returned, it would proceed to divide by zero since percpu_pagelist_fraction wasn't getting initialized at any point. For example, writing 0 bytes into the proc file would trigger the issue. Signed-off-by: Sasha Levin Reviewed-by: Minchan Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

kmemleak: Fix the kmemleak tracking of the percpu areas with !SMP

2012-05-09T17:13:29Z

Kmemleak tracks the percpu allocations via a specific API and the originally allocated areas must be removed from kmemleak (via kmemleak_free). The code was already doing this for SMP systems. Reported-by: Sami Liedes Cc: Tejun Heo Cc: Christoph Lameter Signed-off-by: Catalin Marinas Signed-off-by: Tejun Heo

percpu: pcpu_embed_first_chunk() should free unused parts after all allocs are complete

2012-05-09T17:08:16Z

pcpu_embed_first_chunk() allocates memory for each node, copies percpu data and frees unused portions of it before proceeding to the next group. This assumes that allocations for different nodes doesn't overlap; however, depending on memory topology, the bootmem allocator may end up allocating memory from a different node than the requested one which may overlap with the portion freed from one of the previous percpu areas. This leads to percpu groups for different nodes overlapping which is a serious bug. This patch separates out copy & partial free from the allocation loop such that all allocations are complete before partial frees happen. This also fixes overlapping frees which could happen on allocation failure path - out_free_areas path frees whole groups but the groups could have portions freed at that point. Signed-off-by: Tejun Heo Cc: stable@vger.kernel.org Reported-by: "Pavel V. Panteleev" Tested-by: "Pavel V. Panteleev" LKML-Reference: