<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/compaction.c, branch v3.2.22</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/mm/compaction.c?h=v3.2.22</id>
<link rel='self' href='https://git.amat.us/linux/atom/mm/compaction.c?h=v3.2.22'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2012-02-13T19:17:01Z</updated>
<entry>
<title>mm: compaction: check for overlapping nodes during isolation for migration</title>
<updated>2012-02-13T19:17:01Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2012-02-09T01:13:38Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a7d2576c858c397282602fe92adf8c8ac9d6a0e0'/>
<id>urn:sha1:a7d2576c858c397282602fe92adf8c8ac9d6a0e0</id>
<content type='text'>
commit dc9086004b3d5db75997a645b3fe08d9138b7ad0 upstream.

When isolating pages for migration, migration starts at the start of a
zone while the free scanner starts at the end of the zone.  Migration
avoids entering a new zone by never going beyond the free scanned.

Unfortunately, in very rare cases nodes can overlap.  When this happens,
migration isolates pages without the LRU lock held, corrupting lists
which will trigger errors in reclaim or during page free such as in the
following oops

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: [&lt;ffffffff810f795c&gt;] free_pcppages_bulk+0xcc/0x450
  PGD 1dda554067 PUD 1e1cb58067 PMD 0
  Oops: 0000 [#1] SMP
  CPU 37
  Pid: 17088, comm: memcg_process_s Tainted: G            X
  RIP: free_pcppages_bulk+0xcc/0x450
  Process memcg_process_s (pid: 17088, threadinfo ffff881c2926e000, task ffff881c2926c0c0)
  Call Trace:
    free_hot_cold_page+0x17e/0x1f0
    __pagevec_free+0x90/0xb0
    release_pages+0x22a/0x260
    pagevec_lru_move_fn+0xf3/0x110
    putback_lru_page+0x66/0xe0
    unmap_and_move+0x156/0x180
    migrate_pages+0x9e/0x1b0
    compact_zone+0x1f3/0x2f0
    compact_zone_order+0xa2/0xe0
    try_to_compact_pages+0xdf/0x110
    __alloc_pages_direct_compact+0xee/0x1c0
    __alloc_pages_slowpath+0x370/0x830
    __alloc_pages_nodemask+0x1b1/0x1c0
    alloc_pages_vma+0x9b/0x160
    do_huge_pmd_anonymous_page+0x160/0x270
    do_page_fault+0x207/0x4c0
    page_fault+0x25/0x30

The "X" in the taint flag means that external modules were loaded but but
is unrelated to the bug triggering.  The real problem was because the PFN
layout looks like this

  Zone PFN ranges:
    DMA      0x00000010 -&gt; 0x00001000
    DMA32    0x00001000 -&gt; 0x00100000
    Normal   0x00100000 -&gt; 0x01e80000
  Movable zone start PFN for each node
  early_node_map[14] active PFN ranges
      0: 0x00000010 -&gt; 0x0000009b
      0: 0x00000100 -&gt; 0x0007a1ec
      0: 0x0007a354 -&gt; 0x0007a379
      0: 0x0007f7ff -&gt; 0x0007f800
      0: 0x00100000 -&gt; 0x00680000
      1: 0x00680000 -&gt; 0x00e80000
      0: 0x00e80000 -&gt; 0x01080000
      1: 0x01080000 -&gt; 0x01280000
      0: 0x01280000 -&gt; 0x01480000
      1: 0x01480000 -&gt; 0x01680000
      0: 0x01680000 -&gt; 0x01880000
      1: 0x01880000 -&gt; 0x01a80000
      0: 0x01a80000 -&gt; 0x01c80000
      1: 0x01c80000 -&gt; 0x01e80000

The fix is straight-forward.  isolate_migratepages() has to make a
similar check to isolate_freepage to ensure that it never isolates pages
from a zone it does not hold the LRU lock for.

This was discovered in a 3.0-based kernel but it affects 3.1.x, 3.2.x
and current mainline.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Michal Nazarewicz &lt;mina86@mina86.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration</title>
<updated>2012-02-13T19:16:55Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2012-02-03T23:37:18Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9da11afefb6f8ccc0f7731831f7ad73106fc87f3'/>
<id>urn:sha1:9da11afefb6f8ccc0f7731831f7ad73106fc87f3</id>
<content type='text'>
commit 0bf380bc70ecba68cb4d74dc656cc2fa8c4d801a upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
#10 [d72d3db4] try_to_compact_pages at c030bc84
#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
#14 [d72d3eb8] alloc_pages_vma at c030a845
#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
#16 [d72d3f00] handle_mm_fault at c02f36c6
#17 [d72d3f30] do_page_fault at c05c70ed
#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [&lt;c0522399&gt;] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc-&gt;migrate_pfn
f = cc-&gt;free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh &lt;herbert.van.den.bergh@oracle.com&gt;
Tested-by: Herbert van den Bergh &lt;herbert.van.den.bergh@oracle.com&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Michal Nazarewicz &lt;mina86@mina86.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: compaction: make compact_zone_order() static</title>
<updated>2011-11-01T00:30:49Z</updated>
<author>
<name>Kyungmin Park</name>
<email>kyungmin.park@samsung.com</email>
</author>
<published>2011-11-01T00:09:08Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d43a87e68e9e71d2987a29cc239acec4e8f410c9'/>
<id>urn:sha1:d43a87e68e9e71d2987a29cc239acec4e8f410c9</id>
<content type='text'>
There's no compact_zone_order() user outside file scope, so make it static.

Signed-off-by: Kyungmin Park &lt;kyungmin.park@samsung.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: compaction: make isolate_lru_page() filter-aware</title>
<updated>2011-11-01T00:30:44Z</updated>
<author>
<name>Minchan Kim</name>
<email>minchan.kim@gmail.com</email>
</author>
<published>2011-11-01T00:06:51Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=39deaf8585152f1a35c1676d3d7dc6ae0fb65967'/>
<id>urn:sha1:39deaf8585152f1a35c1676d3d7dc6ae0fb65967</id>
<content type='text'>
In async mode, compaction doesn't migrate dirty or writeback pages.  So,
it's meaningless to pick the page and re-add it to lru list.

Of course, when we isolate the page in compaction, the page might be dirty
or writeback but when we try to migrate the page, the page would be not
dirty, writeback.  So it could be migrated.  But it's very unlikely as
isolate and migration cycle is much faster than writeout.

So, this patch helps cpu overhead and prevent unnecessary LRU churning.

Signed-off-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: change isolate mode from #define to bitwise type</title>
<updated>2011-11-01T00:30:44Z</updated>
<author>
<name>Minchan Kim</name>
<email>minchan.kim@gmail.com</email>
</author>
<published>2011-11-01T00:06:47Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4356f21d09283dc6d39a6f7287a65ddab61e2808'/>
<id>urn:sha1:4356f21d09283dc6d39a6f7287a65ddab61e2808</id>
<content type='text'>
Change ISOLATE_XXX macro with bitwise isolate_mode_t type.  Normally,
macro isn't recommended as it's type-unsafe and making debugging harder as
symbol cannot be passed throught to the debugger.

Quote from Johannes
" Hmm, it would probably be cleaner to fully convert the isolation mode
into independent flags.  INACTIVE, ACTIVE, BOTH is currently a
tri-state among flags, which is a bit ugly."

This patch moves isolate mode from swap.h to mmzone.h by memcontrol.h

Signed-off-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: compaction: trivial clean up in acct_isolated()</title>
<updated>2011-11-01T00:30:44Z</updated>
<author>
<name>Minchan Kim</name>
<email>minchan.kim@gmail.com</email>
</author>
<published>2011-11-01T00:06:44Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=b9e84ac1536d35aee03b2601f19694949f0bd506'/>
<id>urn:sha1:b9e84ac1536d35aee03b2601f19694949f0bd506</id>
<content type='text'>
acct_isolated of compaction uses page_lru_base_type which returns only
base type of LRU list so it never returns LRU_ACTIVE_ANON or
LRU_ACTIVE_FILE.  In addtion, cc-&gt;nr_[anon|file] is used in only
acct_isolated so it doesn't have fields in conpact_control.

This patch removes fields from compact_control and makes clear function of
acct_issolated which counts the number of anon|file pages isolated.

Signed-off-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: compaction: abort compaction if too many pages are isolated and caller is asynchronous V2</title>
<updated>2011-06-16T03:04:02Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2011-06-15T22:08:52Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f9e35b3b41f47c4e17d8132edbcab305a6aaa4b0'/>
<id>urn:sha1:f9e35b3b41f47c4e17d8132edbcab305a6aaa4b0</id>
<content type='text'>
Asynchronous compaction is used when promoting to huge pages.  This is all
very nice but if there are a number of processes in compacting memory, a
large number of pages can be isolated.  An "asynchronous" process can
stall for long periods of time as a result with a user reporting that
firefox can stall for 10s of seconds.  This patch aborts asynchronous
compaction if too many pages are isolated as it's better to fail a
hugepage promotion than stall a process.

[minchan.kim@gmail.com: return COMPACT_PARTIAL for abort]
Reported-and-tested-by: Ury Stankevich &lt;urykhy@gmail.com&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: compaction: ensure that the compaction free scanner does not move to the next zone</title>
<updated>2011-06-16T03:04:02Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2011-06-15T22:08:50Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=7454f4ba40b419eb999a3c61a99da662bf1a2bb8'/>
<id>urn:sha1:7454f4ba40b419eb999a3c61a99da662bf1a2bb8</id>
<content type='text'>
Compaction works with two scanners, a migration and a free scanner.  When
the scanners crossover, migration within the zone is complete.  The
location of the scanner is recorded on each cycle to avoid excesive
scanning.

When a zone is small and mostly reserved, it's very easy for the migration
scanner to be close to the end of the zone.  Then the following situation
can occurs

  o migration scanner isolates some pages near the end of the zone
  o free scanner starts at the end of the zone but finds that the
    migration scanner is already there
  o free scanner gets reinitialised for the next cycle as
    cc-&gt;migrate_pfn + pageblock_nr_pages
    moving the free scanner into the next zone
  o migration scanner moves into the next zone

When this happens, NR_ISOLATED accounting goes haywire because some of the
accounting happens against the wrong zone.  One zones counter remains
positive while the other goes negative even though the overall global
count is accurate.  This was reported on X86-32 with !SMP because !SMP
allows the negative counters to be visible.  The fact that it is the bug
should theoritically be possible there.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>compaction: checks correct fragmentation index</title>
<updated>2011-06-16T03:04:02Z</updated>
<author>
<name>Shaohua Li</name>
<email>shaohua.li@intel.com</email>
</author>
<published>2011-06-15T22:08:49Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a582a738c763e106f47eab24b8146c698a9c700b'/>
<id>urn:sha1:a582a738c763e106f47eab24b8146c698a9c700b</id>
<content type='text'>
fragmentation_index() returns -1000 when the allocation might succeed
This doesn't match the comment and code in compaction_suitable(). I
thought compaction_suitable should return COMPACT_PARTIAL in -1000
case, because in this case allocation could succeed depending on
watermarks.

The impact of this is that compaction starts and compact_finished() is
called which rechecks the watermarks and the free lists.  It should have
the same result in that compaction should not start but is more expensive.

Acked-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Signed-off-by: Shaohua Li &lt;shaohua.li@intel.com&gt;
Cc: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: compaction: fix special case -1 order checks</title>
<updated>2011-06-16T03:04:00Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.cz</email>
</author>
<published>2011-06-15T22:08:25Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3957c7768e5ea02fd3345176ddd340f820e5d285'/>
<id>urn:sha1:3957c7768e5ea02fd3345176ddd340f820e5d285</id>
<content type='text'>
Commit 56de7263fcf3 ("mm: compaction: direct compact when a high-order
allocation fails") introduced a check for cc-&gt;order == -1 in
compact_finished.  We should continue compacting in that case because
the request came from userspace and there is no particular order to
compact for.  Similar check has been added by 82478fb7 (mm: compaction:
prevent division-by-zero during user-requested compaction) for
compaction_suitable.

The check is, however, done after zone_watermark_ok which uses order as a
right hand argument for shifts.  Not only watermark check is pointless if
we can break out without it but it also uses 1 &lt;&lt; -1 which is not well
defined (at least from C standard).  Let's move the -1 check above
zone_watermark_ok.

[minchan.kim@gmail.com&gt; - caught compaction_suitable]
Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hioryu@jp.fujitsu.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
