<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm, branch v3.4.84</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/mm?h=v3.4.84</id>
<link rel='self' href='https://git.amat.us/linux/atom/mm?h=v3.4.84'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2014-03-11T23:10:04Z</updated>
<entry>
<title>mm/hotplug: correctly add new zone to all other nodes' zone lists</title>
<updated>2014-03-11T23:10:04Z</updated>
<author>
<name>Jiang Liu</name>
<email>jiang.liu@huawei.com</email>
</author>
<published>2012-07-31T23:43:30Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=446327d6f1763bbe7f9793cc8dae04fca82c9d48'/>
<id>urn:sha1:446327d6f1763bbe7f9793cc8dae04fca82c9d48</id>
<content type='text'>
commit 08dff7b7d629807dbb1f398c68dd9cd58dd657a1 upstream.

When online_pages() is called to add new memory to an empty zone, it
rebuilds all zone lists by calling build_all_zonelists().  But there's a
bug which prevents the new zone to be added to other nodes' zone lists.

online_pages() {
	build_all_zonelists()
	.....
	node_set_state(zone_to_nid(zone), N_HIGH_MEMORY)
}

Here the node of the zone is put into N_HIGH_MEMORY state after calling
build_all_zonelists(), but build_all_zonelists() only adds zones from
nodes in N_HIGH_MEMORY state to the fallback zone lists.
build_all_zonelists()

    -&gt;__build_all_zonelists()
	-&gt;build_zonelists()
	    -&gt;find_next_best_node()
		-&gt;for_each_node_state(n, N_HIGH_MEMORY)

So memory in the new zone will never be used by other nodes, and it may
cause strange behavor when system is under memory pressure.  So put node
into N_HIGH_MEMORY state before calling build_all_zonelists().

Signed-off-by: Jianguo Wu &lt;wujianguo@huawei.com&gt;
Signed-off-by: Jiang Liu &lt;liuj97@gmail.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: Yinghai Lu &lt;yinghai@kernel.org&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Keping Chen &lt;chenkeping@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Cc: Qiang Huang &lt;h.huangqiang@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: vmscan: fix endless loop in kswapd balancing</title>
<updated>2014-03-11T23:10:02Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2012-11-29T21:54:23Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f47929fd5093c4b5c134ff2b2811ae327102bafd'/>
<id>urn:sha1:f47929fd5093c4b5c134ff2b2811ae327102bafd</id>
<content type='text'>
commit 60cefed485a02bd99b6299dad70666fe49245da7 upstream.

Kswapd does not in all places have the same criteria for a balanced
zone.  Zones are only being reclaimed when their high watermark is
breached, but compaction checks loop over the zonelist again when the
zone does not meet the low watermark plus two times the size of the
allocation.  This gets kswapd stuck in an endless loop over a small
zone, like the DMA zone, where the high watermark is smaller than the
compaction requirement.

Add a function, zone_balanced(), that checks the watermark, and, for
higher order allocations, if compaction has enough free memory.  Then
use it uniformly to check for balanced zones.

This makes sure that when the compaction watermark is not met, at least
reclaim happens and progress is made - or the zone is declared
unreclaimable at some point and skipped entirely.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reported-by: George Spelvin &lt;linux@horizon.com&gt;
Reported-by: Johannes Hirte &lt;johannes.hirte@fem.tu-ilmenau.de&gt;
Reported-by: Tomas Racek &lt;tracek@redhat.com&gt;
Tested-by: Johannes Hirte &lt;johannes.hirte@fem.tu-ilmenau.de&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[hq: Backported to 3.4: adjust context]
Signed-off-by: Qiang Huang &lt;h.huangqiang@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</content>
</entry>
<entry>
<title>mm: setup pageblock_order before it's used by sparsemem</title>
<updated>2014-02-20T18:45:32Z</updated>
<author>
<name>Xishi Qiu</name>
<email>qiuxishi@huawei.com</email>
</author>
<published>2012-07-31T23:43:19Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3fea8b0a9f978cce6d0685464d4e8acb9bbd1acc'/>
<id>urn:sha1:3fea8b0a9f978cce6d0685464d4e8acb9bbd1acc</id>
<content type='text'>
commit ca57df79d4f64e1a4886606af4289d40636189c5 upstream.

On architectures with CONFIG_HUGETLB_PAGE_SIZE_VARIABLE set, such as
Itanium, pageblock_order is a variable with default value of 0.  It's set
to the right value by set_pageblock_order() in function
free_area_init_core().

But pageblock_order may be used by sparse_init() before free_area_init_core()
is called along path:
sparse_init()
    -&gt;sparse_early_usemaps_alloc_node()
	-&gt;usemap_size()
	    -&gt;SECTION_BLOCKFLAGS_BITS
		-&gt;((1UL &lt;&lt; (PFN_SECTION_SHIFT - pageblock_order)) *
NR_PAGEBLOCK_BITS)

The uninitialized pageblock_size will cause memory wasting because
usemap_size() returns a much bigger value then it's really needed.

For example, on an Itanium platform,
sparse_init() pageblock_order=0 usemap_size=24576
free_area_init_core() before pageblock_order=0, usemap_size=24576
free_area_init_core() after pageblock_order=12, usemap_size=8

That means 24K memory has been wasted for each section, so fix it by calling
set_pageblock_order() from sparse_init().

Signed-off-by: Xishi Qiu &lt;qiuxishi@huawei.com&gt;
Signed-off-by: Jiang Liu &lt;liuj97@gmail.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Yinghai Lu &lt;yinghai@kernel.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Keping Chen &lt;chenkeping@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/page_alloc.c: remove pageblock_default_order()</title>
<updated>2014-02-20T18:45:32Z</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@linux-foundation.org</email>
</author>
<published>2012-05-29T22:06:31Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=237597d8f73155bba781acf3f4f558f6ed08a8ee'/>
<id>urn:sha1:237597d8f73155bba781acf3f4f558f6ed08a8ee</id>
<content type='text'>
commit 955c1cd7401565671b064e499115344ec8067dfd upstream.

This has always been broken: one version takes an unsigned int and the
other version takes no arguments.  This bug was hidden because one
version of set_pageblock_order() was a macro which doesn't evaluate its
argument.

Simplify it all and remove pageblock_default_order() altogether.

Reported-by: rajman mekaco &lt;rajman.mekaco@gmail.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq()</title>
<updated>2014-02-20T18:45:32Z</updated>
<author>
<name>KOSAKI Motohiro</name>
<email>kosaki.motohiro@jp.fujitsu.com</email>
</author>
<published>2014-02-06T20:04:24Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4d4bed8141f7883dfbed44749653f769367b3e01'/>
<id>urn:sha1:4d4bed8141f7883dfbed44749653f769367b3e01</id>
<content type='text'>
commit a85d9df1ea1d23682a0ed1e100e6965006595d06 upstream.

During aio stress test, we observed the following lockdep warning.  This
mean AIO+numa_balancing is currently deadlockable.

The problem is, aio_migratepage disable interrupt, but
__set_page_dirty_nobuffers unintentionally enable it again.

Generally, all helper function should use spin_lock_irqsave() instead of
spin_lock_irq() because they don't know caller at all.

   other info that might help us debug this:
    Possible unsafe locking scenario:

          CPU0
          ----
     lock(&amp;(&amp;ctx-&gt;completion_lock)-&gt;rlock);
     &lt;Interrupt&gt;
       lock(&amp;(&amp;ctx-&gt;completion_lock)-&gt;rlock);

    *** DEADLOCK ***

      dump_stack+0x19/0x1b
      print_usage_bug+0x1f7/0x208
      mark_lock+0x21d/0x2a0
      mark_held_locks+0xb9/0x140
      trace_hardirqs_on_caller+0x105/0x1d0
      trace_hardirqs_on+0xd/0x10
      _raw_spin_unlock_irq+0x2c/0x50
      __set_page_dirty_nobuffers+0x8c/0xf0
      migrate_page_copy+0x434/0x540
      aio_migratepage+0xb1/0x140
      move_to_new_page+0x7d/0x230
      migrate_pages+0x5e5/0x700
      migrate_misplaced_page+0xbc/0xf0
      do_numa_page+0x102/0x190
      handle_pte_fault+0x241/0x970
      handle_mm_fault+0x265/0x370
      __do_page_fault+0x172/0x5a0
      do_page_fault+0x1a/0x70
      page_fault+0x28/0x30

Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Larry Woodman &lt;lwoodman@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Johannes Weiner &lt;jweiner@redhat.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>slub: Fix calculation of cpu slabs</title>
<updated>2014-02-13T19:51:09Z</updated>
<author>
<name>Li Zefan</name>
<email>lizefan@huawei.com</email>
</author>
<published>2013-09-10T03:43:37Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=b06c0a0cc545114be1579934e90ecc477201fde7'/>
<id>urn:sha1:b06c0a0cc545114be1579934e90ecc477201fde7</id>
<content type='text'>
commit 8afb1474db4701d1ab80cd8251137a3260e6913e upstream.

  /sys/kernel/slab/:t-0000048 # cat cpu_slabs
  231 N0=16 N1=215
  /sys/kernel/slab/:t-0000048 # cat slabs
  145 N0=36 N1=109

See, the number of slabs is smaller than that of cpu slabs.

The bug was introduced by commit 49e2258586b423684f03c278149ab46d8f8b6700
("slub: per cpu cache for partial pages").

We should use page-&gt;pages instead of page-&gt;pobjects when calculating
the number of cpu partial slabs. This also fixes the mapping of slabs
and nodes.

As there's no variable storing the number of total/active objects in
cpu partial slabs, and we don't have user interfaces requiring those
statistics, I just add WARN_ON for those cases.

Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Reviewed-by: Wanpeng Li &lt;liwanp@linux.vnet.ibm.com&gt;
Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Pekka Enberg &lt;penberg@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: hugetlbfs: fix hugetlbfs optimization</title>
<updated>2014-02-06T19:05:46Z</updated>
<author>
<name>Andrea Arcangeli</name>
<email>aarcange@redhat.com</email>
</author>
<published>2013-11-21T22:32:02Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=50d8f1b5c57bb29f02ab5834be334b4f7922b856'/>
<id>urn:sha1:50d8f1b5c57bb29f02ab5834be334b4f7922b856</id>
<content type='text'>
commit 27c73ae759774e63313c1fbfeb17ba076cea64c5 upstream.

Commit 7cb2ef56e6a8 ("mm: fix aio performance regression for database
caused by THP") can cause dereference of a dangling pointer if
split_huge_page runs during PageHuge() if there are updates to the
tail_page-&gt;private field.

Also it is repeating compound_head twice for hugetlbfs and it is running
compound_head+compound_trans_head for THP when a single one is needed in
both cases.

The new code within the PageSlab() check doesn't need to verify that the
THP page size is never bigger than the smallest hugetlbfs page size, to
avoid memory corruption.

A longstanding theoretical race condition was found while fixing the
above (see the change right after the skip_unlock label, that is
relevant for the compound_lock path too).

By re-establishing the _mapcount tail refcounting for all compound
pages, this also fixes the below problem:

  echo 0 &gt;/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

  BUG: Bad page state in process bash  pfn:59a01
  page:ffffea000139b038 count:0 mapcount:10 mapping:          (null) index:0x0
  page flags: 0x1c00000000008000(tail)
  Modules linked in:
  CPU: 6 PID: 2018 Comm: bash Not tainted 3.12.0+ #25
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  Call Trace:
    dump_stack+0x55/0x76
    bad_page+0xd5/0x130
    free_pages_prepare+0x213/0x280
    __free_pages+0x36/0x80
    update_and_free_page+0xc1/0xd0
    free_pool_huge_page+0xc2/0xe0
    set_max_huge_pages.part.58+0x14c/0x220
    nr_hugepages_store_common.isra.60+0xd0/0xf0
    nr_hugepages_store+0x13/0x20
    kobj_attr_store+0xf/0x20
    sysfs_write_file+0x189/0x1e0
    vfs_write+0xc5/0x1f0
    SyS_write+0x55/0xb0
    system_call_fastpath+0x16/0x1b

Signed-off-by: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Tested-by: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Cc: Pravin Shelar &lt;pshelar@nicira.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Ben Hutchings &lt;bhutchings@solarflare.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Johannes Weiner &lt;jweiner@redhat.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Guillaume Morin &lt;guillaume@morinfr.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</content>
</entry>
<entry>
<title>mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully</title>
<updated>2014-01-29T13:10:42Z</updated>
<author>
<name>Jianguo Wu</name>
<email>wujianguo@huawei.com</email>
</author>
<published>2013-12-19T01:08:54Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9a22404cf92f3c5ec2b2bf2554f0c92f021e77e9'/>
<id>urn:sha1:9a22404cf92f3c5ec2b2bf2554f0c92f021e77e9</id>
<content type='text'>
commit a49ecbcd7b0d5a1cda7d60e03df402dd0ef76ac8 upstream.

After a successful hugetlb page migration by soft offline, the source
page will either be freed into hugepage_freelists or buddy(over-commit
page).  If page is in buddy, page_hstate(page) will be NULL.  It will
hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page().

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
  IP: [&lt;ffffffff81163761&gt;] dequeue_hwpoisoned_huge_page+0x131/0x1d0
  PGD c23762067 PUD c24be2067 PMD 0
  Oops: 0000 [#1] SMP

So check PageHuge(page) after call migrate_pages() successfully.

[wujg: backport to 3.4:
 - adjust context
 - s/num_poisoned_pages/mce_bad_pages/]

Signed-off-by: Jianguo Wu &lt;wujianguo@huawei.com&gt;
Tested-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Reviewed-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</content>
</entry>
<entry>
<title>mm/hugetlb: check for pte NULL pointer in __page_check_address()</title>
<updated>2014-01-08T17:42:12Z</updated>
<author>
<name>Jianguo Wu</name>
<email>wujianguo@huawei.com</email>
</author>
<published>2013-12-19T01:08:59Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=2efb73fb4f0f5081ba6450e789b52f70247c80d9'/>
<id>urn:sha1:2efb73fb4f0f5081ba6450e789b52f70247c80d9</id>
<content type='text'>
commit 98398c32f6687ee1e1f3ae084effb4b75adb0747 upstream.

In __page_check_address(), if address's pud is not present,
huge_pte_offset() will return NULL, we should check the return value.

Signed-off-by: Jianguo Wu &lt;wujianguo@huawei.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: qiuxishi &lt;qiuxishi@huawei.com&gt;
Cc: Hanjun Guo &lt;guohanjun@huawei.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: fix aio performance regression for database caused by THP</title>
<updated>2013-11-13T03:01:49Z</updated>
<author>
<name>Khalid Aziz</name>
<email>khalid.aziz@oracle.com</email>
</author>
<published>2013-09-11T21:22:20Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=b07ef016454ff46f98e633b5a6247ca7e343fb67'/>
<id>urn:sha1:b07ef016454ff46f98e633b5a6247ca7e343fb67</id>
<content type='text'>
commit 7cb2ef56e6a8b7b368b2e883a0a47d02fed66911 upstream.

I am working with a tool that simulates oracle database I/O workload.
This tool (orion to be specific -
&lt;http://docs.oracle.com/cd/E11882_01/server.112/e16638/iodesign.htm#autoId24&gt;)
allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag.  It then
does aio into these pages from flash disks using various common block
sizes used by database.  I am looking at performance with two of the most
common block sizes - 1M and 64K.  aio performance with these two block
sizes plunged after Transparent HugePages was introduced in the kernel.
Here are performance numbers:

		pre-THP		2.6.39		3.11-rc5
1M read		8384 MB/s	5629 MB/s	6501 MB/s
64K read	7867 MB/s	4576 MB/s	4251 MB/s

I have narrowed the performance impact down to the overheads introduced by
THP in __get_page_tail() and put_compound_page() routines.  perf top shows
&gt;40% of cycles being spent in these two routines.  Every time direct I/O
to hugetlbfs pages starts, kernel calls get_page() to grab a reference to
the pages and calls put_page() when I/O completes to put the reference
away.  THP introduced significant amount of locking overhead to get_page()
and put_page() when dealing with compound pages because hugepages can be
split underneath get_page() and put_page().  It added this overhead
irrespective of whether it is dealing with hugetlbfs pages or transparent
hugepages.  This resulted in 20%-45% drop in aio performance when using
hugetlbfs pages.

Since hugetlbfs pages can not be split, there is no reason to go through
all the locking overhead for these pages from what I can see.  I added
code to __get_page_tail() and put_compound_page() to bypass all the
locking code when working with hugetlbfs pages.  This improved performance
significantly.  Performance numbers with this patch:

		pre-THP		3.11-rc5	3.11-rc5 + Patch
1M read		8384 MB/s	6501 MB/s	8371 MB/s
64K read	7867 MB/s	4251 MB/s	6510 MB/s

Performance with 64K read is still lower than what it was before THP, but
still a 53% improvement.  It does mean there is more work to be done but I
will take a 53% improvement for now.

Please take a look at the following patch and let me know if it looks
reasonable.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Cc: Pravin B Shelar &lt;pshelar@nicira.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
