<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/filemap.c, branch v3.4.11</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/mm/filemap.c?h=v3.4.11</id>
<link rel='self' href='https://git.amat.us/linux/atom/mm/filemap.c?h=v3.4.11'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2012-03-29T00:14:37Z</updated>
<entry>
<title>radix-tree: use iterators in find_get_pages* functions</title>
<updated>2012-03-29T00:14:37Z</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@openvz.org</email>
</author>
<published>2012-03-28T21:42:54Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=0fc9d1040313047edf6a39fd4d7c7defdca97c62'/>
<id>urn:sha1:0fc9d1040313047edf6a39fd4d7c7defdca97c62</id>
<content type='text'>
Replace radix_tree_gang_lookup_slot() and
radix_tree_gang_lookup_tag_slot() in page-cache lookup functions with
brand-new radix-tree direct iterating.  This avoids the double-scanning
and pointer copying.

Iterator don't stop after nr_pages page-get fails in a row, it continue
lookup till the radix-tree end.  Thus we can safely remove these restart
conditions.

Unfortunately, old implementation didn't forbid nr_pages == 0, this corner
case does not fit into new code, so the patch adds an extra check at the
beginning.

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Tested-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'stable/for-linus-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm</title>
<updated>2012-03-23T02:52:47Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-03-23T02:52:47Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=aab008db8063364dc3c8ccf4981c21124866b395'/>
<id>urn:sha1:aab008db8063364dc3c8ccf4981c21124866b395</id>
<content type='text'>
Pull cleancache changes from Konrad Rzeszutek Wilk:
 "This has some patches for the cleancache API that should have been
  submitted a _long_ time ago.  They are basically cleanups:

   - rename of flush to invalidate

   - moving reporting of statistics into debugfs

   - use __read_mostly as necessary.

  Oh, and also the MAINTAINERS file change.  The files (except the
  MAINTAINERS file) have been in #linux-next for months now.  The late
  addition of MAINTAINERS file is a brain-fart on my side - didn't
  realize I needed that just until I was typing this up - and I based
  that patch on v3.3 - so the tree is on top of v3.3."

* tag 'stable/for-linus-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm:
  MAINTAINERS: Adding cleancache API to the list.
  mm: cleancache: Use __read_mostly as appropiate.
  mm: cleancache: report statistics via debugfs instead of sysfs.
  mm: zcache/tmem/cleancache: s/flush/invalidate/
  mm: cleancache: s/flush/invalidate/
</content>
</entry>
<entry>
<title>cpuset: mm: reduce large amounts of memory barrier related damage v3</title>
<updated>2012-03-22T00:54:59Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2012-03-21T23:34:11Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=cc9a6c8776615f9c194ccf0b63a0aa5628235545'/>
<id>urn:sha1:cc9a6c8776615f9c194ccf0b63a0aa5628235545</id>
<content type='text'>
Commit c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") wins a super prize for the largest number of
memory barriers entered into fast paths for one commit.

[get|put]_mems_allowed is incredibly heavy with pairs of full memory
barriers inserted into a number of hot paths.  This was detected while
investigating at large page allocator slowdown introduced some time
after 2.6.32.  The largest portion of this overhead was shown by
oprofile to be at an mfence introduced by this commit into the page
allocator hot path.

For extra style points, the commit introduced the use of yield() in an
implementation of what looks like a spinning mutex.

This patch replaces the full memory barriers on both read and write
sides with a sequence counter with just read barriers on the fast path
side.  This is much cheaper on some architectures, including x86.  The
main bulk of the patch is the retry logic if the nodemask changes in a
manner that can cause a false failure.

While updating the nodemask, a check is made to see if a false failure
is a risk.  If it is, the sequence number gets bumped and parallel
allocators will briefly stall while the nodemask update takes place.

In a page fault test microbenchmark, oprofile samples from
__alloc_pages_nodemask went from 4.53% of all samples to 1.15%.  The
actual results were

                             3.3.0-rc3          3.3.0-rc3
                             rc3-vanilla        nobarrier-v2r1
    Clients   1 UserTime       0.07 (  0.00%)   0.08 (-14.19%)
    Clients   2 UserTime       0.07 (  0.00%)   0.07 (  2.72%)
    Clients   4 UserTime       0.08 (  0.00%)   0.07 (  3.29%)
    Clients   1 SysTime        0.70 (  0.00%)   0.65 (  6.65%)
    Clients   2 SysTime        0.85 (  0.00%)   0.82 (  3.65%)
    Clients   4 SysTime        1.41 (  0.00%)   1.41 (  0.32%)
    Clients   1 WallTime       0.77 (  0.00%)   0.74 (  4.19%)
    Clients   2 WallTime       0.47 (  0.00%)   0.45 (  3.73%)
    Clients   4 WallTime       0.38 (  0.00%)   0.37 (  1.58%)
    Clients   1 Flt/sec/cpu  497620.28 (  0.00%) 520294.53 (  4.56%)
    Clients   2 Flt/sec/cpu  414639.05 (  0.00%) 429882.01 (  3.68%)
    Clients   4 Flt/sec/cpu  257959.16 (  0.00%) 258761.48 (  0.31%)
    Clients   1 Flt/sec      495161.39 (  0.00%) 517292.87 (  4.47%)
    Clients   2 Flt/sec      820325.95 (  0.00%) 850289.77 (  3.65%)
    Clients   4 Flt/sec      1020068.93 (  0.00%) 1022674.06 (  0.26%)
    MMTests Statistics: duration
    Sys Time Running Test (seconds)             135.68    132.17
    User+Sys Time Running Test (seconds)         164.2    160.13
    Total Elapsed Time (seconds)                123.46    120.87

The overall improvement is small but the System CPU time is much
improved and roughly in correlation to what oprofile reported (these
performance figures are without profiling so skew is expected).  The
actual number of page faults is noticeably improved.

For benchmarks like kernel builds, the overall benefit is marginal but
the system CPU time is slightly reduced.

To test the actual bug the commit fixed I opened two terminals.  The
first ran within a cpuset and continually ran a small program that
faulted 100M of anonymous data.  In a second window, the nodemask of the
cpuset was continually randomised in a loop.

Without the commit, the program would fail every so often (usually
within 10 seconds) and obviously with the commit everything worked fine.
With this patch applied, it also worked fine so the fix should be
functionally equivalent.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: update stale lock ordering comment for memory-failure.c</title>
<updated>2012-03-22T00:54:58Z</updated>
<author>
<name>Andi Kleen</name>
<email>ak@linux.intel.com</email>
</author>
<published>2012-03-21T23:34:09Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9a3c531df9462df6cc2b060f749651723ffc180c'/>
<id>urn:sha1:9a3c531df9462df6cc2b060f749651723ffc180c</id>
<content type='text'>
When i_mmap_lock changed to a mutex the locking order in memory failure
was changed to take the sleeping lock first.  But the big fat mm lock
ordering comment (BFMLO) wasn't updated.  Do this here.

Pointed out by Andrew.

Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: don't set __GFP_WRITE on ramfs/sysfs writes</title>
<updated>2012-03-22T00:54:58Z</updated>
<author>
<name>Fengguang Wu</name>
<email>fengguang.wu@intel.com</email>
</author>
<published>2012-03-21T23:34:08Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=1010bb1b80edb0713415dfe1f97114d320f58c4f'/>
<id>urn:sha1:1010bb1b80edb0713415dfe1f97114d320f58c4f</id>
<content type='text'>
There is not much point in skipping zones during allocation based on the
dirty usage which they'll never contribute to.  And we'd like to avoid
page reclaim waits when writing to ramfs/sysfs etc.

Signed-off-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: remove the second argument of k[un]map_atomic()</title>
<updated>2012-03-20T13:48:27Z</updated>
<author>
<name>Cong Wang</name>
<email>amwang@redhat.com</email>
</author>
<published>2011-11-25T15:14:39Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9b04c5fec43c0da610a2c37f70c5b013101a6ad7'/>
<id>urn:sha1:9b04c5fec43c0da610a2c37f70c5b013101a6ad7</id>
<content type='text'>
Signed-off-by: Cong Wang &lt;amwang@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'stable/cleancache.v13' into linux-next</title>
<updated>2012-03-19T16:12:19Z</updated>
<author>
<name>Konrad Rzeszutek Wilk</name>
<email>konrad.wilk@oracle.com</email>
</author>
<published>2012-03-19T16:12:19Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=16c0cfa425b8e1488f7a1873bd112a7a099325f0'/>
<id>urn:sha1:16c0cfa425b8e1488f7a1873bd112a7a099325f0</id>
<content type='text'>
* stable/cleancache.v13:
  mm: cleancache: Use __read_mostly as appropiate.
  mm: cleancache: report statistics via debugfs instead of sysfs.
  mm: zcache/tmem/cleancache: s/flush/invalidate/
  mm: cleancache: s/flush/invalidate/
</content>
</entry>
<entry>
<title>readahead: fix pipeline break caused by block plug</title>
<updated>2012-02-04T00:16:41Z</updated>
<author>
<name>Shaohua Li</name>
<email>shaohua.li@intel.com</email>
</author>
<published>2012-02-03T23:37:17Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3deaa7190a8da38453c4fabd9dec7f66d17fff67'/>
<id>urn:sha1:3deaa7190a8da38453c4fabd9dec7f66d17fff67</id>
<content type='text'>
Herbert Poetzl reported a performance regression since 2.6.39.  The test
is a simple dd read, but with big block size.  The reason is:

T1: ra (A, A+128k), (A+128k, A+256k)
T2: lock_page for page A, submit the 256k
T3: hit page A+128K, ra (A+256k, A+384). the range isn't submitted
because of plug and there isn't any lock_page till we hit page A+256k
because all pages from A to A+256k is in memory
T4: hit page A+256k, ra (A+384, A+ 512). Because of plug, the range isn't
submitted again.
T5: lock_page A+256k, so (A+256k, A+512k) will be submitted. The task is
waitting for (A+256k, A+512k) finish.

There is no request to disk in T3 and T4, so readahead pipeline breaks.

We really don't need block plug for generic_file_aio_read() for buffered
I/O.  The readahead already has plug and has fine grained control when I/O
should be submitted.  Deleting plug for buffered I/O fixes the regression.

One side effect is plug makes the request size 256k, the size is 128k
without it.  This is because default ra size is 128k and not a reason we
need plug here.

Vivek said:

: We submit some readahead IO to device request queue but because of nested
: plug, queue never gets unplugged.  When read logic reaches a page which is
: not in page cache, it waits for page to be read from the disk
: (lock_page_killable()) and that time we flush the plug list.
:
: So effectively read ahead logic is kind of broken in parts because of
: nested plugging.  Removing top level plug (generic_file_aio_read()) for
: buffered reads, will allow unplugging queue earlier for readahead.

Signed-off-by: Shaohua Li &lt;shaohua.li@intel.com&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Reported-by: Herbert Poetzl &lt;herbert@13thfloor.at&gt;
Tested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: cleancache: s/flush/invalidate/</title>
<updated>2012-01-23T21:06:24Z</updated>
<author>
<name>Dan Magenheimer</name>
<email>dan.magenheimer@oracle.com</email>
</author>
<published>2011-09-21T15:56:28Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3167760f83899ccda312b9ad9306ec9e5dda06d4'/>
<id>urn:sha1:3167760f83899ccda312b9ad9306ec9e5dda06d4</id>
<content type='text'>
Per akpm suggestions alter the use of the term flush to be
invalidate. The next patch will do this across all MM.

This change is completely cosmetic.

[v9: akpm@linux-foundation.org: change "flush" to "invalidate", part 3]

Signed-off-by: Dan Magenheimer &lt;dan.magenheimer@oracle.com&gt;
Cc: Kamezawa Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Jan Beulich &lt;JBeulich@novell.com&gt;
Reviewed-by: Seth Jennings &lt;sjenning@linux.vnet.ibm.com&gt;
Cc: Jeremy Fitzhardinge &lt;jeremy@goop.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Nitin Gupta &lt;ngupta@vflare.org&gt;
Cc: Matthew Wilcox &lt;matthew@wil.cx&gt;
Cc: Chris Mason &lt;chris.mason@oracle.com&gt;
Cc: Rik Riel &lt;riel@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
[v10: Fixed  fs: move code out of buffer.c conflict change]
Signed-off-by: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
</content>
</entry>
<entry>
<title>memcg: add mem_cgroup_replace_page_cache() to fix LRU issue</title>
<updated>2012-01-13T04:13:04Z</updated>
<author>
<name>KAMEZAWA Hiroyuki</name>
<email>kamezawa.hiroyu@jp.fujitsu.com</email>
</author>
<published>2012-01-13T01:17:44Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=ab936cbcd02072a34b60d268f94440fd5cf1970b'/>
<id>urn:sha1:ab936cbcd02072a34b60d268f94440fd5cf1970b</id>
<content type='text'>
Commit ef6a3c6311 ("mm: add replace_page_cache_page() function") added a
function replace_page_cache_page().  This function replaces a page in the
radix-tree with a new page.  WHen doing this, memory cgroup needs to fix
up the accounting information.  memcg need to check PCG_USED bit etc.

In some(many?) cases, 'newpage' is on LRU before calling
replace_page_cache().  So, memcg's LRU accounting information should be
fixed, too.

This patch adds mem_cgroup_replace_page_cache() and removes the old hooks.
 In that function, old pages will be unaccounted without touching
res_counter and new page will be accounted to the memcg (of old page).
WHen overwriting pc-&gt;mem_cgroup of newpage, take zone-&gt;lru_lock and avoid
races with LRU handling.

Background:
  replace_page_cache_page() is called by FUSE code in its splice() handling.
  Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated
  page and may be on LRU. LRU mis-accounting will be critical for memory cgroup
  because rmdir() checks the whole LRU is empty and there is no account leak.
  If a page is on the other LRU than it should be, rmdir() will fail.

This bug was added in March 2011, but no bug report yet.  I guess there
are not many people who use memcg and FUSE at the same time with upstream
kernels.

The result of this bug is that admin cannot destroy a memcg because of
account leak.  So, no panic, no deadlock.  And, even if an active cgroup
exist, umount can succseed.  So no problem at shutdown.

Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
