<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/memory.c, branch v2.6.30.2</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/mm/memory.c?h=v2.6.30.2</id>
<link rel='self' href='https://git.amat.us/linux/atom/mm/memory.c?h=v2.6.30.2'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2009-05-02T22:36:09Z</updated>
<entry>
<title>mm: close page_mkwrite races</title>
<updated>2009-05-02T22:36:09Z</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2009-04-30T22:08:16Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=b827e496c893de0c0f142abfaeb8730a2fd6b37f'/>
<id>urn:sha1:b827e496c893de0c0f142abfaeb8730a2fd6b37f</id>
<content type='text'>
Change page_mkwrite to allow implementations to return with the page
locked, and also change it's callers (in page fault paths) to hold the
lock until the page is marked dirty.  This allows the filesystem to have
full control of page dirtying events coming from the VM.

Rather than simply hold the page locked over the page_mkwrite call, we
call page_mkwrite with the page unlocked and allow callers to return with
it locked, so filesystems can avoid LOR conditions with page lock.

The problem with the current scheme is this: a filesystem that wants to
associate some metadata with a page as long as the page is dirty, will
perform this manipulation in its -&gt;page_mkwrite.  It currently then must
return with the page unlocked and may not hold any other locks (according
to existing page_mkwrite convention).

In this window, the VM could write out the page, clearing page-dirty.  The
filesystem has no good way to detect that a dirty pte is about to be
attached, so it will happily write out the page, at which point, the
filesystem may manipulate the metadata to reflect that the page is no
longer dirty.

It is not always possible to perform the required metadata manipulation in
-&gt;set_page_dirty, because that function cannot block or fail.  The
filesystem may need to allocate some data structure, for example.

And the VM cannot mark the pte dirty before page_mkwrite, because
page_mkwrite is allowed to fail, so we must not allow any window where the
page could be written to if page_mkwrite does fail.

This solution of holding the page locked over the 3 critical operations
(page_mkwrite, setting the pte dirty, and finally setting the page dirty)
closes out races nicely, preventing page cleaning for writeout being
initiated in that window.  This provides the filesystem with a strong
synchronisation against the VM here.

- Sage needs this race closed for ceph filesystem.
- Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913).
- I need it for fsblock.
- I suspect other filesystems may need it too (eg. btrfs).
- I have converted buffer.c to the new locking. Even simple block allocation
  under dirty pages might be susceptible to i_size changing under partial page
  at the end of file (we also have a buffer.c-side problem here, but it cannot
  be fixed properly without this patch).
- Other filesystems (eg. NFS, maybe btrfs) will need to change their
  page_mkwrite functions themselves.

[ This also moves page_mkwrite another step closer to fault, which should
  eventually allow page_mkwrite to be moved into -&gt;fault, and thus avoiding a
  filesystem calldown and page lock/unlock cycle in __do_fault. ]

[akpm@linux-foundation.org: fix derefs of NULL -&gt;mapping]
Cc: Sage Weil &lt;sage@newdream.net&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Valdis Kletnieks &lt;Valdis.Kletnieks@vt.edu&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: fix pageref leak in do_swap_page()</title>
<updated>2009-05-02T22:36:09Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2009-04-30T22:08:08Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=bc43f75cd9815833b27831600ccade672edb5e43'/>
<id>urn:sha1:bc43f75cd9815833b27831600ccade672edb5e43</id>
<content type='text'>
By the time the memory cgroup code is notified about a swapin we
already hold a reference on the fault page.

If the cgroup callback fails make sure to unlock AND release the page
reference which was taken by lookup_swap_cach(), or we leak the reference.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: page_mkwrite change prototype to match fault</title>
<updated>2009-04-01T15:59:14Z</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2009-03-31T22:23:21Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=c2ec175c39f62949438354f603f4aa170846aabb'/>
<id>urn:sha1:c2ec175c39f62949438354f603f4aa170846aabb</id>
<content type='text'>
Change the page_mkwrite prototype to take a struct vm_fault, and return
VM_FAULT_xxx flags.  There should be no functional change.

This makes it possible to return much more detailed error information to
the VM (and also can provide more information eg.  virtual_address to the
driver, which might be important in some special cases).

This is required for a subsequent fix.  And will also make it easier to
merge page_mkwrite() with fault() in future.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Chris Mason &lt;chris.mason@oracle.com&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Cc: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Cc: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.com&gt;
Cc: Joel Becker &lt;joel.becker@oracle.com&gt;
Cc: Artem Bityutskiy &lt;dedekind@infradead.org&gt;
Cc: Felix Blyakher &lt;felixb@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: add comment why mark_page_accessed() would be better than pte_mkyoung() in follow_page()</title>
<updated>2009-04-01T15:59:12Z</updated>
<author>
<name>KOSAKI Motohiro</name>
<email>kosaki.motohiro@jp.fujitsu.com</email>
</author>
<published>2009-03-31T22:19:37Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=bd775c42ea5f7c766d03a287083837cf05e7e738'/>
<id>urn:sha1:bd775c42ea5f7c766d03a287083837cf05e7e738</id>
<content type='text'>
At first look, mark_page_accessed() in follow_page() seems a bit strange.
It seems pte_mkyoung() would be better consistent with other kernel code.

However, it is intentional. The commit log said:

    ------------------------------------------------
    commit 9e45f61d69be9024a2e6bef3831fb04d90fac7a8
    Author: akpm &lt;akpm&gt;
    Date:   Fri Aug 15 07:24:59 2003 +0000

    [PATCH] Use mark_page_accessed() in follow_page()

    Touching a page via follow_page() counts as a reference so we should be
    either setting the referenced bit in the pte or running mark_page_accessed().

    Altering the pte is tricky because we haven't implemented an atomic
    pte_mkyoung().  And mark_page_accessed() is better anyway because it has more
    aging state: it can move the page onto the active list.

    BKrev: 3f3c8acbplT8FbwBVGtth7QmnqWkIw
    ------------------------------------------------

The atomic issue is still true nowadays. adding comment help to understand
code intention and it would be better.

[akpm@linux-foundation.org: clarify text]
Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: don't call mark_page_accessed() in do_swap_page()</title>
<updated>2009-04-01T15:59:11Z</updated>
<author>
<name>KOSAKI Motohiro</name>
<email>kosaki.motohiro@jp.fujitsu.com</email>
</author>
<published>2009-03-31T22:19:33Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=0a0dd05dd7e1a800241888cbf515bf8d3dc2e59c'/>
<id>urn:sha1:0a0dd05dd7e1a800241888cbf515bf8d3dc2e59c</id>
<content type='text'>
commit bf3f3bc5e734706730c12a323f9b2068052aa1f0 (mm: don't
mark_page_accessed in fault path) only remove the mark_page_accessed() in
filemap_fault().

Therefore, swap-backed pages and file-backed pages have inconsistent
behavior.  mark_page_accessed() should be removed from do_swap_page().

Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Hugh Dickins &lt;hugh@veritas.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>VM, x86, PAT: add a new vm flag to track full pfnmap at mmap</title>
<updated>2009-03-14T08:47:44Z</updated>
<author>
<name>Pallipadi, Venkatesh</name>
<email>venkatesh.pallipadi@intel.com</email>
</author>
<published>2009-03-13T23:35:44Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=895791dac6946d535991edd11341046f8e85ea77'/>
<id>urn:sha1:895791dac6946d535991edd11341046f8e85ea77</id>
<content type='text'>
Impact: cleanup

Add a new vm flag VM_PFN_AT_MMAP to identify a PFNMAP that is
fully mapped with remap_pfn_range. Patch removes the overloading
of VM_INSERTPAGE from the earlier patch.

Signed-off-by: Venkatesh Pallipadi &lt;venkatesh.pallipadi@intel.com&gt;
Signed-off-by: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Acked-by: Nick Piggin &lt;npiggin@suse.de&gt;
LKML-Reference: &lt;20090313233543.GA19909@linux-os.sc.intel.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>VM, x86, PAT: Change is_linear_pfn_mapping to not use vm_pgoff</title>
<updated>2009-03-13T03:28:50Z</updated>
<author>
<name>Pallipadi, Venkatesh</name>
<email>venkatesh.pallipadi@intel.com</email>
</author>
<published>2009-03-13T00:45:27Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4bb9c5c02153dfc89a6c73a6f32091413805ad7d'/>
<id>urn:sha1:4bb9c5c02153dfc89a6c73a6f32091413805ad7d</id>
<content type='text'>
Impact: fix false positive PAT warnings - also fix VirtalBox hang

Use of vma-&gt;vm_pgoff to identify the pfnmaps that are fully
mapped at mmap time is broken. vm_pgoff is set by generic mmap
code even for cases where drivers are setting up the mappings
at the fault time.

The problem was originally reported here:

 http://marc.info/?l=linux-kernel&amp;m=123383810628583&amp;w=2

Change is_linear_pfn_mapping logic to overload VM_INSERTPAGE
flag along with VM_PFNMAP to mean full PFNMAP setup at mmap
time.

Problem also tracked at:

 http://bugzilla.kernel.org/show_bug.cgi?id=12800

Reported-by: Thomas Hellstrom &lt;thellstrom@vmware.com&gt;
Tested-by: Frans Pop &lt;elendil@planet.nl&gt;
Signed-off-by: Venkatesh Pallipadi &lt;venkatesh.pallipadi@intel.com&gt;
Signed-off-by: Suresh Siddha &lt;suresh.b.siddha&gt;@intel.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: "ebiederm@xmission.com" &lt;ebiederm@xmission.com&gt;
Cc: &lt;stable@kernel.org&gt; # only for 2.6.29.1, not .28
LKML-Reference: &lt;20090313004527.GA7176@linux-os.sc.intel.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>do_wp_page: fix regression with execute in place</title>
<updated>2009-02-05T20:56:48Z</updated>
<author>
<name>Carsten Otte</name>
<email>cotte@de.ibm.com</email>
</author>
<published>2009-02-04T23:12:16Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=ab92661d5d9514647346047f30f67a7f35ffea67'/>
<id>urn:sha1:ab92661d5d9514647346047f30f67a7f35ffea67</id>
<content type='text'>
Fix do_wp_page for VM_MIXEDMAP mappings.

In the case where pfn_valid returns 0 for a pfn at the beginning of
do_wp_page and the mapping is not shared writable, the code branches to
label `gotten:' with old_page == NULL.

In case the vma is locked (vma-&gt;vm_flags &amp; VM_LOCKED), lock_page,
clear_page_mlock, and unlock_page try to access the old_page.

This patch checks whether old_page is valid before it is dereferenced.

The regression was introduced by "mlock: mlocked pages are unevictable"
(commit b291f000393f5a0b679012b39d79fbc85c018233).

Signed-off-by: Carsten Otte &lt;cotte@de.ibm.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: &lt;stable@kernel.org&gt;		[2.6.28.x]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>x86 PAT: change track_pfn_vma_new to take pgprot_t pointer param</title>
<updated>2009-01-13T18:13:01Z</updated>
<author>
<name>venkatesh.pallipadi@intel.com</name>
<email>venkatesh.pallipadi@intel.com</email>
</author>
<published>2009-01-10T00:13:11Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=e4b866ed197cef9989348e0479fed8d864ea465b'/>
<id>urn:sha1:e4b866ed197cef9989348e0479fed8d864ea465b</id>
<content type='text'>
Impact: cleanup

Change the protection parameter for track_pfn_vma_new() into a pgprot_t pointer.
Subsequent patch changes the x86 PAT handling to return a compatible
memtype in pgprot_t, if what was requested cannot be allowed due to conflicts.
No fuctionality change in this patch.

Signed-off-by: Venkatesh Pallipadi &lt;venkatesh.pallipadi@intel.com&gt;
Signed-off-by: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>x86 PAT: remove PFNMAP type on track_pfn_vma_new() error</title>
<updated>2009-01-13T18:12:59Z</updated>
<author>
<name>venkatesh.pallipadi@intel.com</name>
<email>venkatesh.pallipadi@intel.com</email>
</author>
<published>2009-01-10T00:13:09Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a36706131182f5507d1e2cfbf391b0fa8d72203c'/>
<id>urn:sha1:a36706131182f5507d1e2cfbf391b0fa8d72203c</id>
<content type='text'>
Impact: fix (harmless) double-free of memtype entries and avoid warning

On track_pfn_vma_new() failure, reset the vm_flags so that there will be
no second cleanup happening when upper level routines call unmap_vmas().

This patch fixes part of the bug reported here:

  http://marc.info/?l=linux-kernel&amp;m=123108883716357&amp;w=2

Specifically the error message:

  X:5010 freeing invalid memtype d0000000-d0101000

Is due to multiple frees on error path, will not happen with the patch below.

Signed-off-by: Venkatesh Pallipadi &lt;venkatesh.pallipadi@intel.com&gt;
Signed-off-by: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
</feed>
