Age | Commit message (Collapse) | Author |
|
|
|
Fix nasty /proc vulnerability
We have a bad interaction with both the kernel and user space being able
to change some of the /proc file status. This fixes the most obvious
part of it, but I expect we'll also make it harder for users to modify
even their "own" files in /proc.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
Based on a patch from Ernie Petrides
During security research, Red Hat discovered a behavioral flaw in core
dump handling. A local user could create a program that would cause a
core file to be dumped into a directory they would not normally have
permissions to write to. This could lead to a denial of service (disk
consumption), or allow the local user to gain root privileges.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
Should have not been applied to 2.6.16
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
chunks [CVE-2006-2934]
When a packet without any chunks is received, the newconntrack variable
in sctp_packet contains an out of bounds value that is used to look up an
pointer from the array of timeouts, which is then dereferenced, resulting
in a crash. Make sure at least a single chunk is present.
Problem noticed by George A. Theall <theall@tenablesecurity.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
It fixes a crash in NTFS on architectures where flush_dcache_page()
is a real function. I never noticed this as all my testing is done on
i386 where flush_dcache_page() is NULL.
http://bugzilla.kernel.org/show_bug.cgi?id=6700
Many thanks to Pauline Ng for the detailed bug report and analysis!
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
Work around the oops reported in
http://bugzilla.kernel.org/show_bug.cgi?id=6478.
Thanks to Ralf Hildebrandt <ralf.hildebrandt@charite.de> for testing and
reporting.
Acked-by: Dave Jones <davej@codemonkey.org.uk>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
- Fixed locking of struct i2o_exec_wait in Executive-OSM
- Removed LCT Notify in i2o_exec_probe() which caused freeing memory and
accessing freed memory during first enumeration of I2O devices
- Added missing locking in i2o_exec_lct_notify()
- removed put_device() of I2O controller in i2o_iop_remove() which caused
the controller structure get freed to early
- Fixed size of mempool in i2o_iop_alloc()
- Fixed access to freed memory in i2o_msg_get()
See http://bugzilla.kernel.org/show_bug.cgi?id=6561
Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
The calculation of nr_pages in scsi_req_map_sg() doesn't account for
the fact that the first page could have an offset that pushes the end
of the buffer onto a new page.
Signed-off-by: Bryan Holty <lgeek@frontiernet.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
It looks like metapage_releasepage was making in invalid assumption that
the releasepage method would not be called on a dirty page. Instead of
issuing a warning and releasing the metapage, it should return 0, indicating
that the private data for the page cannot be released.
I also realized that metapage_releasepage had the return code all wrong. If
it is successful in releasing the private data, it should return 1, otherwise
it needs to return 0.
Lastly, there is no need to call wait_on_page_writeback, since
try_to_release_page will not call us with a page in writback state.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
do_lookup_path()
We're presently running lock_kernel() under fs_lock via nfs's ->permission
handler. That's a ranking bug and sometimes a sleep-in-spinlock bug. This
problem was introduced in the openat() patchset.
We should not need to hold the current->fs->lock for a codepath that doesn't
use current->fs.
[vsu@altlinux.ru: fix error path]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
I noticed a strange behavior in a tmpfs file system the other day, while
building packages - occasionally, and seemingly at random, make decided to
rebuild a target. However, only on tmpfs.
A file would be created, and if checked, it had a sub-second timestamp.
However, after an utimes related call where sub-seconds should be set, they
were zeroed instead. In the case that a file was created, and utimes(...,NULL)
was used on it in the same second, the timestamp on the file moved backwards.
After some digging, I found that this was being caused by tmpfs not having a
time granularity set, thus inheriting the default 1 second granularity.
Hugh adds: yes, we missed tmpfs when the s_time_gran mods went into 2.6.11.
Unfortunately, the granularity of CURRENT_TIME, often used in filesystems,
does not match the default granularity set by alloc_super. A few more such
discrepancies have been found, but this is the most important to fix now.
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
It seems there is error check missing in open_namei for errors returned
through intent.open.file (from lookup_instantiate_filp).
If there is plain open performed, then such a check done inside
__path_lookup_intent_open called from path_lookup_open(), but when the open
is performed with O_CREAT flag set, then __path_lookup_intent_open is only
called with LOOKUP_PARENT set where no file opening can occur yet.
Later on lookup_hash is called where exact opening might take place and
intent.open.file may be filled. If it is filled with error value of some
sort, then we get kernel attempting to dereference this error value as
address (and corresponding oops) in nameidata_to_filp() called from
filp_open().
While this is relatively simple to workaround in ->lookup() method by just
checking lookup_instantiate_filp() return value and returning error as
needed, this is not so easy in ->d_revalidate(), where we can only return
"yes, dentry is valid" or "no, dentry is invalid, perform full lookup
again", and just returning 0 on error would cause extra lookup (with
potential extra costly RPCs).
So in short, I believe that there should be no difference in error handling
for opening a file and creating a file in open_namei() and propose this
simple patch as a solution.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Both csum_partial() and the csum_partial_copy*() family of routines
forget to do a final fold on the computed checksum value on sparc64.
So do the standard Sparc "add + set condition codes, add carry"
sequence, then make sure the high 32-bits of the return value are
clear.
Based upon some excellent detective work and debugging done by
Richard Braun and Samuel Thibault.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Using asm-generic/dma-mapping.h does not work because pushing
the call down to pci_alloc_coherent() causes the gfp_t argument
of dma_alloc_coherent() to be ignored.
Fix this by implementing things directly, and adding a gfp_t
argument we can use in the internal call down to the PCI DMA
implementation of pci_alloc_coherent().
This fixes massive memory corruption when using the sound driver
layer, which passes things like __GFP_COMP down into these
routines and (correctly) expects that to work.
This is a disk eater when sound is used, so it's pretty critical.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
If we move a mapping from one virtual address to another,
and this changes the virtual color of the mapping to those
pages, we can see corrupt data due to D-cache aliasing.
Check for and deal with this by overriding the move_pte()
macro. Set things up so that other platforms can cleanly
override the move_pte() macro too.
This long standing bug corrupts user memory, and in particular
has been notorious for corrupting Debian package database
files on sparc64 boxes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Attached patch fixes spurious errors during firmware load.
Signed-off-by: Stuart MacDonald <stuartm@connecttech.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
|
|
Fix endless loop in the SCTP match similar to those already fixed in the
SCTP conntrack helper (was CVE-2006-1527).
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
do_exit() clears ->it_##clock##_expires, but nothing prevents
another cpu to attach the timer to exiting process after that.
arm_timer() tries to protect against this race, but the check
is racy.
After exit_notify() does 'write_unlock_irq(&tasklist_lock)' and
before do_exit() calls 'schedule() local timer interrupt can find
tsk->exit_state != 0. If that state was EXIT_DEAD (or another cpu
does sys_wait4) interrupted task has ->signal == NULL.
At this moment exiting task has no pending cpu timers, they were
cleanuped in __exit_signal()->posix_cpu_timers_exit{,_group}(),
so we can just return from irq.
John Stultz recently confirmed this bug, see
http://marc.theaimsgroup.com/?l=linux-kernel&m=115015841413687
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
If the local timer interrupt happens just after do_exit() sets PF_EXITING
(and before it clears ->it_xxx_expires) run_posix_cpu_timers() will call
check_process_timers() with tasklist_lock + ->siglock held and
check_process_timers:
t = tsk;
do {
....
do {
t = next_thread(t);
} while (unlikely(t->flags & PF_EXITING));
} while (t != tsk);
the outer loop will never stop.
Actually, the window is bigger. Another process can attach the timer
after ->it_xxx_expires was cleared (see the next commit) and the 'if
(PF_EXITING)' check in arm_timer() is racy (see the one after that).
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
This fixes a bug found by Dave Jones that means that it is possible
for userspace to provoke a machine check on 32-bit kernels. This
also fixes a couple of other places where I found similar problems
by inspection.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
|
|
I added a failure check in patch "sbp2: variable status FIFO address
(fix login timeout)" --- alas for a wrong error value. This is a bug
since Linux 2.6.16. Leads to NULL pointer dereference if the call
failed, and bogus failure handling if call succeeded.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
There is a firmware bug in several Apple iPods which prevents access to
these iPods under certain conditions. The disk size reported by the iPod
is one sector too big. Once access to the end of the disk is attempted,
the iPod becomes inaccessible. This problem has been known for USB iPods
for some time and has recently been discovered to exist with
FireWire/USB combo iPods too.
This patch is derived from the fix in Linux 2.6.17, commit
e9a1c52c7b19d10342226c12f170d7ab644427e2, to be applicable to 2.6.16.x
without prerequisite patches. It hard-wires a workaround for three known
affected model numbers (those of 4th generation iPod, iPod Photo, iPod
mini).
Note: This patch lacks Linux 2.6.17's ability to enable and disable the
workaround via a module parameter.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
This fixes a regression from the earlier DOS fix for non canonical
IRET addresses. It broke UML.
int_ret_from_syscall already does syscall exit tracing, so
no need to do it again in the caller.
This caused problems for UML and some other special programs doing
syscall interception.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
o Start booting into the capture kernel after an Oops if system is in a
unrecoverable state. System will boot into the capture kernel, if one is
pre-loaded by the user, and capture the kernel core dump.
o One of the following conditions should be true to trigger the booting of
capture kernel.
- panic_on_oops is set.
- pid of current thread is 0
- pid of current thread is 1
- Oops happened inside interrupt context.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
Currently iwlist ethX freq[uency]/channel lists all the channels the card
supported for the current region, which includes some channels can only
be used in infrastructure mode. This patch filters these channels out if
the card is currently in ad-hoc mode.
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
Okay, just to sum things up.
This forces libata to wait for up to 2 seconds for BUSY|DRQ to clear
on resume before continuing.
[jgarzik adds...] During testing we never saw DRQ asserted, but
nonetheless (a) this works and (b) testing for DRQ won't hurt.
Signed-off-by: Mark Lord <liml@rtr.ca>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Re-enable posted writes for status FIFO.
Besides bringing back a very minor bandwidth tweak from Linux 2.6.15.x
and older, this also fixes an interoperability regression since 2.6.16:
http://bugzilla.kernel.org/show_bug.cgi?id=6356
(sbp2: scsi_add_device failed. IEEE1394 HD is not working anymore.)
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Tested-by: Vanei Heidemann <linux@javanei.com.br>
Tested-by: Martin Putzlocher <mputzi@gmx.de> (chip type unconfirmed)
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Input: psmouse - fix new device detection logic
Reported to fix http://bugs.gentoo.org/130846
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
For a very long time, echoing 'standby' or 'mem' into /sys/power/state has
killed the machine on powerpc. This patch fixes that.
This patch adds the .valid callback to pm_ops on PowerMac so that only the
suspend to disk state can be entered. Note that just returning 0 would
suffice since the upper layers don't pass PM_SUSPEND_DISK down, but we
handle it there regardless just in case that changes.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Fix an infrequently encountered 'sleeping function called
from invalid context' in the cpuset hooks in __alloc_pages.
Could sleep while interrupts disabled.
The routine cpuset_zone_allowed() is called by code in
mm/page_alloc.c __alloc_pages() to determine if a zone is
allowed in the current tasks cpuset. This routine can sleep,
for certain GFP_KERNEL allocations, if the zone is on a memory
node not allowed in the current cpuset, but might be allowed
in a parent cpuset.
But we can't sleep in __alloc_pages() if in interrupt, nor
if called for a GFP_ATOMIC request (__GFP_WAIT not set in
gfp_flags).
The rule was intended to be:
Don't call cpuset_zone_allowed() if you can't sleep, unless you
pass in the __GFP_HARDWALL flag set in gfp_flag, which disables
the code that might scan up ancestor cpusets and sleep.
This rule was being violated due to a bogus change made (by myself,
pj) to __alloc_pages() as part of the November 2005 effort to
cleanup its logic.
The bogus change can be seen at:
http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-11/4691.html
[PATCH 01/05] mm fix __alloc_pages cpuset ALLOC_* flags
This was first noticed on a tight memory system, in code that
was disabling interrupts and doing allocation requests with
__GFP_WAIT not set, which resulted in __might_sleep() writing
complaints to the log "Debug: sleeping function called ...",
when the code in cpuset_zone_allowed() tried to take the
callback_sem cpuset semaphore.
Special thanks to Dave Chinner, for figuring this out,
and a tip of the hat to Nick Piggin who warned me of this
back in Nov 2005, before I was ready to listen.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Currently loading the ioc3 as a module will cause the ports to be numbered
in reverse order. This mod maintains the proper order of cards for port
numbering.
Signed-off-by: Patrick Gefre <pfg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
Currently loading the ioc4 as a module will cause the ports to be numbered
in reverse order. This mod maintains the proper order of cards for port
numbering.
Signed-off-by: Brent Casavant <bcasavan@sgi.com>
Cc: Pat Gefre <pfg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
|
|
It appears that sockaddr_in.sin_zero is not zeroed during
getsockopt(...SO_ORIGINAL_DST...) operation. This can lead
to an information leak (CVE-2006-1343).
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
|
|
CVE-2006-2444 - Potential remote DoS in SNMP NAT helper.
Fix memory corruption caused by snmp_trap_decode:
- When snmp_trap_decode fails before the id and address are allocated,
the pointers contain random memory, but are freed by the caller
(snmp_parse_mangle).
- When snmp_trap_decode fails after allocating just the ID, it tries
to free both address and ID, but the address pointer still contains
random memory. The caller frees both ID and random memory again.
- When snmp_trap_decode fails after allocating both, it frees both,
and the callers frees both again.
The corruption can be triggered remotely when the ip_nat_snmp_basic
module is loaded and traffic on port 161 or 162 is NATed.
Found by multiple testcases of the trap-app and trap-enc groups of the
PROTOS c06-snmpv1 testsuite.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
|
|
If SCTP receives a badly formatted HB-ACK chunk, it is possible
that we may access invalid memory and potentially have a buffer
overflow. We should really make sure that the chunk format is
what we expect, before attempting to touch the data.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
(CVE-2006-1858)
When performing bound checks during the parameter processing, we
want to use the real chunk and paramter lengths for bounds instead
of the rounded ones. This prevents us from potentially walking of
the end if the chunk length was miscalculated. We still use rounded
lengths when advancing the pointer. This was found during a
conformance test that changed the chunk length without modifying
parameters.
(Vlad noted elsewhere: the most you'd overflow is 3 bytes, so problem
is parameter dependent).
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
Eric Biederman points out that we can't take the task_lock while holding
tasklist_lock for writing, because another CPU that holds the task lock
might take an interrupt that then tries to take tasklist_lock for writing.
Which would be a nasty deadlock, with one CPU spinning forever in an
interrupt handler (although admittedly you need to really work at
triggering it ;)
Since the ptrace_attach() code is special and very unusual, just make it
be extra careful, and use trylock+repeat to avoid the possible deadlock.
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
This holds the task lock (and, for ptrace_attach, the tasklist_lock)
over the actual attach event, which closes a race between attacking to a
thread that is either doing a PTRACE_TRACEME or getting de-threaded.
Thanks to Oleg Nesterov for reminding me about this, and Chris Wright
for noticing a lost return value in my first version.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Currently we check PageDirty() in order to make the decision to swap out
the page. However, the dirty information may be only be contained in the
ptes pointing to the page. We need to first unmap the ptes before checking
for PageDirty(). If unmap is successful then the page count of the page
will also be decreased so that pageout() works properly.
This is a fix necessary for 2.6.17. Without this fix we may migrate dirty
pages for filesystems without migration functions. Filesystems may keep
pointers to dirty pages. Migration of dirty pages can result in the
filesystem keeping pointers to freed pages.
Unmapping is currently not be separated out from removing all the
references to a page and moving the mapping. Therefore try_to_unmap will
be called again in migrate_page() if the writeout is successful. However,
it wont do anything since the ptes are already removed.
The coming updates to the page migration code will restructure the code
so that this is no longer necessary.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
Basic problem: pages of a shared memory segment can only be migrated once.
In 2.6.16 through 2.6.17-rc1, shared memory mappings do not have a
migratepage address space op. Therefore, migrate_pages() falls back to
default processing. In this path, it will try to pageout() dirty pages.
Once a shared memory page has been migrated it becomes dirty, so
migrate_pages() will try to page it out. However, because the page count
is 3 [cache + current + pte], pageout() will return PAGE_KEEP because
is_page_cache_freeable() returns false. This will abort all subsequent
migrations.
This patch adds a migratepage address space op to shared memory segments to
avoid taking the default path. We use the "migrate_page()" function
because it knows how to migrate dirty pages. This allows shared memory
segment pages to migrate, subject to other conditions such as # pte's
referencing the page [page_mapcount(page)], when requested.
I think this is safe. If we're migrating a shared memory page, then we
found the page via a page table, so it must be in memory.
Can be verified with memtoy and the shmem-mbind-test script, both
available at: http://free.linux.hp.com/~lts/Tools/
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
gather_stats() is called with a spinlock held from check_pte_range. We
cannot reschedule with a lock held.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|