Age | Commit message (Collapse) | Author |
|
commit 9bf729c0af67897ea8498ce17c29b0683f7f2028 upstream
On low memory boxes or those with highmem, kernel can OOM before the
background reclaims inodes via xfssyncd. Add a shrinker to run inode
reclaim so that it inode reclaim is expedited when memory is low.
This is more complex than it needs to be because the VM folk don't
want a context added to the shrinker infrastructure. Hence we need
to add a global list of XFS mount structures so the shrinker can
traverse them.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit f1f724e4b523d444c5a598d74505aefa3d6844d2 upstream
The radix-tree code requires it's users to serialize tag updates
against other updates to the tree. While XFS protects tag updates
against each other it does not serialize them against updates of the
tree contents, which can lead to tag corruption. Fix the inode
cache to always take pag_ici_lock in exclusive mode when updating
radix tree tags.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Patrick Schreurs <patrick@news-service.com>
Tested-by: Patrick Schreurs <patrick@news-service.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 77d7a0c2eeb285c9069e15396703d0cb9690ac50 upstream
The introduction of barriers to loop devices has created a new IO
order completion dependency that XFS does not handle. The loop
device implements barriers using fsync and so turns a log IO in the
XFS filesystem on the loop device into a data IO in the backing
filesystem. That is, the completion of log IOs in the loop
filesystem are now dependent on completion of data IO in the backing
filesystem.
This can cause deadlocks when a flush daemon issues a log force with
an inode locked because the IO completion of IO on the inode is
blocked by the inode lock. This in turn prevents further data IO
completion from occuring on all XFS filesystems on that CPU (due to
the shared nature of the completion queues). This then prevents the
log IO from completing because the log is waiting for data IO
completion as well.
The fix for this new completion order dependency issue is to make
the IO completion inode locking non-blocking. If the inode lock
can't be grabbed, simply requeue the IO completion back to the work
queue so that it can be processed later. This prevents the
completion queue from being blocked and allows data IO completion on
other inodes to proceed, hence avoiding completion order dependent
deadlocks.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit e8b217e7530c6a073ac69f1c85b922d93fdf5647 upstream
Date: Tue, 2 Feb 2010 10:16:26 +1100
We always need to flush the disk write cache and can't skip it just because
the no inode attributes have changed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit cbe132a8bdcff0f9afd9060948fb50597c7400b8 upstream
If we hold onto reserved blocks when doing a remount,ro we end
up writing the blocks used count to disk that includes the reserved
blocks. Reserved blocks are not actually used, so this results in
the values in the superblock being incorrect.
Hence if we run xfs_check or xfs_repair -n while the filesystem is
mounted remount,ro we end up with an inconsistent filesystem being
reported. Also, running xfs_copy on the remount,ro filesystem will
result in an inconsistent image being generated.
To fix this, unreserve the blocks when doing the remount,ro, and
reserved them again on remount,rw. This way a remount,ro filesystem
will appear consistent on disk to all utilities.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 9b00f30762fe9f914eb6e03057a616ed63a4e8ca upstream
A "df" run on an NFS client of an exported XFS file system reports
the wrong information for "available" blocks. When a block quota is
enforced, the amount reported as free is limited by the quota, but
the amount reported available is not (and should be).
Reported-by: Guk-Bong, Kwon <gbkwon@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit e09f98606dcc156de1146c209d45a0d6d5f51c3f upstream
When swapping extents, we can corrupt inodes by swapping data forks
that are in incompatible formats. This is caused by the two indoes
having different fork offsets due to the presence of an attribute
fork on an attr2 filesystem. xfs_fsr tries to be smart about
setting the fork offset, but the trick it plays only works on attr1
(old fixed format attribute fork) filesystems.
Changing the way xfs_fsr sets up the attribute fork will prevent
this situation from ever occurring, so in the kernel code we can get
by with a preventative fix - check that the data fork in the
defragmented inode is in a format valid for the inode it is being
swapped into. This will lead to files that will silently and
potentially repeatedly fail defragmentation, so issue a warning to
the log when this particular failure occurs to let us know that
xfs_fsr needs updating/fixing.
To help identify how to improve xfs_fsr to avoid this issue, add
trace points for the inodes being swapped so that we can determine
why the swap was rejected and to confirm that the code is making the
right decisions and modifications when swapping forks.
A further complication is even when the swap is allowed to proceed
when the fork offset is different between the two inodes then value
for the maximum number of extents the data fork can hold can be
wrong. Make sure these are also set correctly after the swap occurs.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 4b6a46882cca8349e8942e2650c33b11bc571c92 upstream
When reclaiming stale inodes, we need to guarantee that inodes are
unpinned before returning with a "clean" status. If we don't we can
reclaim inodes that are pinned, leading to use after free in the
transaction subsystem as transactions complete.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 57817c68229984818fea9e614d6f95249c3fb098 upstream
We cannot do direct inode reclaim without taking the flush lock to
ensure that we do not reclaim an inode under IO. We check the inode
is clean before doing direct reclaim, but this is not good enough
because the inode flush code marks the inode clean once it has
copied the in-core dirty state to the backing buffer.
It is the flush lock that determines whether the inode is still
under IO, even though it is marked clean, and the inode is still
required at IO completion so we can't reclaim it even though it is
clean in core. Hence the requirement that we need to take the flush
lock even on clean inodes because this guarantees that the inode
writeback IO has completed and it is safe to reclaim the inode.
With delayed write inode flushing, we could end up waiting a long
time on the flush lock even for a clean inode. The background
reclaim already handles this efficiently, so avoid all the problems
by killing the direct reclaim path altogether.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 018027be90a6946e8cf3f9b17b5582384f7ed117 upstream
The reclaim code will handle flushing of dirty inodes before reclaim
occurs, so avoid them when determining whether an inode is a
candidate for flushing to disk when walking the radix trees. This
is based on a test patch from Christoph Hellwig.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c8e20be020f234c8d492927a424a7d8bbefd5b5d upstream
Make the inode tree reclaim walk exclusive to avoid races with
concurrent sync walkers and lookups. This is a version of a patch
posted by Christoph Hellwig that avoids all the code duplication.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit fd45e4784164d1017521086524e3442318c67370 upstream
When we search for and find a busy extent during allocation we
force the log out to ensure the extent free transaction is on
disk before the allocation transaction. The current implementation
has a subtle bug in it--it does not handle multiple overlapping
ranges.
That is, if we free lots of little extents into a single
contiguous extent, then allocate the contiguous extent, the busy
search code stops searching at the first extent it finds that
overlaps the allocated range. It then uses the commit LSN of the
transaction to force the log out to.
Unfortunately, the other busy ranges might have more recent
commit LSNs than the first busy extent that is found, and this
results in xfs_alloc_search_busy() returning before all the
extent free transactions are on disk for the range being
allocated. This can lead to potential metadata corruption or
stale data exposure after a crash because log replay won't replay
all the extent free transactions that cover the allocation range.
Modified-by: Alex Elder <aelder@sgi.com>
(Dropped the "found" argument from the xfs_alloc_busysearch trace
event.)
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 44e08c45cc14e6190a424be8d450070c8e508fad upstream
Because inodes remain in cache much longer than inode buffers do
under memory pressure, we can get the situation where we have
stale, dirty inodes being reclaimed but the backing storage has
been freed. Hence we should never, ever flush XFS_ISTALE inodes
to disk as there is no guarantee that the backing buffer is in
cache and still marked stale when the flush occurs.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit d6d59bada372bcf8bd36c3bbc71c485c29dd2a4b upstream
We currently have some rather odd code in xfs_setattr for
updating the a/c/mtime timestamps:
- first we do a non-transaction update if all three are updated
together
- second we implicitly update the ctime for various changes
instead of relying on the ATTR_CTIME flag
- third we set the timestamps to the current time instead of the
arguments in the iattr structure in many cases.
This patch makes sure we update it in a consistent way:
- always transactional
- ctime is only updated if ATTR_CTIME is set or we do a size
update, which is a special case
- always to the times passed in from the caller instead of the
current time
The only non-size caller of xfs_setattr that doesn't come from
the VFS is updated to set ATTR_CTIME and pass in a valid ctime
value.
Reported-by: Eric Blake <ebb9@byu.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit b44b1126279b60597f96bbe77507b1650f88a969 upstream
Add an assert for inodes not added to the inode cache in xfs_ireclaim,
to make sure we're not going to introduce something like the
famous nfsd inode cache bug again.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 44a743f68705c681439f264deb05f8f38e9048d3 upstream
Noticed that through glibc fallocate would return 28 rather than -1
and errno = 28 for ENOSPC. The xfs routines uses XFS_ERROR format
positive return error codes while the syscalls use negative return
codes. Fixup the two cases in xfs_vn_fallocate syscall to convert to
negative.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit fc5bc4c85c45f0bf854404e5736aa8b65720a18d upstream
Summary of problem:
If a journal record wraps at the physical end of the journal, it has to be
read in two parts in xlog_do_recovery_pass(): a read at the physical end and a
read at the physical beginning. If xlog_bread() has to re-align the first
read, the second read request does not take that re-alignment into account.
If the first read was re-aligned, the second read over-writes the end of the
data from the first read, effectively corrupting it. This can happen either
when reading the record header or reading the record data.
The first sanity check in xlog_recover_process_data() is to check for a valid
clientid, so that is the error reported.
Summary of fix:
If there was a first read at the physical end, XFS_BUF_PTR() returns where the
data was requested to begin. Conversely, because it is the result of
xlog_align(), offset indicates where the requested data for the first read
actually begins - whether or not xlog_bread() has re-aligned it.
Using offset as the base for the calculation of where to place the second read
data ensures that it will be correctly placed immediately following the data
from the first read instead of sometimes over-writing the end of it.
The attached patch has resolved the reported problem of occasional inability
to recover the journal (reporting "bad clientid").
Signed-off-by: Andy Poling <andy@realbig.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 80641dc66a2d6dfb22af4413227a92b8ab84c7bb upstream
When completing I/O requests we must not allow the memory allocator to
recurse into the filesystem, as we might deadlock on waiting for the
I/O completion otherwise. The only thing currently allocating normal
GFP_KERNEL memory is the allocation of the transaction structure for
the unwritten extent conversion. Add a memflags argument to
_xfs_trans_alloc to allow controlling the allocator behaviour.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Thomas Neumann <tneumann@users.sourceforge.net>
Tested-by: Thomas Neumann <tneumann@users.sourceforge.net>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c56c9631cbe88f08854a56ff9776c1f310916830 upstream
When xfs_free_eofblocks is called from ->release the VM might already
hold the mmap_sem, but in the write path we take the iolock before
taking the mmap_sem in the generic write code.
Switch xfs_free_eofblocks to only trylock the iolock if called from
->release and skip trimming the prellocated blocks in that case.
We'll still free them later on the final iput.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 848ce8f731aed0a2d4ab5884a4f6664af73d2dd0 upstream
Currently the reclaim code for the case where we don't reclaim the
final reclaim is overly complicated. We know that the inode is clean
but instead of just directly reclaiming the clean inode we go through
the whole process of marking the inode reclaimable just to directly
reclaim it from the calling context. Besides being overly complicated
this introduces a race where iget could recycle an inode between
marked reclaimable and actually being reclaimed leading to panics.
This patch gets rid of the existing reclaim path, and replaces it with
a simple call to xfs_ireclaim if the inode was clean. While we're at
it we also use the slightly more lax xfs_inode_clean check we'd use
later to determine if we need to flush the inode here.
Finally get rid of xfs_reclaim function and place the remaining small
bits of reclaim code directly into xfs_fs_destroy_inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Patrick Schreurs <patrick@news-service.com>
Reported-by: Tommy van Leeuwen <tommy@news-service.com>
Tested-by: Patrick Schreurs <patrick@news-service.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Access to log items on the AIL is generally protected by m_ail_lock;
this is particularly needed when we're getting or setting the 64-bit
li_lsn on a 32-bit platform. This patch fixes a couple places where we
were accessing the log item after dropping the AIL lock on 32-bit
machines.
This can result in a partially-zeroed log->l_tail_lsn if
xfs_trans_ail_delete is racing with xfs_trans_ail_update, and in at
least some cases, this can leave the l_tail_lsn with a zero cycle
number, which means xlog_space_left will think the log is full (unless
CONFIG_XFS_DEBUG is set, in which case we'll trip an ASSERT), leading to
processes stuck forever in xlog_grant_log_space.
Thanks to Adrian VanderSpek for first spotting the race potential and to
Dave Chinner for debug assistance.
Signed-off-by: Nathaniel W. Turner <nate@houseofnate.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
Hi,
I was hit by a bug in linux 2.6.31 when XFS is not able to recover the
log after a crash if fs was mounted with quotas. Gory details in XFS
bugzilla: http://oss.sgi.com/bugzilla/show_bug.cgi?id=855.
It looks like wrong struct is used in buffer length check, and the following
patch should fix the problem.
xfs_dqblk_t has a size of 104+32 bytes, while xfs_disk_dquot_t is 104 bytes
long, and this is exactly what I see in system logs - "XFS: dquot too small
(104) in xlog_recover_do_dquot_trans."
Signed-off-by: Jan Rekorajski <baggins@sith.mimuw.edu.pl>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
* 'for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs:
xfs: fix xfs_quota remove error
xfs: free temporary cursor in xfs_dialloc
|
|
The xfs_quota returns ENOSYS when remove command is executed.
Reproducable with following steps.
# mount -t xfs -o uquota /dev/sda7 /mnt/mp1
# xfs_quota -x -c off -c remove
XFS_QUOTARM: Function not implemented.
The remove command is allowed during quotaoff, but xfs_fs_set_xstate()
checks whether quota is running, and it leads to ENOSYS.
To solve this problem, add a check for X_QUOTARM.
Signed-off-by: Ryota Yamauchi <r-yamauchi@vf.jp.nec.com>
Signed-off-by: Utako Kusaka <u-kusaka@wm.jp.nec.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Commit bd169565993b39b9b4b102cdac8b13e0a259ce2f seems
to have a slight regression where this code path:
if (!--searchdistance) {
/*
* Not in range - save last search
* location and allocate a new inode
*/
...
goto newino;
}
doesn't free the temporary cursor (tcur) that got dup'd in
this function.
This leaks an item in the xfs_btree_cur zone, and it's caught
on module unload:
===========================================================
BUG xfs_btree_cur: Objects remaining on kmem_cache_close()
-----------------------------------------------------------
It seems like maybe a single free at the end of the function might
be cleaner, but for now put a del_cursor right in this code block
similar to the handling in the rest of the function.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: fix double IRELE in xfs_dqrele_inode
|
|
|
|
xfs_dqrele_inode calls xfs_iput to release the ilock and a reference
and then also calls IRELE which does a second decrement of the reference
count. This leads to a premature freeing of inodes when quotas were turned
off while the filesystem was mounted.
Thanks to Utako Kusaka for reporting the bug and provinding a good testcase.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Utako Kusaka <u-kusaka@wm.jp.nec.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: stop calling filemap_fdatawait inside ->fsync
fix readahead calculations in xfs_dir2_leaf_getdents()
xfs: make sure xfs_sync_fsdata covers the log
xfs: mark inodes dirty before issuing I/O
xfs: cleanup ->sync_fs
xfs: fix xfs_quiesce_data
xfs: implement ->dirty_inode to fix timestamp handling
|
|
|
|
Now that the VFS actually waits for the data I/O to complete before
calling into ->fsync we can stop doing it ourselves.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
This is for bug #850,
http://oss.sgi.com/bugzilla/show_bug.cgi?id=850
XFS file system segfaults , repeatedly and 100% reproducable in 2.6.30 , 2.6.31
The above only showed up on a CONFIG_XFS_DEBUG=y kernel, because
xfs_bmapi() ASSERTs that it has been asked for at least one map,
and it was getting 0.
The root cause is that our guesstimated "bufsize" from xfs_file_readdir
was fairly small, and the
bufsize -= length;
in the loop was going negative - except bufsize is a size_t, so it
was wrapping to a very large number.
Then when we did
ra_want = howmany(bufsize + mp->m_dirblksize,
mp->m_sb.sb_blocksize) - 1;
with that very large number, the (int) ra_want was coming out
negative, and a subsequent compare:
if (1 + ra_want > map_blocks ...
was coming out -true- (negative int compare w/ uint) and we went
back to xfs_bmapi() for more, even though we did not need more,
and asked for 0 maps, and hit the ASSERT.
We have kind of a type mess here, but just keeping bufsize from
going negative is probably sufficient to avoid the problem.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
We want to always cover the log after writing out the superblock, and
in case of a synchronous writeout make sure we actually wait for the
log to be covered. That way a filesystem that has been sync()ed can
be considered clean by log recovery.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
To make sure they get properly waited on in sync when I/O is in flight and
we latter need to update the inode size. Requires a new helper to check if an
ioend structure is beyond the current EOF.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
Sort out ->sync_fs to not perform a superblock writeback for the wait = 0 case
as that is just an optional first pass and the superblock will be written back
properly in the next call with wait = 1. Instead perform an opportunistic
quota writeback to have less work later. Also remove the freeze special case
as we do a proper wait = 1 call in the freeze code anyway.
Also rename the function to xfs_fs_sync_fs to match the normal naming
convention, update comments and avoid calling into the laptop_mode logic on
an error.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
We need to do a synchronous xfs_sync_fsdata to make sure the superblock
actually is on disk when we return.
Also remove SYNC_BDFLUSH flag to xfs_sync_inodes because that particular
flag is never checked.
Move xfs_filestream_flush call later to only release inodes after they
have been written out.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
This is picking up on Felix's repost of Dave's patch to implement a
.dirty_inode method. We really need this notification because
the VFS keeps writing directly into the inode structure instead
of going through methods to update this state. In addition to
the long-known atime issue we now also have a caller in VM code
that updates c/mtime that way for shared writeable mmaps. And
I found another one that no one has noticed in practice in the FIFO
code.
So implement ->dirty_inode to set i_update_core whenever the
inode gets externally dirtied, and switch the c/mtime handling to
the same scheme we already use for atime (always picking up
the value from the Linux inode).
Note that this patch also removes the xfs_synchronize_atime call
in xfs_reclaim it was superflous as we already synchronize the time
when writing the inode via the log (xfs_inode_item_format) or the
normal buffers (xfs_iflush_int).
In addition also remove the I_CLEAR check before copying the Linux
timestamps - now that we always have the Linux inode available
we can always use the timestamps in it.
Also switch to just using file_update_time for regular reads/writes -
that will get us all optimization done to it for free and make
sure we notice early when it breaks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code
But leave TTM code alone, something is fishy there with global vm_ops
being used.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
HWPOISON: Enable error_remove_page on btrfs
HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
HWPOISON: Add madvise() based injector for hardware poisoned pages v4
HWPOISON: Enable error_remove_page for NFS
HWPOISON: Enable .remove_error_page for migration aware file systems
HWPOISON: The high level memory error handler in the VM v7
HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
HWPOISON: shmem: call set_page_dirty() with locked page
HWPOISON: Define a new error_remove_page address space op for async truncation
HWPOISON: Add invalidate_inode_page
HWPOISON: Refactor truncate to allow direct truncating of page v2
HWPOISON: check and isolate corrupted free pages v2
HWPOISON: Handle hardware poisoned pages in try_to_unmap
HWPOISON: Use bitmask/action code for try_to_unmap behaviour
HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
HWPOISON: Add poison check to page fault handling
HWPOISON: Add basic support for poisoned pages in fault handler v3
HWPOISON: Add new SIGBUS error codes for hardware poison signals
HWPOISON: Add support for poison swap entries v2
HWPOISON: Export some rmap vma locking to outside world
...
|
|
It's unused.
It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.
It _was_ used in two places at arch/frv for some reason.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
trivial: fix typo in aic7xxx comment
trivial: fix comment typo in drivers/ata/pata_hpt37x.c
trivial: typo in kernel-parameters.txt
trivial: fix typo in tracing documentation
trivial: add __init/__exit macros in drivers/gpio/bt8xxgpio.c
trivial: add __init macro/ fix of __exit macro location in ipmi_poweroff.c
trivial: remove unnecessary semicolons
trivial: Fix duplicated word "options" in comment
trivial: kbuild: remove extraneous blank line after declaration of usage()
trivial: improve help text for mm debug config options
trivial: doc: hpfall: accept disk device to unload as argument
trivial: doc: hpfall: reduce risk that hpfall can do harm
trivial: SubmittingPatches: Fix reference to renumbered step
trivial: fix typos "man[ae]g?ment" -> "management"
trivial: media/video/cx88: add __init/__exit macros to cx88 drivers
trivial: fix typo in CONFIG_DEBUG_FS in gcov doc
trivial: fix missing printk space in amd_k7_smp_check
trivial: fix typo s/ketymap/keymap/ in comment
trivial: fix typo "to to" in multiple files
trivial: fix typos in comments s/DGBU/DBGU/
...
|
|
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
trivial: fix typo "for for" in multiple files
Signed-off-by: Anand Gadiyar <gadiyar@ti.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Enable removing of corrupted pages through truncation
for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
These should cover most server needs.
I chose the set of migration aware file systems for this
for now, assuming they have been especially audited.
But in general it should be safe for all file systems
on the data area that support read/write and truncate.
Caveat: the hardware error handler does not take i_mutex
for now before calling the truncate function. Is that ok?
Cc: tytso@mit.edu
Cc: hch@infradead.org
Cc: mfasheh@suse.com
Cc: aia21@cantab.net
Cc: hugh.dickins@tiscali.co.uk
Cc: swhiteho@redhat.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Conflicts:
fs/xfs/linux-2.6/xfs_lrw.c
|
|
fix the following 'make includecheck' warning:
fs/xfs/linux-2.6/xfs_iops.c: xfs_acl.h is included more than once.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
create_proc_read_entry() is getting deprecated.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
Christoph Hellwig says that it is enough for XFS to call
filemap_write_and_wait_range() instead of sync_page_range() because we do
all the metadata syncing when forcing the log.
CC: Felix Blyakher <felixb@sgi.com>
CC: xfs@oss.sgi.com
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
We added the ENOSPC handling patch in xfs_create just after it got mered
with xfs_mkdir. Change the log reservation to the variable for either
the create or mkdir value so it does the right thing if get here for creating
a directory.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
|