1 files changed, 266 insertions, 91 deletions
diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index c7d5d0c7067..5be51fd888b 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -18,42 +18,137 @@ Mount Options
 =============
 
 When mounting an XFS filesystem, the following options are accepted.
-
-  biosize=size
-	Sets the preferred buffered I/O size (default size is 64K).
-	"size" must be expressed as the logarithm (base2) of the
-	desired I/O size.
-	Valid values for this option are 14 through 16, inclusive
-	(i.e. 16K, 32K, and 64K bytes).  On machines with a 4K
-	pagesize, 13 (8K bytes) is also a valid size.
-	The preferred buffered I/O size can also be altered on an
-	individual file basis using the ioctl(2) system call.
-
-  ikeep/noikeep
-	When inode clusters are emptied of inodes, keep them around
-	on the disk (ikeep) - this is the traditional XFS behaviour
-	and is still the default for now.  Using the noikeep option,
-	inode clusters are returned to the free space pool.
+For boolean mount options, the names with the (*) suffix is the
+default behaviour.
+
+  allocsize=size
+	Sets the buffered I/O end-of-file preallocation size when
+	doing delayed allocation writeout (default size is 64KiB).
+	Valid values for this option are page size (typically 4KiB)
+	through to 1GiB, inclusive, in power-of-2 increments.
+
+	The default behaviour is for dynamic end-of-file
+	preallocation size, which uses a set of heuristics to
+	optimise the preallocation size based on the current
+	allocation patterns within the file and the access patterns
+	to the file. Specifying a fixed allocsize value turns off
+	the dynamic behaviour.
+
+  attr2
+  noattr2
+	The options enable/disable an "opportunistic" improvement to
+	be made in the way inline extended attributes are stored
+	on-disk.  When the new form is used for the first time when
+	attr2 is selected (either when setting or removing extended
+	attributes) the on-disk superblock feature bit field will be
+	updated to reflect this format being in use.
+
+	The default behaviour is determined by the on-disk feature
+	bit indicating that attr2 behaviour is active. If either
+	mount option it set, then that becomes the new default used
+	by the filesystem.
+
+	CRC enabled filesystems always use the attr2 format, and so
+	will reject the noattr2 mount option if it is set.
+
+  barrier (*)
+  nobarrier
+	Enables/disables the use of block layer write barriers for
+	writes into the journal and for data integrity operations.
+	This allows for drive level write caching to be enabled, for
+	devices that support write barriers.
+
+  discard
+  nodiscard (*)
+	Enable/disable the issuing of commands to let the block
+	device reclaim space freed by the filesystem.  This is
+	useful for SSD devices, thinly provisioned LUNs and virtual
+	machine images, but may have a performance impact.
+
+	Note: It is currently recommended that you use the fstrim
+	application to discard unused blocks rather than the discard
+	mount option because the performance impact of this option
+	is quite severe.
+
+  grpid/bsdgroups
+  nogrpid/sysvgroups (*)
+	These options define what group ID a newly created file
+	gets.  When grpid is set, it takes the group ID of the
+	directory in which it is created; otherwise it takes the
+	fsgid of the current process, unless the directory has the
+	setgid bit set, in which case it takes the gid from the
+	parent directory, and also gets the setgid bit set if it is
+	a directory itself.
+
+  filestreams
+	Make the data allocator use the filestreams allocation mode
+	across the entire filesystem rather than just on directories
+	configured to use it.
+
+  ikeep
+  noikeep (*)
+	When ikeep is specified, XFS does not delete empty inode
+	clusters and keeps them around on disk.  When noikeep is
+	specified, empty inode clusters are returned to the free
+	space pool.
+
+  inode32
+  inode64 (*)
+	When inode32 is specified, it indicates that XFS limits
+	inode creation to locations which will not result in inode
+	numbers with more than 32 bits of significance.
+
+	When inode64 is specified, it indicates that XFS is allowed
+	to create inodes at any location in the filesystem,
+	including those which will result in inode numbers occupying
+	more than 32 bits of significance. 
+
+	inode32 is provided for backwards compatibility with older
+	systems and applications, since 64 bits inode numbers might
+	cause problems for some applications that cannot handle
+	large inode numbers.  If applications are in use which do
+	not handle inode numbers bigger than 32 bits, the inode32
+	option should be specified.
+
+
+  largeio
+  nolargeio (*)
+	If "nolargeio" is specified, the optimal I/O reported in
+	st_blksize by stat(2) will be as small as possible to allow
+	user applications to avoid inefficient read/modify/write
+	I/O.  This is typically the page size of the machine, as
+	this is the granularity of the page cache.
+
+	If "largeio" specified, a filesystem that was created with a
+	"swidth" specified will return the "swidth" value (in bytes)
+	in st_blksize. If the filesystem does not have a "swidth"
+	specified but does specify an "allocsize" then "allocsize"
+	(in bytes) will be returned instead. Otherwise the behaviour
+	is the same as if "nolargeio" was specified.
 
   logbufs=value
-	Set the number of in-memory log buffers.  Valid numbers range
-	from 2-8 inclusive.
-	The default value is 8 buffers for filesystems with a
-	blocksize of 64K, 4 buffers for filesystems with a blocksize
-	of 32K, 3 buffers for filesystems with a blocksize of 16K
-	and 2 buffers for all other configurations.  Increasing the
-	number of buffers may increase performance on some workloads
-	at the cost of the memory used for the additional log buffers
-	and their associated control structures.
+	Set the number of in-memory log buffers.  Valid numbers
+	range from 2-8 inclusive.
+
+	The default value is 8 buffers.
+
+	If the memory cost of 8 log buffers is too high on small
+	systems, then it may be reduced at some cost to performance
+	on metadata intensive workloads. The logbsize option below
+	controls the size of each buffer and so is also relevant to
+	this case.
 
   logbsize=value
-	Set the size of each in-memory log buffer.
-	Size may be specified in bytes, or in kilobytes with a "k" suffix.
-	Valid sizes for version 1 and version 2 logs are 16384 (16k) and 
-	32768 (32k).  Valid sizes for version 2 logs also include 
-	65536 (64k), 131072 (128k) and 262144 (256k).
-	The default value for machines with more than 32MB of memory
-	is 32768, machines with less memory use 16384 by default.
+	Set the size of each in-memory log buffer.  The size may be
+	specified in bytes, or in kilobytes with a "k" suffix.
+	Valid sizes for version 1 and version 2 logs are 16384 (16k)
+	and 32768 (32k).  Valid sizes for version 2 logs also
+	include 65536 (64k), 131072 (128k) and 262144 (256k). The
+	logbsize must be an integer multiple of the log
+	stripe unit configured at mkfs time.
+
+	The default value for for version 1 logs is 32768, while the
+	default value for version 2 logs is MAX(32768, log_sunit).
 
   logdev=device and rtdev=device
 	Use an external log (metadata journal) and/or real-time device.
@@ -63,10 +158,10 @@ When mounting an XFS filesystem, the following options are accepted.
 	section or contained within it.
 
   noalign
-	Data allocations will not be aligned at stripe unit boundaries.
-
-  noatime
-	Access timestamps are not updated when a file is read.
+	Data allocations will not be aligned at stripe unit
+	boundaries. This is only relevant to filesystems created
+	with non-zero data alignment parameters (sunit, swidth) by
+	mkfs.
 
   norecovery
 	The filesystem will be mounted without running log recovery.
@@ -77,41 +172,86 @@ When mounting an XFS filesystem, the following options are accepted.
 	the mount will fail.
 
   nouuid
-	Don't check for double mounted file systems using the file system uuid.
-	This is useful to mount LVM snapshot volumes.
+	Don't check for double mounted file systems using the file
+	system uuid.  This is useful to mount LVM snapshot volumes,
+	and often used in combination with "norecovery" for mounting
+	read-only snapshots.
 
-  osyncisosync
-	Make O_SYNC writes implement true O_SYNC.  WITHOUT this option,
-	Linux XFS behaves as if an "osyncisdsync" option is used,
-	which will make writes to files opened with the O_SYNC flag set
-	behave as if the O_DSYNC flag had been used instead.
-	This can result in better performance without compromising
-	data safety.
-	However if this option is not in effect, timestamp updates from
-	O_SYNC writes can be lost if the system crashes.
-	If timestamp updates are critical, use the osyncisosync option.
-
-  quota/usrquota/uqnoenforce
+  noquota
+	Forcibly turns off all quota accounting and enforcement
+	within the filesystem.
+
+  uquota/usrquota/uqnoenforce/quota
 	User disk quota accounting enabled, and limits (optionally)
-	enforced.
+	enforced.  Refer to xfs_quota(8) for further details.
 
-  grpquota/gqnoenforce
+  gquota/grpquota/gqnoenforce
 	Group disk quota accounting enabled and limits (optionally)
-	enforced.
+	enforced.  Refer to xfs_quota(8) for further details.
+
+  pquota/prjquota/pqnoenforce
+	Project disk quota accounting enabled and limits (optionally)
+	enforced.  Refer to xfs_quota(8) for further details.
 
   sunit=value and swidth=value
-	Used to specify the stripe unit and width for a RAID device or
-	a stripe volume.  "value" must be specified in 512-byte block
-	units.
-	If this option is not specified and the filesystem was made on
-	a stripe volume or the stripe width or unit were specified for
-	the RAID device at mkfs time, then the mount system call will
-	restore the value from the superblock.  For filesystems that
-	are made directly on RAID devices, these options can be used
-	to override the information in the superblock if the underlying
-	disk layout changes after the filesystem has been created.
-	The "swidth" option is required if the "sunit" option has been
-	specified, and must be a multiple of the "sunit" value.
+	Used to specify the stripe unit and width for a RAID device
+	or a stripe volume.  "value" must be specified in 512-byte
+	block units. These options are only relevant to filesystems
+	that were created with non-zero data alignment parameters.
+
+	The sunit and swidth parameters specified must be compatible
+	with the existing filesystem alignment characteristics.  In
+	general, that means the only valid changes to sunit are
+	increasing it by a power-of-2 multiple. Valid swidth values
+	are any integer multiple of a valid sunit value.
+
+	Typically the only time these mount options are necessary if
+	after an underlying RAID device has had it's geometry
+	modified, such as adding a new disk to a RAID5 lun and
+	reshaping it.
+
+  swalloc
+	Data allocations will be rounded up to stripe width boundaries
+	when the current end of file is being extended and the file
+	size is larger than the stripe width size.
+
+  wsync
+	When specified, all filesystem namespace operations are
+	executed synchronously. This ensures that when the namespace
+	operation (create, unlink, etc) completes, the change to the
+	namespace is on stable storage. This is useful in HA setups
+	where failover must not result in clients seeing
+	inconsistent namespace presentation during or after a
+	failover event.
+
+
+Deprecated Mount Options
+========================
+
+  delaylog/nodelaylog
+	Delayed logging is the only logging method that XFS supports
+	now, so these mount options are now ignored.
+
+	Due for removal in 3.12.
+
+  ihashsize=value
+	In memory inode hashes have been removed, so this option has
+	no function as of August 2007. Option is deprecated.
+
+	Due for removal in 3.12.
+
+  irixsgid
+	This behaviour is now controlled by a sysctl, so the mount
+	option is ignored.
+
+	Due for removal in 3.12.
+
+  osyncisdsync
+  osyncisosync
+	O_SYNC and O_DSYNC are fully supported, so there is no need
+	for these options any more.
+
+	Due for removal in 3.12.
 
 sysctls
 =======
@@ -119,19 +259,24 @@ sysctls
 The following sysctls are available for the XFS filesystem:
 
   fs.xfs.stats_clear		(Min: 0  Default: 0  Max: 1)
-	Setting this to "1" clears accumulated XFS statistics 
+	Setting this to "1" clears accumulated XFS statistics
 	in /proc/fs/xfs/stat.  It then immediately resets to "0".
-  
+
   fs.xfs.xfssyncd_centisecs	(Min: 100  Default: 3000  Max: 720000)
-  	The interval at which the xfssyncd thread flushes metadata
-  	out to disk.  This thread will flush log activity out, and
-  	do some processing on unlinked inodes.
+	The interval at which the filesystem flushes metadata
+	out to disk and runs internal cache cleanup routines.
 
-  fs.xfs.xfsbufd_centisecs	(Min: 50  Default: 100	Max: 3000)
-	The interval at which xfsbufd scans the dirty metadata buffers list.
+  fs.xfs.filestream_centisecs	(Min: 1  Default: 3000  Max: 360000)
+	The interval at which the filesystem ages filestreams cache
+	references and returns timed-out AGs back to the free stream
+	pool.
 
-  fs.xfs.age_buffer_centisecs	(Min: 100  Default: 1500  Max: 720000)
-	The age at which xfsbufd flushes dirty metadata buffers to disk.
+  fs.xfs.speculative_prealloc_lifetime
+		(Units: seconds   Min: 1  Default: 300  Max: 86400)
+	The interval at which the background scanning for inodes
+	with unused speculative preallocation runs. The scan
+	removes unused preallocation from clean inodes and releases
+	the unused space back to the free pool.
 
   fs.xfs.error_level		(Min: 0  Default: 3  Max: 11)
 	A volume knob for error reporting when internal errors occur.
@@ -143,9 +288,9 @@ The following sysctls are available for the XFS filesystem:
 		XFS_ERRLEVEL_HIGH:      5
 
   fs.xfs.panic_mask		(Min: 0  Default: 0  Max: 127)
-	Causes certain error conditions to call BUG(). Value is a bitmask; 
+	Causes certain error conditions to call BUG(). Value is a bitmask;
 	AND together the tags which represent errors which should cause panics:
-	
+
 		XFS_NO_PTAG                     0
 		XFS_PTAG_IFLUSH                 0x00000001
 		XFS_PTAG_LOGRES                 0x00000002
@@ -155,7 +300,7 @@ The following sysctls are available for the XFS filesystem:
 		XFS_PTAG_SHUTDOWN_IOERROR       0x00000020
 		XFS_PTAG_SHUTDOWN_LOGERROR      0x00000040
 
-	This option is intended for debugging only.		
+	This option is intended for debugging only.
 
   fs.xfs.irix_symlink_mode	(Min: 0  Default: 0  Max: 1)
 	Controls whether symlinks are created with mode 0777 (default)
@@ -164,25 +309,55 @@ The following sysctls are available for the XFS filesystem:
   fs.xfs.irix_sgid_inherit	(Min: 0  Default: 0  Max: 1)
 	Controls files created in SGID directories.
 	If the group ID of the new file does not match the effective group
-	ID or one of the supplementary group IDs of the parent dir, the 
-	ISGID bit is cleared if the irix_sgid_inherit compatibility sysctl 
+	ID or one of the supplementary group IDs of the parent dir, the
+	ISGID bit is cleared if the irix_sgid_inherit compatibility sysctl
 	is set.
 
-  fs.xfs.restrict_chown		(Min: 0  Default: 1  Max: 1)
-  	Controls whether unprivileged users can use chown to "give away"
-	a file to another user.
+  fs.xfs.inherit_sync		(Min: 0  Default: 1  Max: 1)
+	Setting this to "1" will cause the "sync" flag set
+	by the xfs_io(8) chattr command on a directory to be
+	inherited by files in that directory.
 
-  fs.xfs.inherit_sync		(Min: 0  Default: 1  Max 1)
-	Setting this to "1" will cause the "sync" flag set 
-	by the chattr(1) command on a directory to be
+  fs.xfs.inherit_nodump		(Min: 0  Default: 1  Max: 1)
+	Setting this to "1" will cause the "nodump" flag set
+	by the xfs_io(8) chattr command on a directory to be
 	inherited by files in that directory.
 
-  fs.xfs.inherit_nodump		(Min: 0  Default: 1  Max 1)
-	Setting this to "1" will cause the "nodump" flag set 
-	by the chattr(1) command on a directory to be
+  fs.xfs.inherit_noatime	(Min: 0  Default: 1  Max: 1)
+	Setting this to "1" will cause the "noatime" flag set
+	by the xfs_io(8) chattr command on a directory to be
 	inherited by files in that directory.
 
-  fs.xfs.inherit_noatime	(Min: 0  Default: 1  Max 1)
-	Setting this to "1" will cause the "noatime" flag set 
-	by the chattr(1) command on a directory to be
+  fs.xfs.inherit_nosymlinks	(Min: 0  Default: 1  Max: 1)
+	Setting this to "1" will cause the "nosymlinks" flag set
+	by the xfs_io(8) chattr command on a directory to be
 	inherited by files in that directory.
+
+  fs.xfs.inherit_nodefrag	(Min: 0  Default: 1  Max: 1)
+	Setting this to "1" will cause the "nodefrag" flag set
+	by the xfs_io(8) chattr command on a directory to be
+	inherited by files in that directory.
+
+  fs.xfs.rotorstep		(Min: 1  Default: 1  Max: 256)
+	In "inode32" allocation mode, this option determines how many
+	files the allocator attempts to allocate in the same allocation
+	group before moving to the next allocation group.  The intent
+	is to control the rate at which the allocator moves between
+	allocation groups when allocating extents for new files.
+
+Deprecated Sysctls
+==================
+
+  fs.xfs.xfsbufd_centisecs	(Min: 50  Default: 100	Max: 3000)
+	Dirty metadata is now tracked by the log subsystem and
+	flushing is driven by log space and idling demands. The
+	xfsbufd no longer exists, so this syctl does nothing.
+
+	Due for removal in 3.14.
+
+  fs.xfs.age_buffer_centisecs	(Min: 100  Default: 1500  Max: 720000)
+	Dirty metadata is now tracked by the log subsystem and
+	flushing is driven by log space and idling demands. The
+	xfsbufd no longer exists, so this syctl does nothing.
+
+	Due for removal in 3.14.