aboutsummaryrefslogtreecommitdiff
path: root/fs/ocfs2/ocfs2_fs.h
AgeCommit message (Collapse)Author
2010-12-22ocfs2: Fix system inodes cache overflow.Tao Ma
When we store system inodes cache in ocfs2_super, we use a array for global system inodes. But unfortunately, the range is calculated wrongly which makes it overflow and pollute ocfs2_super->local_system_inodes. This patch fix it by setting the range properly. The corresponding bug is ossbug1303. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1303 Cc: stable@kernel.org Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-10-15Merge branch 'globalheartbeat-2' of ↵Joel Becker
git://oss.oracle.com/git/smushran/linux-2.6 into ocfs2-merge-window Conflicts: fs/ocfs2/ocfs2.h
2010-10-07ocfs2: Add support for heartbeat=global mount optionSunil Mushran
Adds support for heartbeat=global mount option. It ensures that the heartbeat mode passed matches the one enabled on disk. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2010-10-09ocfs2: Add an incompat feature flag OCFS2_FEATURE_INCOMPAT_CLUSTERINFOSunil Mushran
OCFS2_FEATURE_INCOMPAT_CLUSTERINFO allows us to use sb->s_cluster_info for both userspace and o2cb cluster stacks. It also allows us to extend cluster info to include stack flags. This patch also adds stackflags to sb->s_clusterinfo. It also introduces a clusterinfo flag OCFS2_CLUSTER_O2CB_GLOBAL_HEARTBEAT to denote the enabled global heartbeat mode. This incompat flag can be set/cleared using tunefs.ocfs2 --fs-features. The clusterinfo flag is set/cleared using tunefs.ocfs2 --update-cluster-stack. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2010-09-23ocfs2: Sync inode flags with ext2.Tao Ma
We sync our inode flags with ext2 and define them by hex values. But actually in commit 3669567(4 years ago), all these values are moved to include/linux/fs.h. So we'd better also use them as what ext2 did. So sync our inode flags with ext2 by using FS_*. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ocfs2: Cache system inodes of other slots.Tao Ma
Durring orphan scan, if we are slot 0, and we are replaying orphan_dir:0001, the general process is that for every file in this dir: 1. we will iget orphan_dir:0001, since there is no inode for it. we will have to create an inode and read it from the disk. 2. do the normal work, such as delete_inode and remove it from the dir if it is allowed. 3. call iput orphan_dir:0001 when we are done. In this case, since we have no dcache for this inode, i_count will reach 0, and VFS will have to call clear_inode and in ocfs2_clear_inode we will checkpoint the inode which will let ocfs2_cmt and journald begin to work. 4. We loop back to 1 for the next file. So you see, actually for every deleted file, we have to read the orphan dir from the disk and checkpoint the journal. It is very time consuming and cause a lot of journal checkpoint I/O. A better solution is that we can have another reference for these inodes in ocfs2_super. So if there is no other race among nodes(which will let dlmglue to checkpoint the inode), for step 3, clear_inode won't be called and for step 1, we may only need to read the inode for the 1st time. This is a big win for us. So this patch will try to cache system inodes of other slots so that we will have one more reference for these inodes and avoid the extra inode read and journal checkpoint. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-18ocfs2: enable discontig block group support.Tao Ma
Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-05-17ocfs2: Add ocfs2_gd_is_discontig.Tao Ma
Add ocfs2_gd_is_discontig so that we can test whether a group descriptor is discontiguous or not. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-04-13ocfs2: ocfs2_group_bitmap_size has to handle old volume.Tao Ma
ocfs2_group_bitmap_size has to handle the case when the volume don't have discontiguous block group support. So pass the feature_incompat in and check it. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-03-26ocfs2: Add suballoc_loc to metadata blocks.Joel Becker
We need a suballoc_loc field on any suballocated block. Define them. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-04-13ocfs2: Allocate discontiguous block groups.Joel Becker
If we cannot get a contiguous region for a block group, allocate a discontiguous one when the filesystem supports it. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-04-13ocfs2: Define data structures for discontiguous block groups.Joel Becker
Defines the OCFS2_FEATURE_INCOMPAT_DISCONTIG_BG feature bit and modifies struct ocfs2_group_desc for the feature. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-05-05ocfs2: increase the default size of local alloc windowsMark Fasheh
I have observed that the current size of 8M gives us pretty poor fragmentation on multi-threaded workloads which do lots of writes. Generally, I can increase the size of local alloc windows and observe a marked decrease in fragmentation, even up and beyond window sizes of 512 megabytes. This makes sense for a couple reasons - larger local alloc means more room for reservation windows. On multi-node workloads the larger local alloc helps as well because we don't have to do window slides as often. Also, I removed the OCFS2_DEFAULT_LOCAL_ALLOC_SIZE constant as it is no longer used and the comment above it was out of date. To test fragmentation, I used a workload which launched 4 threads that did 4k writes into a series of about 140 alternating files. With resv_level=2, and a 4k/4k file system I observed the following average fragmentation for various localalloc= parameters: localalloc= avg. fragmentation 8 48 32 16 64 10 120 7 On larger cluster sizes, the difference is more dramatic. The new default size top out at 256M, which we'll only get for cluster sizes of 32K and above. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-02Ocfs2: Move ocfs2 ioctl definitions from ocfs2_fs.h to newly added ocfs2_ioctl.hTristan Ye
Currently we were adding ioctl cmds/structures for ocfs2 into ocfs2_fs.h which was used for define ocfs2 on-disk layout. That sounds a little bit confusing, and it may be quickly polluted espcially when growing the ocfs2_info_request ioctls afterwards(it will grow i bet). As a result, such OCFS2 IOCs do need to be placed somewhere other than ocfs2_fs.h, a separated ocfs2_ioctl.h will be added to store such ioctl structures and definitions which could also be used from userspace to invoke ioctls call. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-01-25ocfs2: Sync max_inline_data_with_xattr from tools.Tao Ma
In ocfs2-tools, we have added ocfs2_max_inline_data_with_xattr, so add it in the kernel's ocfs2_fs.h. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-17ocfs2: replace u8 by __u8 in ocfs2_fs.hColy Li
This patch replaces date type 'u8' with '__u8', which follows the coding style of ocfs2_fs.h, and portable to user space for ocfs2-tools. Signed-off-by: Coly Li <coly.li@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-22ocfs2: Add ioctl for reflink.Tao Ma
The ioctl will take 3 parameters: old_path, new_path and preserve and call vfs_reflink. It is useful when we backport reflink features to old kernels. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Enable refcount tree support.Tao Ma
Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Add support for incrementing refcount in the tree.Tao Ma
Given a physical cpos and length, increment the refcount in the tree. If the extent has not been seen before, a refcount record is created for it. Refcount records may be merged or split by this operation. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Define refcount tree structure.Tao Ma
Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-04-03ocfs2: Enable indexed directoriesMark Fasheh
Since the disk format is finalized, we can set this feature bit in the supported mask. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <Joel.Becker@oracle.com>
2009-04-03ocfs2: Add total entry count to dx_root_blockMark Fasheh
This little bit of extra accounting speeds up ocfs2_empty_dir() dramatically by allowing us to short-circuit the full directory scan. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-04-03ocfs2: Increase max links countMark Fasheh
Since we've now got a directory format capable of handling a large number of entries, we can increase the maximum link count supported. This only gets increased if the directory indexing feature is turned on. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>
2009-04-03ocfs2: Introduce dir free space listMark Fasheh
The only operation which doesn't get faster with directory indexing is insert, which still has to walk the entire unindexed directory portion to find a free block. This patch provides an improvement in directory insert performance by maintaining a singly linked list of directory leaf blocks which have space for additional dirents. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>
2009-04-03ocfs2: Store dir index records inlineMark Fasheh
Allow us to store a small number of directory index records in the ocfs2_dx_root_block. This saves us a disk read on small to medium sized directories (less than about 250 entries). The inline root is automatically turned into a root block with extents if the directory size increases beyond it's capacity. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>
2009-04-03ocfs2: Add a name indexed b-tree to directory inodesMark Fasheh
This patch makes use of Ocfs2's flexible btree code to add an additional tree to directory inodes. The new tree stores an array of small, fixed-length records in each leaf block. Each record stores a hash value, and pointer to a block in the traditional (unindexed) directory tree where a dirent with the given name hash resides. Lookup exclusively uses this tree to find dirents, thus providing us with constant time name lookups. Some of the hashing code was copied from ext3. Unfortunately, it has lots of unfixed checkpatch errors. I left that as-is so that tracking changes would be easier. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>
2009-03-12ocfs2: tweak to get the maximum inline data size with xattrTiger Yang
Replace max_inline_data with max_inline_data_with_xattr to ensure it correct when xattr inlined. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Enable metadata checksums.Joel Becker
Add OCFS2_FEATURE_INCOMPAT_META_ECC to the list of supported features. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Checksum and ECC for directory blocks.Joel Becker
Use the db_check field of ocfs2_dir_block_trailer to crc/ecc the dirblocks. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Add directory block trailers.Mark Fasheh
Future ocfs2 features metaecc and indexed directories need to store a little bit of data in each dirblock. For compatibility, we place this in a trailer at the end of the dirblock. The trailer plays itself as an empty dirent, so that if the features are turned off, it can be reused without requiring a tunefs scan. This code adds the trailer and validates it when the block is read in. [ Mark is the original author, but I reinserted this code before his dir index work. -- Joel ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Add the on-disk structures for metadata checksums.Joel Becker
Define struct ocfs2_block_check, an 8-byte structure containing a 32bit crc32_le and a 16bit hamming code ecc. This will be used for metadata checksums. Add the structure to free spaces in the various metadata structures. Add the OCFS2_FEATURE_INCOMPAT_META_ECC bit. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Enable quota accounting on mount, disable on umountJan Kara
Enable quota usage tracking on mount and disable it on umount. Also add support for quota on and quota off quotactls and usrquota and grpquota mount options. Add quota features among supported ones. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Implementation of local and global quota file handlingJan Kara
For each quota type each node has local quota file. In this file it stores changes users have made to disk usage via this node. Once in a while this information is synced to global file (and thus with other nodes) so that limits enforcement at least aproximately works. Global quota files contain all the information about usage and limits. It's mostly handled by the generic VFS code (which implements a trie of structures inside a quota file). We only have to provide functions to convert structures from on-disk format to in-memory one. We also have to provide wrappers for various quota functions starting transactions and acquiring necessary cluster locks before the actual IO is really started. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Assign feature bits and system inodes to quota feature and quota filesJan Kara
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-16ocfs2: Add JBD2 compat feature bit.Joel Becker
Define the OCFS2_FEATURE_COMPAT_JBD2 bit in the filesystem header. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10ocfs2: Fix some typos in xattr annotations.Tao Ma
Fix some typos in the xattr annotations. Signed-off-by: Tao Ma <tao.ma@oracle.com> Reported-by: Coly Li <coyli@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add incompatible flag for extended attributeTiger Yang
This patch adds the s_incompat flag for extended attribute support. This helps us ensure that older versions of Ocfs2 or ocfs2-tools will not be able to mount a volume with xattr support. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add xattr bucket iteration for large numbers of EAsTao Ma
Ocfs2 breaks up xattr index tree leaves into 4k regions, called buckets. Attributes are stored within a given bucket, depending on hash value. After a discussion with Mark, we decided that the per-bucket index (xe_entry[]) would only exist in the 1st block of a bucket. Likewise, name/value pairs will not straddle more than one block. This allows the majority of operations to work directly on the buffer heads in a leaf block. This patch adds code to iterate the buckets in an EA. A new abstration of ocfs2_xattr_bucket is added. It records the bhs in this bucket and ocfs2_xattr_header. This keeps the code neat, improving readibility. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add extended attribute supportTiger Yang
This patch implements storing extended attributes both in inode or a single external block. We only store EA's in-inode when blocksize > 512 or that inode block has free space for it. When an EA's value is larger than 80 bytes, we will store the value via b-tree outside inode or block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: reserve inline space for extended attributeTiger Yang
Add the structures and helper functions we want for handling inline extended attributes. We also update the inline-data handlers so that they properly function in the event that we have both inline data and inline attributes sharing an inode block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add the basic xattr disk layout in ocfs2_fs.hTao Ma
Ocfs2 uses a very flexible structure for storing extended attributes on disk. Small amount of attributes are stored directly in the inode block - up to 256 bytes worth. If that fills up, attributes are also stored in an external block, linked to from the inode block. That block can in turn expand to a btree, capable of storing large numbers of attributes. Individual attribute values are stored inline if they're small enough (currently about 80 bytes, this can be changed though), and otherwise are expanded to a btree. The theoretical limit to the size of an individual attribute is about the same as an inode, though the kernel's upper bound on the size of an attributes data is far smaller. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-07-31[PATCH 1/2] ocfs2: Add counter in struct ocfs2_dinode to track journal replaysSunil Mushran
This patch renames the ij_pad to ij_recovery_generation in struct ocfs2_dinode. This will be used to keep count of journal replays after an unclean shutdown. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-07-14ocfs2: Don't snprintf() without a format.Joel Becker
Some system files are per-slot. Their names include the slot number. ocfs2_sprintf_system_inode_name() uses the system inode definitions to fill in the slot number with snprintf(). For global system files, there is no node number, and the name was printed as a format with no arguments. -Wformat-nonliteral and -Wformat-security don't like this. Instead, use a static "%s" format and the name as the argument. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2008-04-18ocfs2: Add the USERSPACE_STACK incompat bit.Joel Becker
The filesystem gains the USERSPACE_STACK incomat bit and the s_cluster_info field on the superblock. When a userspace stack is in use, the name of the stack is stored on-disk for mount-time verification. The "cluster_stack" option is added to mount(2) processing. The mount process needs to pass the matching stack name. If the passed name and the on-disk name do not match, the mount is failed. When using the classic o2cb stack, the incompat bit is *not* set and no mount option is used other than the usual heartbeat=local. Thus, the filesystem is compatible with older tools. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: New slot map formatJoel Becker
The old slot map had a few limitations: - It was limited to one block, so the maximum slot count was 255. - Each slot was signed 16bits, limiting node numbers to INT16_MAX. - An empty slot was marked by the magic 0xFFFF (-1). The new slot map format provides 32bit node numbers (UINT32_MAX), a separate space to mark a slot in use, and extra room to grow. The slot map is now bounded by i_size, not a block. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: Define the contents of the slot_map file.Joel Becker
The slot map file is merely an array of __le16. Wrap it in a structure for cleaner reference. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-01-25ocfs2: Local alloc window size changeable via mount optionSunil Mushran
Local alloc is a performance optimization in ocfs2 in which a node takes a window of bits from the global bitmap and then uses that for all small local allocations. This window size is fixed to 8MB currently. This patch allows users to specify the window size in MB including disabling it by passing in 0. If the number specified is too large, the fs will use the default value of 8MB. mount -o localalloc=X /dev/sdX /mntpoint Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25[PATCH 2/2] ocfs2: Implement group add for online resizeTao Ma
This patch adds the ability for a userspace program to request that a properly formatted cluster group be added to the main allocation bitmap for an Ocfs2 file system. The request is made via an ioctl, OCFS2_IOC_GROUP_ADD. On a high level, this is similar to ext3, but we use a different ioctl as the structure which has to be passed through is different. During an online resize, tunefs.ocfs2 will format any new cluster groups which must be added to complete the resize, and call OCFS2_IOC_GROUP_ADD on each one. Kernel verifies that the core cluster group information is valid and then does the work of linking it into the global allocation bitmap. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25[PATCH 1/2] ocfs2: Add group extend for online resizeTao Ma
This patch adds the ability for a userspace program to request an extend of last cluster group on an Ocfs2 file system. The request is made via ioctl, OCFS2_IOC_GROUP_EXTEND. This is derived from EXT3_IOC_GROUP_EXTEND, but is obviously Ocfs2 specific. tunefs.ocfs2 would call this for an online-resize operation if the last cluster group isn't full. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12ocfs2: Structure updates for inline dataMark Fasheh
Add the disk, network and memory structures needed to support data in inode. Struct ocfs2_inline_data is defined and embedded in ocfs2_dinode for storing inline data. A new inode field, i_dyn_features, is added to facilitate tracking of dynamic inode state. Since it will be used often, we want to mirror it on ocfs2_inode_info, and transfer it via the meta data lvb. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>